Hi,
So I have a data stream applications which pulls data from MongoDB using
CDC, and after the process runs for few days it fails with following
stacktrace:
com.mongodb.MongoCommandException: Command failed with error 286
(ChangeStreamHistoryLost): 'PlanExecutor error during aggregation :: caused
o-chairs for the stream processing track, and we would
love to see you there and hope that you will consider submitting a talk.
About the Streaming track:
There are many top-level ASF projects which focus on and push the envelope
for stream and event processing. ActiveMQ, Beam, Bookkeeper, Camel,
So my task is to set up a Apache flink in my Linux Ubuntu wherein I should
be having two data bases as postgress and MySQL .
The 2 data bases should be connected in such a way that any change or
update in my postgres database should I immediately reflect in my SQL
database.
But this is error I'm e
Hi Nadia
You don't need to call `Env.execute("Flink CDC Job");` because the job will
be submitted after executing `Result.executeInsert("mysql_table");`.
Best,
Feng
On Mon, Nov 18, 2024 at 12:05 PM Nadia Mujeeb
wrote:
>
> So my task is to set up a Apache flink in my Linux Ubuntu wherein I sh
>
is open now through April 15, 2024. (This is two months earlier than last
year!)
I am one of the co-chairs for the stream processing track, and we would
love to see you there and hope that you will consider submitting a talk.
About the Streaming track:
There are many top-level ASF projects wh
Broadcast streaming join is a very interesting addition to streaming
SQL, I'm glad to see it's been brought up.
One of the major difference between streaming and batch is state.
Regular join uses "Keyed State" (the key is deduced from join
condition), so for a regular broadcas
state/#important-considerations
David Anderson 于2024年2月2日周五 23:57写道:
> I've seen enough demand for a streaming broadcast join in the community to
> justify a FLIP -- I think it's a good idea, and look forward to the
> discussion.
>
> David
>
> On Fri, Feb 2, 2024 at 6:5
I've seen enough demand for a streaming broadcast join in the community to
justify a FLIP -- I think it's a good idea, and look forward to the
discussion.
David
On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote:
> +1 a FLIP for this topic.
>
>
> Best,
> Feng
>
>
t; 2. Stream processing represents a long-running scenario, and it is quite
>> difficult to determine whether a small table will become a large table
>> after a long period of operation.
>>
>> However, as you mentioned, join hints do indeed have their significance
>> in s
operation.
>
> However, as you mentioned, join hints do indeed have their significance in
> streaming. If you want to support the implementation of "join hints +
> broadcast join" in streaming, the changes I can currently think of include:
> 1. At optimizer, changing the exc
is quite
difficult to determine whether a small table will become a large table after a
long period of operation.
However, as you mentioned, join hints do indeed have their significance in
streaming. If you want to support the implementation of "join hints + broadcast
join" in stre
Hi Feng,
Thanks for your prompt response.
If we were to solve this in Flink, my higher level viewpoint is:
1. First to implement Broadcast join in Flink Streaming SQL, that works
across Table api (e.g. via a `left.join(right, ,
join_type="broadcast")
2. Then, support a Broadcast hint
pache.org/thread/ovyltrhztw7locn301f0wqfvlykw6l9z> for
> the FLIP that implemented the Broadcast join hint for batch sql:
>
>
> > But currently, only in batch the optimizer has different Join strategies
> for Join and
>
> > there is no choice of join strategies in th
f join strategies in the stream. The join hints
listed in the current
> flip should be ignored (maybe can be warned) in streaming mode. When in
the
> future the stream mode has the choice of join strategies, I think that's
a good time > to discuss that the join hint can affect the streami
a blog-post some time ago (see joins part) why upsert-kafka can be
useful when joining event tables:
https://www.ververica.com/blog/streaming-modes-of-flink-kafka-connectors
Best regards,
Alexey
On Wed, Aug 9, 2023 at 5:05 AM liu ron wrote:
> Hi, David
>
> Regarding the N-way join, this
023/08/04 08:21:51 Сыроватский Артем Иванович wrote:
> > > Hello, Flink community!
> > >
> > > I have some important use case for me, which shows extremely bad
> performance:
> > >
> > > * Streaming application
> > > * sql table api
&
ртем Иванович wrote:
> > Hello, Flink community!
> >
> > I have some important use case for me, which shows extremely bad
> > performance:
> >
> > * Streaming application
> > * sql table api
> > * 10 normal joins (state should be kept fore
operations and we are looking forward to the arrival of Flink 1.19 to help
solve your problem.
On 2023/08/04 08:21:51 Сыроватский Артем Иванович wrote:
> Hello, Flink community!
>
> I have some important use case for me, which shows extremely bad performance:
>
> * Stream
airs for the stream processing track, and we would
love to see you there and hope that you will consider submitting a talk.
About the Streaming track:
There are many top-level ASF projects which focus on and push the envelope
for stream and event processing. ActiveMQ, Beam, Bookkeeper, Camel, Flink,
K
ableName, tenv.fromDataStream(XStream, schema))*
> then
> *tenv.toChangelogStream(tenv.sqlQuery(sql))*
> or
> *tenv.toDataStream(tenv.sqlQuery(sql))*
>
> My code works in streaming mode, but while in batch mode, I got the
> following error:
>
> org.apache.flink.client.progr
eated with
*tenv.createTemporaryView(tableName, tenv.fromDataStream(XStream, schema))*
then
*tenv.toChangelogStream(tenv.sqlQuery(sql))*
or
*tenv.toDataStream(tenv.sqlQuery(sql))*
My code works in streaming mode, but while in batch mode, I got the
following error:
org.apache.flink.client.program.ProgramInvocationExce
Hi Shammon
Are you suggesting that I use over and partition by , right? if it is like
this, I must define a agg_func on a specific column.
For Example,I have a product table.
Before partition by :
select user,product,amount
FROM product
After partition by :
select user,product,amount, FIRST_VA
Hi hjw
To rescale data for dim join, I think you can use `partition by` in sql
before `dim join` which will redistribute data by specific column. In
addition, you can add cache for `dim table` to improve performance too.
Best,
Shammon FY
On Tue, Apr 4, 2023 at 10:28 AM Hang Ruan wrote:
> Hi,
Hi, hiw,
IMO, I think the parallelism 1 is enough for you job if we do not consider
the sink. I do not know why you need set the lookup join operator's
parallelism to 6.
The SQL planner will help us to decide the type of the edge and we can not
change it.
Maybe you could share the Execution graph
For example. I create a kafka source to subscribe the topic that have one
partition and set the default parallelism of the job to 6.The next operator of
kafka source is that lookup join a mysql table.However, the relationship
between the kafka Source and the Lookup join operator is Forward, so
Hi Chirag
CSVBulkWriter implements BulkWriter and no special methods are added in
CSVBulkWriter. I think you can use BulkWriter instead of CSVBulkWriter in
your application directly. You can have a try, thanks
Best,
Shammon
On Fri, Mar 10, 2023 at 4:05 PM Shammon FY wrote:
>
>
Thanks for the reply Shammom. I looked at the DataStreamCsvITCase - it gives a
very good example. I can implement something similar. However, the
CSVBulkWriter that is uses to create a factory, has a default package access
which can be accessed from this test case, but not from my application.
Hi all,
One thing to note is that, the CSVBulkReader does not support the
splittable property. Previously with TextInputFormat we were able to use
the block size to split them, but in Streaming world this is not there.
Regards
Ram
On Wed, Mar 8, 2023 at 7:22 AM yuxia wrote:
> Hi, as the
, 2023年 3 月 07日 下午 7:35:47
主题: CSV File Sink in Streaming Use Case
Hi,
I am working on a Java DataStream application and need to implement a File sink
with CSV format.
I see that I have two options here - Row and Bulk ( [
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connecto
Hi
You can create a `BulkWriter.Factory` which will create `CsvBulkWriter` and
create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in
`DataStreamCsvITCase.testCustomBulkWriter`
Best,
Shammon
On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user
wrote:
> Hi,
>
> I am working o
Hi,
I am working on a Java DataStream application and need to implement a File sink
with CSV format.
I see that I have two options here - Row and Bulk
(https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1)
So for CSV file distribution wh
Do you have a great data streaming story to share?
We want to hear from you!
Speaking at Current 2023 is a great way to connect with hundreds of your
peers, become more involved in the data streaming community, and have a
public platform for you to share your story of the future of streaming and
On Flink Native K8s mode, the pod of JM and TM will disappear if the
> streaming job failed.Are there any ways to get the log of the failed
> Streaming job?
> I only think of a solution that is to mount job logs to NFS for
> persistence through pv-pvc defined in pod-template.
>
>
On Flink Native K8s mode, the pod of JM and TM will disappear if the streaming
job failed.Are there any ways to get the log of the failed Streaming job?
I only think of a solution that is to mount job logs to NFS for persistence
through pv-pvc defined in pod-template.
ENV:
Flink version:1.15.0
unning job graph in Flink UI?
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"hjw"
> *收件人: *"User"
> *发送时间: *星期四, 2022年 12 月 08日 上午 12:05:00
> *主题: *How to set disableChaining like streaming multiple INSERT
> statements in a Stat
Could you please post the image of the running job graph in Flink UI?
Best regards,
Yuxia
发件人: "hjw"
收件人: "User"
发送时间: 星期四, 2022年 12 月 08日 上午 12:05:00
主题: How to set disableChaining like streaming multiple INSERT statements in a
StatementSet ?
Hi,
I create
Hi,
I create a StatementSet that contains multiple INSERT statements.
I found that multiple INSERT tasks will be organized in a operator chain
when StatementSet.execute() is invoked.
How to set disableChaining like streaming multiple INSERT statements in a
StatementSet api ?
env
KafkaSource to fully utilize
Kafka rack awareness.
2.
> But, this got me wondering...would it be possible to run a streaming app
> in an active/active mode, where in normal operation, half of the work was
> being done in each DC, and in failover, all of the work would automatically
> failover
that FLIP :)
Best regards,
Martijn
On Wed, Oct 5, 2022 at 10:00 AM Andrew Otto wrote:
> (Ah, note that I am considering very simple streaming apps here, e.g.
> event enrichment apps. No windowing or complex state. The only state is
> the Kafka offsets, which I suppose would a
(Ah, note that I am considering very simple streaming apps here, e.g. event
enrichment apps. No windowing or complex state. The only state is the
Kafka offsets, which I suppose would also have to be managed from Kafka,
not from Flink state.)
On Wed, Oct 5, 2022 at 9:54 AM Andrew Otto wrote
the multi-datacenter deployment architecture
of streaming applications.
A Kafka stretch cluster is one in which the brokers span multiple
datacenters, relying on the usual Kafka broker replication for multi DC
replication (rather than something like Kafka MirrorMaker). This is
feasible with Kafka to
Hey Pavel,
I was looking for something similar a while back and the best thing I came
up with was using the DataStream API to do all the shuffling and THEN
converting the stream to a table using fromDataStream/fromChangelogStream.
On Wed, Oct 5, 2022 at 4:54 AM Pavel Penkov wrote:
> I have a ta
I have a table that reads a Kafka topic and effective parallelism is equal
to the number of Kafka partitions. Is there a way to reshuffle the data
like with DataStream API to increase effective parallelism?
This issue has been going on for a while but i couldn't figure it out no
matter what I tried
Some general info
Flink 1.14.5 with checkpoint/HA storage in S3
we have 3 jobs which are identical code the only difference is which kafka
topic is read and what prefix is used in the S3 sink
this means tha
Hi David,
Thanks for confirming my research. Adding some more up to date
documentation to that function seems like an easy first contribution.
Best,
Ben
r my test,
>> customised with my own MockEnvironment + Configuration. To my surprise, the
>> configuration is always empty. So I did some reading and debugging and came
>> across this:
>>
>>
>> https://github.com/apache/flink/blob/62786320eb55
surprise, the
> configuration is always empty. So I did some reading and debugging and came
> across this:
>
>
> https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOp
, the
configuration is always empty. So I did some reading and debugging and came
across this:
https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOperator.java#L100
Given that open
Hi Venkat,
I guess you're using another compression algorithm which isn't zlib, you'll
have to adapt the code to work with your algorithm of choice.
Kind regards,
Francis
On Fri, 22 Jul 2022 at 17:27, Ramana wrote:
> Hi Francis - Thanks for the snippet. I tried using the same, however I get
>
Hi Francis - Thanks for the snippet. I tried using the same, however I get
an error.
Following is the error -
java.util.zip.DataFormatException: incorrect header check.
I see multiple errors, i beleive for every message i am seeing this stack
trace?
Any idea as to what could be causing this?
T
Hi Venkat,
there's nothing that I know of, but I've written a zlib decompressor for
our payloads which was pretty straightforward.
public class ZlibDeserializationSchema extends
AbstractDeserializationSchema {
@Override
public byte[] deserialize(byte[] message) throws IOException {
Hi - We have a requirement to read the compressed messages emitting out of
RabbitMQ and to have them processed using PyFlink. However, I am not
finding any out of the box functionality in PyFlink which can help
decompress the messages.
Could anybody help me with an example of how to go about this?
Hi, vtygoss
> I'm working on migrating from full-data-pipeline(with spark) to
> incremental-data-pipeline(with flink cdc), and i met a problem about accuracy
> validation between pipeline based flink and spark.
Glad to hear that !
> For bounded data, it's simple to validate the two result se
Hi, all.
>From my understanding, the accuracy for the sync pipeline requires to
snapshot the source and sink at some points. It is just like we have a
checkpoint that contains all the data at some time for both sink and
source. Then we can compare the content in the checkpoint and find the
differ
I think for an unbounded data, we can only check the result at one point of
time, that is the work what Watermark[1] does. What about tag one time and to
validate the data accuracy at that moment?
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/create/#watermar
It's a good question. Let me ping @Leonard to share more thoughts.
Best,
Shengkai
vtygoss 于2022年5月20日周五 16:04写道:
> Hi community!
>
>
> I'm working on migrating from full-data-pipeline(with spark) to
> incremental-data-pipeline(with flink cdc), and i met a problem about
> accuracy validation bet
Hi community!
I'm working on migrating from full-data-pipeline(with spark) to
incremental-data-pipeline(with flink cdc), and i met a problem about accuracy
validation between pipeline based flink and spark.
For bounded data, it's simple to validate the two result sets are consitent or
not.
Hi everyone,
ApacheCon Asia [1] will feature the Streaming track for the second year.
Please don't hesitate to submit your proposal if there is an interesting
project or Flink experience you would like to share with us!
The conference will be online (virtual) and the talks will be pre-rec
>
> Best,
> Georg
>
> Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
>> Hi Georg,
>>
>> No they wouldn't. There is no capability out of the box that lets you
>> start Flink in streaming mode, run everythi
ility out of the box that lets you
> start Flink in streaming mode, run everything that's available at that
> moment and then stops when there's no data anymore. You would need to
> trigger the stop yourself.
>
> Best regards,
>
> Martijn
>
> On Fri, 6 May 202
Hi Georg,
No they wouldn't. There is no capability out of the box that lets you start
Flink in streaming mode, run everything that's available at that moment and
then stops when there's no data anymore. You would need to trigger the stop
yourself.
Best regards,
Martijn
On Fri, 6
Hi,
I would disagree:
In the case of spark, it is a streaming application that is offering full
streaming semantics (but with less cost and bigger latency) as it triggers
less often. In particular, windowing and stateful semantics as well as
late-arriving data are handled automatically using the
e.org/docs/latest/structured-streaming-programming-guide.html#triggers
> offers a variety of triggers.
>
> In particular, it also has the "once" mode:
>
> *One-time micro-batch* The query will execute *only one* micro-batch to
> process all the available data and then stop on it
Hi,
spark
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
offers a variety of triggers.
In particular, it also has the "once" mode:
*One-time micro-batch* The query will execute *only one* micro-batch to
process all the available data and th
Thank you for information !
From: Farouk
Sent: Thursday, April 21, 2022 1:14:00 AM
To: Aeden Jameson
Cc: Alexey Trenikhun ; Flink User Mail List
Subject: Re: Integration Test for Kafka Streaming job
Hi
I would recommend to use kafka-junit5 from salesforce
Hi
I would recommend to use kafka-junit5 from salesforce
https://github.com/salesforce/kafka-junit
On top of that, you can
use org.apache.flink.runtime.minicluster.TestingMiniCluster
Your stack should be complete.
Cheers
Le jeu. 21 avr. 2022 à 07:10, Aeden Jameson a
écrit :
> I've had success
I've had success using Kafka for Junit,
https://github.com/mguenther/kafka-junit, for these kinds of tests.
On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote:
>
> Hello,
> We have Flink job that read data from multiple Kafka topics, transforms data
> and write in output Kafka topics. We wan
Hello,
We have Flink job that read data from multiple Kafka topics, transforms data
and write in output Kafka topics. We want write integration test for it. I've
looked at KafkaTableITCase, we can do similar setup of Kafka topics,
prepopulate data but since in our case it is endless stream, we n
?
Adrian
On Fri, Apr 8, 2022 at 8:20 PM Roman Khachatryan wrote:
> Hi Carlos,
>
> AFAIK, Flink FileSource is capable of checkpointing while reading the
> files (at least in Streaming Mode).
> As for the watermarks, I think FLIP-182 [1] could solve the problem;
> however, it
Hi Carlos,
AFAIK, Flink FileSource is capable of checkpointing while reading the
files (at least in Streaming Mode).
As for the watermarks, I think FLIP-182 [1] could solve the problem;
however, it's currently under development.
I'm also pulling in Arvid and Fabian who are more familia
Hi,
We have an in-house platform that we want to integrate with external
clients via HDFS. They have lots of existing files and they continuously
put more data to HDFS. Ideally, we would like to have a Flink job that
takes care of ingesting data as one of the requirements is to execute SQL
on top
-configuring-checkpointing
On 25/02/2022 08:18, Jin Yi wrote:
so we have a streaming job where the main work to be done is
processing infinite kafka sources. recently, i added a fromCollection
(finite) source to simply write some state once upon startup. this
all seems to work fine. the finite
One possible option is to look into the hybrid source released in Flink
1.14 to support your use-case:
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/
On Fri, Feb 25, 2022, 09:19 Jin Yi wrote:
> so we have a streaming job where the main w
so we have a streaming job where the main work to be done is processing
infinite kafka sources. recently, i added a fromCollection (finite) source
to simply write some state once upon startup. this all seems to work
fine. the finite source operators all finish, while all the infinite
source
Hi all,
I recently put up a question about a deduplication query related to a join
and realised that I was probably asking the wrong question. I'm using Flink
1.15-SNAPSHOT (97ddc39945cda9bf1f52ab159852fdb606201cf2) as we're using the
RabbitMQ connector with pyflink. We won't go to prod until 1.15
6:23 AM M Singh wrote:
Hi Folks:
I am trying to monitor a jdbc source and continuously streaming data in an
application using the jdbc connector. However, the application stops after
reading the data in the table.
I've checked the docs
(https://nightlies.apache.org/flink/flink-docs-m
Hi,
You can try flink's cdc connector [1] to see if it meets your needs.
[1] https://github.com/ververica/flink-cdc-connectors
Best,
Guowei
On Mon, Feb 21, 2022 at 6:23 AM M Singh wrote:
> Hi Folks:
>
> I am trying to monitor a jdbc source and continuously streaming data in an
Hi Folks:
I am trying to monitor a jdbc source and continuously streaming data in an
application using the jdbc connector. However, the application stops after
reading the data in the table.
I've checked the docs
(https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table
).
String executionPlan = env.getExecutionPlan();
env.execute();
logger.info("Started job; executionPlan={}", getExecutionPlan);
On 14/02/2022 17:40, Shane Bishop wrote:
Hi all,
My team has started seeing the error "java.lang.IllegalStateException:
No operators defined in s
Hi all,
My team has started seeing the error "java.lang.IllegalStateException: No
operators defined in streaming topology. Cannot execute." However, even with
this error, the Flink application starts and runs fine, and the Flink job
renders fine in the Flink Dashboard.
Attached i
Hi
This is my first question to the community so welcome everyone :) On a
daily basis I’m using Apache Beam for developing streaming pipelines but I
would like to learn native Flink as well. I’m looking for examples on how
to write integration tests with full programmatic control over watermark
Thanks
On Fri, Jan 28, 2022, 07:47 Caizhi Weng wrote:
> Hi!
>
> This job will work as long as your SQL statement is valid. Did you meet
> some difficulties? Or what is your concern? A record of 100K is sort of
> large, but I've seen quite a lot of jobs with such record size so it is OK.
>
> HG
Hi!
This job will work as long as your SQL statement is valid. Did you meet
some difficulties? Or what is your concern? A record of 100K is sort of
large, but I've seen quite a lot of jobs with such record size so it is OK.
HG 于2022年1月27日周四 02:57写道:
> Hi,
>
> I need to calculate elapsed times b
Hi,
I need to calculate elapsed times between steps of a transaction.
Each step is an event. All steps belonging to a single transaction have the
same transaction id. Every event has a handling time.
All information is part of a large JSON structure.
But I can have the incoming source supply trans
DataStream API
--
Sender:Guowei Ma
Sent At:2022 Jan. 26 (Wed.) 21:51
Recipient:Shawn Du
Cc:user
Subject:Re: create savepoint on bounded source in streaming mode
Hi, Shawn
Thank you for your sharing. Unfortunately I do
ent At:2022 Jan. 26 (Wed.) 19:50
> Recipient:Shawn Du
> Cc:user
> Subject:Re: create savepoint on bounded source in streaming mode
>
> Hi,Shawn
>
> You want to use the correct state(n-1) for day n-1 and the full amount of
> data for day n to produce the correct state(n) for da
right!
--
Sender:Guowei Ma
Sent At:2022 Jan. 26 (Wed.) 19:50
Recipient:Shawn Du
Cc:user
Subject:Re: create savepoint on bounded source in streaming mode
Hi,Shawn
You want to use the correct state(n-1) for day n-1 and the
:
> Hi Gaowei,
>
> think the case:
> we have one streaming application built by flink, but kinds of
> reason, the event may be disordered or delayed terribly.
> we want to replay the data day by day(the data was processed like
> reordered.). it
Hi Gaowei,
think the case:
we have one streaming application built by flink, but kinds of reason,
the event may be disordered or delayed terribly.
we want to replay the data day by day(the data was processed like
reordered.). it looks like a batching job but with
bounded source in streaming mode
Hi Shawn,
You could also take a look at the hybrid source[1]
Best,
Dawid
[1]https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/
On 26/01/2022 08:39, Guowei Ma wrote:
Hi Shawn
Currently Flink can not trigger the sp at the end of
Thanks.
--
Sender:Guowei Ma
Sent At:2022 Jan. 26 (Wed.) 14:04
Recipient:Shawn Du
Cc:user
Subject:Re: create savepoint on bounded source in streaming mode
Hi, Shawn
I think Flink does not support this mechanism yet.
Would you like to share the scenari
any ideas?
>
>Thanks.
>
> --
> Sender:Guowei Ma
> Sent At:2022 Jan. 26 (Wed.) 14:04
> Recipient:Shawn Du
> Cc:user
> Subject:Re: create savepoint on bounded source in streaming mode
>
> Hi, Shawn
> I think Fl
in streaming mode
Hi, Shawn
I think Flink does not support this mechanism yet.
Would you like to share the scenario in which you need this savepoint at the
end of the bounded input?
Best,
Guowei
On Wed, Jan 26, 2022 at 1:50 PM Shawn Du wrote:
Hi experts,
assume I have several files and I want
lay these files in order in
> streaming mode and create a savepoint when files play at the end. it is
> possible?
> I wrote a simple test app, and job are finished when source is at the end.
> I have no chance to creat a savepoint. please help.
>
> Thanks
> Shawn
>
Hi experts,
assume I have several files and I want replay these files in order in streaming
mode and create a savepoint when files play at the end. it is possible?
I wrote a simple test app, and job are finished when source is at the end. I
have no chance to creat a savepoint. please help
ink to output files in s3.
>>
>> When running in Streaming mode, it seems that parquet files are flushed
>> and rolled at each checkpoint. The result is a crazy high number of very
>> small parquet files which completely defeats the purpose of that format.
>>
>>
&
t; When running in Streaming mode, it seems that parquet files are flushed
> and rolled at each checkpoint. The result is a crazy high number of very
> small parquet files which completely defeats the purpose of that format.
>
>
> Is there a way to build larger output parquet files? O
Hello,
We’re trying to use a Parquet file sink to output files in s3.
When running in Streaming mode, it seems that parquet files are flushed and
rolled at each checkpoint. The result is a crazy high number of very small
parquet files which completely defeats the purpose of that format.
Is
Sink: SnowflakeSinkProvider(...) (2/3) (attempt #0) with attempt id
>> 8d046c60a84900cba31877ec28f81124 to ... (dataPort=64218) with allocation id
>> fe0a5941283557538901c8a9774a2584
>> 2021-12-22 09:25:28,890 INFO o.a.f.r.e.Execution ...-sink-batch -> Sink:
>> SnowflakeSink
8466a27572
> 2021-12-22 09:25:28,917 INFO Finished successfully with value: 0
> 2021-12-22 09:25:28,922 INFO o.a.f.r.e.ClusterEntrypoint Shutting
> StandaloneSessionClusterEntrypoint down with application status UNKNOWN.
> Diagnostics Cluster entrypoint has been closed externally..
>
1 - 100 of 1019 matches
Mail list logo