Flink CDC 3.1 for mongodb repeatedly failing in streaming mode after running for few days despite setting heartbeat

2025-05-23 Thread Sachin Mittal
Hi, So I have a data stream applications which pulls data from MongoDB using CDC, and after the process runs for few days it fails with following stacktrace: com.mongodb.MongoCommandException: Command failed with error 286 (ChangeStreamHistoryLost): 'PlanExecutor error during aggregation :: caused

Announcing the Community Over Code 2025 Streaming Track

2025-04-04 Thread James Hughes
o-chairs for the stream processing track, and we would love to see you there and hope that you will consider submitting a talk. About the Streaming track: There are many top-level ASF projects which focus on and push the envelope for stream and event processing. ActiveMQ, Beam, Bookkeeper, Camel,

No operators defined in streaming topology. Cannot execute.

2024-11-18 Thread Nadia Mujeeb
So my task is to set up a Apache flink in my Linux Ubuntu wherein I should be having two data bases as postgress and MySQL . The 2 data bases should be connected in such a way that any change or update in my postgres database should I immediately reflect in my SQL database. But this is error I'm e

Re: No operators defined in streaming topology. Cannot execute.

2024-11-18 Thread Feng Jin
Hi Nadia You don't need to call `Env.execute("Flink CDC Job");` because the job will be submitted after executing `Result.executeInsert("mysql_table");`. Best, Feng On Mon, Nov 18, 2024 at 12:05 PM Nadia Mujeeb wrote: > > So my task is to set up a Apache flink in my Linux Ubuntu wherein I sh

Announcing the Community Over Code 2024 Streaming Track

2024-03-20 Thread James Hughes
> is open now through April 15, 2024. (This is two months earlier than last year!) I am one of the co-chairs for the stream processing track, and we would love to see you there and hope that you will consider submitting a talk. About the Streaming track: There are many top-level ASF projects wh

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Benchao Li
Broadcast streaming join is a very interesting addition to streaming SQL, I'm glad to see it's been brought up. One of the major difference between streaming and batch is state. Regular join uses "Keyed State" (the key is deduced from join condition), so for a regular broadcas

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-03 Thread Shengkai Fang
state/#important-considerations David Anderson 于2024年2月2日周五 23:57写道: > I've seen enough demand for a streaming broadcast join in the community to > justify a FLIP -- I think it's a good idea, and look forward to the > discussion. > > David > > On Fri, Feb 2, 2024 at 6:5

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread David Anderson
I've seen enough demand for a streaming broadcast join in the community to justify a FLIP -- I think it's a good idea, and look forward to the discussion. David On Fri, Feb 2, 2024 at 6:55 AM Feng Jin wrote: > +1 a FLIP for this topic. > > > Best, > Feng > >

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Feng Jin
t; 2. Stream processing represents a long-running scenario, and it is quite >> difficult to determine whether a small table will become a large table >> after a long period of operation. >> >> However, as you mentioned, join hints do indeed have their significance >> in s

Re: Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Martijn Visser
operation. > > However, as you mentioned, join hints do indeed have their significance in > streaming. If you want to support the implementation of "join hints + > broadcast join" in streaming, the changes I can currently think of include: > 1. At optimizer, changing the exc

Re:Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-02 Thread Xuyang
is quite difficult to determine whether a small table will become a large table after a long period of operation. However, as you mentioned, join hints do indeed have their significance in streaming. If you want to support the implementation of "join hints + broadcast join" in stre

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Prabhjot Bharaj via user
Hi Feng, Thanks for your prompt response. If we were to solve this in Flink, my higher level viewpoint is: 1. First to implement Broadcast join in Flink Streaming SQL, that works across Table api (e.g. via a `left.join(right, , join_type="broadcast") 2. Then, support a Broadcast hint

Re: Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Feng Jin
pache.org/thread/ovyltrhztw7locn301f0wqfvlykw6l9z> for > the FLIP that implemented the Broadcast join hint for batch sql: > > > > But currently, only in batch the optimizer has different Join strategies > for Join and > > > there is no choice of join strategies in th

Case for a Broadcast join + hint in Flink Streaming SQL

2024-02-01 Thread Prabhjot Bharaj via user
f join strategies in the stream. The join hints listed in the current > flip should be ignored (maybe can be warned) in streaming mode. When in the > future the stream mode has the choice of join strategies, I think that's a good time > to discuss that the join hint can affect the streami

Re: Streaming join performance

2023-08-14 Thread Alexey Novakov via user
a blog-post some time ago (see joins part) why upsert-kafka can be useful when joining event tables: https://www.ververica.com/blog/streaming-modes-of-flink-kafka-connectors Best regards, Alexey On Wed, Aug 9, 2023 at 5:05 AM liu ron wrote: > Hi, David > > Regarding the N-way join, this

Re: Streaming join performance

2023-08-08 Thread liu ron
023/08/04 08:21:51 Сыроватский Артем Иванович wrote: > > > Hello, Flink community! > > > > > > I have some important use case for me, which shows extremely bad > performance: > > > > > > * Streaming application > > > * sql table api &

Re: Streaming join performance

2023-08-08 Thread David Anderson
ртем Иванович wrote: > > Hello, Flink community! > > > > I have some important use case for me, which shows extremely bad > > performance: > > > > * Streaming application > > * sql table api > > * 10 normal joins (state should be kept fore

RE: Streaming join performance

2023-08-05 Thread shuai xu
operations and we are looking forward to the arrival of Flink 1.19 to help solve your problem. On 2023/08/04 08:21:51 Сыроватский Артем Иванович wrote: > Hello, Flink community! > > I have some important use case for me, which shows extremely bad performance: > > * Stream

Announcing the Community Over Code 2023 Streaming Track

2023-06-09 Thread James Hughes
airs for the stream processing track, and we would love to see you there and hope that you will consider submitting a talk. About the Streaming track: There are many top-level ASF projects which focus on and push the envelope for stream and event processing. ActiveMQ, Beam, Bookkeeper, Camel, Flink, K

Re: Table API behaves differently in STREAMING mode v.s. BATCH mode

2023-05-04 Thread Shammon FY
ableName, tenv.fromDataStream(XStream, schema))* > then > *tenv.toChangelogStream(tenv.sqlQuery(sql))* > or > *tenv.toDataStream(tenv.sqlQuery(sql))* > > My code works in streaming mode, but while in batch mode, I got the > following error: > > org.apache.flink.client.progr

Table API behaves differently in STREAMING mode v.s. BATCH mode

2023-05-01 Thread Luke Xiong
eated with *tenv.createTemporaryView(tableName, tenv.fromDataStream(XStream, schema))* then *tenv.toChangelogStream(tenv.sqlQuery(sql))* or *tenv.toDataStream(tenv.sqlQuery(sql))* My code works in streaming mode, but while in batch mode, I got the following error: org.apache.flink.client.program.ProgramInvocationExce

Re:Re: How to set reblance in Flink Sql like Streaming api?

2023-04-04 Thread hjw
Hi Shammon Are you suggesting that I use over and partition by , right? if it is like this, I must define a agg_func on a specific column. For Example,I have a product table. Before partition by : select user,product,amount FROM product After partition by : select user,product,amount, FIRST_VA

Re: How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread Shammon FY
Hi hjw To rescale data for dim join, I think you can use `partition by` in sql before `dim join` which will redistribute data by specific column. In addition, you can add cache for `dim table` to improve performance too. Best, Shammon FY On Tue, Apr 4, 2023 at 10:28 AM Hang Ruan wrote: > Hi,

Re: How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread Hang Ruan
Hi, hiw, IMO, I think the parallelism 1 is enough for you job if we do not consider the sink. I do not know why you need set the lookup join operator's parallelism to 6. The SQL planner will help us to decide the type of the edge and we can not change it. Maybe you could share the Execution graph

How to set reblance in Flink Sql like Streaming api?

2023-04-03 Thread hjw
For example. I create a kafka source to subscribe the topic that have one partition and set the default parallelism of the job to 6.The next operator of kafka source is that lookup join a mysql table.However, the relationship between the kafka Source and the Lookup join operator is Forward, so

Re: CSV File Sink in Streaming Use Case

2023-03-12 Thread Shammon FY
Hi Chirag CSVBulkWriter implements BulkWriter and no special methods are added in CSVBulkWriter. I think you can use BulkWriter instead of CSVBulkWriter in your application directly. You can have a try, thanks Best, Shammon On Fri, Mar 10, 2023 at 4:05 PM Shammon FY wrote: > >

Re: CSV File Sink in Streaming Use Case

2023-03-10 Thread Chirag Dewan via user
Thanks for the reply Shammom. I looked at the DataStreamCsvITCase - it gives a very good example. I can implement something similar. However, the CSVBulkWriter that is uses to create a factory, has a default package access which can be accessed from this test case, but not from my application. 

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread ramkrishna vasudevan
Hi all, One thing to note is that, the CSVBulkReader does not support the splittable property. Previously with TextInputFormat we were able to use the block size to split them, but in Streaming world this is not there. Regards Ram On Wed, Mar 8, 2023 at 7:22 AM yuxia wrote: > Hi, as the

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread yuxia
, 2023年 3 月 07日 下午 7:35:47 主题: CSV File Sink in Streaming Use Case Hi, I am working on a Java DataStream application and need to implement a File sink with CSV format. I see that I have two options here - Row and Bulk ( [ https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connecto

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread Shammon FY
Hi You can create a `BulkWriter.Factory` which will create `CsvBulkWriter` and create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in `DataStreamCsvITCase.testCustomBulkWriter` Best, Shammon On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user wrote: > Hi, > > I am working o

CSV File Sink in Streaming Use Case

2023-03-07 Thread Chirag Dewan via user
Hi, I am working on a Java DataStream application and need to implement a File sink with CSV format. I see that I have two options here - Row and Bulk (https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1) So for CSV file distribution wh

[ANNOUNCE] Share your Streaming Stories with us at Current 2023

2023-01-24 Thread Israel Ekpo
Do you have a great data streaming story to share? We want to hear from you! Speaking at Current 2023 is a great way to connect with hundreds of your peers, become more involved in the data streaming community, and have a public platform for you to share your story of the future of streaming and

Re: How to get failed streaming Flink job log in Flink Native K8s mode?

2023-01-03 Thread Yang Wang
On Flink Native K8s mode, the pod of JM and TM will disappear if the > streaming job failed.Are there any ways to get the log of the failed > Streaming job? > I only think of a solution that is to mount job logs to NFS for > persistence through pv-pvc defined in pod-template. > >

How to get failed streaming Flink job log in Flink Native K8s mode?

2022-12-22 Thread hjw
On Flink Native K8s mode, the pod of JM and TM will disappear if the streaming job failed.Are there any ways to get the log of the failed Streaming job? I only think of a solution that is to mount job logs to NFS for persistence through pv-pvc defined in pod-template. ENV: Flink version:1.15.0

Re: Re: How to set disableChaining like streaming multiple INSERT statements in a StatementSet ?

2022-12-10 Thread Hang Ruan
unning job graph in Flink UI? > > Best regards, > Yuxia > > -- > *发件人: *"hjw" > *收件人: *"User" > *发送时间: *星期四, 2022年 12 月 08日 上午 12:05:00 > *主题: *How to set disableChaining like streaming multiple INSERT > statements in a Stat

Re: How to set disableChaining like streaming multiple INSERT statements in a StatementSet ?

2022-12-07 Thread yuxia
Could you please post the image of the running job graph in Flink UI? Best regards, Yuxia 发件人: "hjw" 收件人: "User" 发送时间: 星期四, 2022年 12 月 08日 上午 12:05:00 主题: How to set disableChaining like streaming multiple INSERT statements in a StatementSet ? Hi, I create

How to set disableChaining like streaming multiple INSERT statements in a StatementSet ?

2022-12-07 Thread hjw
Hi, I create a StatementSet that contains multiple INSERT statements. I found that multiple INSERT tasks will be organized in a operator chain when StatementSet.execute() is invoked. How to set disableChaining like streaming multiple INSERT statements in a StatementSet api ? env

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-12 Thread Mason Chen
KafkaSource to fully utilize Kafka rack awareness. 2. > But, this got me wondering...would it be possible to run a streaming app > in an active/active mode, where in normal operation, half of the work was > being done in each DC, and in failover, all of the work would automatically > failover

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Martijn Visser
that FLIP :) Best regards, Martijn On Wed, Oct 5, 2022 at 10:00 AM Andrew Otto wrote: > (Ah, note that I am considering very simple streaming apps here, e.g. > event enrichment apps. No windowing or complex state. The only state is > the Kafka offsets, which I suppose would a

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Andrew Otto
(Ah, note that I am considering very simple streaming apps here, e.g. event enrichment apps. No windowing or complex state. The only state is the Kafka offsets, which I suppose would also have to be managed from Kafka, not from Flink state.) On Wed, Oct 5, 2022 at 9:54 AM Andrew Otto wrote

KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Andrew Otto
the multi-datacenter deployment architecture of streaming applications. A Kafka stretch cluster is one in which the brokers span multiple datacenters, relying on the usual Kafka broker replication for multi DC replication (rather than something like Kafka MirrorMaker). This is feasible with Kafka to

Re: How to rebalance a Flink streaming table?

2022-10-05 Thread Yaroslav Tkachenko
Hey Pavel, I was looking for something similar a while back and the best thing I came up with was using the DataStream API to do all the shuffling and THEN converting the stream to a table using fromDataStream/fromChangelogStream. On Wed, Oct 5, 2022 at 4:54 AM Pavel Penkov wrote: > I have a ta

How to rebalance a Flink streaming table?

2022-10-05 Thread Pavel Penkov
I have a table that reads a Kafka topic and effective parallelism is equal to the number of Kafka partitions. Is there a way to reshuffle the data like with DataStream API to increase effective parallelism?

Failure to recover from failed checkpoint when using S3 streaming file sink

2022-09-10 Thread Oran Shuster via user
This issue has been going on for a while but i couldn't figure it out no matter what I tried Some general info Flink 1.14.5 with checkpoint/HA storage in S3 we have 3 jobs which are identical code the only difference is which kafka topic is read and what prefix is used in the S3 sink this means tha

Re: RichFunctions, streaming, and configuration (it's always empty)

2022-08-07 Thread Ben Edwards
Hi David, Thanks for confirming my research. Adding some more up to date documentation to that function seems like an easy first contribution. Best, Ben

Re: RichFunctions, streaming, and configuration (it's always empty)

2022-08-07 Thread David Anderson
r my test, >> customised with my own MockEnvironment + Configuration. To my surprise, the >> configuration is always empty. So I did some reading and debugging and came >> across this: >> >> >> https://github.com/apache/flink/blob/62786320eb55

Re: RichFunctions, streaming, and configuration (it's always empty)

2022-08-06 Thread Ben Edwards
surprise, the > configuration is always empty. So I did some reading and debugging and came > across this: > > > https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOp

RichFunctions, streaming, and configuration (it's always empty)

2022-08-06 Thread Ben Edwards
, the configuration is always empty. So I did some reading and debugging and came across this: https://github.com/apache/flink/blob/62786320eb555e36fe9fb82168fe97855dc54056/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractUdfStreamOperator.java#L100 Given that open

Re: Decompressing RMQ streaming messages

2022-07-24 Thread Francis Conroy
Hi Venkat, I guess you're using another compression algorithm which isn't zlib, you'll have to adapt the code to work with your algorithm of choice. Kind regards, Francis On Fri, 22 Jul 2022 at 17:27, Ramana wrote: > Hi Francis - Thanks for the snippet. I tried using the same, however I get >

Re: Decompressing RMQ streaming messages

2022-07-22 Thread Ramana
Hi Francis - Thanks for the snippet. I tried using the same, however I get an error. Following is the error - java.util.zip.DataFormatException: incorrect header check. I see multiple errors, i beleive for every message i am seeing this stack trace? Any idea as to what could be causing this? T

Re: Decompressing RMQ streaming messages

2022-07-21 Thread Francis Conroy
Hi Venkat, there's nothing that I know of, but I've written a zlib decompressor for our payloads which was pretty straightforward. public class ZlibDeserializationSchema extends AbstractDeserializationSchema { @Override public byte[] deserialize(byte[] message) throws IOException {

Decompressing RMQ streaming messages

2022-07-21 Thread Ramana
Hi - We have a requirement to read the compressed messages emitting out of RabbitMQ and to have them processed using PyFlink. However, I am not finding any out of the box functionality in PyFlink which can help decompress the messages. Could anybody help me with an example of how to go about this?

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Leonard Xu
Hi, vtygoss > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about accuracy > validation between pipeline based flink and spark. Glad to hear that ! > For bounded data, it's simple to validate the two result se

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Shengkai Fang
Hi, all. >From my understanding, the accuracy for the sync pipeline requires to snapshot the source and sink at some points. It is just like we have a checkpoint that contains all the data at some time for both sink and source. Then we can compare the content in the checkpoint and find the differ

Re:accuracy validation of streaming pipeline

2022-05-24 Thread Xuyang
I think for an unbounded data, we can only check the result at one point of time, that is the work what Watermark[1] does. What about tag one time and to validate the data accuracy at that moment? [1] https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/create/#watermar

Re: accuracy validation of streaming pipeline

2022-05-23 Thread Shengkai Fang
It's a good question. Let me ping @Leonard to share more thoughts. Best, Shengkai vtygoss 于2022年5月20日周五 16:04写道: > Hi community! > > > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about > accuracy validation bet

accuracy validation of streaming pipeline

2022-05-20 Thread vtygoss
Hi community! I'm working on migrating from full-data-pipeline(with spark) to incremental-data-pipeline(with flink cdc), and i met a problem about accuracy validation between pipeline based flink and spark. For bounded data, it's simple to validate the two result sets are consitent or not.

[ANNOUNCE] Call for Presentations for ApacheCon Asia 2022 streaming track

2022-05-18 Thread Yu Li
Hi everyone, ApacheCon Asia [1] will feature the Streaming track for the second year. Please don't hesitate to submit your proposal if there is an interesting project or Flink experience you would like to share with us! The conference will be online (virtual) and the talks will be pre-rec

Re: trigger once (batch job with streaming semantics)

2022-05-10 Thread Martijn Visser
> > Best, > Georg > > Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser < > martijnvis...@apache.org>: > >> Hi Georg, >> >> No they wouldn't. There is no capability out of the box that lets you >> start Flink in streaming mode, run everythi

Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Georg Heiler
ility out of the box that lets you > start Flink in streaming mode, run everything that's available at that > moment and then stops when there's no data anymore. You would need to > trigger the stop yourself. > > Best regards, > > Martijn > > On Fri, 6 May 202

Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Martijn Visser
Hi Georg, No they wouldn't. There is no capability out of the box that lets you start Flink in streaming mode, run everything that's available at that moment and then stops when there's no data anymore. You would need to trigger the stop yourself. Best regards, Martijn On Fri, 6

Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Georg Heiler
Hi, I would disagree: In the case of spark, it is a streaming application that is offering full streaming semantics (but with less cost and bigger latency) as it triggers less often. In particular, windowing and stateful semantics as well as late-arriving data are handled automatically using the

Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Martijn Visser
e.org/docs/latest/structured-streaming-programming-guide.html#triggers > offers a variety of triggers. > > In particular, it also has the "once" mode: > > *One-time micro-batch* The query will execute *only one* micro-batch to > process all the available data and then stop on it

trigger once (batch job with streaming semantics)

2022-05-02 Thread Georg Heiler
Hi, spark https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers offers a variety of triggers. In particular, it also has the "once" mode: *One-time micro-batch* The query will execute *only one* micro-batch to process all the available data and th

Re: Integration Test for Kafka Streaming job

2022-04-21 Thread Alexey Trenikhun
Thank you for information ! From: Farouk Sent: Thursday, April 21, 2022 1:14:00 AM To: Aeden Jameson Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Integration Test for Kafka Streaming job Hi I would recommend to use kafka-junit5 from salesforce

Re: Integration Test for Kafka Streaming job

2022-04-21 Thread Farouk
Hi I would recommend to use kafka-junit5 from salesforce https://github.com/salesforce/kafka-junit On top of that, you can use org.apache.flink.runtime.minicluster.TestingMiniCluster Your stack should be complete. Cheers Le jeu. 21 avr. 2022 à 07:10, Aeden Jameson a écrit : > I've had success

Re: Integration Test for Kafka Streaming job

2022-04-20 Thread Aeden Jameson
I've had success using Kafka for Junit, https://github.com/mguenther/kafka-junit, for these kinds of tests. On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote: > > Hello, > We have Flink job that read data from multiple Kafka topics, transforms data > and write in output Kafka topics. We wan

Integration Test for Kafka Streaming job

2022-04-20 Thread Alexey Trenikhun
Hello, We have Flink job that read data from multiple Kafka topics, transforms data and write in output Kafka topics. We want write integration test for it. I've looked at KafkaTableITCase, we can do similar setup of Kafka topics, prepopulate data but since in our case it is endless stream, we n

Re: HDFS streaming source concerns

2022-04-19 Thread Adrian Bednarz
? Adrian On Fri, Apr 8, 2022 at 8:20 PM Roman Khachatryan wrote: > Hi Carlos, > > AFAIK, Flink FileSource is capable of checkpointing while reading the > files (at least in Streaming Mode). > As for the watermarks, I think FLIP-182 [1] could solve the problem; > however, it

Re: HDFS streaming source concerns

2022-04-08 Thread Roman Khachatryan
Hi Carlos, AFAIK, Flink FileSource is capable of checkpointing while reading the files (at least in Streaming Mode). As for the watermarks, I think FLIP-182 [1] could solve the problem; however, it's currently under development. I'm also pulling in Arvid and Fabian who are more familia

HDFS streaming source concerns

2022-04-06 Thread Carlos Downey
Hi, We have an in-house platform that we want to integrate with external clients via HDFS. They have lots of existing files and they continuously put more data to HDFS. Ideally, we would like to have a Flink job that takes care of ingesting data as one of the requirements is to execute SQL on top

Re: streaming mode with both finite and infinite input sources

2022-02-25 Thread Dawid Wysakowicz
-configuring-checkpointing On 25/02/2022 08:18, Jin Yi wrote: so we have a streaming job where the main work to be done is processing infinite kafka sources.  recently, i added a fromCollection (finite) source to simply write some state once upon startup.  this all seems to work fine.  the finite

Re: streaming mode with both finite and infinite input sources

2022-02-25 Thread Yuval Itzchakov
One possible option is to look into the hybrid source released in Flink 1.14 to support your use-case: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/ On Fri, Feb 25, 2022, 09:19 Jin Yi wrote: > so we have a streaming job where the main w

streaming mode with both finite and infinite input sources

2022-02-24 Thread Jin Yi
so we have a streaming job where the main work to be done is processing infinite kafka sources. recently, i added a fromCollection (finite) source to simply write some state once upon startup. this all seems to work fine. the finite source operators all finish, while all the infinite source

Joining a flink materialised view table with a streaming table

2022-02-22 Thread Francis Conroy
Hi all, I recently put up a question about a deduplication query related to a join and realised that I was probably asking the wrong question. I'm using Flink 1.15-SNAPSHOT (97ddc39945cda9bf1f52ab159852fdb606201cf2) as we're using the RabbitMQ connector with pyflink. We won't go to prod until 1.15

Re: Apache Flink - Continuously streaming data using jdbc connector

2022-02-21 Thread M Singh
6:23 AM M Singh wrote: Hi Folks: I am trying to monitor a jdbc source and continuously streaming data in an application using the jdbc connector.  However, the application stops after reading the data in the table. I've checked the docs (https://nightlies.apache.org/flink/flink-docs-m

Re: Apache Flink - Continuously streaming data using jdbc connector

2022-02-20 Thread Guowei Ma
Hi, You can try flink's cdc connector [1] to see if it meets your needs. [1] https://github.com/ververica/flink-cdc-connectors Best, Guowei On Mon, Feb 21, 2022 at 6:23 AM M Singh wrote: > Hi Folks: > > I am trying to monitor a jdbc source and continuously streaming data in an

Apache Flink - Continuously streaming data using jdbc connector

2022-02-20 Thread M Singh
Hi Folks: I am trying to monitor a jdbc source and continuously streaming data in an application using the jdbc connector.  However, the application stops after reading the data in the table. I've checked the docs (https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table

Re: "No operators defined in streaming topology" error when Flink app still starts successfully

2022-02-15 Thread Chesnay Schepler
). String executionPlan = env.getExecutionPlan(); env.execute(); logger.info("Started job; executionPlan={}", getExecutionPlan); On 14/02/2022 17:40, Shane Bishop wrote: Hi all, My team has started seeing the error "java.lang.IllegalStateException: No operators defined in s

"No operators defined in streaming topology" error when Flink app still starts successfully

2022-02-14 Thread Shane Bishop
Hi all, My team has started seeing the error "java.lang.IllegalStateException: No operators defined in streaming topology. Cannot execute." However, even with this error, the Flink application starts and runs fine, and the Flink job renders fine in the Flink Dashboard. Attached i

How to test stateful streaming pipeline?

2022-02-01 Thread Marcin Kuthan
Hi This is my first question to the community so welcome everyone :) On a daily basis I’m using Apache Beam for developing streaming pipelines but I would like to learn native Flink as well. I’m looking for examples on how to write integration tests with full programmatic control over watermark

Re: Unbounded streaming with table API and large json as one of the columns

2022-01-28 Thread HG
Thanks On Fri, Jan 28, 2022, 07:47 Caizhi Weng wrote: > Hi! > > This job will work as long as your SQL statement is valid. Did you meet > some difficulties? Or what is your concern? A record of 100K is sort of > large, but I've seen quite a lot of jobs with such record size so it is OK. > > HG

Re: Unbounded streaming with table API and large json as one of the columns

2022-01-27 Thread Caizhi Weng
Hi! This job will work as long as your SQL statement is valid. Did you meet some difficulties? Or what is your concern? A record of 100K is sort of large, but I've seen quite a lot of jobs with such record size so it is OK. HG 于2022年1月27日周四 02:57写道: > Hi, > > I need to calculate elapsed times b

Unbounded streaming with table API and large json as one of the columns

2022-01-26 Thread HG
Hi, I need to calculate elapsed times between steps of a transaction. Each step is an event. All steps belonging to a single transaction have the same transaction id. Every event has a handling time. All information is part of a large JSON structure. But I can have the incoming source supply trans

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
DataStream API -- Sender:Guowei Ma Sent At:2022 Jan. 26 (Wed.) 21:51 Recipient:Shawn Du Cc:user Subject:Re: create savepoint on bounded source in streaming mode Hi, Shawn Thank you for your sharing. Unfortunately I do

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Guowei Ma
ent At:2022 Jan. 26 (Wed.) 19:50 > Recipient:Shawn Du > Cc:user > Subject:Re: create savepoint on bounded source in streaming mode > > Hi,Shawn > > You want to use the correct state(n-1) for day n-1 and the full amount of > data for day n to produce the correct state(n) for da

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
right! -- Sender:Guowei Ma Sent At:2022 Jan. 26 (Wed.) 19:50 Recipient:Shawn Du Cc:user Subject:Re: create savepoint on bounded source in streaming mode Hi,Shawn You want to use the correct state(n-1) for day n-1 and the

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Guowei Ma
: > Hi Gaowei, > > think the case: > we have one streaming application built by flink, but kinds of > reason, the event may be disordered or delayed terribly. > we want to replay the data day by day(the data was processed like > reordered.). it

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
Hi Gaowei, think the case: we have one streaming application built by flink, but kinds of reason, the event may be disordered or delayed terribly. we want to replay the data day by day(the data was processed like reordered.). it looks like a batching job but with

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
bounded source in streaming mode Hi Shawn, You could also take a look at the hybrid source[1] Best, Dawid [1]https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ On 26/01/2022 08:39, Guowei Ma wrote: Hi Shawn Currently Flink can not trigger the sp at the end of

Re: create savepoint on bounded source in streaming mode

2022-01-25 Thread Dawid Wysakowicz
   Thanks. -- Sender:Guowei Ma Sent At:2022 Jan. 26 (Wed.) 14:04 Recipient:Shawn Du Cc:user Subject:Re: create savepoint on bounded source in streaming mode Hi, Shawn I think Flink does not support this mechanism yet. Would you like to share the scenari

Re: create savepoint on bounded source in streaming mode

2022-01-25 Thread Guowei Ma
any ideas? > >Thanks. > > -- > Sender:Guowei Ma > Sent At:2022 Jan. 26 (Wed.) 14:04 > Recipient:Shawn Du > Cc:user > Subject:Re: create savepoint on bounded source in streaming mode > > Hi, Shawn > I think Fl

Re: create savepoint on bounded source in streaming mode

2022-01-25 Thread Shawn Du
in streaming mode Hi, Shawn I think Flink does not support this mechanism yet. Would you like to share the scenario in which you need this savepoint at the end of the bounded input? Best, Guowei On Wed, Jan 26, 2022 at 1:50 PM Shawn Du wrote: Hi experts, assume I have several files and I want

Re: create savepoint on bounded source in streaming mode

2022-01-25 Thread Guowei Ma
lay these files in order in > streaming mode and create a savepoint when files play at the end. it is > possible? > I wrote a simple test app, and job are finished when source is at the end. > I have no chance to creat a savepoint. please help. > > Thanks > Shawn >

create savepoint on bounded source in streaming mode

2022-01-25 Thread Shawn Du
Hi experts, assume I have several files and I want replay these files in order in streaming mode and create a savepoint when files play at the end. it is possible? I wrote a simple test app, and job are finished when source is at the end. I have no chance to creat a savepoint. please help

Re: Parquet files in streaming mode

2021-12-27 Thread Martijn Visser
ink to output files in s3. >> >> When running in Streaming mode, it seems that parquet files are flushed >> and rolled at each checkpoint. The result is a crazy high number of very >> small parquet files which completely defeats the purpose of that format. >> >> &

Re: Parquet files in streaming mode

2021-12-27 Thread Deepak Sharma
t; When running in Streaming mode, it seems that parquet files are flushed > and rolled at each checkpoint. The result is a crazy high number of very > small parquet files which completely defeats the purpose of that format. > > > Is there a way to build larger output parquet files? O

Parquet files in streaming mode

2021-12-27 Thread Mathieu D
Hello, We’re trying to use a Parquet file sink to output files in s3. When running in Streaming mode, it seems that parquet files are flushed and rolled at each checkpoint. The result is a crazy high number of very small parquet files which completely defeats the purpose of that format. Is

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-22 Thread Yuval Itzchakov
Sink: SnowflakeSinkProvider(...) (2/3) (attempt #0) with attempt id >> 8d046c60a84900cba31877ec28f81124 to ... (dataPort=64218) with allocation id >> fe0a5941283557538901c8a9774a2584 >> 2021-12-22 09:25:28,890 INFO o.a.f.r.e.Execution ...-sink-batch -> Sink: >> SnowflakeSink

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-22 Thread Caizhi Weng
8466a27572 > 2021-12-22 09:25:28,917 INFO Finished successfully with value: 0 > 2021-12-22 09:25:28,922 INFO o.a.f.r.e.ClusterEntrypoint Shutting > StandaloneSessionClusterEntrypoint down with application status UNKNOWN. > Diagnostics Cluster entrypoint has been closed externally.. >

  1   2   3   4   5   6   7   8   9   10   >