Re: [Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Alexey Romanenko
Hmm, do you mean that if you have a windowed PCollection as an input for "JdbcIO.write().withResults()" (“signal" transform) and then you apply "Wait.on(signal)" then it doesn’t work as expected [1]? [1]

Re: [Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Alexey Romanenko
Thanks Cham but actually we already had a similar open issue for this [1] and I’m currently working on that one. Also, I created an umbrella Jira for such tasks for all other Java SDK IOs since I believe this behaviour can be quite demanded in general [2] I’d be happy to discuss it in more

Synchronizing after parallel stages

2022-01-26 Thread Mark Striebeck
In our beam pipeline we are writing out the results (to MySQL) in separate (parallel) stages. When all stages finished writing successfully, we want to set a "final" flag on all DB tables (so, any client who uses this data will always use the latest data, but only if it was completely processed).

Re: [Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Yomal de Silva
Yes, it is needed for KafkaIo as well. Thank you for creating the ticket. For such JDBC operations are there any suggestions on a way forward? Currently trying out an approach using JDBC template inside a DoFn. This creates a complexity when performing batch operations. On Thu, Jan 27, 2022 at

Re: [Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Yomal de Silva
Hi Alexey, Thank you for your reply. Yes, that's the use case here. Tried the approach that you have suggested, but for the window to get triggered shouldn't we apply some transform like GroupByKey? or else the window is ignored right? As we dont have such a transform applied, we did not observe

Re: [Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Alexey Romanenko
Hi Yomal de Silva, IIUC, you need to pass downstream the records only once they were stored in DB with JdbcIO, don’t you? And your pipeline is unbounded. So, why not to split into windows your input data PCollection (if it was not done upstream) and use Wait.on() with

[Question] Can Apache beam used for chaining IO transforms

2022-01-26 Thread Yomal de Silva
Hi beam team, I have been working on a data pipeline which consumes data from an SFTP location and the data is passed through multiple transforms and finally published to a kafka topic. During this we need to maintain the state of the file, so we write to the database during the pipeline. We came