Re: [Question] Can Apache beam used for chaining IO transforms

Alexey Romanenko Wed, 26 Jan 2022 08:34:11 -0800

Hi Yomal de Silva,

IIUC, you need to pass downstream the records only once they were stored in DB 
with JdbcIO, don’t you? And your pipeline is unbounded.


So, why not to split into windows your input data PCollection (if it was not 
done upstream) and use Wait.on() with JdbcIO.write().withResults()? This is 
exactly a case for which it was initially developed.

—
Alexey
 

> On 26 Jan 2022, at 10:16, Yomal de Silva <yomal.prav...@gmail.com> wrote:
> 
> Hi beam team,
> I have been working on a data pipeline which consumes data from an SFTP 
> location and the data is passed through multiple transforms and finally 
> published to a kafka topic. During this we need to maintain the state of the 
> file, so we write to the database during the pipeline. 
> We came across that JDBCIo write is only considered as a sink and it returns 
> a PDone. But for our requirement we need to pass down the elements that were 
> persisted.
> 
> Using writeVoid is also not useful since this is an unbounded stream and we 
> cannot use Wait.On with that without using windows and triggers. 
> 
> Read file --> write the file state in db --> enrich data --> publish to kafka 
> --> update file state
> 
> The above is the basic requirement from the pipeline. Any suggestions?
> 
> Thank you

Re: [Question] Can Apache beam used for chaining IO transforms

Reply via email to