Lets say I set FixedWindows.of(Duration.standardMinutes(10)) Since my event time is determined by WithTimestamps.of(Instant.now()), I can safely assume window is closed once that 10 min period is passed. But how do I ensure that all records belonged to that window are already flushed to disk. That I can safely create a .COMPLETE flag?
From: Luke Cwik <[email protected]> Sent: Wednesday, June 24, 2020 5:10 PM To: user <[email protected]> Subject: Re: How to create a marker file when each window completes? CAUTION: This email originated from outside of D&B. Please do not click links or open attachments unless you recognize the sender and know the content is safe. Knowing when a window is "closed" is based upon having the watermark advance which is based upon even time. On Wed, Jun 24, 2020 at 9:05 AM Sunny, Mani Kolbe <[email protected]<mailto:[email protected]>> wrote: Hi Luke, Sorry forgot to mention, we override the event timestamp to current using WithTimestamps.of(Instant.now()) as we don’t really care actual event time. So FixedWindow closes when current time passes window.end time. It is a standard practice in oozie world to trigger downstream jobs based on marker files. So thought someone might have encountered this before problem before me. Regards, Mani From: Luke Cwik <[email protected]<mailto:[email protected]>> Sent: Wednesday, June 24, 2020 4:23 PM To: user <[email protected]<mailto:[email protected]>> Subject: Re: How to create a marker file when each window completes? CAUTION: This email originated from outside of D&B. Please do not click links or open attachments unless you recognize the sender and know the content is safe. What do you consider complete? (I ask this since you are using element count and processing time triggers) Generally the idea is that you can feed the output PCollection<WriteFilesResult> to a stateful DoFn with an @OnWindowExpiration setup but this works only if you completeness is controlled by watermark advancement. Note that @OnWindowExpiration is new to Beam so it may not yet work with Spark. On Wed, Jun 24, 2020 at 4:22 AM Sunny, Mani Kolbe <[email protected]<mailto:[email protected]>> wrote: Hello, We are developing a beam pipeline which runs on SparkRunner on streaming mode. This pipeline read from Kinesis, do some translations, filtering and finally output to S3 using AvroIO writer. We are using Fixed windows with triggers based on element count and processing time intervals. Outputs path is partitioned by window start timestamp. allowedLateness=0sec This is all working fine. But to support some legacy downstream services, we need to ensure that output partitions has marker file to indicate that data completeness and is ready for downstream conception. Something like hadoop’s .SUCCESS file or a .COMPLETE. Is there way to create such a marker file in Beam on window closing event? Regards, Mani
