What do you consider complete? (I ask this since you are using element count and processing time triggers)
Generally the idea is that you can feed the output PCollection<WriteFilesResult> to a stateful DoFn with an @OnWindowExpiration setup but this works only if you completeness is controlled by watermark advancement. Note that @OnWindowExpiration is new to Beam so it may not yet work with Spark. On Wed, Jun 24, 2020 at 4:22 AM Sunny, Mani Kolbe <[email protected]> wrote: > Hello, > > > > We are developing a beam pipeline which runs on SparkRunner on streaming > mode. This pipeline read from Kinesis, do some translations, filtering and > finally output to S3 using AvroIO writer. We are using Fixed windows with > triggers based on element count and processing time intervals. Outputs path > is partitioned by window start timestamp. allowedLateness=0sec > > > > This is all working fine. But to support some legacy downstream services, > we need to ensure that output partitions has marker file to indicate that > data completeness and is ready for downstream conception. Something like > hadoop’s .SUCCESS file or a .COMPLETE. Is there way to create such a marker > file in Beam on window closing event? > > > > Regards, > > Mani >
