What do you consider complete? (I ask this since you are using element
count and processing time triggers)

Generally the idea is that you can feed the output
PCollection<WriteFilesResult> to a stateful DoFn with
an @OnWindowExpiration setup but this works only if you completeness is
controlled by watermark advancement. Note that @OnWindowExpiration is new
to Beam so it may not yet work with Spark.

On Wed, Jun 24, 2020 at 4:22 AM Sunny, Mani Kolbe <[email protected]> wrote:

> Hello,
>
>
>
> We are developing a beam pipeline which runs on SparkRunner on streaming
> mode. This pipeline read from Kinesis, do some translations, filtering and
> finally output to S3 using AvroIO writer. We are using Fixed windows with
> triggers based on element count and processing time intervals. Outputs path
> is partitioned by window start timestamp. allowedLateness=0sec
>
>
>
> This is all working fine. But to support some legacy downstream services,
> we need to ensure that output partitions has marker file to indicate that
> data completeness and is ready for downstream conception. Something like
> hadoop’s .SUCCESS file or a .COMPLETE. Is there way to create such a marker
> file in Beam on window closing event?
>
>
>
> Regards,
>
> Mani
>

Reply via email to