Re: Beam on Flink runner not able to advance watermarks on a high load

Jan Lukavský Tue, 16 Nov 2021 01:11:33 -0800

Hi Sandeep,

- dev@beam <mailto:[email protected]>

The watermark estimation itself should not be related to load. Can youplease clarify, if


 a) you are using any custom timestamp policy?

b) you see any backpressure in Flink's UI? Backpressure could - undersome circumstances - cause delays in watermark propagation. It _might_help to increase parallelism in that case.


Best,

 Jan

On 11/15/21 18:22, Kathula, Sandeep wrote:

Hi,
We are running a Beam application on Flink runner (Beam 2.29 andFlink 1.12) which reads from Kafka and writes to S3 once every 5minutes. My window and s3 writes looks like
PCollection<GenericRecord>.apply("Batch Events",Window.<GenericRecord>into(
FixedWindows.of(Duration.standardMinutes(5)))

.triggering(AfterWatermark.pastEndOfWindow())

.withAllowedLateness(Duration.ZERO, Window.ClosingBehavior.FIRE_ALWAYS)

.discardingFiredPanes())

.apply(FileIO.<GenericRecord>write()

.via(ParquetIO.sink(schema))

.to(outputPath)

.withNumShards(5)

.withNaming(new CustomFileNaming("snappy.parquet")));
Resources allocated: 5 task slots each with 3 CPU and 32 GB RAM. Weare using RocksDB as state backend and giving 50% of memory to off-heap.
Its running fine with lighter loads. But when it gets heavier loadfrom Kafka (7500 or more records per sec – each record around 7KB insize), we are seeing that no files are being written to S3.We areusing AfterWatermark.pastEndOfWindow() which is trigerring only whenthe watermark reaches the end of window.
After debugging we found that watermarks are not being advanced duringheavy loads and as a result event time triggers after watermarkreaches end of window because of which s3 writes will happen are notgetting triggered. So the data is accumulating in off-heap whichresults in out of memory after some time.
Can you please let us know why watermarks are not advancing under highload.
Thanks,

Sandeep

Re: Beam on Flink runner not able to advance watermarks on a high load

Reply via email to