Hi,
    I have a Beam code with Flink runner which reads from Kafka, applies 10 
minutes window and writes the data into parquet format in S3. Its running fine 
when everything goes well. But due to some issue, if my pipeline stops running 
for an hour or two, then for it to catch up from latest Flink checkpoint it’s 
trying to read data from Kafka at a very high rate and trying to dump to S3 in 
parquet format. As the data processed in the latest window period of 10 minutes 
is huge because of catching up with lag, it is failing with out of memory and 
its never able to be run successfully with my current resources. I checked that 
there is a Beam property called maxBundleSize through which we can control 
maximum size of a bundle but I didn’t find any property to handle number of 
bundles processed within the window interval.

   I wanted to check if there is any way to limit number of records processed 
within a window interval.

Thanks,
Sandeep

Reply via email to