Hello,

We are developing a beam pipeline which runs on SparkRunner on streaming mode. 
This pipeline read from Kinesis, do some translations, filtering and finally 
output to S3 using AvroIO writer. We are using Fixed windows with triggers 
based on element count and processing time intervals. Outputs path is 
partitioned by window start timestamp. allowedLateness=0sec

This is working fine. But as we productionize it, we need to ensure no data 
loss during failures or when stopping streaming for planned downtime etc. We 
are worried if already-read-but-yet-to-be-processed records will be lost during 
such events. Essentially we need a way to pause reading from source, allow it 
drain already read records, then do some maintenance activity and then resume 
streaming. Or move kinesis checkpointing to after "processing" parDos. Is any 
way to implement these?

Regards,
Mani

Reply via email to