I don't think barries can "expire" as of now. Might be a nice idea
thought -- I don't know if this might be a problem in production.

Furthermore, I want to point out, that an "expiring checkpoint" would
not break exactly-once processing, as the latest successful checkpoint
can always be used to recover correctly. Only the recovery-time would be
increase. because if a "barrier expires" and no checkpoint can be
stored, more data has to be replayed using the "old" checkpoint".


-Matthias

On 05/12/2016 09:21 PM, Srikanth wrote:
> Hello,
> 
> I was reading about Flink's checkpoint and wanted to check if I
> correctly understood the usage of barriers for exactly once processing.
>  1) Operator does alignment by buffering records coming after a barrier
> until it receives barrier from all upstream operators instances.
>  2) Barrier is always preceded by a watermark to trigger processing all
> windows that are complete.
>  3) Records in windows that are not triggered are also saved as part of
> checkpoint. These windows are repopulated when restoring from checkpoints. 
> 
> In production setups, were there any cases where alignment during
> checkpointing caused unacceptable latency?
> If so, is there a way to indicate say wait for a MAX 100 ms? That way we
> have exactly-once in most situations but prefer at least once over
> higher latency in corner cases.
> 
> Srikanth

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to