Question about checkpointing with stateful operators and state recovery

Federico D'Ambrosio Thu, 28 Sep 2017 02:46:47 -0700

Hi, I've got a couple of questions concerning the topics in the subject:

    1. If an operator is getting applied on a keyed stream, do I still have
to implement the CheckpointedFunction trait and define the snapshotState
and initializeState methods, in order to successfully recover the state
from a job failure?


    2. While using a FlinkKafkaConsumer, enabling checkpointing allows
exactly once semantics end to end, provided that the sink is able to
guarantee the same. Do I have to set
setCommitOffsetsOnCheckpoints(true)? How would someone implement exactly
once semantics in a sink?

    3. What are the advantages of externalized checkpoints and which are
the cases where I would want to use them?

    4. Let's suppose a scenario where: checkpointing is enabled every 10
seconds, I have a kafka consumer which is set to start from the latest
records, a sink providing at least once semantics and a stateful keyed
operator inbetween the consumer and the sink. Is it correct that, in case
of task failure, happens the following?
        - the kafka consumer gets reverted to the latest offset (does it
happen even if I don't set setCommitOffsetsOnCheckpoints(true)?)
        - the operator state gets reverted to the latest checkpoint
        - the sink is stateless so it doesn't really care about what
happened
        - the stream restarts and probably some of the events coming to the
sink have already been     processed before

Thank you for attention,
Kind regards,
Federico

Question about checkpointing with stateful operators and state recovery

Reply via email to