The issue really is we are trying to avoid checkpointing as datasets are really
heavy and all of the states are really transient in a few of our apps (flushed
within few seconds). So high volume/velocity and transient nature of state make
those app good candidates to just not have checkpoints.
We do have offsets committed to Kafka AND we have “some” tolerance for gap /
duplicate. However, we do want to handle “graceful” restarts / shutdown. For
shutdown, we have been taking savepoints (which works great) but for restart,
we just can’t find a way.
Bottom line - we are trading off resiliency for resource utilization and
performance but would like to harden apps for production deployments as much as
Hope that makes sense.
> On Mar 6, 2018, at 10:19 PM, Tzu-Li Tai <tzuli...@gmail.com> wrote:
> Hi Ashish,
> Could you elaborate a bit more on why you think the restart of all operators
> lead to data loss?
> When restart occurs, Flink will restart the job from the latest complete
> All operator states will be reloaded with state written in that checkpoint,
> and the position of the input stream will also be re-winded.
> I don't think there is a way to force a checkpoint before restarting occurs,
> but as I mentioned, that should not be required, because the last complete
> checkpoint will be used.
> Am I missing something in your particular setup?
> Sent from: