Hi Gordon,

The issue really is we are trying to avoid checkpointing as datasets are really 
heavy and all of the states are really transient in a few of our apps (flushed 
within few seconds). So high volume/velocity and transient nature of state make 
those app good candidates to just not have checkpoints. 

We do have offsets committed to Kafka AND we have “some” tolerance for gap / 
duplicate. However, we do want to handle “graceful” restarts / shutdown. For 
shutdown, we have been taking savepoints (which works great) but for restart, 
we just can’t find a way. 

Bottom line - we are trading off resiliency for resource utilization and 
performance but would like to harden apps for production deployments as much as 
we can.

Hope that makes sense.

Thanks, Ashish

> On Mar 6, 2018, at 10:19 PM, Tzu-Li Tai <tzuli...@gmail.com> wrote:
> 
> Hi Ashish,
> 
> Could you elaborate a bit more on why you think the restart of all operators
> lead to data loss?
> 
> When restart occurs, Flink will restart the job from the latest complete
> checkpoint.
> All operator states will be reloaded with state written in that checkpoint,
> and the position of the input stream will also be re-winded.
> 
> I don't think there is a way to force a checkpoint before restarting occurs,
> but as I mentioned, that should not be required, because the last complete
> checkpoint will be used.
> Am I missing something in your particular setup?
> 
> Cheers,
> Gordon
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to