Hi Stephan -

I agree that the savepoint-shutdown-restart model is nominally the same as the 
rolling restart with one notable exception - a lack of atomicity. There is a 
gap between invoking the savepoint command and the shutdown command. My problem 
isn’t fortunate enough to have idempotent operations: replaying events ends up 
double-counting. With the current model (or at least as far as I can tell from 
the documentation you linked) I will double-process some events that are 
slightly after the savepoint.

One thing that could alleviate this is an atomic shutdown-with-savepoint (or 
savepoint-with-shutdown, I’m not so picky about which way it is, I only want it 
to be atomic). With this, I can be assured that the savepoint matches the 
actual last-processed state. 

My understanding of the processing within Flink is that this could be modeled 
by a “savepoint” event followed by a “shutdown” event into the event stream, 
but my understanding is a bit cartoonish so I’m sure it’s more involved.

Ron
—
Ron Crocker
Principal Engineer & Architect
( ( •)) New Relic
rcroc...@newrelic.com
M: +1 630 363 8835

> On Dec 20, 2016, at 12:40 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> Hi Andrew!
> 
> Would be great to know if what Aljoscha described works for you. Ideally, 
> this costs no more than a failure/recovery cycle, which one typically also 
> gets with rolling upgrades.
> 
> Best,
> Stephan

Reply via email to