Re: After job cancel, leftover ZK state prevents job manager startup

2018-12-12 Thread Micah Wylde
orked on the fix to hear his opinion. Maybe the current >> fix only made the problem less likely to appear but is not complete, yet? >> >> Best, >> Stefan >> >> > On 11. Dec 2018, at 05:19, Micah Wylde wrote: >> > >> > Hello, >> > >&g

After job cancel, leftover ZK state prevents job manager startup

2018-12-10 Thread Micah Wylde
Hello, We've been seeing an issue with several Flink 1.5.4 clusters that looks like this: 1. Job is cancelled with a savepoint 2. The jar is deleted from our HA blobstore (S3) 3. The jobgraph in ZK is *not* deleted 4. We restart the cluster 5. Startup fails in recovery because the jar is not