Hi Till,

Thank you for the explanation.

Looking forward for this feature to be added in the near future.

Best,

Kenzyme

-------- Original Message --------
On Oct. 20, 2020, 8:46 a.m., Till Rohrmann wrote:

> Hi Kenzyme,
>
> at the moment Flink will stop the execution of jobs when it loses its 
> connection to ZooKeeper for whatever reason. If a ZK rolling update can cause 
> the connection loss to the quorum, then it's what you are seeing. FLINK-10052 
> wants to add a feature which allows Flink to tolerate a SUSPENDED ZK 
> connection for a short amount of time. I haven't tried it out with a rolling 
> ZK update but it might solve the problem you are observing.
>
> Cheers,
> Till
>
> On Tue, Oct 20, 2020 at 5:41 AM Kenzyme <[email protected]> wrote:
>
>> Hi Roman,
>>
>> Thank you for your reply.
>>
>> I'm not 100% sure if those features discussed in the threads will fix the 
>> issue, but they seemed related in some way.
>>
>> Basically, the expected behaviour I had for Flink was similar to how Kafka 
>> works i.e. Kafka services continues w/o disruption whenever ZK quorum is 
>> maintained during rolling updates.
>>
>> Best,
>>
>> Kenzyme Le
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Monday, October 19th, 2020 at 4:38 PM, Khachatryan Roman 
>> <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> AFAIK, the features discussed in the threads you mentioned are not yet 
>>> implemented. So there is no way to avoid Job restarts in case of ZK rolling 
>>> restarts.
>>> I'm pulling in Till as he might know better.
>>>
>>> Regards,
>>> Roman
>>>
>>> On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Related to 
>>>> https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w...@mail.gmail.com%3E
>>>>  and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if 
>>>> there's a way to prevent Flink instances from failing while doing a 
>>>> rolling restart on ZK followers while still keeping the quorum?
>>>>
>>>> This is what was shown in Flink logs while restarting ZK :
>>>> ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are 
>>>> not monitored (temporarily).
>>>>
>>>> I was able to reproduce this twice with a quorum of 5 ZK nodes while doing 
>>>> some ZK maintenance.
>>>>
>>>> Thanks!
>>>>
>>>> Kenzyme Le

Reply via email to