Hi guys,

It looks suspicious that the TM pod termination is potentially delayed by
the reconnect to a killed JM.
I created an issue to investigate this:
https://issues.apache.org/jira/browse/FLINK-15946
Let's continue the discussion there.

Best,
Andrey

On Wed, Feb 5, 2020 at 11:49 AM Yang Wang <danrtsey...@gmail.com> wrote:

> Maybe you need to check the kubelet logs to see why it get stuck in the
> "Terminating" state
> for long time. Even it needs to clean up the ephemeral storage, it should
> not take so long
> time.
>
>
> Best,
> Yang
>
> Li Peng <li.p...@doordash.com> 于2020年2月5日周三 上午10:42写道:
>
>> My yml files follow most of the instructions here:
>>
>>
>> http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/
>>
>> What command did you use to delete the deployments? I use : helm
>> --tiller-namespace prod delete --purge my-deployment
>>
>> I noticed that for environments without much data (like staging), this
>> works flawlessly, but in production with high volume of data, it gets stuck
>> in a loop. I suspect that the extra time needed to cleanup the task
>> managers with high traffic, delays the shutdown until after the job manager
>> terminates, and then the task manager gets stuck in a loop when it detects
>> the job manager is dead.
>>
>> Thanks,
>> Li
>>
>>>

Reply via email to