Re: Taskmanager killed often after migrating to flink 1.12

Till Rohrmann Thu, 29 Apr 2021 01:36:18 -0700

Great, thanks for the update.

On Wed, Apr 28, 2021 at 7:08 PM Sambaran <sambaran2...@gmail.com> wrote:


> Hi Till,
>
> Thank you for the response, we are currently running flink with an
> increased memory usage, so far the taskmanager is working fine, we will
> check if there is any further issue and will update you.
>
> Regards
> Sambaran
>
> On Wed, Apr 28, 2021 at 5:33 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hi Sambaran,
>>
>> could you also share the cause why the checkpoints could not be discarded
>> with us?
>>
>> With Flink 1.10, we introduced a stricter memory model for the
>> TaskManagers. That could be a reason why you see more TaskManagers being
>> killed by the underlying resource management system. You could maybe check
>> whether your resource management system logs that some containers/pods are
>> exceeding their memory limitations. If this is the case, then you should
>> give your Flink processes a bit more memory [1].
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html
>>
>> Cheers,
>> Till
>>
>> On Tue, Apr 27, 2021 at 6:48 PM Sambaran <sambaran2...@gmail.com> wrote:
>>
>>> Hi there,
>>>
>>> We have recently migrated to flink 1.12 from 1.7, although the jobs are
>>> running fine, sometimes the task manager is getting killed (much frequently
>>> 2-3 times a day).
>>>
>>> Logs:
>>> INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -
>>> RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
>>>
>>> While checking more logs we see flink not able to discard old checkpoints
>>> org.apache.flink.runtime.checkpoint.CheckpointsCleaner       [] - Could
>>> not discard completed checkpoint 173.
>>>
>>> We are not sure of what is the reason here, has anyone faced this before?
>>>
>>> Regards
>>> Sambaran
>>>
>>

Re: Taskmanager killed often after migrating to flink 1.12

Reply via email to