Hi Till,

Thank you for the response, we are currently running flink with an
increased memory usage, so far the taskmanager is working fine, we will
check if there is any further issue and will update you.

Regards
Sambaran

On Wed, Apr 28, 2021 at 5:33 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Sambaran,
>
> could you also share the cause why the checkpoints could not be discarded
> with us?
>
> With Flink 1.10, we introduced a stricter memory model for the
> TaskManagers. That could be a reason why you see more TaskManagers being
> killed by the underlying resource management system. You could maybe check
> whether your resource management system logs that some containers/pods are
> exceeding their memory limitations. If this is the case, then you should
> give your Flink processes a bit more memory [1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html
>
> Cheers,
> Till
>
> On Tue, Apr 27, 2021 at 6:48 PM Sambaran <sambaran2...@gmail.com> wrote:
>
>> Hi there,
>>
>> We have recently migrated to flink 1.12 from 1.7, although the jobs are
>> running fine, sometimes the task manager is getting killed (much frequently
>> 2-3 times a day).
>>
>> Logs:
>> INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -
>> RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
>>
>> While checking more logs we see flink not able to discard old checkpoints
>> org.apache.flink.runtime.checkpoint.CheckpointsCleaner       [] - Could
>> not discard completed checkpoint 173.
>>
>> We are not sure of what is the reason here, has anyone faced this before?
>>
>> Regards
>> Sambaran
>>
>

Reply via email to