Great, thanks for the update. On Wed, Apr 28, 2021 at 7:08 PM Sambaran <sambaran2...@gmail.com> wrote:
> Hi Till, > > Thank you for the response, we are currently running flink with an > increased memory usage, so far the taskmanager is working fine, we will > check if there is any further issue and will update you. > > Regards > Sambaran > > On Wed, Apr 28, 2021 at 5:33 PM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Sambaran, >> >> could you also share the cause why the checkpoints could not be discarded >> with us? >> >> With Flink 1.10, we introduced a stricter memory model for the >> TaskManagers. That could be a reason why you see more TaskManagers being >> killed by the underlying resource management system. You could maybe check >> whether your resource management system logs that some containers/pods are >> exceeding their memory limitations. If this is the case, then you should >> give your Flink processes a bit more memory [1]. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html >> >> Cheers, >> Till >> >> On Tue, Apr 27, 2021 at 6:48 PM Sambaran <sambaran2...@gmail.com> wrote: >> >>> Hi there, >>> >>> We have recently migrated to flink 1.12 from 1.7, although the jobs are >>> running fine, sometimes the task manager is getting killed (much frequently >>> 2-3 times a day). >>> >>> Logs: >>> INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - >>> RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. >>> >>> While checking more logs we see flink not able to discard old checkpoints >>> org.apache.flink.runtime.checkpoint.CheckpointsCleaner [] - Could >>> not discard completed checkpoint 173. >>> >>> We are not sure of what is the reason here, has anyone faced this before? >>> >>> Regards >>> Sambaran >>> >>