@Konstantion: Could you share a relevant part of the heap dump just to get a second look?
The timer tasks are responsible to abort the checkpoint if a checkpoint timeout occurs. You can decrease the timeout via the CheckpointConfig (env.getCheckpointConfig().setCheckpointTimeout(long)), the current default is 10 mins. On a first skim of the checkpoint coordinator code I didn't see anything that cancels these tasks when the checkpoint is fully ack'd. @Stephan: I think we should do that. What do you think? On Tue, Feb 28, 2017 at 4:06 PM, Konstantin Knauf <konstantin.kn...@tngtech.com> wrote: > Hi everyone, > > I am currently running a small Flink job locally, which checkpoints > every 100ms. > > After a few minutes the JM crashes with an OOME. In the Headump I can > see, that a TimerTask holds references to all completed > CheckpointCoordinators. I assume this task is supposed to clean these > checkpoints up eventually. > > First, is this the expected behaviour? Second, is there a configuration > option to trigger this cleanup timer earlier? > > Cheers, > > Konstantin > > -- > Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 >