Thanks a lot! This is very helpful. In addition to your comments, what are the items retained by NetworkEnvironment? They grew seems like indefinitely, do they ever reduce?
I think there is a GC issue because my task manager is killed somehow after a job run. The duration correlates to the volume of Kafka topics. More volume TM dies quickly. Do you have any tips to debug it? On Thu, Nov 16, 2017, 01:35 Stefan Richter <s.rich...@data-artisans.com> wrote: > Hi, > > I cannot spot anything that indicates a leak from your screenshots. Maybe > you misinterpret the numbers? In your heap dump, there is only a single > instance of org.apache.flink.runtime.io.network.NetworkEnvironment and it > retains about 400,000,000 bytes from being GCed because it holds references > to the network buffers. This is perfectly normal because this the buffer > pool is part of this object, and for as long as it lives, the referenced > buffers should not be GCed and the current size of all your buffers is > around 400 million bytes. > > Your heap space is also not growing without bounds, but always goes down > after a GC was performed. Looks fine to me. > > Last, I think the number of G1_Young_Generation is a counter of how many > gc cycles have been performed and the time is a sum. So naturally, those > values would always increase. > > Best, > Stefan > > > Am 15.11.2017 um 18:35 schrieb Hao Sun <ha...@zendesk.com>: > > > > Hi team, I am looking at some memory/GC issues for my flink setup. I am > running flink 1.3.2 in docker for my development environment. Using > Kubernetes for production. > > I see instances of org.apache.flink.runtime.io.network.NetworkEnvironment > are increasing dramatically and not GC-ed very well for my application. > > My simple app collects Kafka events and transforms the information and > logs the results out. > > > > Is this expected? I am new to Java memory analysis not sure what is > actually wrong. > > > > <image.png> > > <image.png> > > <image.png> > > <image.png> > >