Ok maybe it can be documented? So just trying to understand, how do most people run their jobs? I mean like they run less tasks, but tasks that have allot direct or mapped memory? Like little JVM_HEAP but huge state outside the JVM?
I also recorded this issue: https://issues.apache.org/jira/browse/FLINK-16278 so we can maybe get it documented. On Tue, 25 Feb 2020 at 02:57, Xintong Song <tonysong...@gmail.com> wrote: > In that case, I think the default metaspace size is too small for you > setup. The default configurations are not intended for such large task > managers. > > In Flink 1.8 we do not set the JVM '-XX:MaxMetaspaceSize' parameter, which > means you have 'unlimited' metaspace size. We changed that in Flink 1.10 to > have stricter control on the overall memory usage of Flink processes. > > Thank you~ > > Xintong Song > > > > On Tue, Feb 25, 2020 at 1:24 PM John Smith <java.dev....@gmail.com> wrote: > >> I would like to also add the same exact jobs on Flink 1.8 where running >> perfectly fine. >> >> On Tue, 25 Feb 2020 at 00:20, John Smith <java.dev....@gmail.com> wrote: >> >>> Right after Job execution. Basically as soon as I deployed a 5th job. So >>> at 4 jobs it was ok, at 5 jobs it would take like 1-2 minutes max and the >>> node would just shut off. >>> So far with MaxMetaSpace 256m it's been stable. My task nodes are 16GB >>> and the memory config is done as follows... >>> taskmanager.memory.flink.size: 12g >>> taskmanager.memory.jvm-metaspace.size: 256m >>> >>> 100% of the jobs right now are ETL with checkpoints, NO state, >>> Kafka -----> Json Transform ----> DB >>> or >>> Kafka ----> DB lookup (to small local cache)--------> Json Transform >>> -----> Apache Ignite >>> >>> None of the jobs are related. >>> >>> On Mon, 24 Feb 2020 at 20:59, Xintong Song <tonysong...@gmail.com> >>> wrote: >>> >>>> Hi John, >>>> >>>> The default metaspace size is intend for working with a major >>>> proportion of jobs. We are aware that for some jobs that need to load lots >>>> of classes, the default value might not be large enough. However, having a >>>> larger default value means for other jobs that do not load many classes, >>>> the overall memory requirements might be unnecessarily high. (Imagine you >>>> have a task manager with the default total memory 1.5GB, but 512m of it is >>>> reserved for metaspace.) >>>> >>>> Another possible problem is metaspace leak. When you say "eventually >>>> task nodes started shutting down with OutOfMemory Metaspace", does this >>>> problem happen shortly after the job execution starts, or does it happen >>>> after job running for a while? Does the metaspace footprint keep growing or >>>> become stable after the initial growth? If the metaspace keeps growing >>>> along with time, it's usually an indicator of metaspace memory leak. >>>> >>>> Thank you~ >>>> >>>> Xintong Song >>>> >>>> >>>> >>>> On Tue, Feb 25, 2020 at 7:50 AM John Smith <java.dev....@gmail.com> >>>> wrote: >>>> >>>>> Hi, I just upgraded to 1.10 and I started deploying my jobs. >>>>> Eventually task nodes started shutting down with OutOfMemory Metaspace. >>>>> >>>>> I look at the logs and the task managers are started with: >>>>> -XX:MaxMetaspaceSize=100663296 >>>>> >>>>> So I configed: taskmanager.memory.jvm-metaspace.size: 256m >>>>> >>>>> It seems to be ok for now. What are your thoughts? And should I try >>>>> 512m or is that too much? >>>>> >>>>