Hi Timur, Indeed, if you use JNI libraries then the memory will be off-heap and the -XmX limit will not be respected. Currently, we don't expect users to use JNI memory allocation. We might want to enforce a more strict direct memory limit in the future. In this case, you would get an OutOfMemoryException before Yarn could kill the container. Both are not nice to have :)
You will have to adjust 'yarn.heap-cutoff-ratio' or 'yarn.heap-cutoff-min' (for an absolute memory cutoff) to adjust to your JNI memory needs. Cheers, Max On Mon, Apr 25, 2016 at 8:27 PM, Timur Fayruzov <timur.fairu...@gmail.com> wrote: > Great answer, thanks you Max for a very detailed explanation! Illuminating > how off-heap parameter affects the memory allocation. > > I read this post: > https://blogs.oracle.com/jrockit/entry/why_is_my_jvm_process_larger_t > > and the thing that jumped on me is the allocation of memory for jni libs. I > do use a native library in my application, which is likely the culprit. I > need to account for its memory footprint when doing my memory calculations. > > Thanks, > Timur > > > On Mon, Apr 25, 2016 at 10:28 AM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Timur, >> >> Shedding some light on the memory calculation: >> >> You have a total memory size of 2500 MB for each TaskManager. The >> default for 'taskmanager.memory.fraction' is 0.7. This is the fraction >> of the memory used by the memory manager. When you have turned on >> off-heap memory, this memory is allocated off-heap. As you pointed >> out, the default Yarn cutoff ratio is 0.25. >> >> Memory cutoff for Yarn: 2500 * 0.25 MB = 625 MB >> >> Java heap size with off-heap disabled: 2500 MB - 625 MB = 1875 MB >> >> Java heap size with off-heap enabled: (2500 MB - 625 MB) * 0.3 = 562,5 >> MB (~570 MB in your case) >> Off-heap memory size: (2500 MB - 625 MB) * 0.7 = 1312,5 MB >> >> The heap memory limits in your log seem to be calculated correctly. >> Note that we don't set a strict limit for the off-heap memory because >> the Flink memory manager controls the amount of memory allocated. It >> will preallocate memory when you have 'taskmanager.memory.preallocate' >> set to true. Otherwise it will allocate dynamically. Still, you should >> have about 500 MB memory left with everything allocated. There is some >> more direct (off-heap) memory allocated for the network stack >> adjustable with 'taskmanager.network.numberOfBuffers' which is set to >> 2048 by default and corresponds to 2048 * 32 KB = 64 MB memory. I >> believe this can grow up to twice of that size. Still, should be >> enough memory left. >> >> Are you running a streaming or batch job? Off-heap memory and memory >> preallocation are mostly beneficial for batch jobs which use the >> memory manager a lot for sorting, hashing and caching. >> >> For streaming I'd suggest to use Flink's defaults: >> >> taskmanager.memory.off-heap: false >> taskmanager.memory.preallocate: false >> >> Raising the cutoff ratio should prevent killing of the TaskManagers. >> As Robert mentioned, in practice the JVM tends to allocate more than >> the maximum specified heap size. You can put the following in your >> flink-conf.yaml: >> >> # slightly raise the cut off ratio (might need to be even higher) >> yarn.heap-cutoff-ratio: 0.3 >> >> Thanks, >> Max >> >> On Mon, Apr 25, 2016 at 5:52 PM, Timur Fayruzov >> <timur.fairu...@gmail.com> wrote: >> > Hello Maximilian, >> > >> > I'm using 1.0.0 compiled with Scala 2.11 and Hadoop 2.7. I'm running >> > this on >> > EMR. I didn't see any exceptions in other logs. What are the logs you >> > are >> > interested in? >> > >> > Thanks, >> > Timur >> > >> > On Mon, Apr 25, 2016 at 3:44 AM, Maximilian Michels <m...@apache.org> >> > wrote: >> >> >> >> Hi Timur, >> >> >> >> Which version of Flink are you using? Could you share the entire logs? >> >> >> >> Thanks, >> >> Max >> >> >> >> On Mon, Apr 25, 2016 at 12:05 PM, Robert Metzger <rmetz...@apache.org> >> >> wrote: >> >> > Hi Timur, >> >> > >> >> > The reason why we only allocate 570mb for the heap is because you are >> >> > allocating most of the memory as off heap (direct byte buffers). >> >> > >> >> > In theory, the memory footprint of the JVM is limited to 570 (heap) + >> >> > 1900 >> >> > (direct mem) = 2470 MB (which is below 2500). But in practice thje >> >> > JVM >> >> > is >> >> > allocating more memory, causing these killings by YARN. >> >> > >> >> > I have to check the code of Flink again, because I would expect the >> >> > safety >> >> > boundary to be much larger than 30 mb. >> >> > >> >> > Regards, >> >> > Robert >> >> > >> >> > >> >> > On Fri, Apr 22, 2016 at 9:47 PM, Timur Fayruzov >> >> > <timur.fairu...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hello, >> >> >> >> >> >> Next issue in a string of things I'm solving is that my application >> >> >> fails >> >> >> with the message 'Connection unexpectedly closed by remote task >> >> >> manager'. >> >> >> >> >> >> Yarn log shows the following: >> >> >> >> >> >> Container >> >> >> [pid=4102,containerID=container_1461341357870_0004_01_000015] >> >> >> is >> >> >> running beyond physical memory limits. Current usage: 2.5 GB of 2.5 >> >> >> GB >> >> >> physical memory used; 9.0 GB of 12.3 GB virtual memory used. Killing >> >> >> container. >> >> >> Dump of the process-tree for container_1461341357870_0004_01_000015 >> >> >> : >> >> >> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) >> >> >> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) >> >> >> FULL_CMD_LINE >> >> >> |- 4102 4100 4102 4102 (bash) 1 7 115806208 715 /bin/bash -c >> >> >> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m >> >> >> -XX:MaxDirectMemorySize=1900m >> >> >> >> >> >> >> >> >> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log >> >> >> -Dlogback.configurationFile=file:logback.xml >> >> >> -Dlog4j.configuration=file:log4j.properties >> >> >> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1> >> >> >> >> >> >> >> >> >> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.out >> >> >> 2> >> >> >> >> >> >> >> >> >> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.err >> >> >> |- 4306 4102 4102 4102 (java) 172258 40265 9495257088 646460 >> >> >> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m >> >> >> -XX:MaxDirectMemorySize=1900m >> >> >> >> >> >> >> >> >> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log >> >> >> -Dlogback.configurationFile=file:logback.xml >> >> >> -Dlog4j.configuration=file:log4j.properties >> >> >> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . >> >> >> >> >> >> One thing that drew my attention is `-Xmx570m`. I expected it to be >> >> >> TaskManagerMemory*0.75 (due to yarn.heap-cutoff-ratio). I run the >> >> >> application as follows: >> >> >> HADOOP_CONF_DIR=/etc/hadoop/conf flink run -m yarn-cluster -yn 18 >> >> >> -yjm >> >> >> 4096 -ytm 2500 eval-assembly-1.0.jar >> >> >> >> >> >> In flink logs I do see 'Task Manager memory: 2500'. When I look at >> >> >> the >> >> >> yarn container logs on the cluster node I see that it starts with >> >> >> 570mb, >> >> >> which puzzles me. When I look at the actually allocated memory for a >> >> >> Yarn >> >> >> container using 'top' I see 2.2GB used. Am I interpreting these >> >> >> parameters >> >> >> correctly? >> >> >> >> >> >> I also have set (it failed in the same way without this as well): >> >> >> taskmanager.memory.off-heap: true >> >> >> >> >> >> Also, I don't understand why this happens at all. I assumed that >> >> >> Flink >> >> >> won't overcommit allocated resources and will spill to the disk when >> >> >> running >> >> >> out of heap memory. Appreciate if someone can shed light on this >> >> >> too. >> >> >> >> >> >> Thanks, >> >> >> Timur >> >> > >> >> > >> > >> > > >