Hi Timur,

The reason why we only allocate 570mb for the heap is because you are
allocating most of the memory as off heap (direct byte buffers).

In theory, the memory footprint of the JVM is limited to 570 (heap) + 1900
(direct mem) = 2470 MB (which is below 2500). But in practice thje JVM is
allocating more memory, causing these killings by YARN.

I have to check the code of Flink again, because I would expect the safety
boundary to be much larger than 30 mb.

Regards,
Robert


On Fri, Apr 22, 2016 at 9:47 PM, Timur Fayruzov <timur.fairu...@gmail.com>
wrote:

> Hello,
>
> Next issue in a string of things I'm solving is that my application fails
> with the message 'Connection unexpectedly closed by remote task manager'.
>
> Yarn log shows the following:
>
> Container [pid=4102,containerID=container_1461341357870_0004_01_000015] is
> running beyond physical memory limits. Current usage: 2.5 GB of 2.5 GB
> physical memory used; 9.0 GB of 12.3 GB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1461341357870_0004_01_000015 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 4102 4100 4102 4102 (bash) 1 7 115806208 715 /bin/bash -c
> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m
> -XX:MaxDirectMemorySize=1900m
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1>
> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.out
> 2>
> /var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.err
>         |- 4306 4102 4102 4102 (java) 172258 40265 9495257088 646460
> /usr/lib/jvm/java-1.8.0/bin/java -Xms570m -Xmx570m
> -XX:MaxDirectMemorySize=1900m
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1461341357870_0004/container_1461341357870_0004_01_000015/taskmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnTaskManagerRunner --configDir .
>
> One thing that drew my attention is `-Xmx570m`. I expected it to be
> TaskManagerMemory*0.75 (due to yarn.heap-cutoff-ratio). I run the
> application as follows:
> HADOOP_CONF_DIR=/etc/hadoop/conf flink run -m yarn-cluster -yn 18 -yjm
> 4096 -ytm 2500 eval-assembly-1.0.jar
>
> In flink logs I do see 'Task Manager memory: 2500'. When I look at the
> yarn container logs on the cluster node I see that it starts with 570mb,
> which puzzles me. When I look at the actually allocated memory for a Yarn
> container using 'top' I see 2.2GB used. Am I interpreting these parameters
> correctly?
>
> I also have set (it failed in the same way without this as well):
> taskmanager.memory.off-heap: true
>
> Also, I don't understand why this happens at all. I assumed that Flink
> won't overcommit allocated resources and will spill to the disk when
> running out of heap memory. Appreciate if someone can shed light on this
> too.
>
> Thanks,
> Timur
>

Reply via email to