Hi Jakub, You have 2 options: 1. Turning off virtual memory check as you mentioned. 2. Making yarn.nodemanager.vmem-pmem-ratio larger.
1. is reasonable choice if you cannot predict virtual memory usage in advance or you don't have any applications to check virtual memory. Thanks, - Tsuyoshi On Thu, Sep 11, 2014 at 7:24 PM, Jakub Stransky <[email protected]> wrote: > Hi, > > thanks for reply. Machine is pretty small as it has 4GB of total memory. > So we reserved 1GB for OS, 1GB HBase (according to recommendation) so > remains 2GB thats what nodemanager claims. > > Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All > machines has similar parameters so the stronger ones are used for nn and > rest for dn. I know that hw is far away from ideal but it is a small > cluster for a POC and gaining some experiences. > > Back to the problem. At the time when this happens no other job is running > on cluster. All mappers (3) has already finished and we have single reduce > task which fails at ~ 70% of its progress on virtual memory consumption. > Dataset which is processing is 500MB of avro data file compressed. Reducer > doesn't cache anything intentionally, just divide a records in various > folders dynamically. > From RM console I clearly see that there is a free unused resources - > memory. Is there a way how to detect what consumed that assigned virtual > memory? Because for a smaller amount of input data ~ 120MB compressed data > - job finishes just fine within 3 min. > > We have obviously a problem in scaling the task out. Could someone provide > some hints as it seems that we are missing something fundamental here. > > Thanks for helping me out > Jakub > > On 11 September 2014 11:34, Susheel Kumar Gadalay <[email protected]> > wrote: > >> Your physical memory is 1GB on this node. >> >> What are the other containers (map tasks) running on this? >> >> You have given map memory as 768M and reduce memory as 1024M and am as >> 1024M. >> >> With AM and a single map task it is 1.7M and cannot start another >> container for reducer. >> Reduce these values and check. >> >> On 9/11/14, Jakub Stransky <[email protected]> wrote: >> > Hello hadoop users, >> > >> > I am facing following issue when running M/R job during a reduce phase: >> > >> > Container >> [pid=22961,containerID=container_1409834588043_0080_01_000010] is >> > running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB >> > physical memory used; 2.1 GB of 2.1 GB virtual memory used. >> > Killing container. Dump of the process-tree for >> > container_1409834588043_0080_01_000010 : >> > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) >> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE >> > |- 22961 16896 22961 22961 (bash) 0 0 >> > 9424896 312 /bin/bash -c >> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true >> > -Dhadoop.metrics.log.level=WARN -Xmx768m >> > >> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp >> > -Dlog4j.configuration=container-log4j.properties >> > >> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010 >> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA >> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184 >> > attempt_1409834588043_0080_r_000000_0 10 >> > >> 1>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stdout >> > >> 2>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stderr >> > |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659 >> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true >> > -Dhadoop.metrics.log.level=WARN -Xmx768m >> > >> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp >> > -Dlog4j.configuration=container-log4j.properties >> > >> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010 >> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA >> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184 >> > attempt_1409834588043_0080_r_000000_0 10 Container killed on request. >> Exit >> > code is 143 >> > >> > >> > I have following settings with default ratio physical to vm set to 2.1 : >> > # hadoop - yarn-site.xml >> > yarn.nodemanager.resource.memory-mb : 2048 >> > yarn.scheduler.minimum-allocation-mb : 256 >> > yarn.scheduler.maximum-allocation-mb : 2048 >> > >> > # hadoop - mapred-site.xml >> > mapreduce.map.memory.mb : 768 >> > mapreduce.map.java.opts : -Xmx512m >> > mapreduce.reduce.memory.mb : 1024 >> > mapreduce.reduce.java.opts : -Xmx768m >> > mapreduce.task.io.sort.mb : 100 >> > yarn.app.mapreduce.am.resource.mb : 1024 >> > yarn.app.mapreduce.am.command-opts : -Xmx768m >> > >> > I have following questions: >> > - Is it possible to track down the vm consumption? Find what was the >> cause >> > for such a high vm. >> > - What is the best way to solve this kind of problems? >> > - I found following recommendation on the internet: " We actually >> recommend >> > disabling this check by setting yarn.nodemanager.vmem-check-enabled to >> false >> > as >> > there is reason to believe the virtual/physical ratio is exceptionally >> high >> > with some versions of Java / Linux." Is it a good way to go? >> > >> > My reduce task doesn't perform any super activity - just classify data, >> for >> > a given input key chooses the appropriate output folder and writes the >> data >> > out. >> > >> > Thanks for any advice >> > Jakub >> > >> > > > > -- > Jakub Stransky > cz.linkedin.com/in/jakubstransky > > -- - Tsuyoshi
