[
https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711992#comment-13711992
]
Vinod Kumar Vavilapalli commented on YARN-501:
----------------------------------------------
Not sure if any more logs are any useful. The AM was started with Xmx to be
128MB and it is using 532MB virtual memory and the shell itself of about 100MB.
For some reason, this 1 in a 500 time error of JVM using vmem more than usual
is what is killing it. May be something to do with specific version of linux
and/or JVM? You could try submitting with more memory to avoid it, but it'll be
worth tracking the randomness down.
Clearly YARN can't do much in this case, other than may be prolonging the
monitoring cycle - we watch over memory processes for about 3 seconds (1
cycle/3seconds) and kill them. You could try increasing
yarn.nodemanager.container-monitor.interval-ms.
> Application Master getting killed randomly reporting excess usage of memory
> ---------------------------------------------------------------------------
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
> Issue Type: Bug
> Components: applications/distributed-shell, nodemanager
> Affects Versions: 2.0.3-alpha
> Reporter: Krishna Kishore Bonagiri
> Assignee: Omkar Vinit Joshi
>
> I am running a date command using the Distributed Shell example in a loop of
> 500 times. It ran successfully all the times except one time where it gave
> the following error.
> 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client
> (Client.java:monitorApplication(605)) - Got application report from ASM for,
> appId=222, clientToken=null, appDiagnostics=Application
> application_1363938200742_0222 failed 1 times due to AM Container for
> appattempt_1363938200742_0222_000001 exited with exitCode: 143 due to:
> Container [pid=21141,containerID=container_1363938200742_0222_01_000001] is
> running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb
> physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing
> container.
> Dump of the process-tree for container_1363938200742_0222_01_000001 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c
> /home_/dsadm/yarn/jdk//bin/java -Xmx128m
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster
> --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stdout
>
> 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stderr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira