[ 
https://issues.apache.org/jira/browse/SPARK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-1930:
-------------------------------

    Fix Version/s: 1.0.1

>  Container  is running beyond physical memory limits
> ----------------------------------------------------
>
>                 Key: SPARK-1930
>                 URL: https://issues.apache.org/jira/browse/SPARK-1930
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>            Reporter: Guoqiang Li
>             Fix For: 1.0.1
>
>
> When the containers occupies 8G memory ,the containers were killed
> yarn node manager log:
> {code}
> 2014-05-23 13:35:30,776 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container [pid=4947,containerID=container_1400809535638_0015_01_000005] is 
> running beyond physical memory limits. Current usage: 8.6 GB of 8.5 GB 
> physical memory used; 10.0 GB of 17.8 GB virtual memory used. Killing 
> container.
> Dump of the process-tree for container_1400809535638_0015_01_000005 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 4947 25417 4947 4947 (bash) 0 0 110804992 335 /bin/bash -c 
> /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill 
> %p' -Xms8192m -Xmx8192m  -Xss2m 
> -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp
>   -Dlog4j.configuration=log4j-spark-container.properties 
> -Dspark.akka.askTimeout="120" -Dspark.akka.timeout="120" 
> -Dspark.akka.frameSize="20" 
> org.apache.spark.executor.CoarseGrainedExecutorBackend 
> akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 
> 10dian72.domain.test 4 1> 
> /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stdout
>  2> 
> /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stderr
>  
>         |- 4957 4947 4947 4947 (java) 157809 12620 10667016192 2245522 
> /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill 
> %p -Xms8192m -Xmx8192m -Xss2m 
> -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp
>  -Dlog4j.configuration=log4j-spark-container.properties 
> -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 
> -Dspark.akka.frameSize=20 
> org.apache.spark.executor.CoarseGrainedExecutorBackend 
> akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 
> 10dian72.domain.test 4 
> 2014-05-23 13:35:30,776 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Removed ProcessTree with root 4947
> 2014-05-23 13:35:30,776 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1400809535638_0015_01_000005 transitioned from RUNNING 
> to KILLING
> 2014-05-23 13:35:30,777 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1400809535638_0015_01_000005
> 2014-05-23 13:35:30,788 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_1400809535638_0015_01_000005 is : 143
> 2014-05-23 13:35:30,829 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1400809535638_0015_01_000005 transitioned from KILLING 
> to CONTAINER_CLEANEDUP_AFTER_KILL
> 2014-05-23 13:35:30,830 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
> absolute path : 
> /yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005
> 2014-05-23 13:35:30,830 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=spark        
> OPERATION=Container Finished - Killed   TARGET=ContainerImpl    
> RESULT=SUCCESS  APPID=application_1400809535638_0015    
> CONTAINERID=container_1400809535638_0015_01_000005
> 2014-05-23 13:35:30,830 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1400809535638_0015_01_000005 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2014-05-23 13:35:30,830 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Removing container_1400809535638_0015_01_000005 from application 
> application_1400809535638_0015
> {code}
> I think it should be related with {{YarnAllocationHandler.MEMORY_OVERHEA}}  
> https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala#L562
> Relative to 8G, 384 MB is too small



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to