[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168814#comment-15168814
 ] 

Varun Vasudev commented on YARN-4731:
-------------------------------------

The signal container exception and the container initalization error can be 
ignored. The signal container exception is due to the fact that we call signal 
container as part of the container cleanup and the container initialization 
error is due to the MR AM killing the last reducer.

> Linux container executor fails to delete nmlocal folders
> --------------------------------------------------------
>
>                 Key: YARN-4731
>                 URL: https://issues.apache.org/jira/browse/YARN-4731
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.0
>            Reporter: Bibin A Chundatt
>            Assignee: Varun Vasudev
>            Priority: Blocker
>         Attachments: YARN-4731.001.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_000001
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_000001:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_000001]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_000001
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
>         at org.apache.hadoop.util.Shell.run(Shell.java:838)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
>         ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw------- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
> -rwx------ 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to