Eric Badger created YARN-8335:
---------------------------------
Summary: Privileged docker containers' jobSubmitDir does not get
successfully cleaned up
Key: YARN-8335
URL: https://issues.apache.org/jira/browse/YARN-8335
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Eric Badger
The jobSubmitDir directory is owned by root and is being cleaned up as the
submitting user, which appears to be why it is failing to clean up.
{noformat}
2018-05-21 19:46:15,124 WARN [DeletionService #0]
privileged.PrivilegedOperationExecutor
(PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell
execution returned exit code: 255. Privileged Execution Operation Stderr:
Stdout: main : command provided 3
main : run as user is ebadger
main : requested yarn user is ebadger
failed to unlink
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/jobSubmitDir/job.split:
Permission denied
failed to unlink
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/jobSubmitDir/job.splitmetainfo:
Permission denied
failed to rmdir jobSubmitDir: Directory not empty
Error while deleting
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001:
39 (Directory not empty)
Full command array for failed execution:
[/hadoop-3.2.0-SNAPSHOT/bin/container-executor, ebadger, ebadger, 3,
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001]
2018-05-21 19:46:15,124 ERROR [DeletionService #0]
nodemanager.LinuxContainerExecutor
(LinuxContainerExecutor.java:deleteAsUser(848)) - DeleteAsUser for
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001
returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
ExitCodeException exitCode=255:
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:844)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:135)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
at org.apache.hadoop.util.Shell.run(Shell.java:902)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
... 10 more
{noformat}
{noformat}
[foo@bar hadoop]$ ls -l
/tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/
total 4
drwxr-sr-x 2 root users 4096 May 21 19:45 jobSubmitDir
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]