Jun Gong created YARN-3831:
------------------------------

             Summary: Localization failed when a local disk turns from bad to 
good without NM initializes it
                 Key: YARN-3831
                 URL: https://issues.apache.org/jira/browse/YARN-3831
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: Jun Gong
            Assignee: Jun Gong


A local disk turns from bad to good without NM initializes it(create 
/path-to-local-dir/usercache and /path-to-local-dir/filecache). When localizing 
a container, container-executor will try to create directories under 
/path-to-local-dir/usercache, and it will fail. Then container's localization 
will fail. 

Related log is as following:
{noformat}
2015-06-19 18:00:01,205 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1431957472783_38706012_01_000465
2015-06-19 18:00:01,212 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Writing credentials to the nmPrivate file 
/data8/yarnenv/local/nmPrivate/container_1431957472783_38706012_01_000465.tokens.
 Credentials list: 
2015-06-19 18:00:01,216 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1431957472783_38706012_01_000465 startLocalizer is : 20
org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
        at org.apache.hadoop.util.Shell.run(Shell.java:379)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
2015-06-19 18:00:01,216 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
provided 0
2015-06-19 18:00:01,216 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
tdwadmin
2015-06-19 18:00:01,216 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Failed to create 
directory /data2/yarnenv/local/usercache/tdwadmin - No such file or directory
2015-06-19 18:00:01,216 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Localizer failed
java.io.IOException: Application application_1431957472783_38706012 
initialization failed (exitCode=20) with output: main : command provided 0
main : user is tdwadmin
Failed to create directory /data2/yarnenv/local/usercache/tdwadmin - No such 
file or directory

        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:214)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
        at org.apache.hadoop.util.Shell.run(Shell.java:379)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205)
        ... 1 more
2015-06-19 18:00:01,216 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1431957472783_38706012_01_000465 transitioned from 
LOCALIZING to LOCALIZATION_FAILED
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to