[ https://issues.apache.org/jira/browse/YARN-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598050#comment-14598050 ]
zhihai xu commented on YARN-3831: --------------------------------- [~hex108], thanks for the confirmation! > Localization failed when a local disk turns from bad to good without NM > initializes it > -------------------------------------------------------------------------------------- > > Key: YARN-3831 > URL: https://issues.apache.org/jira/browse/YARN-3831 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Jun Gong > Assignee: Jun Gong > > A local disk turns from bad to good without NM initializes it(create > /path-to-local-dir/usercache and /path-to-local-dir/filecache). When > localizing a container, container-executor will try to create directories > under /path-to-local-dir/usercache, and it will fail. Then container's > localization will fail. > Related log is as following: > {noformat} > 2015-06-19 18:00:01,205 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1431957472783_38706012_01_000465 > 2015-06-19 18:00:01,212 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /data8/yarnenv/local/nmPrivate/container_1431957472783_38706012_01_000465.tokens. > Credentials list: > 2015-06-19 18:00:01,216 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1431957472783_38706012_01_000465 startLocalizer is : > 20 > org.apache.hadoop.util.Shell$ExitCodeException: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981) > 2015-06-19 18:00:01,216 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command > provided 0 > 2015-06-19 18:00:01,216 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is > tdwadmin > 2015-06-19 18:00:01,216 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Failed to create > directory /data2/yarnenv/local/usercache/tdwadmin - No such file or directory > 2015-06-19 18:00:01,216 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.IOException: Application application_1431957472783_38706012 > initialization failed (exitCode=20) with output: main : command provided 0 > main : user is tdwadmin > Failed to create directory /data2/yarnenv/local/usercache/tdwadmin - No such > file or directory > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:214) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981) > Caused by: org.apache.hadoop.util.Shell$ExitCodeException: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205) > ... 1 more > 2015-06-19 18:00:01,216 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1431957472783_38706012_01_000465 transitioned from > LOCALIZING to LOCALIZATION_FAILED > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)