[ 
https://issues.apache.org/jira/browse/YARN-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597616#comment-14597616
 ] 

Jun Gong commented on YARN-3831:
--------------------------------

[~zxu], thank you for the remind. Sorry for late reply.

The bug was found in version 2.2.0. I checked latest code. It seems have been 
fixed: there is a 'localDirsChangeListener' to handle 'onDirsChanged', when a 
local disk turns from bad to good,  'localDirsChangeListener' will try to 
initialize it.

Closing it now.

> Localization failed when a local disk turns from bad to good without NM 
> initializes it
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-3831
>                 URL: https://issues.apache.org/jira/browse/YARN-3831
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>
> A local disk turns from bad to good without NM initializes it(create 
> /path-to-local-dir/usercache and /path-to-local-dir/filecache). When 
> localizing a container, container-executor will try to create directories 
> under /path-to-local-dir/usercache, and it will fail. Then container's 
> localization will fail. 
> Related log is as following:
> {noformat}
> 2015-06-19 18:00:01,205 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1431957472783_38706012_01_000465
> 2015-06-19 18:00:01,212 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /data8/yarnenv/local/nmPrivate/container_1431957472783_38706012_01_000465.tokens.
>  Credentials list: 
> 2015-06-19 18:00:01,216 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1431957472783_38706012_01_000465 startLocalizer is : 
> 20
> org.apache.hadoop.util.Shell$ExitCodeException: 
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
> 2015-06-19 18:00:01,216 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
> provided 0
> 2015-06-19 18:00:01,216 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> tdwadmin
> 2015-06-19 18:00:01,216 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Failed to create 
> directory /data2/yarnenv/local/usercache/tdwadmin - No such file or directory
> 2015-06-19 18:00:01,216 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.IOException: Application application_1431957472783_38706012 
> initialization failed (exitCode=20) with output: main : command provided 0
> main : user is tdwadmin
> Failed to create directory /data2/yarnenv/local/usercache/tdwadmin - No such 
> file or directory
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:214)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
> Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205)
>         ... 1 more
> 2015-06-19 18:00:01,216 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1431957472783_38706012_01_000465 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to