[ 
https://issues.apache.org/jira/browse/YARN-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588266#comment-15588266
 ] 

Tao Yang commented on YARN-5749:
--------------------------------

Contacted with [~ajisakaa]], This problem can be solved by the patch in 
YARN-5679.

> Fail to localize resources after health status for local dirs changed
> ---------------------------------------------------------------------
>
>                 Key: YARN-5749
>                 URL: https://issues.apache.org/jira/browse/YARN-5749
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Tao Yang
>
> HADOOP-13440 updated FileContext#setUMask method to change umask from local 
> variable to global variable through updating conf value of 
> "fs.permissions.umask-mode". 
> This method might be called to update value for global umask by LogWriter and 
> ResourceLocalizationService. 
> After an application finished, LogWriter will update the umask value to be 
> "137" while uploading logs for containers. Then the global umask value is 
> updated right now and will affect other services. In my case , After one of 
> local directories is marked as bad (because the disk used space is above the 
> threshold defined by 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
>  ResourceLocalizationService will reinitailize the left local directories and 
> change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed 
> from "022" to "137"). From now on, The NM will always fail to localize 
> resources as the local directories is not executable.
> Detail logs are as follows:
> {code}
> 2016-10-19 15:36:32,650 WARN 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error 
> Exception:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
>         at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
>         at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,650 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
> valid local directory for 
> nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
>         at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>         at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,652 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e26_1476858409240_0004_01_000005 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> {code}
> To solve this problem, in my opinion, it's better if FileContext can be 
> compatible with past usage.
> Please feel free to give your suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to