[
https://issues.apache.org/jira/browse/YARN-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-5749:
---------------------------
Description:
HADOOP-13440 updated FileContext#setUMask method to change umask from local
variable to global variable through updating conf value of
"fs.permissions.umask-mode".
This method might be called to update value for global umask by LogWriter and
ResourceLocalizationService.
After an application finished, LogWriter will update the umask value to be
"137" while uploading logs for containers. Then the global umask value is
updated right now and will affect other services. In my case , After one of
local directories is marked as bad (because the disk used space is above the
threshold defined by
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
ResourceLocalizationService will reinitailize the left local directories and
change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed
from "022" to "137"). From now on, The NM will always fail to localize
resources as the local directories is not executable.
Detail logs are as follows:
{code}
2016-10-19 15:36:32,650 WARN
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error
Exception:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
at
org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
at
org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
at
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,650 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localizer failed
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid
local directory for nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
at
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,652 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_e26_1476858409240_0004_01_000005 transitioned from
LOCALIZING to LOCALIZATION_FAILED
{code}
To solve this problem, in my opinion, it's better if FileContext can be
compatible with past usage.
Please feel free to give your suggestions.
was:
HADOOP-13440 updated FileContext#setUMask method to change umask from local
variable to global variable through updating conf value of
"fs.permissions.umask-mode".
This method might be called to update value for global umask by LogWriter and
ResourceLocalizationService.
After an application finished, LogWriter will update the umask value to be
"137" while uploading logs for containers. Then the global umask value is
updated right now and will affect other services. In my case , After one of
local directories is marked as bad (because the disk used space is above the
threshold defined by
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
ResourceLocalizationService will reinitailize the left local directories and
change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed
from "022" to "137"). From now on, The NM will always fail to localize
resources as the local directories is not executable.
Detail logs are as follows:
{code}
2016-10-19 15:36:32,650 WARN
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error
Exception:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
at
org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
at
org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
at
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,650 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localizer failed
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid
local directory for nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
at
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,652 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_e26_1476858409240_0004_01_000005 transitioned from
LOCALIZING to LOCALIZATION_FAILED
{code}
In my opinion, it's better if FileContext can compatible with past usage.
Please feel free to give your suggestions.
> Fail to localize resources after health status for local dirs changed
> ---------------------------------------------------------------------
>
> Key: YARN-5749
> URL: https://issues.apache.org/jira/browse/YARN-5749
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.0-alpha2
> Reporter: Tao Yang
>
> HADOOP-13440 updated FileContext#setUMask method to change umask from local
> variable to global variable through updating conf value of
> "fs.permissions.umask-mode".
> This method might be called to update value for global umask by LogWriter and
> ResourceLocalizationService.
> After an application finished, LogWriter will update the umask value to be
> "137" while uploading logs for containers. Then the global umask value is
> updated right now and will affect other services. In my case , After one of
> local directories is marked as bad (because the disk used space is above the
> threshold defined by
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
> ResourceLocalizationService will reinitailize the left local directories and
> change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed
> from "022" to "137"). From now on, The NM will always fail to localize
> resources as the local directories is not executable.
> Detail logs are as follows:
> {code}
> 2016-10-19 15:36:32,650 WARN
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error
> Exception:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
> executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
> at
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
> at
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
> at
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Localizer failed
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for
> nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
> at
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
> 2016-10-19 15:36:32,652 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_e26_1476858409240_0004_01_000005 transitioned from
> LOCALIZING to LOCALIZATION_FAILED
> {code}
> To solve this problem, in my opinion, it's better if FileContext can be
> compatible with past usage.
> Please feel free to give your suggestions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]