[
https://issues.apache.org/jira/browse/YARN-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
William Montaz updated YARN-11906:
----------------------------------
Description:
YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to
DiskChecker.checkDir() by DiskValidator.checkStatus().
The problem this brings is that DiskChecker is creating directories in its
check. BasicDiskValidator does this as well. But ReadWriteDiskValidator does
not create the dir, it check if dir exists prior to calling
DiskChecker.checkDir() and fails directly if the dir is absent.
ContainerLaunch will try to create paths that do not exist (example, the
launch_container.sh file), expecting DiskChecker to create them. Since
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
with requested size -1 as the max capacity in any directory is 0{noformat}
was:
YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to
DiskChecker.checkDir() by DiskValidator.checkStatus().
The problem this brings is that DiskChecker is creating directories in its
check. This is directly reproduced by BasicDiskValidator. But
ReadWriteDiskValidator does not create the dir, it check if dir exists prior to
calling DiskChecker.checkDir() and fails directly if the dir is absent.
ContainerLaunch will try to create paths that do not exist (example, the
launch_container.sh file), expecting DiskChecker to create them. Since
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
with requested size -1 as the max capacity in any directory is 0{noformat}
> Nodemanager broken when using ReadWriteDiskValidator
> ----------------------------------------------------
>
> Key: YARN-11906
> URL: https://issues.apache.org/jira/browse/YARN-11906
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.3.6
> Reporter: William Montaz
> Priority: Major
>
> YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to
> DiskChecker.checkDir() by DiskValidator.checkStatus().
> The problem this brings is that DiskChecker is creating directories in its
> check. BasicDiskValidator does this as well. But ReadWriteDiskValidator does
> not create the dir, it check if dir exists prior to calling
> DiskChecker.checkDir() and fails directly if the dir is absent.
> ContainerLaunch will try to create paths that do not exist (example, the
> launch_container.sh file), expecting DiskChecker to create them. Since
> YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when
> the dir do not exist. We end with such exceptions at any container launch
> phase
> {noformat}
> Could not find any valid local directory for
> nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
> with requested size -1 as the max capacity in any directory is 0{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]