William Montaz created YARN-11906:
-------------------------------------
Summary: Nodemanager broken when using ReadWriteDiskValidator
Key: YARN-11906
URL: https://issues.apache.org/jira/browse/YARN-11906
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 3.3.6
Reporter: William Montaz
YARN-7300 introduces DiskValidator in LocalDirAllocator. In details, it
replaces calls to DiskChecker.checkDir() by DiskValidator.checkStatus().
The problem this brings is that DiskChecker is creating directories in its
check. This is directly reproduced by BasicDiskValidator. But
ReadWriteDiskValidator does not create the dir, it check if dir exists prior to
calling DiskChecker.checkDir() and fails directly if the dir is absent.
ContainerLaunch will try to create paths that do not exist (example, the
launch_container.sh file), expecting DiskChecker to create them. Since
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
with requested size -1 as the max capacity in any directory is 0{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]