[ 
https://issues.apache.org/jira/browse/YARN-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Montaz updated YARN-11906:
----------------------------------
    Description: 
YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to 
DiskChecker.checkDir() by DiskValidator.checkStatus().

The problem this brings is that DiskChecker is creating directories in its 
check. BasicDiskValidator does this as well. But ReadWriteDiskValidator does 
not create the dir, it check if dir exists prior to calling 
DiskChecker.checkDir() and fails directly if the dir is absent.

ContainerLaunch will try to create paths that do not exist (example, the 
launch_container.sh file), expecting DiskChecker to create them. Since 
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the 
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for 
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
 with requested size -1 as the max capacity in any directory is 0{noformat}

  was:
YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to 
DiskChecker.checkDir() by DiskValidator.checkStatus().

The problem this brings is that DiskChecker is creating directories in its 
check. This is directly reproduced by BasicDiskValidator. But 
ReadWriteDiskValidator does not create the dir, it check if dir exists prior to 
calling DiskChecker.checkDir() and fails directly if the dir is absent.

ContainerLaunch will try to create paths that do not exist (example, the 
launch_container.sh file), expecting DiskChecker to create them. Since 
YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when the 
dir do not exist. We end with such exceptions at any container launch phase
{noformat}
Could not find any valid local directory for 
nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
 with requested size -1 as the max capacity in any directory is 0{noformat}


> Nodemanager broken when using ReadWriteDiskValidator
> ----------------------------------------------------
>
>                 Key: YARN-11906
>                 URL: https://issues.apache.org/jira/browse/YARN-11906
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.3.6
>            Reporter: William Montaz
>            Priority: Major
>
> YARN-7300 introduces DiskValidator in LocalDirAllocator. It replaces calls to 
> DiskChecker.checkDir() by DiskValidator.checkStatus().
> The problem this brings is that DiskChecker is creating directories in its 
> check. BasicDiskValidator does this as well. But ReadWriteDiskValidator does 
> not create the dir, it check if dir exists prior to calling 
> DiskChecker.checkDir() and fails directly if the dir is absent.
> ContainerLaunch will try to create paths that do not exist (example, the 
> launch_container.sh file), expecting DiskChecker to create them. Since 
> YARN-7300 using ReadWriteDiskValidator will thus fail on any good disk when 
> the dir do not exist. We end with such exceptions at any container launch 
> phase
> {noformat}
> Could not find any valid local directory for 
> nmPrivate/application_1765167412015_56041/container_e1150_1765167412015_56041_02_000001//launch_container.sh
>  with requested size -1 as the max capacity in any directory is 0{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to