Bence Kosztolnik created YARN-11703:
---------------------------------------

             Summary: Validate accessibility of Node Manager working directories
                 Key: YARN-11703
                 URL: https://issues.apache.org/jira/browse/YARN-11703
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: yarn
    Affects Versions: 3.5.0
            Reporter: Bence Kosztolnik
            Assignee: Bence Kosztolnik


h3. Problem:

If some subdirectory or file changes permission under 
*yarn.nodemanager.local-dirs* or {*}yarn.nodemanager.log-dirs{*}, and won't be 
accessible by the node manager, then the node manager will not reach an 
unhealthy state, but container runs would fail.
h3. Testing:
 - run an example PI job in a cluster
 - change the user cache directory of the user to not readable by the node 
manager. For example:
{noformat}
chmod 222 ./usercache/{user}
{noformat}

 - cluster state will stay healthy
 - re-run the PI job
 - containers will fail on the affected node, with

{noformat}
... Not able to initialize app-cache directories in any of the configured local 
directories for user ...{noformat}

h3. Solution:

Add an extra validation to the DirectoryCollection#testdirs to ensure the 
content of the local-dirs and log-dirs are accessible by the node manager, and 
turn the node unhealthy if not.
New flag will be introduced to enable this validation: 
*yarn.nodemanager.working-dir-content-accessibility-validation.enabled* 
(default true)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to