[ 
https://issues.apache.org/jira/browse/YARN-11932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11932.
-------------------------------
    Fix Version/s: 3.5.0
     Hadoop Flags: Reviewed
       Resolution: Fixed

> Fix TestYarnFederationWithFairScheduler timeout caused by shared NodeLabel 
> storage
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-11932
>                 URL: https://issues.apache.org/jira/browse/YARN-11932
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: router
>    Affects Versions: 3.5.1
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> *Problem*
>  
> TestYarnFederationWithFairScheduler#testMetricsInfo intermittently times out 
> during test execution.
>  
> The root cause is that multiple test subclusters share the same NodeLabel 
> storage directory (\{{/tmp/hadoop-yarn-$USER/node-labels}}) by default. When 
> tests run sequentially, residual editlog entries containing "delete default 
> label" operations from previous tests cause the ResourceManager to fail 
> during startup recovery with the error:
> {code:java}
> Node label=default to be removed doesn't existed in cluster node labels 
> collection {code}
> *Solution*
>  
> Set an isolated NodeLabel storage directory for each subcluster startup to 
> avoid reusing old editlog files.
>  
> In \{{TestMockSubCluster.java}}, configure a unique directory per subcluster 
> using:
> * GenericTestUtils.getTestDir() to create test-specific directories
> * Directory naming pattern: \{{node-labels-{subClusterId}-\{timestamp}}}
> * Configuration key: \{{YarnConfiguration.FS_NODE_LABELS_STORE_ROOT_DIR}}
>  
> *Test Results*
>  
> After the fix, all 38 tests in TestYarnFederationWithFairScheduler pass 
> successfully:
> * Tests run: 38, Failures: 0, Errors: 0, Skipped: 0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to