[ 
https://issues.apache.org/jira/browse/YARN-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643981#comment-17643981
 ] 

ASF GitHub Bot commented on YARN-11390:
---------------------------------------

cnauroth commented on code in PR #5190:
URL: https://github.com/apache/hadoop/pull/5190#discussion_r1041276776


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java:
##########
@@ -2959,6 +2960,20 @@ protected ResourceTrackerService 
createResourceTrackerService() {
     mockRM.stop();
   }
 
+  private void pollingAssert(Supplier<Boolean> supplier, String message)

Review Comment:
   In hadoop-common, there is a similar helper method: 
`org.apache.hadoop.test.GenericTestUtils#waitFor`. This also has some other 
nice features, like providing a thread dump for troubleshooting if it times 
out. Can you please look at reusing that method?





> TestResourceTrackerService.testNodeRemovalNormally: Shutdown nodes should be 
> 0 now expected: <1> but was: <0>
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11390
>                 URL: https://issues.apache.org/jira/browse/YARN-11390
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Bence Kosztolnik
>            Assignee: Bence Kosztolnik
>            Priority: Major
>              Labels: pull-request-available
>
> Some times the TestResourceTrackerService.{*}testNodeRemovalNormally{*} fails 
> with the following message
> {noformat}
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:1723)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:1685)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalNormally(TestResourceTrackerService.java:1530){noformat}
> This can happen in case if the hardcoded 1s sleep in the test not enough for 
> proper shut down.
> To fix this issue we should poll the cluster status with a time out, and see 
> the cluster can reach the expected state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to