[
https://issues.apache.org/jira/browse/YARN-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643981#comment-17643981
]
ASF GitHub Bot commented on YARN-11390:
---------------------------------------
cnauroth commented on code in PR #5190:
URL: https://github.com/apache/hadoop/pull/5190#discussion_r1041276776
##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java:
##########
@@ -2959,6 +2960,20 @@ protected ResourceTrackerService
createResourceTrackerService() {
mockRM.stop();
}
+ private void pollingAssert(Supplier<Boolean> supplier, String message)
Review Comment:
In hadoop-common, there is a similar helper method:
`org.apache.hadoop.test.GenericTestUtils#waitFor`. This also has some other
nice features, like providing a thread dump for troubleshooting if it times
out. Can you please look at reusing that method?
> TestResourceTrackerService.testNodeRemovalNormally: Shutdown nodes should be
> 0 now expected: <1> but was: <0>
> -------------------------------------------------------------------------------------------------------------
>
> Key: YARN-11390
> URL: https://issues.apache.org/jira/browse/YARN-11390
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Bence Kosztolnik
> Assignee: Bence Kosztolnik
> Priority: Major
> Labels: pull-request-available
>
> Some times the TestResourceTrackerService.{*}testNodeRemovalNormally{*} fails
> with the following message
> {noformat}
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but
> was:<0>
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:1723)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:1685)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalNormally(TestResourceTrackerService.java:1530){noformat}
> This can happen in case if the hardcoded 1s sleep in the test not enough for
> proper shut down.
> To fix this issue we should poll the cluster status with a time out, and see
> the cluster can reach the expected state
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]