[
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814427#comment-16814427
]
Prabhu Joseph commented on YARN-9462:
-------------------------------------
As per the stdout from failed testcase, the {{NodesListManager#Node Removal
Timer}} does decrement {{decrInactiveNMMetrics}} after the validation by
testcase. This happens when both testcase thread (latch await) and Node Removal
Timer sleeps at same time. This can be fixed by having latch await timeout more
than the timer check interval.
{code}
2019-04-08 08:58:16,771 INFO [main] impl.MetricsSystemImpl
(MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
2019-04-08 08:58:16,771 DEBUG [main] util.MBeans (MBeans.java:unregister(138))
- Unregistering Hadoop:service=ResourceManager,name=MetricsSystem,sub=Control
2019-04-08 08:58:16,772 INFO [main] impl.MetricsSystemImpl
(MetricsSystemImpl.java:shutdown(607)) - ResourceManager metrics system
shutdown complete.
2019-04-08 08:58:16,772 DEBUG [main] service.AbstractService
(AbstractService.java:enterState(443)) - Service:
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore entered
state STOPPED
2019-04-08 08:58:16,772 DEBUG [main] service.AbstractService
(AbstractService.java:enterState(443)) - Service: Dispatcher entered state
STOPPED
2019-04-08 08:58:16,772 INFO [main] event.AsyncDispatcher
(AsyncDispatcher.java:serviceStop(160)) - AsyncDispatcher is draining to stop,
ignoring any new events.
2019-04-08 08:58:16,772 INFO [main] resourcemanager.ResourceManager
(ResourceManager.java:transitionToStandby(1354)) - Transitioned to standby state
2019-04-08 08:58:16,775 INFO [Node Removal Timer]
resourcemanager.NodesListManager (NodesListManager.java:run(151)) - Removed
DECOMMISSIONED node host2 from inactive nodes list
{code}
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> -----------------------------------------------------------------------
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager, test
> Affects Versions: 3.2.0
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Minor
> Attachments:
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR]
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
> Time elapsed: 3.385 s <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but
> was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
> at
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]