[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814427#comment-16814427
 ] 

Prabhu Joseph commented on YARN-9462:
-------------------------------------

As per the stdout from failed testcase, the {{NodesListManager#Node Removal 
Timer}} does decrement {{decrInactiveNMMetrics}} after the validation by 
testcase. This happens when both testcase thread (latch await) and Node Removal 
Timer sleeps at same time. This can be fixed by having latch await timeout more 
than the timer check interval.

{code}
2019-04-08 08:58:16,771 INFO  [main] impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
2019-04-08 08:58:16,771 DEBUG [main] util.MBeans (MBeans.java:unregister(138)) 
- Unregistering Hadoop:service=ResourceManager,name=MetricsSystem,sub=Control
2019-04-08 08:58:16,772 INFO  [main] impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(607)) - ResourceManager metrics system 
shutdown complete.
2019-04-08 08:58:16,772 DEBUG [main] service.AbstractService 
(AbstractService.java:enterState(443)) - Service: 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore entered 
state STOPPED
2019-04-08 08:58:16,772 DEBUG [main] service.AbstractService 
(AbstractService.java:enterState(443)) - Service: Dispatcher entered state 
STOPPED
2019-04-08 08:58:16,772 INFO  [main] event.AsyncDispatcher 
(AsyncDispatcher.java:serviceStop(160)) - AsyncDispatcher is draining to stop, 
ignoring any new events.
2019-04-08 08:58:16,772 INFO  [main] resourcemanager.ResourceManager 
(ResourceManager.java:transitionToStandby(1354)) - Transitioned to standby state
2019-04-08 08:58:16,775 INFO  [Node Removal Timer] 
resourcemanager.NodesListManager (NodesListManager.java:run(151)) - Removed 
DECOMMISSIONED node host2 from inactive nodes list
{code}

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> -----------------------------------------------------------------------
>
>                 Key: YARN-9462
>                 URL: https://issues.apache.org/jira/browse/YARN-9462
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, test
>    Affects Versions: 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Minor
>         Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>       at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>       at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to