[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297358#comment-17297358 ] Peter Bacsko commented on YARN-10672: - +1 overall. Committed changes to branch-3.2 too. Thanks [~snemeth] for the contribution. > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.4.0 > > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch, > YARN-10672.branch-3.2.001.patch, YARN-10672.branch-3.3.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297345#comment-17297345 ] Peter Bacsko commented on YARN-10672: - Ok, test failures seem to be totally unrelated. The change only concerns "TestReservations" and modifies the order of stubbing. > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.4.0 > > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch, > YARN-10672.branch-3.2.001.patch, YARN-10672.branch-3.3.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297342#comment-17297342 ] Hadoop QA commented on YARN-10672: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 6s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 41s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 34s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green}{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 44s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/751/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 53s{color} | {color:black}{color} | {color:black}{color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | |
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297000#comment-17297000 ] Hadoop QA commented on YARN-10672: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 30m 50s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.3 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 26s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 30s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 4s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 27s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 50s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/735/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}207m 15s{color} | {color:black}{color} | {color:black}{color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation | | | hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors |
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296362#comment-17296362 ] Peter Bacsko commented on YARN-10672: - +1 LGTM. Thanks [~snemeth], committed to trunk. You might want to consider backporting this to branch-3.3 and branch-3.2. > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296017#comment-17296017 ] Szilard Nemeth commented on YARN-10672: --- As per our offline discussion with [~pbacsko], I'm creating a follow-up to consolidate this and YARN-10447. > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295881#comment-17295881 ] Peter Bacsko commented on YARN-10672: - The solution is much straightforward than mine in YARN-10447. Actually we might consider applying this to TestLeafQueue as well, while undoing my changes, because that's more complicated (I had no patience to go deeper with Mockito internal behavior, I just thought well, disable that thread and that's enough). > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295875#comment-17295875 ] Peter Bacsko commented on YARN-10672: - It's basically the same as YARN-10447. Must have been a good debugging session... > All testcases in TestReservations are flaky > --- > > Key: YARN-10672 > URL: https://issues.apache.org/jira/browse/YARN-10672 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot > 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at > 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, > YARN-10672-debuglogs.patch, YARN-10672.001.patch > > > All testcases in TestReservations are flaky > Running a particular test in TestReservations 100 times never passes all the > time. > For example, let's run testReservationNoContinueLook 100 times. For me, it > produced 39 failed and 61 passed results. > Sometimes just 1 out of 100 runs is failed. > Screenshot is attached. > Stacktrace: > {code:java} > java.lang.AssertionError: > Expected :2048 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) > {code} > The test fails here: > {code:java} > // Start testing... > // Only AM > TestUtils.applyResourceCommitRequest(clusterResource, > a.assignContainers(clusterResource, node_0, > new ResourceLimits(clusterResource), > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); > assertEquals(2 * GB, a.getUsedResources().getMemorySize()); > {code} > With some debugging (patch attached), I realized that sometimes there are no > registered nodes so the AM can't be allocated and test will fail: > {code:java} > 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator > (RegularContainerAllocator.java:canAssign(312)) - **Can't assign > container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f > {code} > In these cases, this is also printed from > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes: > {code:java} > 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler > (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real > getNumClusterNodes > {code} > h2. Let's break this down: > 1. The mocking happens in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration, > boolean): > {code:java} > cs.setRMContext(spyRMContext); > cs.init(csConf); > cs.start(); > when(cs.getNumClusterNodes()).thenReturn(3); > {code} > Under no circumstances this could be allowed to return any other value than 3. > However, as mentioned above, sometimes the real method of > 'getNumClusterNodes' is called on CapacityScheduler. > 2. Sometimes, this gets printed to the console: > {code:java} > org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Integer cannot be returned by isMultiNodePlacementEnabled() > isMultiNodePlacementEnabled() should return boolean > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566) > at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at >
[jira] [Commented] (YARN-10672) All testcases in TestReservations are flaky
[ https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295668#comment-17295668 ] Hadoop QA commented on YARN-10672: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 41s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 6s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/artifact/out/whitespace-eol.txt{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 59s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK