[
https://issues.apache.org/jira/browse/YARN-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407979#comment-16407979
]
Eric Payne commented on YARN-5973:
----------------------------------
Thanks [~dibyendu_hadoop] for working on the patch for this. I think the patch
provides a better way to wait for the container actions, but the race still
occurs about 10% of the time in my testing with the following:
{code:java}
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.047 sec <<<
FAILURE! - in
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
testPreemptionForFragmentatedCluster(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
Time elapsed: 17.027 sec <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testPreemptionForFragmentatedCluster(TestCapacitySchedulerSurgicalPreemption.java:352)
{code}
I want to understand better why the race is still occurring.
> TestCapacitySchedulerSurgicalPreemption sometimes fails
> -------------------------------------------------------
>
> Key: YARN-5973
> URL: https://issues.apache.org/jira/browse/YARN-5973
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, scheduler preemption
> Affects Versions: 2.8.0
> Reporter: Eric Payne
> Assignee: Dibyendu Karmakar
> Priority: Minor
> Attachments: YARN-5973-branch-2.8.0.001.patch
>
>
> The tests in {{TestCapacitySchedulerSurgicalPreemption}} appear to be racy.
> They often pass, but the following errors sometimes occur:
> {noformat}
> testSimpleSurgicalPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
> Time elapsed: 14.671 sec <<< FAILURE!
> java.lang.AssertionError: null
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.fail(Assert.java:95)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerPreemptionTestBase.waitNumberOfLiveContainersFromApp(CapacitySchedulerPreemptionTestBase.java:110)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSimpleSurgicalPreemption(TestCapacitySchedulerSurgicalPreemption.java:143)
> {noformat}
> {noformat}
> testSurgicalPreemptionWithAvailableResource(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
> Time elapsed: 9.503 sec <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource(TestCapacitySchedulerSurgicalPreemption.java:220)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]