[ 
https://issues.apache.org/jira/browse/YARN-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407979#comment-16407979
 ] 

Eric Payne commented on YARN-5973:
----------------------------------

Thanks [~dibyendu_hadoop] for working on the patch for this. I think the patch 
provides a better way to wait for the container actions, but the race still 
occurs about 10% of the time in my testing with the following:
{code:java}
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 39.047 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
testPreemptionForFragmentatedCluster(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
  Time elapsed: 17.027 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<2>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testPreemptionForFragmentatedCluster(TestCapacitySchedulerSurgicalPreemption.java:352)
{code}
I want to understand better why the race is still occurring.

> TestCapacitySchedulerSurgicalPreemption sometimes fails
> -------------------------------------------------------
>
>                 Key: YARN-5973
>                 URL: https://issues.apache.org/jira/browse/YARN-5973
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, scheduler preemption
>    Affects Versions: 2.8.0
>            Reporter: Eric Payne
>            Assignee: Dibyendu Karmakar
>            Priority: Minor
>         Attachments: YARN-5973-branch-2.8.0.001.patch
>
>
> The tests in {{TestCapacitySchedulerSurgicalPreemption}} appear to be racy. 
> They often pass, but  the following errors sometimes occur:
> {noformat}
> testSimpleSurgicalPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
>   Time elapsed: 14.671 sec  <<< FAILURE!
> java.lang.AssertionError: null
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.fail(Assert.java:95)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerPreemptionTestBase.waitNumberOfLiveContainersFromApp(CapacitySchedulerPreemptionTestBase.java:110)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSimpleSurgicalPreemption(TestCapacitySchedulerSurgicalPreemption.java:143)
> {noformat}
> {noformat}
> testSurgicalPreemptionWithAvailableResource(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
>   Time elapsed: 9.503 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<2>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource(TestCapacitySchedulerSurgicalPreemption.java:220)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to