[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2019-10-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951242#comment-16951242
 ] 

Hadoop QA commented on YARN-6901:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 20m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
39s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 89 unchanged - 1 fixed = 90 total (was 90) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
30s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
18s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
 is unsynchronized, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setParent(CSQueue)
 is synchronized  At AbstractCSQueue.java:synchronized  At 
AbstractCSQueue.java:[line 197] |
| Failed junit tests | 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142188#comment-16142188
 ] 

Wangda Tan commented on YARN-6901:
--

[~jlowe], Since this is a preventive fix, I just downgraded priority and moved 
it out.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(java.util.Map,
>  org.apache.hadoop.yarn.api.records.Resource, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=374, line=138 (Compiled 
> frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142169#comment-16142169
 ] 

Jason Lowe commented on YARN-6901:
--

Doing an audit of 2.8.2 Blockers and Criticals.  Do we still feel that we need 
this for 2.8.2 or should we move it out?

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(java.util.Map,
>  org.apache.hadoop.yarn.api.records.Resource, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=374, line=138 (Compiled 
> frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112850#comment-16112850
 ] 

Jason Lowe commented on YARN-6901:
--

Yep, that sounds good to me.


> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(java.util.Map,
>  org.apache.hadoop.yarn.api.records.Resource, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=374, line=138 (Compiled 
> frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111863#comment-16111863
 ] 

Wangda Tan commented on YARN-6901:
--

[~jlowe], make sense to me.

So just repeat my understanding:for this JIRA, we can do following two things:
1) Remove getParent synchronized lock and add volatile to parent ref.
2) Remove getOrderingPolicy synchronized lock and add volatile to 
orderingPolicy ref. 
In addition, keep synchronized lock for setOrderingPolicy. 

Please let me know if it is also your expectation.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111020#comment-16111020
 ] 

Jason Lowe commented on YARN-6901:
--

bq. If there's no existing issue we can find, I agree to delay the fix to 
YARN-6917.

If we're going to change the ordering policy locking then we should also change 
the getQueuePath locking here.  YARN-6917 may not get any traction in the 
short-term unless someone picks it up (unfortunately I probably won't get to it 
soon), and getQueuePath locking behavior is the cause of the deadlock on that 
2.8-ish cluster.  I was simply pointing it out to raise awareness since that 
approach solves the same locking issue and also reduces the load of every 
allocate.  Totally agree we can defer to YARN-6917 if that is going in soon, 
but I also agree with your earlier assessment that we should address this 
sooner than later in case someone tries to call these app methods without 
holding the queue lock.

bq. Actually orderingPolicy can be set only when queue do reinitialize (which 
protected by queue's synchronize lock). Do you think does it still have the 
issue in the case? 

Yes because the way this is called may change in the future,  just like the 
non-issue we are fixing here.  ;-)  If setting the ordering policy is not a 
common occurrence then adding a lock there shouldn't be an issue, and it will 
always do something sane.  The important part is to keep it lock free on the 
getter.


> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109972#comment-16109972
 ] 

Wangda Tan commented on YARN-6901:
--

bq. I agree it would be nice if getQueuePath were lockless, although long-term 
I think something like YARN-6917 would be preferable to a volatile approach. 
I'm OK with volatile in the short-term.
If there's no existing issue we can find, I agree to delay the fix to 
YARN-6917. 

bq. That does not seem to have anything to do with reaching up the hierarchy.
Inside AbstractContainerAllocator:
{code}
application
.getCSLeafQueue()
.getOrderingPolicy()
{code}
To me grabbing lock of leaf queue while holding lock of app is also not a good 
practice. 

bq. It also looks like it introduces a bug if two threads try to call 
setOrderingPolicy at the same time 
Actually orderingPolicy can be set only when queue do reinitialize (which 
protected by queue's synchronize lock). Do you think does it still have the 
issue in the case?  

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109637#comment-16109637
 ] 

Jason Lowe commented on YARN-6901:
--

bq. I checked the code of the problematic cluster, it is a little bit different 
from branch-2.8.

OK, that explains why I couldn't line up the stacktrace with branch-2.8 code or 
recent ancestors to it and also why we've never seen this in practice.

I agree it would be nice if getQueuePath were lockless, although long-term I 
think something like YARN-6917 would be preferable to a volatile approach.  I'm 
OK with volatile in the short-term.

Why does the patch change the synchronization around the ordering policy?  That 
does not seem to have anything to do with reaching up the hierarchy.  It also 
looks like it introduces a bug if two threads try to call setOrderingPolicy at 
the same time, e.g.:
# Thread 1 notices the old ordering policy, policy A, is not null and begins to 
copy the old contents into its new policy, policy B
# Thread 2 notices the old ordering policy, policy A, is not null and begins to 
copy the old contents into its new policy, policy C
# Thread 1 sets the policy to policy B
# Thread 2 sets the policy to policy C

Now we are left with a policy that contains the entities from policy A and C 
and have lost the original entities from B, whereas the old code would result 
in a policy containing the entities of policy A, B, and C regardless of which 
thread won the race for the lock.  I think we can get rid of the lock on the 
getter, but I think it is necessary on the setter or we need to do CAS-like 
logic and loop back around if someone has swapped out the policy while we were 
copying the old one.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109445#comment-16109445
 ] 

Wangda Tan commented on YARN-6901:
--

[~jlowe],

You're right, I checked the code of the problematic cluster, it is a little bit 
different from branch-2.8. When it allocates reserve container, leafqueue's 
synchronized lock isn't acquired properly. So I think the deadlock should not 
be existed in branch-2.8. 

However, I think the fix attached in the JIRA should be get into branch-2.8 in 
any case, since it fixes bad pattern which acquires lock of upper-level 
component while holding lock of lower-level component, this could likely cause 
deadlock in the future.

Downgrade priority to critical since this is more like a preventing fix.



> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-08-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108904#comment-16108904
 ] 

Jason Lowe commented on YARN-6901:
--

bq. This may not be possible, ideally moving an app from one queue to another 
need to lock queue, and assign container need to lock queue as well. It should 
be safe.

I'm not sure I understand.  Could you elaborate on how the deadlock is 
happening?  Looking at the branch-2.8 code, I don't see how LeafQueue is 
calling assignContainers on the app without holding a lock on the queue.  
Therefore I'm confused why the app's assignContainers is blocking on a queue 
lock it should already have.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108146#comment-16108146
 ] 

Wangda Tan commented on YARN-6901:
--

Thanks [~jlowe],

bq. Now I'm wondering how we did not have the lock when trying to get the leaf 
queue path...
This may not be possible, ideally moving an app from one queue to another need 
to lock queue, and assign container need to lock queue as well. It should be 
safe.

bq. On a related note, I found it odd that LeafQueue#assignContainers 
explicitly locks the app before callling assignContainers.
I just checked the code, assignContainers in app grabs synchronized lock, so I 
think we can remove the synchronize(app) from LeafQueue safely.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108126#comment-16108126
 ] 

Jason Lowe commented on YARN-6901:
--

As I understand the stacktraces, we have the preemption monitor thread who has 
locked a leaf queue and is in the process of trying to lock the application 
within the queue.  Then we have the scheduling thread which is calling 
LeafQueue's assignContainers and ultimately locks an application then tries to 
lock the leaf queue for that application.  Is that correct?  If so I'm 
wondering how we're deadlocking since the thread calling 
LeafQueue#assignContainers should already have the lock on the leaf queue.  
Maybe I missed a spot since I couldn't line up the line numbers in the stack 
trace with branch-2.8 source, but everywhere I saw LeafQueue's assignContainers 
calling the application's assignContainers method it was within the context of 
a {{synchronized(this)}} block.

Now I'm wondering how we did _not_ have the lock when trying to get the leaf 
queue path.  Are we somehow trying to examine an allocation for some other 
queue, or does the queue think this application is part of this queue but the 
app thinks it is part of some other queue?  I'm probably missing something 
basic, but I'm not sure why we don't already have the leaf queue locked when 
FiCaSchedulerApp#assignContainers is called.

On a related note, I found it odd that LeafQueue#assignContainers explicitly 
locks the app before callling assignContainers on it when it is trying to 
assign a reserved container but I didn't see a similar lock on the application 
before calling assignContainers for the non-reserved case.  Seems like we're 
either missing a lock we need or grabbing a lock unnecessarily.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107524#comment-16107524
 ] 

Wangda Tan commented on YARN-6901:
--

Thanks [~eepayne]/[~jlowe], 

I think the parent queue lock is to prevent move a sub queue to different 
hierarchy. However I think it is not allowed today:

CapacitySchedulerQueueManager.java:
{code}
if (!oldQueue.getQueuePath().equals(newQueue.getQueuePath())) {
  //Queue's cannot be moved from one hierarchy to other
  throw new IOException(queueName + " is moved from:"
  + oldQueue.getQueuePath() + " to:" + newQueue.getQueuePath()
  + " after refresh, which is not allowed.");
{code}

We should have this logic in branch-2.8 as well, since moving queue hierarchy 
causes many other issues. So add volatile to parent ref should be a simple fix 
we can do for now.

I haven't checked the preemption test failure yet, I can check it offline. The 
biggest issue I can see is, we should never do a reverse lock (e.g. lock app 
first, and before release app lock, lock leaf queue again). Currently 
preemption thread lock hierarchy in normal order (lock leaf queue then lock 
app), so we need to fix main scheduler logic accordingly.

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107522#comment-16107522
 ] 

Eric Payne commented on YARN-6901:
--

I suspect the test failure is a recurring one: YARN-5973

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getReservedContainers()
>  @bci=0, line=336 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.preemptFrom(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp,
>  org.apache.hadoop.yarn.api.records.Resource, java.util.Map, java.util.List, 
> org.apache.hadoop.yarn.api.records.Resource, java.util.Map, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=61, line=277 (Compiled 
> frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(java.util.Map,
>  org.apache.hadoop.yarn.api.records.Resource, 
> org.apache.hadoop.yarn.api.records.Resource) @bci=374, line=138 (Compiled 
> frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107433#comment-16107433
 ] 

Eric Payne commented on YARN-6901:
--

[~leftnoteasy], thanks for the heads-up on this issue.
bq. I have yet to verify it is susceptible to this deadlock
I have compared and analyzed the code bases and I believe that we are still 
susceptible to this.

>From the stack traces provided in the *Description* section, it looks like the 
>deadlock would be between the assigncontainers thread and the preemption 
>thread. We would definitely have noticed this if it were occurring, so, as 
>[~jlowe] pointed out, I'm pretty sure that we have not seen this in practice 
>(yet).

Since the deadlock involves the preemption thread, I do have a concern about 
the above unit test failure for {{TestCapacitySchedulerSurgicalPreemption}}. 
Can you comment?
{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.fail(Assert.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerPreemptionTestBase.waitNumberOfLiveContainersFromApp(CapacitySchedulerPreemptionTestBase.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSimpleSurgicalPreemption(TestCapacitySchedulerSurgicalPreemption.java:143)
{noformat}

> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-31 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107386#comment-16107386
 ] 

Jason Lowe commented on YARN-6901:
--

Thanks for the report and patch!  We have not seen this in practice on our 2.8 
builds yet, and I have yet to verify it is susceptible to this deadlock given 
we have pulled back a few features from 2.9 that may have indirectly solved 
this.

However after a quick look at the proposed fix I'm wondering if [this 
comment|https://issues.apache.org/jira/browse/YARN-3137?focusedCommentId=14306040=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306040]
 in YARN-3137 applies here.  I proposed to get rid of a lock in a similar 
manner but there was concern that the queue hierarchy could change during 
refreshQueues and end up having children that don't point back to their 
parents, etc.  As I commented in that other JIRA I'm not fully convinced it's a 
real issue, but I never got a complete answer to my questions there.  Seems 
like those same concerns people had could happen here with the proposed patch.


> A CapacityScheduler app->LeafQueue deadlock found in branch-2.8 
> 
>
> Key: YARN-6901
> URL: https://issues.apache.org/jira/browse/YARN-6901
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6901.branch-2.8.001.patch
>
>
> Stacktrace:
> {code}
> Thread 22068: (state = BLOCKED)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
>  @bci=0, line=185 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueuePath()
>  @bci=8, line=262 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator.getCSAssignmentFromAllocateResult(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocation,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=183, line=80 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=204, line=747 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=16, line=49 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode,
>  org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer) 
> @bci=61, line=468 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(org.apache.hadoop.yarn.api.records.Resource,
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode,
>  org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceLimits, 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.SchedulingMode)
>  @bci=148, line=876 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode)
>  @bci=157, line=1149 (Compiled frame)
>  - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEvent)
>  @bci=266, line=1277 (Compiled frame)
> 
>  Thread 22124: (state = BLOCKED)
>  - 
> 

[jira] [Commented] (YARN-6901) A CapacityScheduler app->LeafQueue deadlock found in branch-2.8

2017-07-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105970#comment-16105970
 ] 

Hadoop QA commented on YARN-6901:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
48s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 18s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 88 unchanged - 1 fixed = 89 total (was 89) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
24s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 12s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent()
 is unsynchronized, 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setParent(CSQueue)
 is synchronized  At AbstractCSQueue.java:synchronized  At