[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338848#comment-14338848
 ] 

Craig Welch commented on YARN-3251:
---

Sorry if that wasn't clear, to reduce risk removed the minor changes in 
CSQueueUtils

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338782#comment-14338782
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700977/YARN-3251.2-6-0.2.patch
  against trunk revision dce8b9c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6759//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338860#comment-14338860
 ] 

Wangda Tan commented on YARN-3251:
--

if opposite opinions - if no opposite opinions

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338858#comment-14338858
 ] 

Wangda Tan commented on YARN-3251:
--

LGTM +1, I will commit the patch to branch-2.6 this afternoon if opposite 
opinions. Thanks!

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339140#comment-14339140
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701150/YARN-3251.2.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6760//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337305#comment-14337305
 ] 

Wangda Tan commented on YARN-3251:
--

Found few typos, removed patch and will upload again.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337348#comment-14337348
 ] 

Vinod Kumar Vavilapalli commented on YARN-3251:
---

bq. Since the target of your patch is to make a quick fix for old version, it's 
better to create a patch in branch-2.6. and the patch you created will be 
committed to branch-2.6 as well. I noticed some functionalities and interfaces 
being used in your patch are not part of 2.6. And patch I'm working on now will 
remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a 
intermediate fix in branch-2.
How about we have a separate JIRA solely focused on 2.6.1 - as we have two 
separate patches and two different contributors?

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337443#comment-14337443
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700874/YARN-3251.trunk.1.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6743//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6743//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6743//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337571#comment-14337571
 ] 

Wangda Tan commented on YARN-3251:
--

Removed patch for trunk and uploaded the same one to YARN-3265, reassigned this 
to [~cwelch].

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337465#comment-14337465
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700875/YARN-3251.trunk.1.patch
  against trunk revision caa42ad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestResourceUsage

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6744//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6744//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6744//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337567#comment-14337567
 ] 

Wangda Tan commented on YARN-3251:
--

+1 for the suggestion, moved it out of YARN-3243, changed target version and 
also updated title.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337529#comment-14337529
 ] 

Vinod Kumar Vavilapalli commented on YARN-3251:
---

Actually let's use this JIRA for 2.6.1 and YARN-3243 for the trunk fix.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.trunk.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-25 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337962#comment-14337962
 ] 

Craig Welch commented on YARN-3251:
---

bq. 1) Since the target of your patch is to make a quick fix for old version, 
it's better to create a patch in branch-2.6
done
bq. And patch I'm working on now will remove the 
CSQueueUtils.computeMaxAvailResource, so it's no need to add a intermediate fix 
in branch-2.
I suppose that depends on whether anyone needs a trunk version of the patch 
before the other changes are landed - if someone asks for it I could quickly 
update the original patch to provide it
bq. 2) I think CSQueueUtils.getAbsoluteMaxAvailCapacity doesn't hold 
child/parent's lock together, maybe we don't need to change that, could you 
confirm?
it doesn't, the change there was to insure consistency for multiple values used 
from the queue, as previously it was occurring inside a lock and that was 
guaranteed, now it isn't.  However, there's no need to lock on the parent, so I 
removed that 
bq. 3) Maybe we don't need getter/setter of absoluteMaxAvailCapacity in queue, 
a volatile float is enough?
Yes, that should be safe, done


 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334986#comment-14334986
 ] 

Jason Lowe commented on YARN-3251:
--

Sample stack trace:
{noformat}
Found one Java-level deadlock:
=
IPC Server handler 71 on 8032:
  waiting to lock monitor 0x037f9120 (object 0x00023b060ad8, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
  which is held by ResourceManager Event Processor
ResourceManager Event Processor:
  waiting to lock monitor 0x02c4b7d0 (object 0x00023aecf620, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
  which is held by IPC Server handler 71 on 8032

Java stack information for the threads listed above:
===
IPC Server handler 71 on 8032:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getQueueInfo(LeafQueue.java:451)
- waiting to lock 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023aecf620 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023af36e70 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueInfo(ParentQueue.java:214)
- locked 0x00023b0d9478 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:910)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:832)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:259)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:413)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2079)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2075)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2073)
ResourceManager Event Processor:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getParent(AbstractCSQueue.java:185)
- waiting to lock 0x00023aecf620 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:177)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.getAbsoluteMaxAvailCapacity(CSQueueUtils.java:183)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1033)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.checkLimitsToReserve(LeafQueue.java:1341)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1611)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1399)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1278)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:893)
- locked 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:758)
- locked 0x00023ceb53e0 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
- locked 0x00023b060ad8 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 

[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335021#comment-14335021
 ] 

Sunil G commented on YARN-3251:
---

[~jlowe]  
Recent getAbsoluteMaxAvailCapacity changes cause this.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335002#comment-14335002
 ] 

Jason Lowe commented on YARN-3251:
--

It looks like this is fallout from YARN-2008.  
CSQueueUtils.getAbsoluteMaxAvailCapacity is called with the lock held on the 
LeafQueue and walks up the tree, attempting to grab locks on parents as it 
goes.  That's contrary to the conventional order of locking while walking down 
the tree, and thus we can deadlock.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335044#comment-14335044
 ] 

Jason Lowe commented on YARN-3251:
--

YARN-3243 could remove the need to climb up the hierarchy to compute max avail 
capacity.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335043#comment-14335043
 ] 

Sunil G commented on YARN-3251:
---

Its better to compute the available capacity during the call to 
root.assignContainers. In that scenario, a simpler get will retrieve the 
available capacity. 

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335124#comment-14335124
 ] 

Wangda Tan commented on YARN-3251:
--

Thanks for reporting this, [~jlowe]!

Since this is a blocker for 2.7, I will create a patch for this using method 
described in YARN-3243 first before working on other related refactorings, I 
added this as a sub task of YARN-3251.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335803#comment-14335803
 ] 

Wangda Tan commented on YARN-3251:
--

[~cwelch],
Some comments,
1) Since the target of your patch is to make a quick fix for old version, it's 
better to create a patch in branch-2.6. and the patch you created will be 
committed to branch-2.6 as well. I noticed some functionalities and interfaces 
being used in your patch are not part of 2.6. And patch I'm working on now will 
remove the CSQueueUtils.computeMaxAvailResource, so it's no need to add a 
intermediate fix in branch-2.
2) I think CSQueueUtils.getAbsoluteMaxAvailCapacity doesn't hold child/parent's 
lock together, maybe we don't need to change that, could you confirm?
3) Maybe we don't need getter/setter of absoluteMaxAvailCapacity in queue, a 
volatile float is enough?

Thanks,

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3251.1.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity

2015-02-24 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335765#comment-14335765
 ] 

Craig Welch commented on YARN-3251:
---

It looks like this can occur when a call which walks down the queue tree (in 
this case, getQueueInfo()) happens at the same time as an assignContainers call 
which does not start from the root queue, which is specifically one for a 
reservedContainer where scheduleAsynchronously is false.  Essentially, it isn't 
safe to hold a lock on a queue while locking on a parent queue (as I now see 
noted in other methods in LeafQueue :/).

[YARN-3243] is potentially a long term fix, but it would be nice to fix this 
right away as it clearly is already problematic.  Also, [YARN-3243] depends on 
a number of other sizable changes which have gone in recently, meaning it will 
be difficult to apply it as a fix to older codebases, for which it would be 
very nice to have a fix.

I've attached a patch somewhat along the lines suggest by [~sunilg], it simply 
moves the acquisition of the absoluteMaxAvailCapacity outside the lock on the 
leaf queue - it will lock parent queues individually as it ascends, but it 
never holds a parent and child lock simultaneously, which is the unacceptable 
state.  It follows the pattern for other methods in LeafQueue like 
recoverContainer which access parent queues - they all are careful to make sure 
the parent queue access occurs outside any lock on themselves.  

Unfortunately it's not possible to just do this in root.assignContainers 
because of the reservedContainer case which will not invoke assignContainers on 
the root queue at any point.  Instead, absoluteMaxAvailCapacity is determined 
outside any lock on the leaf queue in assignContainers before entering the 
synchronized method which continues the logic as it is today.  

This looks to me to be the way to fix the issue with the smallest code change 
today pending other changes coming down the line.

 CapacityScheduler deadlock when computing absolute max avail capacity
 -

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Wangda Tan
Priority: Blocker

 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)