[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-02-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142158#comment-15142158
 ] 

Jian He commented on YARN-4519:
---

committed in 2.8 too

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Fix For: 2.8.0
>
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142162#comment-15142162
 ] 

Hudson commented on YARN-4519:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9280 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9280/])
Move YARN-4519 in CHANGES.txt to 2.8 (jianhe: rev 
39a71b606a4abdc46ea6dfce2b79480b47ec18b5)
* hadoop-yarn-project/CHANGES.txt


> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Fix For: 2.8.0
>
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121857#comment-15121857
 ] 

MENG DING commented on YARN-4519:
-

Thanks [~leftnoteasy]. I will log a separate ticket for {{completedContaine}} 
once the fix of this issue is approved.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122493#comment-15122493
 ] 

Hudson commented on YARN-4519:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9202 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9202/])
YARN-4519. Potential deadlock of CapacityScheduler between decrease (jianhe: 
rev 7f46636495e23693d588b0915f464fa7afd9102e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedContainerChangeRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerResizing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Fix For: 2.9.0
>
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121063#comment-15121063
 ] 

Wangda Tan commented on YARN-4519:
--

[~mding],

It seems to me schedulerHealth doesn't need synchronized lock as well. It will 
be eventually consistent, and all maps in schedulerHealth are concurrent map, 
so we should expect it is consistent in most of the time.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-26 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117468#comment-15117468
 ] 

MENG DING commented on YARN-4519:
-

Hi, [~leftnoteasy]

bq. IIUC, after this patch, increase/decrease container logic needs to acquire 
LeafQueue's lock. Since container allocation/release acquires Leafqueue's lock 
too, race condition of container/resource will be avoided.
Yes, exactly.

bq. One question not related to the patch, it looks safe to remove synchronized 
lock of CS#completedContainerInternal, correct?
I think we don't need to synchronize the entire function with cs lock, only the 
part that updates the {{schedulerHealth}}. If you think this is worth fixing, I 
will log a separate ticket.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116648#comment-15116648
 ] 

Wangda Tan commented on YARN-4519:
--

Thanks [~mding] for working on this patch,

IIUC, after this patch, increase/decrease container logic needs to acquire 
LeafQueue's lock. Since container allocation/release acquires Leafqueue's lock 
too, race condition of container/resource will be avoided.

One question not related to the patch, it looks safe to remove synchronized 
lock of CS#completedContainerInternal, correct? 

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116460#comment-15116460
 ] 

Jian He commented on YARN-4519:
---

looks good to me overall, thanks for the effort !

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch, YARN-4519.3.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116475#comment-15116475
 ] 

Hadoop QA commented on YARN-4519:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 0 new + 361 unchanged - 8 fixed = 361 total (was 369) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 14s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12784264/YARN-4519.3.patch 

[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-11 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092054#comment-15092054
 ] 

MENG DING commented on YARN-4519:
-

The JavaDoc and unit tests error are not related to this issue.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch, YARN-4519.2.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090357#comment-15090357
 ] 

Hadoop QA commented on YARN-4519:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 145, now 144). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 45s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-07 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088361#comment-15088361
 ] 

MENG DING commented on YARN-4519:
-

Please ignore the previous patch. I think there is way for improvement.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
> Attachments: YARN-4519.1.patch
>
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2016-01-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088054#comment-15088054
 ] 

Hadoop QA commented on YARN-4519:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 145, now 144). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12781009/YARN-4519.1.patch |
| JIRA Issue | YARN-4519 |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073991#comment-15073991
 ] 

MENG DING commented on YARN-4519:
-

Hi, [~sandflee]

The delta resource is the difference between currently allocated resource and 
the target resource (or in the case of rollback, the difference between 
currently allocated resource and the last confirmed resource). For decrease 
request, for example, we need to put a CS lock around computing delta resource 
and CS.decreaseContainer, otherwise the currently allocated resource might have 
been changed in the middle by the scheduling thread causing the delta resource 
to be outdated.

decreaseContainer updates core scheduler statistics, it must be locked.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074038#comment-15074038
 ] 

sandflee commented on YARN-4519:


got it, thanks [~mding]!


> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073622#comment-15073622
 ] 

sandflee commented on YARN-4519:


sorry, I don't understand 
1, why should we put compute delta resource under CS lock?
2, why should we put decreaseContainer under CS lock?  
to avoid what race conditions?  thanks.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>Assignee: MENG DING
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072873#comment-15072873
 ] 

MENG DING commented on YARN-4519:
-

I feel that the correct solution would be simply put all decrease requests into 
a pendingDecrease list in the allocate() call (after some initial sanity 
checks, of course). And in the allocateContainersToNode() call, process all the 
pendingDecrease requests first before allocating new/increase resource. This 
would make it easy for the resource rollback too.

Also, the following code may have issues?
{code:title=CapacityScheduler.allocate}
// Pre-process increase requests
List normalizedIncreaseRequests =
checkAndNormalizeContainerChangeRequests(increaseRequests, true);

// Pre-process decrease requests
List normalizedDecreaseRequests =
checkAndNormalizeContainerChangeRequests(decreaseRequests, false);
{code}
There could be race conditions when calculating the delta resource for the 
SchedContainerchangeRequest, since the above code is not synchronized with the 
scheduler?

Thoughts, [~leftnoteasy]?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073067#comment-15073067
 ] 

MENG DING commented on YARN-4519:
-

This approach would be simpler, at the expense of acquiring a CS lock in the 
allocate call (though no worse than existing logic). 

I also think that it is necessary to move the logic of creating 
normalizedDecreaseRequests (i.e. SchedContainerChangeRequest) into the 
decreaseContainer() call (under the scope of CS lock), otherwise there would be 
race condition when creating the delta resources. What do you think?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072993#comment-15072993
 ] 

Wangda Tan commented on YARN-4519:
--

Thanks [~jianhe] found this issue and analysis from [~sandflee]/[~mding].

I think the simplest solution could be, move 
{code}
 // Decrease containers
  decreaseContainers(normalizedDecreaseRequests, application);
{code}
Out of the synchronized lock of application:
{code}
synchronized (application) {
   //...
   }
   // put it here.
{code}

And also, in {{AbstractYarnScheduler#decreaseContainers}},
It's better to move 
{code}
  boolean hasIncreaseRequest =
  attempt.removeIncreaseRequest(request.getNodeId(),
  request.getPriority(), request.getContainerId());
{code}
Into {{decreaseContainer}}.

After above changes, decrease a container needs to acquire CS lock first. And 
YARN-4136 can directly use {{decreaseContainer}} to rolllback container.

Thoughts?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073076#comment-15073076
 ] 

Wangda Tan commented on YARN-4519:
--

bq. I also think that it is necessary to move the logic of creating 
normalizedDecreaseRequests (i.e. SchedContainerChangeRequest) into the 
decreaseContainer() call (under the scope of CS lock), otherwise there would be 
race condition when creating the delta resources. What do you think?
It seems to me that race could only happens when computing delta resource, 
correct? If so, can we set delta resource of SchedContainerChangeRequest when 
we enter decreaseContainer?
And also, updateIncreaseRequests could have same race condition, does it?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073319#comment-15073319
 ] 

Wangda Tan commented on YARN-4519:
--

[~mding],
I haven't started, please go ahead if you're interested.

We need to make sure following operations are under *same* CS synchronization 
lock:
1. Compute delta resource for increase request and insert to application
2. Compute delta resource for decrease request and call CS.decreaseContainer
3. Rollback action

Race could happen if we compute delta resource under one CS lock but insert 
request under another CS lock.

Agree?

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073316#comment-15073316
 ] 

MENG DING commented on YARN-4519:
-

Yes the race only happens when computing delta resource, and yes it also 
happens to increase request.
bq.  If so, can we set delta resource of SchedContainerChangeRequest when we 
enter decreaseContainer?
I guess whichever way we take, we need to make sure the delta resource is 
computed in the scope of a CS lock (i.e., delta resource for decrease request, 
increase request, and rollback action)

I can take a shot at working out a patch if nobody is working on that yet.

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073321#comment-15073321
 ] 

MENG DING commented on YARN-4519:
-

Agreed :-)

> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)