[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908180#comment-13908180 ] Hudson commented on YARN-1398: -- FAILURE: Integrated in Hadoop-Yarn-trunk #488 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/488/]) YARN-1398. Fixed a deadlock in ResourceManager between users requesting queue-acls and completing containers. Contributed by Vinod Kumar Vavilapalli. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570415) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908319#comment-13908319 ] Hudson commented on YARN-1398: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1680 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1680/]) YARN-1398. Fixed a deadlock in ResourceManager between users requesting queue-acls and completing containers. Contributed by Vinod Kumar Vavilapalli. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570415) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908397#comment-13908397 ] Hudson commented on YARN-1398: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1705 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1705/]) YARN-1398. Fixed a deadlock in ResourceManager between users requesting queue-acls and completing containers. Contributed by Vinod Kumar Vavilapalli. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570415) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907709#comment-13907709 ] Arun C Murthy commented on YARN-1398: - +1 lgtm. I think this was an oversight in YARN-569. Let's get this in asap. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907713#comment-13907713 ] Jian He commented on YARN-1398: --- took a look also, lgtm, + 1 Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907773#comment-13907773 ] Hadoop QA commented on YARN-1398: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630180/YARN-1398-20140220.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3138//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3138//console This message is automatically generated. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907810#comment-13907810 ] Hadoop QA commented on YARN-1398: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630180/YARN-1398-20140220.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3140//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3140//console This message is automatically generated. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907874#comment-13907874 ] Vinod Kumar Vavilapalli commented on YARN-1398: --- Tx for the quick reviews, [~jianhe] and [~acmurthy]. I am checking this in now. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907911#comment-13907911 ] Hudson commented on YARN-1398: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5201 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5201/]) YARN-1398. Fixed a deadlock in ResourceManager between users requesting queue-acls and completing containers. Contributed by Vinod Kumar Vavilapalli. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1570415) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1398-20140220.txt getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861194#comment-13861194 ] Sunil G commented on YARN-1398: --- As per YARN-325, this issue was fixed before 2.1.0. But in 2.1.0, we can see like below ParentQueue.completedContainer while holding a lock on the LeafQueue. This can cause same issue which is mentioned in YARN-325. Is there any reason why the ParentQueue.completedContainer call is added back with holding the lock on leaf queue. Because as per the YARN-325 fix, the fix was to remove the same. And this has mentioned in the comments too. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Priority: Critical getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861236#comment-13861236 ] Sunil G commented on YARN-1398: --- During YARN-569 defect fix for adding a scheduling policy, the below code segment is added back in leafqueue lock segment // Inform the parent queue getParent().completedContainer(clusterResource, application, node, rmContainer, null, event, this); Pls let know whether this call is really required in the synchronized block of Leafqueue completedContainer call. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Priority: Critical getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826421#comment-13826421 ] Rohith Sharma K S commented on YARN-1398: - Hi Sunil, I think this is same as https://issues.apache.org/jira/i#browse/YARN-325. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Priority: Critical getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1#6144)