[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065053#comment-15065053 ] Wangda Tan commented on YARN-4416: -- Committed to branch-2.8. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061875#comment-15061875 ] Naganarasimha G R commented on YARN-4416: - Hi [~wangda], i tested TestWorkPreservingRMRestart and it runs fine in my local setup, but TestClientRMTokens and TestAMAuthorization are already existing issues for which jira's have already been raised. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063172#comment-15063172 ] Naganarasimha G R commented on YARN-4416: - Thanks for the review and commit [~wangda] & [~sunilg] > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060936#comment-15060936 ] Hudson commented on YARN-4416: -- FAILURE: Integrated in Hadoop-trunk-Commit #8978 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8978/]) YARN-4416. Deadlock due to synchronised get Methods in AbstractCSQueue. (wangda: rev 9b856d9787be5ec88ef34574b9b98755d7b669ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059312#comment-15059312 ] Wangda Tan commented on YARN-4416: -- Thanks [~Naganarasimha] for update. Latest patch looks good, +1, could you run failed unit tests locally? (just to be safe) > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056772#comment-15056772 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 0s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056470#comment-15056470 ] Naganarasimha G R commented on YARN-4416: - Hi [~wangda], YARN-4416.v2.002.patch removed synchronized lock on getNumApplications. but i presume there will be possibility that in between {{getNumPendingApplications}} and {{getNumActiveApplications}} that {{activateApplications}} can be called and number of applications count can be given as a wrong value(*more than actual*). Shall i revert for this ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056489#comment-15056489 ] Wangda Tan commented on YARN-4416: -- [~Naganarasimha], I would suggest to revert the change. And delay all OrderingPolicy-related changes to other JIRAs. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056454#comment-15056454 ] Wangda Tan commented on YARN-4416: -- [~Naganarasimha], getNumApplications calls ordering policy method. Is there any issue if we remove the synchronized lock? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054434#comment-15054434 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 19s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 47s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051679#comment-15051679 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getAbsoluteUsedCapacity() is unsynchronized,
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049493#comment-15049493 ] Wangda Tan commented on YARN-4416: -- Thanks for sharing your thoughts, [~Naganarasimha]/[~sunilg]. Looked at code, I think we need to be very careful with the locking changes of OrderingPolicy. Since it will likely cause CME in the future. I would prefer to split the JIRA into two parts: - Remove redundant locks, such as getAbsoluteCapacity. - Improve locks of OrderingPolicy. Even if it closely related to LeafQueue, but I think we should try best to decouple it from LeafQueue to better API design. Potentially we need to rethink API of OrderingPolicy. I suggest to convert both JIRAs to sub jiras of YARN-3091. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1505#comment-1505 ] Naganarasimha G R commented on YARN-4416: - Thanks [~wangda], Will convert this jira and raise a new one under YARN-3091. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043940#comment-15043940 ] Sunil G commented on YARN-4416: --- A typo Almost all api's exposed from LeafQueue is used with Lock from Queue ==> Almost all api's exposed from *AbstractComparatorOrderingPolicy* is used with Lock from Queue > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043909#comment-15043909 ] Sunil G commented on YARN-4416: --- bq.i have added locks for the access of schedulableEntities in AbstractComparatorOrderingPolicy but not completely sure of the modifications as there already synchronization on entitiesToReorder. So would like additional(/focused) review for this part in particular AbstractComparatorOrderingPolicy or OrderingPolicy is accessed under the lock from LeafQueue. This dependency does exists now. I feel, its better to access this via LeafQueue lock. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043932#comment-15043932 ] Sunil G commented on YARN-4416: --- Sorry, I was not very clear in my earlier comments. Almost all api's exposed from LeafQueue is used with Lock from Queue. Hence with this new lock, we are getting a hierarchy. Is this intentional.? Because we are going to have a new lock in a major code path. Also In LeafQueue#assignContainers {code} for (Iterator assignmentIterator = orderingPolicy.getAssignmentIterator(); assignmentIterator.hasNext();) { FiCaSchedulerApp application = assignmentIterator.next(); {code} we access the iterator from ordering policy under LeafQueue lock, so I could see that, now we have some methods in LeafQueue which is removed with LeafQueue lock and directly used only new lock from OrderingPolicy. So we need to slightly careful here as we should ensure we do not delete any item w/o LeafQueue lock. (we are now doing under LeafQueue lock, hence no issues as of now) > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043941#comment-15043941 ] Naganarasimha G R commented on YARN-4416: - [~sunilg], bq. Hence with this new lock, we are getting a hierarchy. Is this intentional.? Yes Sunil, even i was skeptical about it, but went ahead with [~wangda]'s [suggestion|https://issues.apache.org/jira/browse/YARN-4416?focusedCommentId=15038560=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15038560] as there were similar read write locks held in queueCapacity, resource-usage & some methods were already updating them without locks on LeafQueue. Further was of the opinion that Ordering policy should not be dependent on LeafQueue for ensuring multithreaded consistency as its independent entity and can be used else where. bq. we access the iterator from ordering policy under LeafQueue lock, so I could see that, now we have some methods in LeafQueue which is removed with LeafQueue lock and directly used only new lock from OrderingPolicy. Still all the methods which are modifying the Ordering policy is done holding lock on LeafQueue and if in future if any other place they modify they need to ensure first lock on Leaf queue is held. Also TreeSet iterator failsfast when the underlying set gets modified But Anyway need to evaluate the impact on the performance. Planning to run SLS with and without these changes to validate it. Further IMO i think we could have read write lock in LeafQueue which would better avoid all Synchronized locks on LeafQueue for the getter(/reads) in the leaf queue. Thoughts ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044307#comment-15044307 ] Sunil G commented on YARN-4416: --- I also agree that we need to make ordering policy independent. But a fail fast iterator will also be pblm as we have an open loophole to change some contents in SchedulableEntity. A discussion took place while doing priority with Jian on same line. And we dropped the plan to have locks inside ordering policy due to tight coupling with leafqueue. Looping [~jianhe] also to the thread. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044316#comment-15044316 ] Naganarasimha G R commented on YARN-4416: - [~sunilg], Hmm true, but any other way to avoid sync locks for the get API's ? I feel thats really not good its like web ui,CLI, REST everybody access Queue to get information and if any problem else where Main Scheduler Thread can get stuck. Also we can have unexpected deadlocks for read calls like one in the attached stack trace. Can Read/Write locks in the leaf queue be an option ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043293#comment-15043293 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 139, now 137). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 30s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 57s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getAbsoluteUsedCapacity() is unsynchronized,
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043496#comment-15043496 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 9s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 59s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12775942/YARN-4416.v1.002.patch | | JIRA
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042779#comment-15042779 ] Naganarasimha G R commented on YARN-4416: - Also there is unused variable in LeafQueue {code} // absolute capacity as a resource (based on cluster resource) private Resource absoluteCapacityResource = Resources.none(); ... private void updateAbsoluteCapacityResource(Resource clusterResource) { absoluteCapacityResource = Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager .getResourceByLabel(RMNodeLabelsManager.NO_LABEL, clusterResource), queueCapacities.getAbsoluteCapacity(), minimumAllocation); } {code} Was not sure why this is required. If not required shall i delete it ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038560#comment-15038560 ] Wangda Tan commented on YARN-4416: -- Thanks for reporting this issue, [~Naganarasimha]. Looked at the code, all methods used by org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#toString don't need to be synchronized: - queueCapacity, resource-usage has their own read/write lock. - numContainers is volatile. - read/write lock could be added to OrderingPolicy. Read operations don't need synchronized. So getNumApplications doesn't need synchronized. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)