[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
[ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612962#comment-15612962 ] Hadoop QA commented on YARN-4673: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} YARN-4673 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-4673 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12789860/YARN-4673.01.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13566/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > race condition in ResourceTrackerService#nodeHeartBeat while processing > deduplicated msg > > > Key: YARN-4673 > URL: https://issues.apache.org/jira/browse/YARN-4673 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: sandflee >Assignee: sandflee > Labels: oct16-medium > Attachments: YARN-4673.01.patch > > > we could add a lock like ApplicationMasterService#allocate -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
[ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166923#comment-15166923 ] sandflee commented on YARN-4673: Hi, [~ozawa], in ResourceTrackService we may concurrently process nodeHeartBeat() with same nodeId and responseId, they may both pass the lastResonseId check, this will cause the lost of RM message. With the Nodelock, we could process one by one, and the above exception could be catched. {code} if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse .getResponseId()) { LOG.info("Received duplicate heartbeat from node " + rmNode.getNodeAddress() + " responseId=" + remoteNodeStatus.getResponseId()); return lastNodeHeartbeatResponse; } {code} actually I have not encounter the bug caused by this, but this may be a risk. > race condition in ResourceTrackerService#nodeHeartBeat while processing > deduplicated msg > > > Key: YARN-4673 > URL: https://issues.apache.org/jira/browse/YARN-4673 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4673.01.patch > > > we could add a lock like ApplicationMasterService#allocate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
[ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166867#comment-15166867 ] Tsuyoshi Ozawa commented on YARN-4673: -- Hi [~sandflee] thank you for the contribution. Could you explain the cause of the deadlock? It helps us to review your patch more fast and more correctly. > race condition in ResourceTrackerService#nodeHeartBeat while processing > deduplicated msg > > > Key: YARN-4673 > URL: https://issues.apache.org/jira/browse/YARN-4673 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4673.01.patch > > > we could add a lock like ApplicationMasterService#allocate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg
[ https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166842#comment-15166842 ] Hadoop QA commented on YARN-4673: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 16 unchanged - 3 fixed = 17 total (was 19) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 25s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 155m 36s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Unread field:ResourceTrackerService.java:[line 623] | | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourc