[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280729#comment-16280729 ] Arun Suresh commented on YARN-6483: --- Hmm... something seems to be off with Jenkins. [~rkanter], please go ahead and commit the addendum patch if you are ok with it (given it is a trivial change) > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280722#comment-16280722 ] genericqa commented on YARN-6483: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6483 | | GITHUB PR | https://github.com/apache/hadoop/pull/289 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18819/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279525#comment-16279525 ] genericqa commented on YARN-6483: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6483 | | GITHUB PR | https://github.com/apache/hadoop/pull/289 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18804/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279514#comment-16279514 ] genericqa commented on YARN-6483: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6483 | | GITHUB PR | https://github.com/apache/hadoop/pull/289 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18803/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-branch-3.0.addendum.patch, YARN-6483-v1.patch, > YARN-6483.002.patch, YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279497#comment-16279497 ] Robert Kanter commented on YARN-6483: - Sounds good. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279480#comment-16279480 ] Arun Suresh commented on YARN-6483: --- bq. .. is to simply remove (or update to not rely on XML) just the problematic test in branch-3.0. Was thinking the same - given that the feature itself is working and is tested by the other testcases in this patch. I vote we add an addendum patch against this JIRA for branch-3.0 where we Ignore the test - and then create a new JIRA to fix this. Thoughts ? > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279046#comment-16279046 ] Robert Kanter commented on YARN-6483: - The other option if we don't want to completely revert this from branch-3.0, is to simply remove (or update to not rely on XML) just the problematic test in branch-3.0. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279006#comment-16279006 ] Robert Kanter commented on YARN-6483: - YARN-7162 is the one that actually removes the XML parsing code. There's more details on YARN-7162, but in a nutshell, we didn't want to get locked into supporting this exact XML formatting for the excludes file, because it could change once YARN-5536 is completed, which aims to add a JSON format, and make the format pluggable. Not shipping the current XML format in 3.0 allows us to do that. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278020#comment-16278020 ] Arun Suresh commented on YARN-6483: --- Ah... How much of trouble is it to get YARN-7162 in branch-3.0 ? If it is non-trivial, I will revert it from branch-3.0. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277895#comment-16277895 ] Robert Kanter commented on YARN-6483: - [~asuresh], did you mean to commit this to branch-3.0? The fix version for this JIRA says 3.1.0. Plus, the {{TestResourceTrackerService#testGracefulDecommissionDefaultTimeoutResolution}} added here is relying on an XML excludes file, which is currently only supported in trunk (YARN-7162), so it fails when run in branch-3.0 because it reads each line of XML as a separate host (e.g. {{host1}}, etc): {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.706 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService testGracefulDecommissionDefaultTimeoutResolution(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 23.913 sec <<< FAILURE! java.lang.AssertionError: Node state is not correct (timedout) expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:908) at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionDefaultTimeoutResolution(TestResourceTrackerService.java:345) {noformat} > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267314#comment-16267314 ] ASF GitHub Bot commented on YARN-6483: -- Github user juanrh commented on the issue: https://github.com/apache/hadoop/pull/289 Pushed in https://github.com/apache/hadoop/commit/b46ca7e73b8bac3fdbff0b13afe009308078acf2 > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267316#comment-16267316 ] ASF GitHub Bot commented on YARN-6483: -- Github user juanrh closed the pull request at: https://github.com/apache/hadoop/pull/289 > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Fix For: 3.1.0 > > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263778#comment-16263778 ] Arun Suresh commented on YARN-6483: --- Actually - closing this for now. The branch-2 version might need more work. Will re-open once we decide if we really need it for 2.9.x or 2.10 > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263759#comment-16263759 ] Arun Suresh commented on YARN-6483: --- Committed this to trunk and branch-3.0 Will cherry-pick this to branch-2 and branch-2.9 shortly > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Juan Rodríguez Hortalá >Assignee: Juan Rodríguez Hortalá > Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, > YARN-6483.003.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263757#comment-16263757 ] Hudson commented on YARN-6483: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13274 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13274/]) YARN-6483. Add nodes transitioning to DECOMMISSIONING state to the list (arun suresh: rev b46ca7e73b8bac3fdbff0b13afe009308078acf2) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManagerEventType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeUpdateType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DefaultAMSProcessor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/NodeReportPBImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNodeUpdateEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned to the AM > >