[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-06 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280729#comment-16280729
 ] 

Arun Suresh commented on YARN-6483:
---

Hmm... something seems to be off with Jenkins.
[~rkanter], please go ahead and commit the addendum patch if you are ok with it 
(given it is a trivial change)


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-06 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280722#comment-16280722
 ] 

genericqa commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6483 |
| GITHUB PR | https://github.com/apache/hadoop/pull/289 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18819/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279525#comment-16279525
 ] 

genericqa commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6483 |
| GITHUB PR | https://github.com/apache/hadoop/pull/289 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18804/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch, YARN-6483.branch-3.0.addendum.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279514#comment-16279514
 ] 

genericqa commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6483 |
| GITHUB PR | https://github.com/apache/hadoop/pull/289 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18803/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-branch-3.0.addendum.patch, YARN-6483-v1.patch, 
> YARN-6483.002.patch, YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279497#comment-16279497
 ] 

Robert Kanter commented on YARN-6483:
-

Sounds good.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279480#comment-16279480
 ] 

Arun Suresh commented on YARN-6483:
---

bq. .. is to simply remove (or update to not rely on XML) just the problematic 
test in branch-3.0.
Was thinking the same - given that the feature itself is working and is tested 
by the other testcases in this patch. I vote we add an addendum patch against 
this JIRA for branch-3.0 where we Ignore the test - and then create a new JIRA 
to fix this. Thoughts ?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279046#comment-16279046
 ] 

Robert Kanter commented on YARN-6483:
-

The other option if we don't want to completely revert this from branch-3.0, is 
to simply remove (or update to not rely on XML) just the problematic test in 
branch-3.0.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-05 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279006#comment-16279006
 ] 

Robert Kanter commented on YARN-6483:
-

YARN-7162 is the one that actually removes the XML parsing code.  There's more 
details on YARN-7162, but in a nutshell, we didn't want to get locked into 
supporting this exact XML formatting for the excludes file, because it could 
change once YARN-5536 is completed, which aims to add a JSON format, and make 
the format pluggable.  Not shipping the current XML format in 3.0 allows us to 
do that.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-04 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278020#comment-16278020
 ] 

Arun Suresh commented on YARN-6483:
---

Ah... How much of trouble is it to get YARN-7162 in branch-3.0 ? If it is 
non-trivial, I will revert it from branch-3.0.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-12-04 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277895#comment-16277895
 ] 

Robert Kanter commented on YARN-6483:
-

[~asuresh], did you mean to commit this to branch-3.0?  The fix version for 
this JIRA says 3.1.0.
Plus, the 
{{TestResourceTrackerService#testGracefulDecommissionDefaultTimeoutResolution}} 
added here is relying on an XML excludes file, which is currently only 
supported in trunk (YARN-7162), so it fails when run in branch-3.0 because it 
reads each line of XML as a separate host (e.g. {{host1}}, etc):
{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.706 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
testGracefulDecommissionDefaultTimeoutResolution(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
  Time elapsed: 23.913 sec  <<< FAILURE!
java.lang.AssertionError: Node state is not correct (timedout) 
expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:908)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionDefaultTimeoutResolution(TestResourceTrackerService.java:345)
{noformat}

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267314#comment-16267314
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh commented on the issue:

https://github.com/apache/hadoop/pull/289
  
Pushed in 
https://github.com/apache/hadoop/commit/b46ca7e73b8bac3fdbff0b13afe009308078acf2


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267316#comment-16267316
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh closed the pull request at:

https://github.com/apache/hadoop/pull/289


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Fix For: 3.1.0
>
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-11-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263778#comment-16263778
 ] 

Arun Suresh commented on YARN-6483:
---

Actually - closing this for now. The branch-2 version might need more work. 
Will re-open once we decide if we really need it for 2.9.x or 2.10

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-11-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263759#comment-16263759
 ] 

Arun Suresh commented on YARN-6483:
---

Committed this to trunk and branch-3.0
Will cherry-pick this to branch-2 and branch-2.9 shortly

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263757#comment-16263757
 ] 

Hudson commented on YARN-6483:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13274 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13274/])
YARN-6483. Add nodes transitioning to DECOMMISSIONING state to the list (arun 
suresh: rev b46ca7e73b8bac3fdbff0b13afe009308078acf2)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManagerEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeUpdateType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DefaultAMSProcessor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/NodeReportPBImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNodeUpdateEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> 
>