[ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431472#comment-16431472
 ] 

Karthik Palaniappan commented on YARN-8118:
-------------------------------------------

Not sure I understand your use cases (@Jason/@Junping). For jobs that produce 
shuffle data (i.e. all Hadoop-ecosystem jobs?), killing a container is just as 
bad as removing the shuffle it produced. I can imagine a few reasonable 
scenarios around removing nodes:

1) immediately remove nodes (regular decommissioning)

2) wait for containers to finish, but don't wait until applications finish 
(scenarios where shuffle doesn't matter)

3) wait for apps to finish and let in-progress apps use decommissioning nodes

#1 is regular (forceful) decommissioning. #3 is my proposal  – focused at cloud 
environments with potentially drastic scaling events. #2 makes sense for 
non-cloud environments where few nodes are being removed at a time. It also 
makes sense when running jobs that don't produce shuffle output.

So if you're willing to tolerate a behavioral change, maybe #2 should be the 
default, and #3 should be an additional flag (either an XML property or a flag 
on the graceful decommission request).

However, as currently implemented, it seems like graceful decommissioning is 
the worst of all worlds – wait for apps to finish, but don't let apps use 
decommissioning nodes. Am I missing something obvious here? I couldn't find 
anything in the original design docs discussing why it was implemented that way.

> Better utilize gracefully decommissioning node managers
> -------------------------------------------------------
>
>                 Key: YARN-8118
>                 URL: https://issues.apache.org/jira/browse/YARN-8118
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 2.8.2
>         Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>            Reporter: Karthik Palaniappan
>            Priority: Major
>         Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to