[
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431472#comment-16431472
]
Karthik Palaniappan commented on YARN-8118:
-------------------------------------------
Not sure I understand your use cases (@Jason/@Junping). For jobs that produce
shuffle data (i.e. all Hadoop-ecosystem jobs?), killing a container is just as
bad as removing the shuffle it produced. I can imagine a few reasonable
scenarios around removing nodes:
1) immediately remove nodes (regular decommissioning)
2) wait for containers to finish, but don't wait until applications finish
(scenarios where shuffle doesn't matter)
3) wait for apps to finish and let in-progress apps use decommissioning nodes
#1 is regular (forceful) decommissioning. #3 is my proposal – focused at cloud
environments with potentially drastic scaling events. #2 makes sense for
non-cloud environments where few nodes are being removed at a time. It also
makes sense when running jobs that don't produce shuffle output.
So if you're willing to tolerate a behavioral change, maybe #2 should be the
default, and #3 should be an additional flag (either an XML property or a flag
on the graceful decommission request).
However, as currently implemented, it seems like graceful decommissioning is
the worst of all worlds – wait for apps to finish, but don't let apps use
decommissioning nodes. Am I missing something obvious here? I couldn't find
anything in the original design docs discussing why it was implemented that way.
> Better utilize gracefully decommissioning node managers
> -------------------------------------------------------
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
> * Java 8
> * Hadoop 2.8.2 using client-mode graceful decommissioning
> Reporter: Karthik Palaniappan
> Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on
> doc):
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications
> to complete before shutting down, but they cannot run new containers from
> those in-progress applications. This is wasteful, particularly in
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new
> applications. That will make in-progress applications complete faster and let
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real
> lines of code (patch 001). If folks are happy with the proposal, I'll write
> unit tests and also write a patch targeted at trunk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]