[
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431557#comment-16431557
]
Karthik Palaniappan commented on YARN-8118:
-------------------------------------------
Sure – I think I get the use case you guys are describing – I'm just trying to
understand why that's different than option #2 (wait for running containers to
finish, then decommission the node immediately after).
Is the idea that those 20 minute containers would drain shuffle from
decommissioning nodes faster than the 10 minute timeout? So then Jason's
comment about gracefully decommissioning on a "sufficiently large cluster"
makes sense. So as an admin you just need to set this timeout to enough time to
finish in-progress containers, finish the current stage (e.g. the map stage),
and at least start all tasks in the next stage (e.g. the reduce stage) to drain
shuffle. But you don't necessarily need to wait for the entire application to
finish.
I still think option #2 and option #3 are both valid secondary use cases, so
I'm inclined to make an enum parameter for "graceful decommission strategy". In
terms of plumbing the flag through, using XML config is by far the easiest. But
I can see an argument that this should be a parameter on a per-decommission-rpc
basis. Thoughts?
> Better utilize gracefully decommissioning node managers
> -------------------------------------------------------
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
> * Java 8
> * Hadoop 2.8.2 using client-mode graceful decommissioning
> Reporter: Karthik Palaniappan
> Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on
> doc):
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications
> to complete before shutting down, but they cannot run new containers from
> those in-progress applications. This is wasteful, particularly in
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new
> applications. That will make in-progress applications complete faster and let
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real
> lines of code (patch 001). If folks are happy with the proposal, I'll write
> unit tests and also write a patch targeted at trunk.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]