[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

Robert Kanter (JIRA) Mon, 09 Apr 2018 17:04:30 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431524#comment-16431524
 ]


Robert Kanter commented on YARN-8118:
-------------------------------------

Thanks for your ideas [~Karthik Palaniappan].

Consider this scenario: You want to gracefully decommission a node with a 
timeout of 10 minutes.  Suppose you have a job that has containers which 
normally take 20 minutes to run.  At this point, we wouldn't want to start any 
of those containers on that node because they're not going to finish before the 
decom timeout ends, so they'd just get killed halfway through; instead of 
running on another node, which would be faster overall.

I'm fine with adding an option for the behavior you're describing, but I don't 
think we can change the default behavior here (it's also not a "bugfix" like 
your design doc suggests; as [~jlowe], [~djp], and my above scenario show, 
there are valid use cases for the current behavior).  

> Better utilize gracefully decommissioning node managers
> -------------------------------------------------------
>
>                 Key: YARN-8118
>                 URL: https://issues.apache.org/jira/browse/YARN-8118
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 2.8.2
>         Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>            Reporter: Karthik Palaniappan
>            Priority: Major
>         Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

Reply via email to