[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737631#comment-13737631
 ] 

Carlo Curino commented on YARN-624:
-----------------------------------

Robert,

That makes sense. I think we should have some guidelines for people on what to 
do while we work out the details of how to get gang-scheduling right. 
As I was mentioning few posts above, I also heard requests from people doing 
machine learning of rather exotic versions of gang-scheduling. 

We can definitely make preemption gang-aware, but it is not trivial to get the 
semantics and corner-cases right, in a sense what we are
really in the game of discussing is a conversion rate between capacity/fairness 
and cluster efficiency, e.g., is it worth to discard the
progress made by 200 containers for 20min to give this another application all 
its rightful capacity right away? Hard question.

Part of a longer term research I am involved in is to quantify this trade offs 
more clearly by projecting both in an economical value space. 
But this is not going to be ready for a long while.

                
> Support gang scheduling in the AM RM protocol
> ---------------------------------------------
>
>                 Key: YARN-624
>                 URL: https://issues.apache.org/jira/browse/YARN-624
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, scheduler
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
> scheduler runs a set of tasks when they can all be run at the same time, 
> would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they 
> get all the ones they need.  However, this lends itself to deadlocks when 
> different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to