[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2019-07-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891568#comment-16891568
 ] 

Klaus Ma commented on YARN-624:
---

do we support Gang-scheduling right now?

> Support gang scheduling in the AM RM protocol
> -
>
> Key: YARN-624
> URL: https://issues.apache.org/jira/browse/YARN-624
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, scheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Major
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
> scheduler runs a set of tasks when they can all be run at the same time, 
> would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they 
> get all the ones they need.  However, this lends itself to deadlocks when 
> different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2015-06-30 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608386#comment-14608386
 ] 

Lei Guo commented on YARN-624:
--

[~john.lil...@redpoint.net], with your use case, do you also have data locality 
preference? If yes, whether each container may have separate data locality 
preference?

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2015-06-30 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608653#comment-14608653
 ] 

john lilley commented on YARN-624:
--

We don't have any data locality needs for these use cases.  Especially the 
machine-learning case, it will be CPU bound.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2015-05-15 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545486#comment-14545486
 ] 

john lilley commented on YARN-624:
--

I would like to +1 this feature, and illustrate our use cases.  Currently there 
are two:
-- Finding strongly-connected subgraphs.  This is a central step in 
data-quality/matching applications, because after record-matching is performed 
in a distributed fashion, the match pairs (edges) must be turned into match 
groups (subgraphs).  It is very inefficient to process this using a traditional 
independent-task YARN model.
-- Machine-learning model training.  There are many models that lend themselves 
to distributed processing, and even those that don't can benefit from parallel 
genetic algorithm that competes multiple models and topologies in parallel.

In both these cases we are considering a custom AM that runs like:
-- Asks for M containers
-- Accepts as few as N containers, but only after not getting M for some period 
of time (heuristics TBD).
-- Possibly, after getting non-zero but  N containers for some time, release 
them all, sleep a while, and try again (deadlock avoidance).

This algorithm would be much better run by the RM, because it can:
-- Immediately fail the AM if N containers are impossible.
-- Avoid idle incomplete sets of containers while waiting for a sufficient gang.
-- Avoid deadlock.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2015-03-25 Thread Anubhav Srivastav (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379352#comment-14379352
 ] 

Anubhav Srivastav commented on YARN-624:


Is there any progress on implementing this feature?

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-10-03 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785556#comment-13785556
 ] 

Robert Joseph Evans commented on YARN-624:
--

[~curino] Sorry about the late reply.  I have not really tested this much with 
storm on YARN.  Most of our experiments it is negligible the amount of time it 
takes to get nodes.  But we have not really done anything serious with it, and 
adding new nodes right now is a manual operation.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-10-03 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785570#comment-13785570
 ] 

Carlo Curino commented on YARN-624:
---

Got it.. thanks anyway, please keep us posted if you get with Storm or Giraph 
to get some concrete numbers... 

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-09-24 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776847#comment-13776847
 ] 

Carlo Curino commented on YARN-624:
---

Hi Guys,

I would like to quantify what is the typical waste of resources while 
hoarding containers towards a gang for Gyraph or Storm. 
Anyone have an intuition/measure of the typical time-delay and container 
slot-time wasted while hoarding containers, before the 
useful part of the computation starts?  Thanks.. 


 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-12 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736894#comment-13736894
 ] 

Robert Joseph Evans commented on YARN-624:
--

From my perspective it does not really solve the problem for me.  It comes 
close but is not perfect.  I am interested in gang scheduling to support 
[storm on yarn|https://github.com/yahoo/storm-yarn/]

The biggest issue I have with this design is knowing the size before the 
application is launched.  The ultimate goal with storm is to have a system 
where multiple separate, but related, storm topologies are processing data 
using the same application.  We would configure the queues so that if storm 
sees a spike in demand it can steal containers from batch processing to grow a 
topology and when the spike goes away it would release those containers back.  
If the number of containers changes dynamically, by both submitting new 
topologies and growing/shrinking existing ones it is impossible to tell YARN 
what I need at the beginning.

Gang scheduling is interesting for me because there is a specific number of 
containers that each topology is configured to need when that topology is 
launched.  Without all of those containers there is no reason to launch a 
single part of the topology. I can see this happening with a modification to 
your approach where the all or nothing happens when the AM submits a request, 
and not when the AM is submitted.

I also have a hard time seeing how this would work well with other advanced 
features like preemption.  For preemption to work well with gang scheduling it 
needs to take into account that if it shoots a container in a gang of 
containers it is likely going to get back a lot more resources then just one 
container.  If it is aware of this then it can still shoot the container, but 
avoid shooting other containers needlessly because it knows what it is going to 
get back.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736970#comment-13736970
 ] 

Carlo Curino commented on YARN-624:
---

Robert, you are right, and provide a compelling example of an application that 
has dynamic needs for resources. 

There are ways around this, where you dynamically negotiate an 
increase/decrease of dedicated resources, and keep 
the AM as it is. Philosophically this keeps all interaction AM-RM as 
best-effort partial-ok, while is the client-RM 
protocol that talks about binding negotiation for resources. This would work 
and match well the current preemption 
mechanics, but I am not sure it is the best design (I haven't thought hard 
about it yet).

If we go with the design where the AM makes gang-like requests, we should make 
the preemption policy aware of
this, and act accordingly. In a sense, this boils down to a granularity 
problem, not too different from the current
size of containers to preempt vs needed capacity. But it stretches the 
precision issue by potentially a huge factor, making 
the tradeoff between under and over preempting a more subtle line to walk. 

Two ways around this:
* we might want introduce non-strictly FIFO preemptions in a queue, i.e., skip 
a large gang and preempt containers from the 
next app if the gang is way bigger than my preemption needs. This risks to 
break reservations, and has possibly funny and 
gameable semantics. Also it seems hard to gain experience on how to parametrize 
such heuristics.

* an alternative workaround is to ensure that no gang requests are satisfied 
with over-capacity containers, this keeps the
gangs out of the preemption radar. A simple way to enforce this is to set 
max-capacity the same as guaranteed capacity for 
the queues that will serve gang requests. (This might combine nicely with the 
dynamic negotiation business as well).

Another sub-problem of gang-scheduling is to track which containers belong to 
which gang (and/or which requests they serve). 
This also requires the AM to be consistent in how it uses containers it 
receives and possibly a more explicit protocol to 
say this container I am giving you is part of that gang request, otherwise a 
single preemption might break multiple topologies. 
In general this containers-to-requests tracking seems a bit too opaque at the 
moment (I have heard independent complaints from 
ApplicationMaster developers on this before).

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735965#comment-13735965
 ] 

Steve Loughran commented on YARN-624:
-

I now think we can't just offload off this to the AM, because that can lead to 
a dining philosopher's class deadlock.

example, 4 containers for service code, two AMs wanting 3 containers each, each 
with two allocated and waiting for the third...

Gang scheduling in the RM would let it satisfy the constraints for one of the 
services ( choose based on queues).

Otherwise: need a finite configurable limit on how long an AM can lease a 
container that isn't running code. That would make sense anyway if we allow a 
container to outlive a single program. 

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-10 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735967#comment-13735967
 ] 

Carlo Curino commented on YARN-624:
---

Related to this is work we just proposed in YARN-1051. We manage dynamically 
negotiated reservation of capacity at admission control. The idea is that if I 
want gang-scheduling I can declare this at submission time and the system 
accept me only if it can fit me. At that level we do constraints checking / 
knapsack (e.g., that we never promise more gang-style reservations than we can 
fit). 

This means that at run-time AM-hoarding is ok because we guarantee it to fit. 
I am aware of at least 2 limitations of this approach w.r.t. the dynamic 
version you were discussing:
* doesn't work if the application doesn't know about its needs until the AM has 
started 
* we lose large chunks of the cluster (and our previously checked constraints 
don't hold)

Neither seems a great concern, and the second one can be handle with 
re-planning in the admission-control (which we don't have yet, but its in our 
agenda).


 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736015#comment-13736015
 ] 

Alejandro Abdelnur commented on YARN-624:
-

carlo, i think what you are referring is starting an app when you can guarantee 
its planned capacity. imo, gang scheduling is a bit different beast.   

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-10 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736067#comment-13736067
 ] 

Carlo Curino commented on YARN-624:
---

I do refer to that. The argument is that provided the planned capacity is 
tightly guaranteed to be available, AM-side hoarding of container is sufficient 
to achieve gang scheduling. 
Again bar a few limitations that I mentioned (and maybe others I missed) this 
will get us close to get at least basic deadlock-free, gang-scheduling 
semantics (as we discussed in person with no real promises on location). 

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-08-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733177#comment-13733177
 ] 

Bikas Saha commented on YARN-624:
-

RM currently expects something to start in a container within a timeout after 
allocation. Either that needs to change or that will set a maximum timeout for 
which the AM can hold onto containers while waiting for a gang of them to be 
allocated. The NM could provide an API to launch but not start a process. So 
all resource copying etc could be completed and the process may be launched in 
a suspended state, ready to go. This may help in telling the RM that the 
container actually is being used. Then NM could then un-suspend and start the 
process after being told by the AM to do so.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-21 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663125#comment-13663125
 ] 

Carlo Curino commented on YARN-624:
---

I have two level of comments, the first is to clarify the intent of my earlier 
messages, and the second one to match robert description of a use case for ML 
frameworks.

Intent:
[~vinodkv], I completely agree with you that we should be very deliberate in 
choosing what use cases to support and make sure we only add features that 
target concrete and I would argue imminent use cases. 
Reflecting on a conversation I had with Alejandro, I was trying to help this 
conversation to take this form:
1) push for a broad discussion of what are the use cases for gang-scheduling we 
know of, so that we understand the entire complexity of the problem (hence the 
comments around more advanced feature such as OR of gangs)
2) let a set of core features emerge from the most concrete short-term needs we 
have (the storm example is a good example of where to start for this)
3) try to devise a protocol that supports the core features well, but that is 
amenable to future expansions (inasmuch as we can guess our future needs based 
on 1)
So in term of concrete actions I am totally aligned with your request for 
groundedness, but I think it would really benefit us to spell out also some 
of the future requirements 
so that we have a chance to designed for extensibility (similarly to what you 
guys pushed for in YARN-45, which I thought was really a good call).

ML Use Cases:
I asked Markus Weimer (ML/Systems guy in our group) to summarize why he sees 
gang scheduling to be key for ML frameworks (which I think are going to flock 
into yarn in the coming months/years). 

Here his response:
In many iterative algorithms, it is imperative to load all the data into the 
main memory to minimize execution time. This is true for systems like Giraph, 
Mahout and many others that will over time be on YARN. In order to satisfy 
their memory requirement, they will block holding on to idle slots until YARN 
has delivered all the resources needed. Exposing that pattern via gang 
scheduling seems beneficial.
Furthermore, these systems are often communications intensive. Hence, they’d 
benefit from a gang of containers that are collocated on the network. This is a 
gang-wide property of the resource ask that cannot be captured easily without 
gang scheduling. The alternatives (e.g. getting a container on each rack, then 
expand from there to see which rack “wins”) are quite wasteful in comparison.
Lastly, scheduling with alternatives at the gang level would be helpful. If 
e.g. the training data for a machine learning algorithm needs 128GB of RAM, any 
combination of containers with that amount of RAM would satisfy the need. 
However, preference is given to fewer machines as that reduces the 
communication overhead.

While I appreciate the level of urgency for what Markus describe and for Storm 
is not comparable, I see ML as an important future use case for YARN. And 
gang-scheduling seems one of those features that will determine whether people 
build on Yarn or on something like Mesos.


 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-20 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662101#comment-13662101
 ] 

Robert Joseph Evans commented on YARN-624:
--

I would love to have it right now for storm too. If you want me to sign up as a 
use case I am happy to. 

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662081#comment-13662081
 ] 

Alejandro Abdelnur commented on YARN-624:
-

[~vinodkv], my immediate use case for this is an AM that requires a complete 
set (gang) of containers before being able to accomplish their work in an 
efficient way. To achieve the desired locality of the allocations, my use case 
will use YARN-392. From my side, for now, I can drop off OR-gangs request. 
Also, having a fallback-to-normal would help the AM to do plan B with a partial 
gang.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662139#comment-13662139
 ] 

Alejandro Abdelnur commented on YARN-624:
-

[~revans2], mind describing storm requirements? Also, what is the min required 
functionality that will make a diff for storm.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662173#comment-13662173
 ] 

Vinod Kumar Vavilapalli commented on YARN-624:
--

bq. I would love to have it right now for storm too. If you want me to sign up 
as a use case I am happy to.
This is exactly the kind of concrete use-case I was asking for, not some made 
up use-case which *may* make sense. Yes please, it will be great if you sign up 
and lay down important things that you need.

Like I said, I am not against adding this, we need to make sure we are 
addressing the right requirements. Otherwise, we'll get lost in the myriad 
scheduler features that we *can* implement, but of little use.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-20 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662352#comment-13662352
 ] 

Robert Joseph Evans commented on YARN-624:
--

Storm is a real-time stream processing system.  We are working on porting this 
to run on YARN.  Storm will process one or more streams of data using a logical 
DAG of processing nodes called a topology.  This topology runs in spawned 
processes. If there are not enough processes to run a topology there is no 
point in launching any of the processes.  Hence the need for gang scheduling.

It is a very simple gang scheduling use case currently.  When a new topology is 
submitted we want to request enough resources to to run that topology.  If a 
node goes down, we are going to request enough resources to replace it, so we 
can get up and running again ASAP.  When a topology is killed we want to 
release those resources.

Long term we would like to make sure that the different containers are close to 
each other from a network topology perspective. We don't care which node or 
rack the containers are on, but we do care that they are all on the same 
node/rack as the other containers.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661750#comment-13661750
 ] 

Vinod Kumar Vavilapalli commented on YARN-624:
--

While I see a general use for this and also that having this will be 
'exciting', I'd suggest that we first figure out real use cases. Like an 
application that already needs this. We need to have some ML-type applications 
that Carlo mentoined on board before we attempt this. Otherwise, we'll just be 
implementing random features out of priority that no one needs. We could 
implement tons of scheduling features, lets prioritize and implement in that 
order.

 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658637#comment-13658637
 ] 

Alejandro Abdelnur commented on YARN-624:
-

As pointed out, supporting gang at RM/scheduler level will allow 
detection/avoidance of deadlocks. This would not be trivial (nor efficient) to 
do if gang is done at AM level.

Examples of gang request capabilities could be:

* express a set of containers in any nodes. I.e.: 10 containers in any node of 
the cluster.
* express a set of containers in a specified set of nodes. I.e.: 10 containers 
in rack1. 10 containers one in each of n1...n10
* express different sets of possible gangs that would satisfy the request: 
I.e.: 10 containers in rack1 or in rack2. 10 containers in n1...n10 or in 
n11..n20.
* indicate a timeout/fallback-to-normal of gang requests.

We should decide on what gang capabilities we want/need to address in the short 
term.


 Support gang scheduling in the AM RM protocol
 -

 Key: YARN-624
 URL: https://issues.apache.org/jira/browse/YARN-624
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
 scheduler runs a set of tasks when they can all be run at the same time, 
 would be a useful feature for YARN schedulers to support.
 Currently, AMs can approximate this by holding on to containers until they 
 get all the ones they need.  However, this lends itself to deadlocks when 
 different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira