[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891568#comment-16891568 ] Klaus Ma commented on YARN-624: --- do we support Gang-scheduling right now? > Support gang scheduling in the AM RM protocol > - > > Key: YARN-624 > URL: https://issues.apache.org/jira/browse/YARN-624 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, scheduler >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza >Priority: Major > > Per discussion on YARN-392 and elsewhere, gang scheduling, in which a > scheduler runs a set of tasks when they can all be run at the same time, > would be a useful feature for YARN schedulers to support. > Currently, AMs can approximate this by holding on to containers until they > get all the ones they need. However, this lends itself to deadlocks when > different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608386#comment-14608386 ] Lei Guo commented on YARN-624: -- [~john.lil...@redpoint.net], with your use case, do you also have data locality preference? If yes, whether each container may have separate data locality preference? Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608653#comment-14608653 ] john lilley commented on YARN-624: -- We don't have any data locality needs for these use cases. Especially the machine-learning case, it will be CPU bound. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545486#comment-14545486 ] john lilley commented on YARN-624: -- I would like to +1 this feature, and illustrate our use cases. Currently there are two: -- Finding strongly-connected subgraphs. This is a central step in data-quality/matching applications, because after record-matching is performed in a distributed fashion, the match pairs (edges) must be turned into match groups (subgraphs). It is very inefficient to process this using a traditional independent-task YARN model. -- Machine-learning model training. There are many models that lend themselves to distributed processing, and even those that don't can benefit from parallel genetic algorithm that competes multiple models and topologies in parallel. In both these cases we are considering a custom AM that runs like: -- Asks for M containers -- Accepts as few as N containers, but only after not getting M for some period of time (heuristics TBD). -- Possibly, after getting non-zero but N containers for some time, release them all, sleep a while, and try again (deadlock avoidance). This algorithm would be much better run by the RM, because it can: -- Immediately fail the AM if N containers are impossible. -- Avoid idle incomplete sets of containers while waiting for a sufficient gang. -- Avoid deadlock. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379352#comment-14379352 ] Anubhav Srivastav commented on YARN-624: Is there any progress on implementing this feature? Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785556#comment-13785556 ] Robert Joseph Evans commented on YARN-624: -- [~curino] Sorry about the late reply. I have not really tested this much with storm on YARN. Most of our experiments it is negligible the amount of time it takes to get nodes. But we have not really done anything serious with it, and adding new nodes right now is a manual operation. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785570#comment-13785570 ] Carlo Curino commented on YARN-624: --- Got it.. thanks anyway, please keep us posted if you get with Storm or Giraph to get some concrete numbers... Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776847#comment-13776847 ] Carlo Curino commented on YARN-624: --- Hi Guys, I would like to quantify what is the typical waste of resources while hoarding containers towards a gang for Gyraph or Storm. Anyone have an intuition/measure of the typical time-delay and container slot-time wasted while hoarding containers, before the useful part of the computation starts? Thanks.. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736894#comment-13736894 ] Robert Joseph Evans commented on YARN-624: -- From my perspective it does not really solve the problem for me. It comes close but is not perfect. I am interested in gang scheduling to support [storm on yarn|https://github.com/yahoo/storm-yarn/] The biggest issue I have with this design is knowing the size before the application is launched. The ultimate goal with storm is to have a system where multiple separate, but related, storm topologies are processing data using the same application. We would configure the queues so that if storm sees a spike in demand it can steal containers from batch processing to grow a topology and when the spike goes away it would release those containers back. If the number of containers changes dynamically, by both submitting new topologies and growing/shrinking existing ones it is impossible to tell YARN what I need at the beginning. Gang scheduling is interesting for me because there is a specific number of containers that each topology is configured to need when that topology is launched. Without all of those containers there is no reason to launch a single part of the topology. I can see this happening with a modification to your approach where the all or nothing happens when the AM submits a request, and not when the AM is submitted. I also have a hard time seeing how this would work well with other advanced features like preemption. For preemption to work well with gang scheduling it needs to take into account that if it shoots a container in a gang of containers it is likely going to get back a lot more resources then just one container. If it is aware of this then it can still shoot the container, but avoid shooting other containers needlessly because it knows what it is going to get back. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736970#comment-13736970 ] Carlo Curino commented on YARN-624: --- Robert, you are right, and provide a compelling example of an application that has dynamic needs for resources. There are ways around this, where you dynamically negotiate an increase/decrease of dedicated resources, and keep the AM as it is. Philosophically this keeps all interaction AM-RM as best-effort partial-ok, while is the client-RM protocol that talks about binding negotiation for resources. This would work and match well the current preemption mechanics, but I am not sure it is the best design (I haven't thought hard about it yet). If we go with the design where the AM makes gang-like requests, we should make the preemption policy aware of this, and act accordingly. In a sense, this boils down to a granularity problem, not too different from the current size of containers to preempt vs needed capacity. But it stretches the precision issue by potentially a huge factor, making the tradeoff between under and over preempting a more subtle line to walk. Two ways around this: * we might want introduce non-strictly FIFO preemptions in a queue, i.e., skip a large gang and preempt containers from the next app if the gang is way bigger than my preemption needs. This risks to break reservations, and has possibly funny and gameable semantics. Also it seems hard to gain experience on how to parametrize such heuristics. * an alternative workaround is to ensure that no gang requests are satisfied with over-capacity containers, this keeps the gangs out of the preemption radar. A simple way to enforce this is to set max-capacity the same as guaranteed capacity for the queues that will serve gang requests. (This might combine nicely with the dynamic negotiation business as well). Another sub-problem of gang-scheduling is to track which containers belong to which gang (and/or which requests they serve). This also requires the AM to be consistent in how it uses containers it receives and possibly a more explicit protocol to say this container I am giving you is part of that gang request, otherwise a single preemption might break multiple topologies. In general this containers-to-requests tracking seems a bit too opaque at the moment (I have heard independent complaints from ApplicationMaster developers on this before). Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735965#comment-13735965 ] Steve Loughran commented on YARN-624: - I now think we can't just offload off this to the AM, because that can lead to a dining philosopher's class deadlock. example, 4 containers for service code, two AMs wanting 3 containers each, each with two allocated and waiting for the third... Gang scheduling in the RM would let it satisfy the constraints for one of the services ( choose based on queues). Otherwise: need a finite configurable limit on how long an AM can lease a container that isn't running code. That would make sense anyway if we allow a container to outlive a single program. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735967#comment-13735967 ] Carlo Curino commented on YARN-624: --- Related to this is work we just proposed in YARN-1051. We manage dynamically negotiated reservation of capacity at admission control. The idea is that if I want gang-scheduling I can declare this at submission time and the system accept me only if it can fit me. At that level we do constraints checking / knapsack (e.g., that we never promise more gang-style reservations than we can fit). This means that at run-time AM-hoarding is ok because we guarantee it to fit. I am aware of at least 2 limitations of this approach w.r.t. the dynamic version you were discussing: * doesn't work if the application doesn't know about its needs until the AM has started * we lose large chunks of the cluster (and our previously checked constraints don't hold) Neither seems a great concern, and the second one can be handle with re-planning in the admission-control (which we don't have yet, but its in our agenda). Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736015#comment-13736015 ] Alejandro Abdelnur commented on YARN-624: - carlo, i think what you are referring is starting an app when you can guarantee its planned capacity. imo, gang scheduling is a bit different beast. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736067#comment-13736067 ] Carlo Curino commented on YARN-624: --- I do refer to that. The argument is that provided the planned capacity is tightly guaranteed to be available, AM-side hoarding of container is sufficient to achieve gang scheduling. Again bar a few limitations that I mentioned (and maybe others I missed) this will get us close to get at least basic deadlock-free, gang-scheduling semantics (as we discussed in person with no real promises on location). Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733177#comment-13733177 ] Bikas Saha commented on YARN-624: - RM currently expects something to start in a container within a timeout after allocation. Either that needs to change or that will set a maximum timeout for which the AM can hold onto containers while waiting for a gang of them to be allocated. The NM could provide an API to launch but not start a process. So all resource copying etc could be completed and the process may be launched in a suspended state, ready to go. This may help in telling the RM that the container actually is being used. Then NM could then un-suspend and start the process after being told by the AM to do so. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663125#comment-13663125 ] Carlo Curino commented on YARN-624: --- I have two level of comments, the first is to clarify the intent of my earlier messages, and the second one to match robert description of a use case for ML frameworks. Intent: [~vinodkv], I completely agree with you that we should be very deliberate in choosing what use cases to support and make sure we only add features that target concrete and I would argue imminent use cases. Reflecting on a conversation I had with Alejandro, I was trying to help this conversation to take this form: 1) push for a broad discussion of what are the use cases for gang-scheduling we know of, so that we understand the entire complexity of the problem (hence the comments around more advanced feature such as OR of gangs) 2) let a set of core features emerge from the most concrete short-term needs we have (the storm example is a good example of where to start for this) 3) try to devise a protocol that supports the core features well, but that is amenable to future expansions (inasmuch as we can guess our future needs based on 1) So in term of concrete actions I am totally aligned with your request for groundedness, but I think it would really benefit us to spell out also some of the future requirements so that we have a chance to designed for extensibility (similarly to what you guys pushed for in YARN-45, which I thought was really a good call). ML Use Cases: I asked Markus Weimer (ML/Systems guy in our group) to summarize why he sees gang scheduling to be key for ML frameworks (which I think are going to flock into yarn in the coming months/years). Here his response: In many iterative algorithms, it is imperative to load all the data into the main memory to minimize execution time. This is true for systems like Giraph, Mahout and many others that will over time be on YARN. In order to satisfy their memory requirement, they will block holding on to idle slots until YARN has delivered all the resources needed. Exposing that pattern via gang scheduling seems beneficial. Furthermore, these systems are often communications intensive. Hence, they’d benefit from a gang of containers that are collocated on the network. This is a gang-wide property of the resource ask that cannot be captured easily without gang scheduling. The alternatives (e.g. getting a container on each rack, then expand from there to see which rack “wins”) are quite wasteful in comparison. Lastly, scheduling with alternatives at the gang level would be helpful. If e.g. the training data for a machine learning algorithm needs 128GB of RAM, any combination of containers with that amount of RAM would satisfy the need. However, preference is given to fewer machines as that reduces the communication overhead. While I appreciate the level of urgency for what Markus describe and for Storm is not comparable, I see ML as an important future use case for YARN. And gang-scheduling seems one of those features that will determine whether people build on Yarn or on something like Mesos. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662101#comment-13662101 ] Robert Joseph Evans commented on YARN-624: -- I would love to have it right now for storm too. If you want me to sign up as a use case I am happy to. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662081#comment-13662081 ] Alejandro Abdelnur commented on YARN-624: - [~vinodkv], my immediate use case for this is an AM that requires a complete set (gang) of containers before being able to accomplish their work in an efficient way. To achieve the desired locality of the allocations, my use case will use YARN-392. From my side, for now, I can drop off OR-gangs request. Also, having a fallback-to-normal would help the AM to do plan B with a partial gang. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662139#comment-13662139 ] Alejandro Abdelnur commented on YARN-624: - [~revans2], mind describing storm requirements? Also, what is the min required functionality that will make a diff for storm. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662173#comment-13662173 ] Vinod Kumar Vavilapalli commented on YARN-624: -- bq. I would love to have it right now for storm too. If you want me to sign up as a use case I am happy to. This is exactly the kind of concrete use-case I was asking for, not some made up use-case which *may* make sense. Yes please, it will be great if you sign up and lay down important things that you need. Like I said, I am not against adding this, we need to make sure we are addressing the right requirements. Otherwise, we'll get lost in the myriad scheduler features that we *can* implement, but of little use. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662352#comment-13662352 ] Robert Joseph Evans commented on YARN-624: -- Storm is a real-time stream processing system. We are working on porting this to run on YARN. Storm will process one or more streams of data using a logical DAG of processing nodes called a topology. This topology runs in spawned processes. If there are not enough processes to run a topology there is no point in launching any of the processes. Hence the need for gang scheduling. It is a very simple gang scheduling use case currently. When a new topology is submitted we want to request enough resources to to run that topology. If a node goes down, we are going to request enough resources to replace it, so we can get up and running again ASAP. When a topology is killed we want to release those resources. Long term we would like to make sure that the different containers are close to each other from a network topology perspective. We don't care which node or rack the containers are on, but we do care that they are all on the same node/rack as the other containers. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661750#comment-13661750 ] Vinod Kumar Vavilapalli commented on YARN-624: -- While I see a general use for this and also that having this will be 'exciting', I'd suggest that we first figure out real use cases. Like an application that already needs this. We need to have some ML-type applications that Carlo mentoined on board before we attempt this. Otherwise, we'll just be implementing random features out of priority that no one needs. We could implement tons of scheduling features, lets prioritize and implement in that order. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658637#comment-13658637 ] Alejandro Abdelnur commented on YARN-624: - As pointed out, supporting gang at RM/scheduler level will allow detection/avoidance of deadlocks. This would not be trivial (nor efficient) to do if gang is done at AM level. Examples of gang request capabilities could be: * express a set of containers in any nodes. I.e.: 10 containers in any node of the cluster. * express a set of containers in a specified set of nodes. I.e.: 10 containers in rack1. 10 containers one in each of n1...n10 * express different sets of possible gangs that would satisfy the request: I.e.: 10 containers in rack1 or in rack2. 10 containers in n1...n10 or in n11..n20. * indicate a timeout/fallback-to-normal of gang requests. We should decide on what gang capabilities we want/need to address in the short term. Support gang scheduling in the AM RM protocol - Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira