Tim, They are very interesting points. From a scalability point I don't think we have really run into those situations yet but they are coming. YARN currently has some very "simplistic" scheduling for the RM. All of the complexity comes out in the AM. There have been a number of JIRA to make requests more complex, to help support more "picky" applications like the paper says. These would make YARN shift a bit more from a two-level scheduler towards a Monolithic one, and thereby reducing some of the scalability of the system, but making it support more complex scheduling patterns. The largest YARN cluster I know of right now is about 4000 nodes. On it we are hitting some bottlenecks with the current scheduler. We have looked at some ways to speed it up with more conventional approaches like allowing the scheduler to me multithreaded. We expect to be able to easily support 4000-6000 nodes through YARN with a few optimizations. Going to tens of thousands of nodes would require some more significant changes.
As far as utilization is concerned the presented architecture does provide some very interesting points, but all of that can be addressed with a Monolithic scheduler so long as we don't have to scale very large. It also would probably require a complete redesign of YARN and the MR AM, which is not a small undertaking. There is also the question of trusted code. In a shared state system where all of the various schedulers are peers how would we enforce resource constraints? Each of the schedulers would have to enforce them themselves, and as such would have to be trusted code. This makes adding in new application types on the fly difficult. I suppose we could do a hybrid approach, where the RM is a single type of scheduler among many. It would provide the same API that currently exists for YARN applications, but MR applications could have one or more "JobTracker" like schedulers that share state with the RM, and what other "schedulers" there are out. That would be something fun to try out, but sadly I really don't have time to even get started thinking about a proof of concept on something like that. At least that is until we hit a significant business use case that would drive it over the architecture we already have. For example needing 10s of thousands of nodes in a cluster, or a huge shift in different types of jobs on to YARN so that we are doing a lot more than just MR on the same cluster. --Bobby On 4/19/13 9:47 AM, "Tim St Clair" <[email protected]> wrote: >I recently read Googles Omega paper, and wondering if any of the YARN >developers were planning to address some of the items considered as key >points. > >http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf > >Cheers, >Tim
