Porting YARN to run atop Mesos is quite reasonable. Some folks at eBay have started some work on this (https://github.com/mesos/myriad). If you're interested, you should check it out, and contribute to the project.
On Tue, Oct 28, 2014 at 5:21 AM, Yaneeve Shekel <[email protected]> wrote: > To quote John below, > > “So excuse my naivety… but…”, I am also confused as to the version/naming > convention going on at the hadoop project. > > I would like to run hadoop over mesos as opposed to over yarn. I would also > like to use the *“new”* mapreduce packages. > > https://github.com/mesos/hadoop mentions that “The pom.xml included is > configured and tested against CDH5 and MRv1. Hadoop on Mesos does not > currently support YARN (and MRv2).” Does this all mean that the mapreduce > package is not available. I think it does not, I think I should be able to > use the “new” api over any scheduling system just as I could over plain > vanilla cdh (where I could configure and use any combination of the the cross > product -> (mapred, mapreduce) X (MRv1, YARN)). Could anyone verify this? > > Second, has any work been done as pertaining the original thread with regards > to what John has suggested below? > > > > Thanks a lot, > > Yaneeve > > > > On Jul 27, 2014 7:00 PM, "John Omernik" <[email protected]> wrote: > > > > > So excuse my naivety in this space, but my ignorance has never really > > > stopped me from asking questions: > > > > > > I see YARN (Yet another resource negotiator) as very similar to Mesos. > > > I.e. something to manage resources on a cluster of machines. So when I hear > > > talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask > > > myself, what are we actually getting out of this setup? > > > > > > So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce > > > V2 like this: Map Reduce V2 is an application that runs on YARN. I.e. if > > > you run a job, it creates an application master, that application master > > > requests resources, and the job gets run. It differs from Map Reduce V1 is > > > there is no long running Job Tracker (other than the YARN Resource Manager, > > > but that is managing resources for all applications, not just Map Reduce > > > Applications). Ok, so Mesos, why can't there be a Mesos Application that > > > is similar to a Map Reduce V2 Application in YARN? Why do we need to run > > > YARN on Mesos? That doesn't really make sense. Basically, for M/R V2 vs > > > M/R V1, the only difference is to mimic M/R V1 we need task trackers and > > > job trackers running as Mesos applications (which we have). So in M/R v2, > > > we just need the equivalent of an application master running on Yarn, > > > requesting resources across the cluster. > > > > > > Fundamentally, YARN is confusing because I think they coupled running Map > > > Reduce jobs with the resource manager and called it "Hadoop v2". By > > > coupling the two, people look at YARN as Map Reduce V2, but it's not > > > really. It's a way to running jobs on a cluster of machines (ala Mesos) > > > with a "application" that is the equivalent of Map Reduce V1. The names > > > being given seem to be confusing to me, it makes people who have invested > > > in Hadoop (Map Reduce V1) be very interested in YARN because it's called > > > "Hadoop V2". While Mesos is seen as the "Other" > > > > > > > > > Just for my sake I summarized a TL;DR form so if someone wants to correct > > > my understanding they can > > > > > > Mesos = Tool to manage resources > > > > > > YARN = Tool to manage resources it's also called Hadoopv2 > > > > > > Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run > > > on Hadoop clusters, and Mesos. It's also called Hadoopv1 > > > > > > Map Reduce V2 = Application that can run on YARN that mimics Map Reduce > > > V1 on a YARN Cluster. This + YARN has been called Hadoopv2. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou < > > > [email protected]> wrote: > > > > > >> When I said that running yarn over mesos did not make sense I meant that > > >> running a resource manager in a resource manager was very sub-optimal. You > > >> will eventually do static allocation of resources for the Yarn framework in > > >> Mesos or have complex logic to determine how much resource should be given > > >> to yarn. You will also have the same burden of managing 2 different > > >> clusters instead of one, even if yarn is sort of hidden as mesos framework. > > >> > > >> However yes I believe its easier to run yarn on mesos than to run mrv2 on > > >> top of mesos. The solution I was discussing was obviously "ideal" and I > > >> looked at the MRAppMaster since and it discouraged me :) > > >> On Jul 27, 2014 12:41 AM, "Rick Richardson" <[email protected]> > > >> wrote: > > >> > > >>> FWIW I also think the fastest approach here is is porting Yarn onto > > >>> Mesos. > > >>> > > >>> In a perfect world, writing an implementation layer for the Yarn > > >>> Interface on Mesos would certainly be the optimal approach, but looking at > > >>> the MRv2 code, it is very very coupled to many Yarn modules. > > >>> > > >>> If someone wanted to take on the project of making a generic resource > > >>> scheduler Interface for MRv2, that works be amazing :) > > >>> On Jul 26, 2014 6:19 PM, "Jie Yu" <[email protected]> wrote: > > >>> > > >>>> I am interested in investigating the idea of YARN on top of Mesos. One > > >>>> of the benefits I can think of is that we can get rid of the static > > >>>> resource allocation between YARN and Mesos clusters. In that way, Mesos > >>>> can > > >>>> allocate those resources that are not used by YARN to other Mesos > > >>>> frameworks like Aurora, Marathon, etc, to increase the resource > >>>> utilization > > >>>> of the entire data center. Also, we could avoid running each MRv2 job as > >>>> a > > >>>> framework which I think might cause some maintenance complexity (e.g. for > > >>>> framework rate limiting, etc). Finally, YARN currently does not have a > >>>> good > > >>>> isolation support. It only supports cpu isolation right now (using > > >>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage > >>>> the > > >>>> existing Mesos containerizer strategy to provide better isolation between > > >>>> tasks. Maxime, I am curious why do you think it does not make sense to > >>>> run > > >>>> YARN over Mesos? Since I am not super familar with YARN, I might be > >>>> missing > > >>>> something. > > >>>> > > >>>> I have been thinking of making ResourceManager in YARN a Mesos > > >>>> framework and making NodeManager a Mesos executor. The NodeManager will > > >>>> launch containers using primitives provided by Mesos so that we have a > > >>>> consistent containerizer layer. I haven't fully figured out how this > >>>> could > > >>>> be done yet (e.g., nested containers, communication between NodeManager > >>>> and > > >>>> ResourceManager, etc.), but I would love to explore this direction. I > >>>> would > > >>>> like to hear about any feedback/suggestions you guys have about this > > >>>> direction. > > >>>> > > >>>> Thanks, > > >>>> - Jie > > >>>> > > >>>> > > >>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou < > > >>>> [email protected]> wrote: > > >>>> > > >>>>> We run both mesos and yarn in prod and it does not make sense to run > > >>>>> yarn over mesos. > > >>>>> > > >>>>> However it would be interesting to find a way to run MRv2 jobs on > > >>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to > > >>>>> start > > >>>>> though... MRv2 contains a yarn application master that needs to be > > >>>>> rewritten as a mesos framework scheduler. This is probably doable. > >>>>> However > > >>>>> with MRv2 every map reduce job would be mapped as a new framework in > > >>>>> Mesos. > > >>>>> Not sure how many frameworks mesos can run and scale up to. Especially > > >>>>> short lived frameworks. > > >>>>> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <[email protected]> wrote: > > >>>>> > > >>>>>> Hey Luyi, > > >>>>>> > > >>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2 > > >>>>>> MRv1. It also doesn't have great support for the HA jobtracker > >>>>>> available > > >>>>>> in > > >>>>>> newer versions of Hadoop, but I've been working on that the past few > > >>>>>> weeks. > > >>>>>> > > >>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested > > >>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of > > >>>>>> YARN? > > >>>>>> > > >>>>>> I wonder if anyone else on the mailing list is running YARN on top of > > >>>>>> Mesos... > > >>>>>> > > >>>>>> Tom. > > >>>>>> > > >>>>>> On Friday, 25 July 2014, Luyi Wang <[email protected]> wrote: > > >>>>>> > > >>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It > > >>>>>>> listed support for MapReduce V1 > > >>>>>>> > > >>>>>>> How about the MR V2? > > >>>>>>> > > >>>>>>> Right now we are using cloudera to manage hadoop clusters where uses > > >>>>>>> MRV2. We are planning to migrate all our services to mesos(still in > >>>>>>> the > > >>>>>>> initial investigating stage). Good suggestions, advice and > >>>>>>> experiences > > >>>>>>> are > > >>>>>>> welcomed. > > >>>>>>> > > >>>>>>> Thanks a lot! > > >>>>>>> > > >>>>>>> > > >>>>>>> -Luyi. > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>> > > > > > >

