Thanks for the explanation John, that's very useful. I wasn't aware each "job" in MRv2 was considered it's own entity to the scheduler, that's interesting... I think Maxime's point about some kind of hadoop compatible framework would work well, it sounds to me like the Framework<>Executor<>Task flow might fit well here, perhaps? Is there any reason an executor couldn't register a framework in Mesos?
On 28 July 2014 01:44, Luyi Wang <wangluyi1...@gmail.com> wrote: > I second john's opinion on the confusing part of different terminology of > hadoop v2. That's the reason I asked the question on if mesos support mr > v2. As maxime's concern, the decoupling part might be difficult. After > reading the mesos mrv1's implementation, I think possibly mrv2 migration > can be done as if not touching anything related with resource manger(Yarn). > Need more time to investigating more on this complication. > > > > -Luyi. > > > > > > On Sun, Jul 27, 2014 at 10:40 AM, Maxime Brugidou < > maxime.brugi...@gmail.com> wrote: > >> John, i believe that you are 100% correct. Theoretically we should run >> MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very >> complex and difficult to decouple from the resource manager/negotiator. >> >> It's still something that could be done I guess but maybe as completely >> independent Hadoop-compatible map reduce framework for Mesos. You could >> write this from scratch with a custom framework inspired by the MRv2 app >> master implementation. >> On Jul 27, 2014 7:00 PM, "John Omernik" <j...@omernik.com> wrote: >> >>> So excuse my naivety in this space, but my ignorance has never really >>> stopped me from asking questions: >>> >>> I see YARN (Yet another resource negotiator) as very similar to Mesos. >>> I.e. something to manage resources on a cluster of machines. So when I hear >>> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask >>> myself, what are we actually getting out of this setup? >>> >>> So, going to the mapr/reduce question, I see Mapr Reduce V1 and >>> MaprReduce V2 like this: Map Reduce V2 is an application that runs on >>> YARN. I.e. if you run a job, it creates an application master, that >>> application master requests resources, and the job gets run. It differs >>> from Map Reduce V1 is there is no long running Job Tracker (other than the >>> YARN Resource Manager, but that is managing resources for all applications, >>> not just Map Reduce Applications). Ok, so Mesos, why can't there be a >>> Mesos Application that is similar to a Map Reduce V2 Application in YARN? >>> Why do we need to run YARN on Mesos? That doesn't really make sense. >>> Basically, for M/R V2 vs M/R V1, the only difference is to mimic M/R V1 we >>> need task trackers and job trackers running as Mesos applications (which we >>> have). So in M/R v2, we just need the equivalent of an application master >>> running on Yarn, requesting resources across the cluster. >>> >>> Fundamentally, YARN is confusing because I think they coupled running >>> Map Reduce jobs with the resource manager and called it "Hadoop v2". By >>> coupling the two, people look at YARN as Map Reduce V2, but it's not >>> really. It's a way to running jobs on a cluster of machines (ala Mesos) >>> with a "application" that is the equivalent of Map Reduce V1. The names >>> being given seem to be confusing to me, it makes people who have invested >>> in Hadoop (Map Reduce V1) be very interested in YARN because it's called >>> "Hadoop V2". While Mesos is seen as the "Other" >>> >>> >>> Just for my sake I summarized a TL;DR form so if someone wants to >>> correct my understanding they can >>> >>> Mesos = Tool to manage resources >>> >>> YARN = Tool to manage resources it's also called Hadoopv2 >>> >>> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run >>> on Hadoop clusters, and Mesos. It's also called Hadoopv1 >>> >>> Map Reduce V2 = Application that can run on YARN that mimics Map Reduce >>> V1 on a YARN Cluster. This + YARN has been called Hadoopv2. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou < >>> maxime.brugi...@gmail.com> wrote: >>> >>>> When I said that running yarn over mesos did not make sense I meant >>>> that running a resource manager in a resource manager was very sub-optimal. >>>> You will eventually do static allocation of resources for the Yarn >>>> framework in Mesos or have complex logic to determine how much resource >>>> should be given to yarn. You will also have the same burden of managing 2 >>>> different clusters instead of one, even if yarn is sort of hidden as mesos >>>> framework. >>>> >>>> However yes I believe its easier to run yarn on mesos than to run mrv2 >>>> on top of mesos. The solution I was discussing was obviously "ideal" and I >>>> looked at the MRAppMaster since and it discouraged me :) >>>> On Jul 27, 2014 12:41 AM, "Rick Richardson" <rick.richard...@gmail.com> >>>> wrote: >>>> >>>>> FWIW I also think the fastest approach here is is porting Yarn onto >>>>> Mesos. >>>>> >>>>> In a perfect world, writing an implementation layer for the Yarn >>>>> Interface on Mesos would certainly be the optimal approach, but looking at >>>>> the MRv2 code, it is very very coupled to many Yarn modules. >>>>> >>>>> If someone wanted to take on the project of making a generic resource >>>>> scheduler Interface for MRv2, that works be amazing :) >>>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yujie....@gmail.com> wrote: >>>>> >>>>>> I am interested in investigating the idea of YARN on top of Mesos. >>>>>> One of the benefits I can think of is that we can get rid of the static >>>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos >>>>>> can >>>>>> allocate those resources that are not used by YARN to other Mesos >>>>>> frameworks like Aurora, Marathon, etc, to increase the resource >>>>>> utilization >>>>>> of the entire data center. Also, we could avoid running each MRv2 job as >>>>>> a >>>>>> framework which I think might cause some maintenance complexity (e.g. for >>>>>> framework rate limiting, etc). Finally, YARN currently does not have a >>>>>> good >>>>>> isolation support. It only supports cpu isolation right now (using >>>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage >>>>>> the >>>>>> existing Mesos containerizer strategy to provide better isolation between >>>>>> tasks. Maxime, I am curious why do you think it does not make sense to >>>>>> run >>>>>> YARN over Mesos? Since I am not super familar with YARN, I might be >>>>>> missing >>>>>> something. >>>>>> >>>>>> I have been thinking of making ResourceManager in YARN a Mesos >>>>>> framework and making NodeManager a Mesos executor. The NodeManager will >>>>>> launch containers using primitives provided by Mesos so that we have a >>>>>> consistent containerizer layer. I haven't fully figured out how this >>>>>> could >>>>>> be done yet (e.g., nested containers, communication between NodeManager >>>>>> and >>>>>> ResourceManager, etc.), but I would love to explore this direction. I >>>>>> would >>>>>> like to hear about any feedback/suggestions you guys have about this >>>>>> direction. >>>>>> >>>>>> Thanks, >>>>>> - Jie >>>>>> >>>>>> >>>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou < >>>>>> maxime.brugi...@gmail.com> wrote: >>>>>> >>>>>>> We run both mesos and yarn in prod and it does not make sense to run >>>>>>> yarn over mesos. >>>>>>> >>>>>>> However it would be interesting to find a way to run MRv2 jobs on >>>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to >>>>>>> start >>>>>>> though... MRv2 contains a yarn application master that needs to be >>>>>>> rewritten as a mesos framework scheduler. This is probably doable. >>>>>>> However >>>>>>> with MRv2 every map reduce job would be mapped as a new framework in >>>>>>> Mesos. >>>>>>> Not sure how many frameworks mesos can run and scale up to. Especially >>>>>>> short lived frameworks. >>>>>>> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t...@duedil.com> wrote: >>>>>>> >>>>>>>> Hey Luyi, >>>>>>>> >>>>>>>> That's correct, the Hadoop framework currently only supports Hadoop >>>>>>>> 2 MRv1. It also doesn't have great support for the HA jobtracker >>>>>>>> available >>>>>>>> in newer versions of Hadoop, but I've been working on that the past few >>>>>>>> weeks. >>>>>>>> >>>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very >>>>>>>> interested to find out more. Am I correct in thinking MRv2 will only >>>>>>>> run on >>>>>>>> top of YARN? >>>>>>>> >>>>>>>> I wonder if anyone else on the mailing list is running YARN on top >>>>>>>> of Mesos... >>>>>>>> >>>>>>>> Tom. >>>>>>>> >>>>>>>> On Friday, 25 July 2014, Luyi Wang <wangluyi1...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It >>>>>>>>> listed support for MapReduce V1 >>>>>>>>> >>>>>>>>> How about the MR V2? >>>>>>>>> >>>>>>>>> Right now we are using cloudera to manage hadoop clusters where >>>>>>>>> uses MRV2. We are planning to migrate all our services to mesos(still >>>>>>>>> in >>>>>>>>> the initial investigating stage). Good suggestions, advice and >>>>>>>>> experiences >>>>>>>>> are welcomed. >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> >>>>>>>>> >>>>>>>>> -Luyi. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>> >