I second john's opinion on the confusing part of different terminology of hadoop v2. That's the reason I asked the question on if mesos support mr v2. As maxime's concern, the decoupling part might be difficult. After reading the mesos mrv1's implementation, I think possibly mrv2 migration can be done as if not touching anything related with resource manger(Yarn). Need more time to investigating more on this complication.
-Luyi. On Sun, Jul 27, 2014 at 10:40 AM, Maxime Brugidou <maxime.brugi...@gmail.com > wrote: > John, i believe that you are 100% correct. Theoretically we should run > MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very > complex and difficult to decouple from the resource manager/negotiator. > > It's still something that could be done I guess but maybe as completely > independent Hadoop-compatible map reduce framework for Mesos. You could > write this from scratch with a custom framework inspired by the MRv2 app > master implementation. > On Jul 27, 2014 7:00 PM, "John Omernik" <j...@omernik.com> wrote: > >> So excuse my naivety in this space, but my ignorance has never really >> stopped me from asking questions: >> >> I see YARN (Yet another resource negotiator) as very similar to Mesos. >> I.e. something to manage resources on a cluster of machines. So when I hear >> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask >> myself, what are we actually getting out of this setup? >> >> So, going to the mapr/reduce question, I see Mapr Reduce V1 and >> MaprReduce V2 like this: Map Reduce V2 is an application that runs on >> YARN. I.e. if you run a job, it creates an application master, that >> application master requests resources, and the job gets run. It differs >> from Map Reduce V1 is there is no long running Job Tracker (other than the >> YARN Resource Manager, but that is managing resources for all applications, >> not just Map Reduce Applications). Ok, so Mesos, why can't there be a >> Mesos Application that is similar to a Map Reduce V2 Application in YARN? >> Why do we need to run YARN on Mesos? That doesn't really make sense. >> Basically, for M/R V2 vs M/R V1, the only difference is to mimic M/R V1 we >> need task trackers and job trackers running as Mesos applications (which we >> have). So in M/R v2, we just need the equivalent of an application master >> running on Yarn, requesting resources across the cluster. >> >> Fundamentally, YARN is confusing because I think they coupled running Map >> Reduce jobs with the resource manager and called it "Hadoop v2". By >> coupling the two, people look at YARN as Map Reduce V2, but it's not >> really. It's a way to running jobs on a cluster of machines (ala Mesos) >> with a "application" that is the equivalent of Map Reduce V1. The names >> being given seem to be confusing to me, it makes people who have invested >> in Hadoop (Map Reduce V1) be very interested in YARN because it's called >> "Hadoop V2". While Mesos is seen as the "Other" >> >> >> Just for my sake I summarized a TL;DR form so if someone wants to correct >> my understanding they can >> >> Mesos = Tool to manage resources >> >> YARN = Tool to manage resources it's also called Hadoopv2 >> >> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run >> on Hadoop clusters, and Mesos. It's also called Hadoopv1 >> >> Map Reduce V2 = Application that can run on YARN that mimics Map Reduce >> V1 on a YARN Cluster. This + YARN has been called Hadoopv2. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou < >> maxime.brugi...@gmail.com> wrote: >> >>> When I said that running yarn over mesos did not make sense I meant that >>> running a resource manager in a resource manager was very sub-optimal. You >>> will eventually do static allocation of resources for the Yarn framework in >>> Mesos or have complex logic to determine how much resource should be given >>> to yarn. You will also have the same burden of managing 2 different >>> clusters instead of one, even if yarn is sort of hidden as mesos framework. >>> >>> However yes I believe its easier to run yarn on mesos than to run mrv2 >>> on top of mesos. The solution I was discussing was obviously "ideal" and I >>> looked at the MRAppMaster since and it discouraged me :) >>> On Jul 27, 2014 12:41 AM, "Rick Richardson" <rick.richard...@gmail.com> >>> wrote: >>> >>>> FWIW I also think the fastest approach here is is porting Yarn onto >>>> Mesos. >>>> >>>> In a perfect world, writing an implementation layer for the Yarn >>>> Interface on Mesos would certainly be the optimal approach, but looking at >>>> the MRv2 code, it is very very coupled to many Yarn modules. >>>> >>>> If someone wanted to take on the project of making a generic resource >>>> scheduler Interface for MRv2, that works be amazing :) >>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yujie....@gmail.com> wrote: >>>> >>>>> I am interested in investigating the idea of YARN on top of Mesos. One >>>>> of the benefits I can think of is that we can get rid of the static >>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos >>>>> can >>>>> allocate those resources that are not used by YARN to other Mesos >>>>> frameworks like Aurora, Marathon, etc, to increase the resource >>>>> utilization >>>>> of the entire data center. Also, we could avoid running each MRv2 job as a >>>>> framework which I think might cause some maintenance complexity (e.g. for >>>>> framework rate limiting, etc). Finally, YARN currently does not have a >>>>> good >>>>> isolation support. It only supports cpu isolation right now (using >>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage >>>>> the >>>>> existing Mesos containerizer strategy to provide better isolation between >>>>> tasks. Maxime, I am curious why do you think it does not make sense to run >>>>> YARN over Mesos? Since I am not super familar with YARN, I might be >>>>> missing >>>>> something. >>>>> >>>>> I have been thinking of making ResourceManager in YARN a Mesos >>>>> framework and making NodeManager a Mesos executor. The NodeManager will >>>>> launch containers using primitives provided by Mesos so that we have a >>>>> consistent containerizer layer. I haven't fully figured out how this could >>>>> be done yet (e.g., nested containers, communication between NodeManager >>>>> and >>>>> ResourceManager, etc.), but I would love to explore this direction. I >>>>> would >>>>> like to hear about any feedback/suggestions you guys have about this >>>>> direction. >>>>> >>>>> Thanks, >>>>> - Jie >>>>> >>>>> >>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou < >>>>> maxime.brugi...@gmail.com> wrote: >>>>> >>>>>> We run both mesos and yarn in prod and it does not make sense to run >>>>>> yarn over mesos. >>>>>> >>>>>> However it would be interesting to find a way to run MRv2 jobs on >>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to >>>>>> start >>>>>> though... MRv2 contains a yarn application master that needs to be >>>>>> rewritten as a mesos framework scheduler. This is probably doable. >>>>>> However >>>>>> with MRv2 every map reduce job would be mapped as a new framework in >>>>>> Mesos. >>>>>> Not sure how many frameworks mesos can run and scale up to. Especially >>>>>> short lived frameworks. >>>>>> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t...@duedil.com> wrote: >>>>>> >>>>>>> Hey Luyi, >>>>>>> >>>>>>> That's correct, the Hadoop framework currently only supports Hadoop >>>>>>> 2 MRv1. It also doesn't have great support for the HA jobtracker >>>>>>> available >>>>>>> in newer versions of Hadoop, but I've been working on that the past few >>>>>>> weeks. >>>>>>> >>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested >>>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of >>>>>>> YARN? >>>>>>> >>>>>>> I wonder if anyone else on the mailing list is running YARN on top >>>>>>> of Mesos... >>>>>>> >>>>>>> Tom. >>>>>>> >>>>>>> On Friday, 25 July 2014, Luyi Wang <wangluyi1...@gmail.com> wrote: >>>>>>> >>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It >>>>>>>> listed support for MapReduce V1 >>>>>>>> >>>>>>>> How about the MR V2? >>>>>>>> >>>>>>>> Right now we are using cloudera to manage hadoop clusters where >>>>>>>> uses MRV2. We are planning to migrate all our services to mesos(still >>>>>>>> in >>>>>>>> the initial investigating stage). Good suggestions, advice and >>>>>>>> experiences >>>>>>>> are welcomed. >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> >>>>>>>> >>>>>>>> -Luyi. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>