Thanks for the explanation John, that's very useful. I wasn't aware each
"job" in MRv2 was considered it's own entity to the scheduler, that's
interesting... I think Maxime's point about some kind of hadoop compatible
framework would work well, it sounds to me like the
Framework<>Executor<>Task flow might fit well here, perhaps? Is there any
reason an executor couldn't register a framework in Mesos?


On 28 July 2014 01:44, Luyi Wang <wangluyi1...@gmail.com> wrote:

> I second john's opinion on the confusing part of different terminology of
> hadoop v2.  That's the reason I asked the question on if mesos support mr
> v2.  As maxime's concern, the decoupling part might be difficult.  After
> reading the mesos mrv1's implementation, I think possibly mrv2 migration
> can be done as if not touching anything related with resource manger(Yarn).
>     Need more time to investigating more on this complication.
>
>
>
> -Luyi.
>
>
>
>
>
> On Sun, Jul 27, 2014 at 10:40 AM, Maxime Brugidou <
> maxime.brugi...@gmail.com> wrote:
>
>> John, i believe that you are 100% correct. Theoretically we should run
>> MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very
>> complex and difficult to decouple from the resource manager/negotiator.
>>
>> It's still something that could be done I guess but maybe as completely
>> independent Hadoop-compatible map reduce framework for Mesos. You could
>> write this from scratch with a custom framework inspired by the MRv2 app
>> master implementation.
>>  On Jul 27, 2014 7:00 PM, "John Omernik" <j...@omernik.com> wrote:
>>
>>> So excuse my naivety in this space, but my ignorance has never really
>>> stopped me from asking questions:
>>>
>>> I see YARN (Yet another resource negotiator) as very similar to Mesos.
>>> I.e. something to manage resources on a cluster of machines. So when I hear
>>> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask
>>> myself, what are we actually getting out of this setup?
>>>
>>> So, going to the mapr/reduce question, I see Mapr Reduce V1 and
>>> MaprReduce V2 like this:  Map Reduce V2 is an application that runs on
>>> YARN. I.e. if you run a job, it creates an application master, that
>>> application master requests resources, and the job gets run.  It differs
>>> from Map Reduce V1 is there is no long running Job Tracker (other than the
>>> YARN Resource Manager, but that is managing resources for all applications,
>>> not just Map Reduce Applications).  Ok, so Mesos, why can't there be a
>>> Mesos Application that is similar to a Map Reduce V2 Application in YARN?
>>>  Why do we need to run YARN on Mesos? That doesn't really make sense.
>>>  Basically, for M/R V2 vs M/R V1, the only difference is to mimic M/R V1 we
>>> need task trackers and job trackers running as Mesos applications (which we
>>> have).  So in M/R v2, we just need the equivalent of an application master
>>> running on Yarn, requesting resources across the cluster.
>>>
>>> Fundamentally, YARN is confusing because I think they coupled running
>>> Map Reduce jobs with the resource manager and called it "Hadoop v2".  By
>>> coupling the two, people look at YARN as Map Reduce V2, but it's not
>>> really.  It's a way to running jobs on a cluster of machines (ala Mesos)
>>> with a "application" that is the equivalent of Map Reduce V1.   The names
>>> being given seem to be confusing to me, it makes people who have invested
>>> in Hadoop (Map Reduce V1) be very interested in YARN because it's called
>>> "Hadoop V2".  While Mesos is seen as the "Other"
>>>
>>>
>>> Just for my sake I summarized a TL;DR form so if someone wants to
>>> correct my understanding they can
>>>
>>> Mesos = Tool to manage resources
>>>
>>> YARN = Tool to manage resources it's also called Hadoopv2
>>>
>>> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run
>>> on Hadoop clusters, and Mesos.  It's also called Hadoopv1
>>>
>>> Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce
>>> V1 on a YARN Cluster. This + YARN has been called Hadoopv2.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <
>>> maxime.brugi...@gmail.com> wrote:
>>>
>>>> When I said that running yarn over mesos did not make sense I meant
>>>> that running a resource manager in a resource manager was very sub-optimal.
>>>> You will eventually do static allocation of resources for the Yarn
>>>> framework in Mesos or have complex logic to determine how much resource
>>>> should be given to yarn. You will also have the same burden of managing 2
>>>> different clusters instead of one, even if yarn is sort of hidden as mesos
>>>> framework.
>>>>
>>>> However yes I believe its easier to run yarn on mesos than to run mrv2
>>>> on top of mesos. The solution I was discussing was obviously "ideal" and I
>>>> looked at the MRAppMaster since and it discouraged me :)
>>>>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <rick.richard...@gmail.com>
>>>> wrote:
>>>>
>>>>> FWIW I also think the fastest approach here is is porting Yarn onto
>>>>> Mesos.
>>>>>
>>>>> In a perfect world, writing an implementation layer for the Yarn
>>>>> Interface on Mesos would certainly be the optimal approach, but looking at
>>>>> the MRv2 code, it is very very coupled to many Yarn modules.
>>>>>
>>>>> If someone wanted to take on the project of making a generic resource
>>>>> scheduler Interface for MRv2, that works be amazing :)
>>>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yujie....@gmail.com> wrote:
>>>>>
>>>>>> I am interested in investigating the idea of YARN on top of Mesos.
>>>>>> One of the benefits I can think of is that we can get rid of the static
>>>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos 
>>>>>> can
>>>>>> allocate those resources that are not used by YARN to other Mesos
>>>>>> frameworks like Aurora, Marathon, etc, to increase the resource 
>>>>>> utilization
>>>>>> of the entire data center. Also, we could avoid running each MRv2 job as 
>>>>>> a
>>>>>> framework which I think might cause some maintenance complexity (e.g. for
>>>>>> framework rate limiting, etc). Finally, YARN currently does not have a 
>>>>>> good
>>>>>> isolation support. It only supports cpu isolation right now (using
>>>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage 
>>>>>> the
>>>>>> existing Mesos containerizer strategy to provide better isolation between
>>>>>> tasks. Maxime, I am curious why do you think it does not make sense to 
>>>>>> run
>>>>>> YARN over Mesos? Since I am not super familar with YARN, I might be 
>>>>>> missing
>>>>>> something.
>>>>>>
>>>>>> I have been thinking of making ResourceManager in YARN a Mesos
>>>>>> framework and making NodeManager a Mesos executor. The NodeManager will
>>>>>> launch containers using primitives provided by Mesos so that we have a
>>>>>> consistent containerizer layer. I haven't fully figured out how this 
>>>>>> could
>>>>>> be done yet (e.g., nested containers, communication between NodeManager 
>>>>>> and
>>>>>> ResourceManager, etc.), but I would love to explore this direction. I 
>>>>>> would
>>>>>> like to hear about any feedback/suggestions you guys have about this
>>>>>> direction.
>>>>>>
>>>>>> Thanks,
>>>>>> - Jie
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>>>>> maxime.brugi...@gmail.com> wrote:
>>>>>>
>>>>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>>>>> yarn over mesos.
>>>>>>>
>>>>>>> However it would be interesting to find a way to run MRv2 jobs on
>>>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to 
>>>>>>> start
>>>>>>> though... MRv2 contains a yarn application master that needs to be
>>>>>>> rewritten as a mesos framework scheduler. This is probably doable. 
>>>>>>> However
>>>>>>> with MRv2 every map reduce job would be mapped as a new framework in 
>>>>>>> Mesos.
>>>>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>>>>> short lived frameworks.
>>>>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t...@duedil.com> wrote:
>>>>>>>
>>>>>>>> Hey Luyi,
>>>>>>>>
>>>>>>>> That's correct, the Hadoop framework currently only supports Hadoop
>>>>>>>> 2 MRv1. It also doesn't have great support for the HA jobtracker 
>>>>>>>> available
>>>>>>>> in newer versions of Hadoop, but I've been working on that the past few
>>>>>>>> weeks.
>>>>>>>>
>>>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very
>>>>>>>> interested to find out more. Am I correct in thinking MRv2 will only 
>>>>>>>> run on
>>>>>>>> top of YARN?
>>>>>>>>
>>>>>>>> I wonder if anyone else on the mailing list is running YARN on top
>>>>>>>> of Mesos...
>>>>>>>>
>>>>>>>> Tom.
>>>>>>>>
>>>>>>>> On Friday, 25 July 2014, Luyi Wang <wangluyi1...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It
>>>>>>>>> listed support for MapReduce V1
>>>>>>>>>
>>>>>>>>> How about the MR V2?
>>>>>>>>>
>>>>>>>>> Right now we are using cloudera to manage hadoop clusters where
>>>>>>>>> uses MRV2. We are planning to migrate all our services to mesos(still 
>>>>>>>>> in
>>>>>>>>> the initial investigating stage).  Good suggestions, advice and 
>>>>>>>>> experiences
>>>>>>>>> are welcomed.
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Luyi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>

Reply via email to