Re: MapReduce scalability study

Sylvain Gault Thu, 22 May 2014 14:15:24 -0700

I only talk about Hadoop because it is the de-facto implementation of
MapReduce. But for the remaining of my thesis, I took a more general
approach and implemented my algorithms in a custom MapReduce
implentation.


I learned yesterday about the existence of YARN. :D And I definitely
can't not talk about it since it's the future and 1.x will be abandoned.
But I mostly know about MRv1, so I decided to only briefly talk about
MRv2 when the difference are relevant. i.e. for scalability and global
architecture I guess.

Sylvain

On Thu, May 22, 2014 at 05:39:43PM -0300, Marco Shaw wrote:
> I would consider the timeframe that you are looking for to determine if you 
> should focus on Hadoop 2.x (with YARN) or older. 2.x should scale much better 
> than 1.x. 
> 
> Keep in mind that 2.x was only "officially" released late last year. 
> 
> Marco
> 
> > On May 22, 2014, at 5:17 PM, Sylvain Gault <[email protected]> wrote:
> > 
> > Hello,
> > 
> > I'm new to this mailing list, so forgive me if I don't do everything
> > right.
> > 
> > I didn't know whether I should ask on this mailing list or on
> > mapreduce-dev or on yarn-dev. So I'll just start there. ^^
> > 
> > Short story: I'm looking for some paper(s) studying the scalability
> > of Hadoop MapReduce. And I found this extremely difficult to find on
> > google scholar. Do you have something worth citing in a PhD thesis?
> > 
> > Long story: I'm writing my PhD thesis about MapReduce and when I talk
> > about Hadoop I'd like to say "how much it scales". I heared two years
> > ago some people say that "Yahoo! got it scale up to 4000 nodes and plan
> > to try on 6000 nodes" or something like that. I also heared that
> > YARN/MRv2 should scale better, but I don't plan to talk much about
> > YARN/MRv2. So I'd take anything I could cite as a reference in my
> > manuscript. :)
> > 
> > 
> > Best regards,
> > Sylvain Gault

Re: MapReduce scalability study

Reply via email to