I only talk about Hadoop because it is the de-facto implementation of MapReduce. But for the remaining of my thesis, I took a more general approach and implemented my algorithms in a custom MapReduce implentation.
I learned yesterday about the existence of YARN. :D And I definitely can't not talk about it since it's the future and 1.x will be abandoned. But I mostly know about MRv1, so I decided to only briefly talk about MRv2 when the difference are relevant. i.e. for scalability and global architecture I guess. Sylvain On Thu, May 22, 2014 at 05:39:43PM -0300, Marco Shaw wrote: > I would consider the timeframe that you are looking for to determine if you > should focus on Hadoop 2.x (with YARN) or older. 2.x should scale much better > than 1.x. > > Keep in mind that 2.x was only "officially" released late last year. > > Marco > > > On May 22, 2014, at 5:17 PM, Sylvain Gault <[email protected]> wrote: > > > > Hello, > > > > I'm new to this mailing list, so forgive me if I don't do everything > > right. > > > > I didn't know whether I should ask on this mailing list or on > > mapreduce-dev or on yarn-dev. So I'll just start there. ^^ > > > > Short story: I'm looking for some paper(s) studying the scalability > > of Hadoop MapReduce. And I found this extremely difficult to find on > > google scholar. Do you have something worth citing in a PhD thesis? > > > > Long story: I'm writing my PhD thesis about MapReduce and when I talk > > about Hadoop I'd like to say "how much it scales". I heared two years > > ago some people say that "Yahoo! got it scale up to 4000 nodes and plan > > to try on 6000 nodes" or something like that. I also heared that > > YARN/MRv2 should scale better, but I don't plan to talk much about > > YARN/MRv2. So I'd take anything I could cite as a reference in my > > manuscript. :) > > > > > > Best regards, > > Sylvain Gault
