On Thursday, May 22, 2014 10:17:42 PM Sylvain Gault wrote: > Hello, > > I'm new to this mailing list, so forgive me if I don't do everything > right. > > I didn't know whether I should ask on this mailing list or on > mapreduce-dev or on yarn-dev. So I'll just start there. ^^ > > Short story: I'm looking for some paper(s) studying the scalability > of Hadoop MapReduce. And I found this extremely difficult to find on > google scholar. Do you have something worth citing in a PhD thesis? > > Long story: I'm writing my PhD thesis about MapReduce and when I talk > about Hadoop I'd like to say "how much it scales". I heared two years > ago some people say that "Yahoo! got it scale up to 4000 nodes and plan > to try on 6000 nodes" or something like that. I also heared that > YARN/MRv2 should scale better, but I don't plan to talk much about > YARN/MRv2. So I'd take anything I could cite as a reference in my > manuscript. :) Hello, Sylvain. One of the reason why the Hadoop dev team began to work in YARN is precisely looking for a more scalable and resourceful Hadoop system, so if you actually want to talk about Hadoop scalability, you should talk about YARN and MR2.
The paper is here: https://developer.yahoo.com/blogs/hadoop/next-generation-apache-hadoop-mapreduce-3061.html and the related JIRA issues here: https://issues.apache.org/jira/browse/MAPREDUCE-278 https://issues.apache.org/jira/browse/MAPREDUCE-279 You should talk with Arun C Murthy, Chief Architect at Hortonworks about all these topics. He could help you much more than I could. -- Marcos Ortiz[1] (@marcosluis2186[2]) http://about.me/marcosortiz[3] > > > Best regards, > Sylvain Gault -------- [1] http://www.linkedin.com/in/mlortiz [2] http://twitter.com/marcosluis2186 [3] http://about.me/marcosortiz VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
