Hot! I want to discuss only about in-memory. I many heard about in-memory stuffs.
In fact, a data loading phase will be duplicated in every BSP job that requires in-memory processing. I don't like idea of implementing MapReduce on top of BSP but, I think we can consider about novel model. On Fri, Oct 12, 2012 at 4:03 AM, Leonidas Fegaras <[email protected]> wrote: > OK. Since this is already work in progress by Apurv and it's not a > high-priority > by the Hama team, I will not pursue it any further. > Leonidas > > > > On Oct 11, 2012, at 12:57 PM, Thomas Jungblut wrote: > >> Thanks you two for bringing up that discussion. >> >> Personally I have a very strong opinion on that, I think that building a >> MapReduce solution on top of BSP is useless. >> We had nearly ten years of development in this paradigm and it has grown >> and specialized itself very much. >> You can express MapReduce in BSP, that's totally fine. But that does not >> mean that every MapReduce algorithm is automagically efficient on BSP. >> There was (and still is) lots of development on the MapReduce engine and >> you can't cope with that on a more abstract paradigm. >> >> But, of course there are things where MapReduce is inefficient (iterative >> jobs, grouping, no explicit output caching). >> Yeah grouping, actually grouping is the main part of reducing, but it is >> solved inefficiently in Hadoop. >> You are forced to sort and that's (when I recall your paper correctly) >> also >> a drawback which lead you to implement mrql with BSP, because grouping by >> hash is for several cases much more faster and sometimes also more >> efficient. >> It's funny because the original paper [1] suggested that they just have >> sort as a nice feature to build an inverted index and to do binary search >> on the tokens. So it's more of a nice side-effect than the real design of >> the system. >> >> All in all, it does not mean that I am not interested in providing such >> functionality in Hama, but I'm sure that we should invest our time more >> carefully on features that bring value to the users (improving message >> scalability, improve performance, provide more examples and algorithms, do >> talks and presentations) than coding a half baked solution that is easily >> outperformed by the normal MapReduce. >> It was never my intention to "kill" Hadoop by developing with Hama, but to >> improve certain use cases that can not be done efficiently in MapReduce. >> So if it's just 1k lines and it is not a half-baked solution, feel free to >> contribute your stuff. >> >> [1] http://research.google.com/archive/mapreduce.html > > -- Best Regards, Edward J. Yoon @eddieyoon
