Thanks you two for bringing up that discussion. Personally I have a very strong opinion on that, I think that building a MapReduce solution on top of BSP is useless. We had nearly ten years of development in this paradigm and it has grown and specialized itself very much. You can express MapReduce in BSP, that's totally fine. But that does not mean that every MapReduce algorithm is automagically efficient on BSP. There was (and still is) lots of development on the MapReduce engine and you can't cope with that on a more abstract paradigm.
But, of course there are things where MapReduce is inefficient (iterative jobs, grouping, no explicit output caching). Yeah grouping, actually grouping is the main part of reducing, but it is solved inefficiently in Hadoop. You are forced to sort and that's (when I recall your paper correctly) also a drawback which lead you to implement mrql with BSP, because grouping by hash is for several cases much more faster and sometimes also more efficient. It's funny because the original paper [1] suggested that they just have sort as a nice feature to build an inverted index and to do binary search on the tokens. So it's more of a nice side-effect than the real design of the system. All in all, it does not mean that I am not interested in providing such functionality in Hama, but I'm sure that we should invest our time more carefully on features that bring value to the users (improving message scalability, improve performance, provide more examples and algorithms, do talks and presentations) than coding a half baked solution that is easily outperformed by the normal MapReduce. It was never my intention to "kill" Hadoop by developing with Hama, but to improve certain use cases that can not be done efficiently in MapReduce. So if it's just 1k lines and it is not a half-baked solution, feel free to contribute your stuff. [1] http://research.google.com/archive/mapreduce.html
