On Thu, Nov 17, 2011 at 9:45 AM, Gordon Tillman <[email protected]> wrote: > I'm really interested in being able to implement distributed > reduce phases (specifically to do a partial sort) and then have that output > handle by a final reduce phase that could perform an efficient merge sort > and stream results back to the client. That would be really cool!
Hi, Gordon. I just caught up on the 2i thread, and noticed your comment at the end. As of 1.0 and RiakPipe-based MapReduce, you are able to do something like this with the "pre-reduce" tuning functionality: http://wiki.basho.com/MapReduce.html#Pre-Reduce With pre-reduce enabled, your reduce function is run in two stages. The first stage processes the outputs of the previous map stage in parallel, on the vnode where each output was produced. The second stage is reduce as you know it, processing all of the results of that pre-reduce stage in one place. For example, if you had these vnodes producing these outputs: A: 1,2,3 B: 4,5,6 C: 7,8,9 enabling pre-reduce would cause three parallel reduces: x = reduce(1,2,3) y = reduce(4,5,6) z = reduce(7,8,9) followed by a final all-together reduce of those results: reduce(x,y,z) So, it's the same function evaluated in both stages, but if you've written it to play well with re-reduce, it should "just work". Hope that helps, Bryan _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
