Bryan thanks very much for the info. I am going to investigate this over the weekend. What I am trying to achieve is the ability to sort a potentially very large dataset in the most efficient way possible.
Regards, --g On Nov 18, 2011, at 10:12 , Bryan Fink wrote: > On Thu, Nov 17, 2011 at 9:45 AM, Gordon Tillman <[email protected]> wrote: >> I'm really interested in being able to implement distributed >> reduce phases (specifically to do a partial sort) and then have that output >> handle by a final reduce phase that could perform an efficient merge sort >> and stream results back to the client. That would be really cool! > > Hi, Gordon. I just caught up on the 2i thread, and noticed your > comment at the end. As of 1.0 and RiakPipe-based MapReduce, you are > able to do something like this with the "pre-reduce" tuning > functionality: > > http://wiki.basho.com/MapReduce.html#Pre-Reduce > > With pre-reduce enabled, your reduce function is run in two stages. > The first stage processes the outputs of the previous map stage in > parallel, on the vnode where each output was produced. The second > stage is reduce as you know it, processing all of the results of that > pre-reduce stage in one place. > > For example, if you had these vnodes producing these outputs: > > A: 1,2,3 > B: 4,5,6 > C: 7,8,9 > > enabling pre-reduce would cause three parallel reduces: > > x = reduce(1,2,3) > y = reduce(4,5,6) > z = reduce(7,8,9) > > followed by a final all-together reduce of those results: > > reduce(x,y,z) > > So, it's the same function evaluated in both stages, but if you've > written it to play well with re-reduce, it should "just work". > > Hope that helps, > Bryan _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
