Thank you. -John
________________________________ > Date: Fri, 23 Jul 2010 17:14:51 -0700 > Subject: Re: Reduce phase only on one node? > From: [email protected] > To: [email protected] > CC: [email protected]; [email protected] > > Currently, reduce functions run on the coordinating node. The coordinating > node runs 2 processes per reduce phase to achieve some parallelism. There are > two features requests open to improve the implementation: > > > Allow users to toggle the number of processes used during reduce > https://issues.basho.com/148 > > Distribute reduce phases across the cluster > > https://issues.basho.com/149 > > There is no target release set for either issue but you should be able to add > yourself to the CC list in Bugzilla to receive notifications. > > > Thanks, > Dan > > Daniel Reverri > Developer Advocate > Basho Technologies, Inc. > [email protected] > > > > On Fri, Jul 23, 2010 at 2:34 PM, John D. Rowell> wrote: > > +1 to this, my understanding is that you can use the same reduce funcion to > re-reduce a stream of data and still get the same results. Is this what > actually happens in Riak internally (i.e. the coordinating node only > re-reduces each node's reduce) or does the reduce function only run on the > coordinating node, which receives the raw map data? > > > > 2010/7/23 John Butler> > > > > > > Hello, > > I went through the Riak Fast Track and overall I'm excited about the > possibilities of Riak. However, I did see one thing that concerns me in > regard to MapReduce jobs. From the Riak docs and this page here: > http://seancribbs.com/tech/2010/02/06/why-riak-should-power-your-next-rails-app/ > it seems to suggest that while Map jobs are performed in parallel Reduce > jobs are not. > > > > Wouldn't be better if each node performed the reduce function on it's own set > of data before sending it to the main coordinator? Imagine if you are mapping > over a high number of objects (millions) and are trying to aggregate the data > grouped by some demographic (such as state). If all the mappers send the data > to coordinating node to be reduced then you are going to send millions of > items from each data node all to one coordinating node. If instead each node > called reduce first, then sent it to the coordinating node, then each node > would be at most sending over 50 objects (for 50 states). Not only would this > greatly reduce network traffic, it would be able to execute in parallel > greatly improving performance. Since all reduce jobs are communicative, > associative and idempotent I see no reason why this couldn't happen (of > course I'm not familiar with the Riak internals to strongly assert this). > > > > Maybe I misunderstood and this is already happening. If it's not happening, > why and is that on the roadmap to be added later? > > -John > > _________________________________________________________________ > > Hotmail is redefining busy with tools for the New Busy. Get more from your > inbox. > > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 > > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
