A few things that should be mentioned as well: 1) MapReduce amounts to N=1, or reading only one replica. If you have divergent replicas (siblings, e.g.) on different notes, they might not appear in your MapReduce results. 2) MapReduce does not invoke read-repair, so divergent replicas will not converge.
On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <[email protected]> wrote: > Jeremiah, > > You were essentially correct. A "targeted" MR does not have to search > for the data, and does not slow down with database size. It is a > bucket-sweeping MR that currently has that behavior. > > -Justin > > > > On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka > <[email protected]> wrote: > > I would have suspected that an MR job where you supply a Bucket, Key pair > would be just as fast as a Get request. Shows what I know. > > --- > > Jeremiah Peschka > > Founder, Brent Ozar PLF, LLC > > > > On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote: > > > >> MapReduce ( or a simply Map ) gets really slow when database has a > significant amount of data ( or distributed over several servers ). Get > instead is always faster as Riak doesn't have to search for the key ( you > tell Riak exactly where to GET the data in your url ) > >> > >> Rohman > >> > >> On Thu, 28 Jul 2011 23:43:06 +0400, [email protected] wrote: > >> > >>> Hi, > >>> > >>> (I looked at various places for the information, however I could not > >>> find anything that would answer the question. It's not completely > ruled > >>> out that not all places were checked though :)) > >>> > >>> I use PB erlang interface to access the database. Given a bucket name > >>> and a key, the value can easily be extracted using: > >>> > >>> {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key), > >>> Value = riakc_obj:get_value(Object) > >>> > >>> Alternatively, a mapred (actually, just map) request could be issued: > >>> > >>> {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [ > >>> {Bucket, Key} > >>> ], [ > >>> {map, {modfun, riak_kv, map_object_value}, none, true} > >>> ]) > >>> > >>> I would expect that the result is the same while in the second case, > the > >>> amount of data transferred to the client is smaller (which might be > good > >>> for certain situations). > >>> > >>> So the [open] question is: are there any reasons for using the first > >>> approach over the second? > >>> > >>> -- > >>> Misha > >>> > >> -- > >> > >> Antonio Rohman Fernandez > >> CEO, Founder & Lead Engineer > >> [email protected] Projects > >> MaruBatsu.es > >> PupCloud.com > >> Wedding Album > >> _______________________________________________ > >> riak-users mailing list > >> [email protected] > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Sean Cribbs <[email protected]> Developer Advocate Basho Technologies, Inc. http://www.basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
