Well, what do you know: 2,565,187. Apologies for the false alarm.
On Fri, Jun 21, 2013 at 7:29 AM, Joe Caswell <[email protected]> wrote: > Elias, > > Just for the sake of argument, if you use > index(bucket_name,"$key",'0'..'z') > do you get the same result? > > Joe > > From: Elias Levy <[email protected]> > Date: Friday, June 21, 2013 1:57 AM > To: "[email protected]" <[email protected]> > Subject: Mismatched object counts > > I've just inserted some data into a six node Riak 1.3.1 EE cluster. The > keys are all SHA256s. The bucket previously had somewhere in > the vicinity of 1 million objects. > > A MR job using the $key 2i with a range of '0' to 'Z', which should cover > all possible SHA256s, and using > both riak_kv_mapreduce:: reduce_count_inputs and streaming the keys using > reduce_identity and counting client side, both returned a count around > 750K, but that is now somewhat suspect. > > The objects I inserted overlap somewhat with the previously existing > objects, but not completely. Overlapping objects were merged. I > inserted 2,521,799 objects. > > When I execute the MR count job against it reports 1,604,783 objects, > using both techniques (reduce_count_inputs and reduce_identity plus client > side counting). > > Given the discrepancy I queried the bucket for the 2,521,799 objects I > thought I inserted and I verified the system thinks they are there. > > What gives? Why is MR returning incorrect result? Does the 2i query > somehow miss some possible keys? > > This is what the job looks like in Ruby: > > Riak::MapReduce.new(client). > index(bucket_name, "$key", '0'..'Z'). > reduce(['riak_kv_mapreduce', 'reduce_count_inputs'], :keep => true, :arg > => { "reduce_phase_batch_size" => 1000, "do_prereduce" => true } ). > timeout(86400000). > run > > As a side question, does do_prereduce here have any effect? I am thinking > it does not. The docs indicate do_prereduce is a map phase argument, not a > reduce phase one. That begs the question of how to enable prereduce for a > MR job without a map phase, other than setting mapred_always_prereduce = > true in the config file. > > Elias > _______________________________________________ riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
