Well, what do you know: 2,565,187.  Apologies for the false alarm.

On Fri, Jun 21, 2013 at 7:29 AM, Joe Caswell <[email protected]> wrote:

> Elias,
>
>   Just for the sake of argument, if you use
>     index(bucket_name,"$key",'0'..'z')
> do you get the same result?
>
> Joe
>
> From: Elias Levy <[email protected]>
> Date: Friday, June 21, 2013 1:57 AM
> To: "[email protected]" <[email protected]>
> Subject: Mismatched object counts
>
> I've just inserted some data into a six node Riak 1.3.1 EE cluster.  The
> keys are all SHA256s.  The bucket previously had somewhere in
> the vicinity of 1 million objects.
>
> A MR job using the $key 2i with a range of '0' to 'Z', which should cover
> all possible SHA256s, and using
> both riak_kv_mapreduce:: reduce_count_inputs and streaming the keys using
> reduce_identity and counting client side, both returned a count around
> 750K, but that is now somewhat suspect.
>
> The objects I inserted overlap somewhat with the previously existing
> objects, but not completely.  Overlapping objects were merged.  I
> inserted 2,521,799 objects.
>
> When I execute the MR count job against it reports 1,604,783 objects,
> using both techniques (reduce_count_inputs and reduce_identity plus client
> side counting).
>
> Given the discrepancy I queried the bucket for the 2,521,799 objects I
> thought I inserted and I verified the system thinks they are there.
>
> What gives?  Why is MR returning incorrect result?  Does the 2i query
> somehow miss some possible keys?
>
> This is what the job looks like in Ruby:
>
> Riak::MapReduce.new(client).
>   index(bucket_name, "$key", '0'..'Z').
>   reduce(['riak_kv_mapreduce', 'reduce_count_inputs'], :keep => true, :arg
> => { "reduce_phase_batch_size" => 1000, "do_prereduce" => true } ).
>   timeout(86400000).
>   run
>
> As a side question, does do_prereduce here have any effect?  I am thinking
> it does not.  The docs indicate do_prereduce is a map phase argument, not a
> reduce phase one.  That begs the question of how to enable prereduce for a
> MR job without a map phase, other than setting mapred_always_prereduce =
> true in the config file.
>
> Elias
> _______________________________________________ riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to