Elias,

  Just for the sake of argument, if you use
    index(bucket_name,"$key",'0'..'z')
do you get the same result?

Joe

From:  Elias Levy <[email protected]>
Date:  Friday, June 21, 2013 1:57 AM
To:  "[email protected]" <[email protected]>
Subject:  Mismatched object counts

I've just inserted some data into a six node Riak 1.3.1 EE cluster.  The
keys are all SHA256s.  The bucket previously had somewhere in the vicinity
of 1 million objects.

A MR job using the $key 2i with a range of '0' to 'Z', which should cover
all possible SHA256s, and using both riak_kv_mapreduce:: reduce_count_inputs
and streaming the keys using reduce_identity and counting client side, both
returned a count around 750K, but that is now somewhat suspect.

The objects I inserted overlap somewhat with the previously existing
objects, but not completely.  Overlapping objects were merged.  I inserted
2,521,799 objects.

When I execute the MR count job against it reports 1,604,783 objects, using
both techniques (reduce_count_inputs and reduce_identity plus client side
counting).

Given the discrepancy I queried the bucket for the 2,521,799 objects I
thought I inserted and I verified the system thinks they are there.

What gives?  Why is MR returning incorrect result?  Does the 2i query
somehow miss some possible keys?

This is what the job looks like in Ruby:

Riak::MapReduce.new(client).
  index(bucket_name, "$key", '0'..'Z').
  reduce(['riak_kv_mapreduce', 'reduce_count_inputs'], :keep => true, :arg
=> { "reduce_phase_batch_size" => 1000, "do_prereduce" => true } ).
  timeout(86400000).
  run

As a side question, does do_prereduce here have any effect?  I am thinking
it does not.  The docs indicate do_prereduce is a map phase argument, not a
reduce phase one.  That begs the question of how to enable prereduce for a
MR job without a map phase, other than setting mapred_always_prereduce =
true in the config file.

Elias
_______________________________________________ riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to