What's the underlying goal of getting this count of records in a bucket? Do you want to just have a live count or will you be eventually performing additional filters on the count?
One option might be to use counters [1] to hold these counts, instead of attempting to compute them on the fly. In direct answer to your question - there's no faster way to make this happen apart from speeding up disks and may playing around with some of the MapReduce arguments, like enabling pre-reduce. You're always going to have to scan the cluster to find keys matching your criteria (at least with LevelDB). [1]: http://basho.com/counters-in-riak-1-4/ --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Thu, Aug 1, 2013 at 12:01 PM, Christian Rosnes < [email protected]> wrote: > > > > On Wed, Jul 31, 2013 at 9:54 AM, Christian Rosnes < > [email protected]> wrote: > > >> I have 4 node Riak 1.4 test cluster on Azure >> (Large: 4core, 7GB RAM instances). >> >> > Ran 7, slightly different, Erlang map-reduce jobs overnight to count the > 118 million > records in the 'entries' bucket. There were no other user requests running > at the time of testing. Please take the test-results with a grain of salt, > YMMV. > Scripts used listed below. > > Christian > @NorSoulx > > *Here are the results:* > > ---- > Running script *count.all.records.in.bucket.1.sh* > Counting all records in bucket: entries (Thu Aug 1 09:07:53 UTC 2013) > [118 553 863] > real * 201m46.355s* > user 0m0.199s > sys 0m0.419s > Done: Thu Aug 1 12:29:39 UTC 2013 > > ---- > Running script* count.all.records.in.bucket.2.sh* > Counting all records in bucket: entries (Wed Jul 31 19:24:40 UTC 2013) > [118 553 863] > real *148m33.854s* (ran this a second time and the result was then * > 144m*) > user 0m0.185s > sys 0m0.423s > Done: Wed Jul 31 21:53:13 UTC 2013 > > ---- > Running script *count.all.records.in.bucket.3.sh* > Counting all records in bucket: entries (Wed Jul 31 21:53:13 UTC 2013) > [118 553 863] > real *129m51.310s* > user 0m0.136s > sys 0m0.327s > Done: Thu Aug 1 00:03:05 UTC 2013 > > ---- > Running script *count.all.records.in.bucket.4.sh* > Countuing all records in bucket: entries (Thu Aug 1 00:03:05 UTC 2013) > [118 553 863] > real *138m29.816s* > user 0m0.105s > sys 0m0.464s > Done: Thu Aug 1 02:21:35 UTC 2013 > > ---- > Running script *count.all.records.in.bucket.5.sh* > Counting all records in bucket: entries (Thu Aug 1 02:21:35 UTC 2013) > [118 553 863] > real *132m10.353s* > user 0m0.129ss > sys 0m0.337s > Done: Thu Aug 1 04:33:45 UTC 2013 > > ---- > Running script *count.all.records.in.bucket.6.sh* > Counting all records in bucket: entries (Thu Aug 1 04:33:45 UTC 2013) > [118 553 863] > real *137m16.386s* > user 0m0.122s > sys 0m0.363s > Done: Thu Aug 1 06:51:01 UTC 2013 > > ---- > Running script *count.all.records.in.bucket.7.sh* > Counting all records in bucket: entries (Thu Aug 1 06:51:01 UTC 2013) > > [118 553 863] > real *136m51.149s* > user 0m0.297s > sys 0m0.225s > Done: Thu Aug 1 09:07:53 UTC 2013 > > ============================= > > *Scripts:* > > count.all.records.in.bucket.1.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred -H 'Content-Type: > application/json' -d '{ > "inputs":"entries", > "query":[ > > {"map":{"language":"erlang","module":"riak_mapreduce_utils", > "function":"map_id","keep":false}}, > > {"reduce" : {"language" : "erlang", "module" : > "riak_kv_mapreduce", "function" : "reduce_count_inputs" }}, > ], > "timeout": 90000000}' > > > count.all.records.in.bucket.2.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":{ > "bucket":"entries", > "index":"$bucket", > "key":"entries" > }, > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"reduce_phase_batch_size":1000} > } > }], > "timeout": 90000000}' > > > count.all.records.in.bucket.3.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":"entries", > > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"do_prereduce":true} > } > }], > "timeout": 90000000}' > > > count.all.records.in.bucket.4.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":"entries", > > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > > "arg":{"reduce_phase_batch_size":100000,"do_prereduce":true} > } > }], > "timeout": 90000000}' > > > count.all.records.in.bucket.5.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":{ > "bucket":"entries", > "index":"$bucket", > "key":"entries" > }, > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"do_prereduce":true} > } > }], > "timeout": 90000000}' > > count.all.records.in.bucket.6.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":{ > "bucket":"entries", > "index":"$bucket", > "key":"entries" > }, > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"do_prereduce":false} > } > }], > "timeout": 90000000}' > > > count.all.records.in.bucket.7.sh > > -------------------------------- > time curl -XPOST http://localhost:8098/mapred \ > -H 'Content-Type: application/json' \ > -d '{"inputs":{ > "bucket":"entries", > "index":"$bucket", > "key":"entries" > }, > "query":[{"reduce":{"language":"erlang", > "module":"riak_kv_mapreduce", > "function":"reduce_count_inputs", > "arg":{"reduce_phase_batch_size":10000} > } > }], > "timeout": 90000000}' > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
