As best as I understand the magical $key index, you need to provide a range in order to query anything from the index. A RiakBucketKeyInput accepts a bucket/key pair - you can add many of these to an MR input phase if you already know which keys in a bucket need to be acted upon.
Riak's secondary indices only allow for two operatios - exact match and range scan (see the message description if you're interested [1]). To get a full range scan, you'll want to pick a range_max value that is outside the bounds of your largest key. If you know you're only dealing with ASCII characters you can easily pick an ASCII character [2] that's outside the bounds of your data set. This gets trickier if you have to deal with Unicode data. [1]: http://docs.basho.com/riak/latest/references/apis/protocol-buffers/PBC-Index/ [2]: http://www.asciitable.com/ --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Tue, Feb 12, 2013 at 2:24 PM, Kevin Burton <[email protected]>wrote: > Is there a reason why you selected a range and not just the bucket and key > (in the example)? My concern is that I don’t want to hard-code any > dependencies or fore-knowledge in the code if possible. Using a range > assumes that all of the keys are in the range. As I see it if you just > specify the bucket and key there is no “assumption”. Right?**** > > ** ** > > *From:* Jeremiah Peschka [mailto:[email protected]] > *Sent:* Tuesday, February 12, 2013 1:52 PM > > *To:* Kevin Burton > *Cc:* riak-users > *Subject:* Re: ListKeys or MapReduce**** > > ** ** > > Oh, and an example can be found https://gist.github.com/peschkaj/4772825** > ** > > > **** > > ---**** > > Jeremiah Peschka - Founder, Brent Ozar Unlimited**** > > MCITP: SQL Server 2008, MVP**** > > Cloudera Certified Developer for Apache Hadoop**** > > ** ** > > On Tue, Feb 12, 2013 at 11:44 AM, Jeremiah Peschka < > [email protected]> wrote:**** > > ...and fixed!**** > > ** ** > > You can get this right now if you're adventurous and want to build > CorrugatedIron from source by grabbing the develop branch [1]. We have > several other issues to clean up and verify before we release CI 1.1.1 in > the next day or so. Or you can download it from [2] if you don't want to > build yourself and don't want to wait for NuGet. Once we put 1.1.1 to NuGet > we'll respond to this thread or email you directly.**** > > ** ** > > I make no guarantees that the new DLL won't eat your hard drive or turn > your computer into a killer robot.**** > > ** ** > > [1]: https://github.com/DistributedNonsense/CorrugatedIron/tree/develop*** > * > > [2]: > http://clientresources.brentozar.com.s3.amazonaws.com/CorrugatedIron-111-alpha.zip > **** > > > **** > > ---**** > > Jeremiah Peschka - Founder, Brent Ozar Unlimited**** > > MCITP: SQL Server 2008, MVP**** > > Cloudera Certified Developer for Apache Hadoop**** > > ** ** > > On Tue, Feb 12, 2013 at 11:13 AM, Jeremiah Peschka < > [email protected]> wrote:**** > > Good news! You've found a bug in CorrugatedIron. Because of index naming, > we muck index names to have a suffix of _bin or _int, depending on the > index type. This shouldn't be happening on $key, but it is. I'll create a > github issue and get that taken care of.**** > > > **** > > ---**** > > Jeremiah Peschka - Founder, Brent Ozar Unlimited**** > > MCITP: SQL Server 2008, MVP**** > > Cloudera Certified Developer for Apache Hadoop**** > > ** ** > > On Tue, Feb 12, 2013 at 7:56 AM, Kevin Burton <[email protected]> > wrote:**** > > I forgot to mention that when I execute this code I get the error:**** > > **** > > {not_found,**** > > {<<"products">>,**** > > <<"$keys">>},**** > > undefined}}}:[{mochijson2,**** > > json_encode,2,**** > > [{file,**** > > > "src/mochijson2.erl"},**** > > {line,149}]},**** > > {mochijson2,**** > > > '-json_encode_array/2-fun-0-',**** > > 3,**** > > [{file,**** > > > "src/mochijson2.erl"},**** > > {line,157}]},**** > > {lists,foldl,3,**** > > > [{file,"lists.erl"},**** > > {line,1197}]},*** > * > > {mochijson2,**** > > > json_encode_array,2,**** > > [{file,**** > > > "src/mochijson2.erl"},**** > > {line,159}]},**** > > {riak_kv_pb_mapred, > **** > > process_stream,3,* > *** > > [{file,**** > > > "src/riak_kv_pb_mapred.erl"},**** > > {line,97}]},**** > > {riak_api_pb_server, > **** > > process_stream,5,* > *** > > [{file,**** > > > "src/riak_api_pb_server.erl"},**** > > {line,227}]},**** > > {riak_api_pb_server, > **** > > handle_info,2,**** > > [{file,**** > > > "src/riak_api_pb_server.erl"},**** > > {line,158}]},**** > > {gen_server,**** > > handle_msg,5,**** > > [{file,**** > > > "gen_server.erl"},**** > > {line,607}]}] - > CommunicationError**** > > **** > > **** > > *From:* riak-users [mailto:[email protected]] *On Behalf > Of *Kevin Burton > *Sent:* Tuesday, February 12, 2013 9:48 AM > *To:* 'Jeremiah Peschka' > *Cc:* 'riak-users' > *Subject:* RE: ListKeys or MapReduce**** > > **** > > The name is “$keys”? Something like:**** > > **** > > using (IRiakEndPoint cluster = RiakCluster.FromConfig( > "riakConfig"))**** > > {**** > > IRiakClient riakClient = cluster.CreateClient();**** > > RiakBucketKeyInput bucketKeyInput = new RiakBucketKeyInput > ();**** > > bucketKeyInput.AddBucketKey(productBucketName, "$keys");** > ** > > RiakMapReduceQuery query = new RiakMapReduceQuery()**** > > .Inputs(bucketKeyInput)**** > > .MapJs(m => m.Name("Riak.mapValuesJson").Keep(true));** > ** > > RiakResult<RiakMapReduceResult> result = > riakClient.MapReduce(query);**** > > if (result.IsSuccess)**** > > {**** > > **** > > **** > > *From:* Jeremiah Peschka > [mailto:[email protected]<[email protected]>] > > *Sent:* Tuesday, February 12, 2013 9:18 AM > *To:* Kevin Burton > *Cc:* riak-users > *Subject:* Re: ListKeys or MapReduce**** > > **** > > It would be queried like any other index as an MR input. I'll create an > issue and will try to get this in some time in the next few days - no > promises, though.**** > > > **** > > ---**** > > Jeremiah Peschka - Founder, Brent Ozar Unlimited**** > > MCITP: SQL Server 2008, MVP**** > > Cloudera Certified Developer for Apache Hadoop**** > > **** > > On Tue, Feb 12, 2013 at 7:09 AM, Kevin Burton <[email protected]> > wrote:**** > > I will read the other URLs that you mentioned. Thank you.**** > > **** > > Would you mind giving a short example (preferably using CI) of the $keys > index?**** > > **** > > *From:* Jeremiah Peschka [mailto:[email protected]] > *Sent:* Tuesday, February 12, 2013 8:52 AM > *To:* Kevin Burton > *Cc:* riak-users > *Subject:* Re: ListKeys or MapReduce**** > > **** > > They're both pretty crappy in terms of performance - they read all data > off of disk. If you're using LevelDB you can use the $keys index to pull > back just the keys that in a single bucket.**** > > **** > > A better approach is to maintain a separate bucket - e.g. DocumentCount - > that is used for counting documents. Unfortunately, you can't guarantee > transactional consistency around counts in Riak today, so you'll want to > move maintaining the counts out of Riak and into something else. If you > search the list archives [1], you'll find that Redis has been mentioned as > a good way to solve this problem - counters are stored in Redis and flushed > to Riak on a regular schedule. Because of the lack of consistency > (especially around MapReduce operations), Riak isn't the best choice if you > require counters/aggregations to be stored in the database.**** > > **** > > Once CRDTs [2] make it into mainstream Riak, you can make use of those > data structures to implement distributed counters in Riak.**** > > **** > > [1]: http://riak.markmail.org**** > > [2]: http://vimeo.com/52414903**** > > > **** > > ---**** > > Jeremiah Peschka - Founder, Brent Ozar Unlimited**** > > MCITP: SQL Server 2008, MVP**** > > Cloudera Certified Developer for Apache Hadoop**** > > **** > > On Mon, Feb 11, 2013 at 10:30 AM, <[email protected]> wrote:**** > > Say I need to determine how many document there are in my database. For a > CorrugatedIron application I can do ListKeys and get the warning that it is > an expensive operation or I can do a MapReduce query. Which is the the > least expensive? Is there an option that I am missing?**** > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com**** > > **** > > **** > > ** ** > > ** ** > > ** ** >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
