My colleague Erik is working on something that sounds like a solution for your problem, http://polymorphictypist.blogspot.com/2011/04/multi-version-collections-in-riak.html
In combination with a set of commit hooks like riak_link_index, it may be just what you're looking for. I believe that he has implemented some of it, but It hasn't been released yet. He's at github.com/eriksoe. Kresten -- 301 Moved Permanently Email: Kresten Krab Thorup <[email protected]> On 22/04/2011, at 22.46, "Bryan O'Sullivan" <[email protected]> wrote: > In my application, I have a frequent need to store collections of data where > I rarely access the items individually, but much more commonly in aggregate > form. In principle, the mapreduce feature is well suited to this, but it must > be fed a set of {bucket,key} pairs to consume, and maintaining these > collections is painful. > > The naive approach would be to store each collection in its own bucket, and > use listkeys to fetch the keys that I can then mapreduce over. However, > listkeys has red flags all over it, presumably because it's implemented as a > scan+filter of all buckets and keys. > > The slightly less naive approach would be to store the keys for a collection > in a list. Unfortunately, for a collection of even modest size (10,000 > elements), the cost of retrieving and storing this index to make small > changes (e.g. "I'm adding one element to the collection") rapidly becomes > untenable, and that's before such knotty issues as "how do I delete items > from this index?" arise. This is essentially the approach taken by the > riak_link_index project. > > A more subtle approach would be to maintain a hierarchical index, with a > "current" index that is modified whenever items are added, and promoted to > "frozen" once it reaches a certain size. Frozen indices would be maintained > in an index of their own, which itself would be frozen, and so on. > Performance would be maintained by limiting the maximum size of each index > object. This is a similar scheme to the traditional filesystem approach of > direct, single indirect, and double indirect block indexing. It's also quite > a lot of work to implement. > > Does Basho have any plans to address this? It seems likely to be a very > frequently arising need. > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
