Re: Secondary indexing

Kresten Krab Thorup Sat, 23 Apr 2011 02:26:22 -0700

My colleague Erik is working on something that sounds like a solution for your 
problem, 
http://polymorphictypist.blogspot.com/2011/04/multi-version-collections-in-riak.html


In combination with a set of commit hooks like riak_link_index, it may be just 
what you're looking for. 

I believe that he has implemented some of it, but It hasn't been released yet.  
He's at github.com/eriksoe.

Kresten

--
301 Moved Permanently
Email: Kresten Krab Thorup <[email protected]>

On 22/04/2011, at 22.46, "Bryan O'Sullivan" <[email protected]> wrote:

> In my application, I have a frequent need to store collections of data where 
> I rarely access the items individually, but much more commonly in aggregate 
> form. In principle, the mapreduce feature is well suited to this, but it must 
> be fed a set of {bucket,key} pairs to consume, and maintaining these 
> collections is painful.
> 
> The naive approach would be to store each collection in its own bucket, and 
> use listkeys to fetch the keys that I can then mapreduce over. However, 
> listkeys has red flags all over it, presumably because it's implemented as a 
> scan+filter of all buckets and keys.
> 
> The slightly less naive approach would be to store the keys for a collection 
> in a list. Unfortunately, for a collection of even modest size (10,000 
> elements), the cost of retrieving and storing this index to make small 
> changes (e.g. "I'm adding one element to the collection") rapidly becomes 
> untenable, and that's before such knotty issues as "how do I delete items 
> from this index?" arise. This is essentially the approach taken by the 
> riak_link_index project.
> 
> A more subtle approach would be to maintain a hierarchical index, with a 
> "current" index that is modified whenever items are added, and promoted to 
> "frozen" once it reaches a certain size. Frozen indices would be maintained 
> in an index of their own, which itself would be frozen, and so on. 
> Performance would be maintained by limiting the maximum size of each index 
> object. This is a similar scheme to the traditional filesystem approach of 
> direct, single indirect, and double indirect block indexing. It's also quite 
> a lot of work to implement.
> 
> Does Basho have any plans to address this? It seems likely to be a very 
> frequently arising need.
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Secondary indexing

Reply via email to