Ryan Zezeski wrote: > Mike Oxford wrote: >> The big "problem" is that you have to have "knowledge of the buckets" >> to later correlate them. Listing buckets is expensive. > > I'm not sure if you realize this but "bucket" is really just a namespace in > the key. Said another way <REAL KEY>=<BUCKET>/<KEY>. The <REAL KEY> is > what's hashed and determines the ring position. There are no special > provisions for a bucket for the most part (one exception I can think of is > custom properties which get stored in the gossiped ring).
Right. There is no list of buckets. Computing the list is ludicrously expensive because it involves folding over all of the keys in the backend, extracting the bucket name from each, and accumulating these in a set. Listing all the keys in a bucket is similarly expensive. It folds over all the keys, extracts the bucket name, matches against the desired bucket (if any), and then accumulates the keys. However, if you specify the bucket when listing keys there is an optimization available to key listing that is impossible for bucket listing. I've given it some thought because I intend to regularly MR over entire buckets which involves listing keys. The best solution I've found so far is to partition the keyspace by using the multi backend. When you ask for all of the keys in a given bucket, only the backend that stores that bucket is consulted. Ideally, any bucket that will need to produce its key list gets its own keyspace (backend). Andy _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
