On Thu, Nov 26, 2009 at 7:12 PM, Anthony Molinaro <antho...@alumni.caltech.edu> wrote:
> Unless you are using order preserving partitioning which might or might not > be what you want, you won't be able to do a full scan. Instead you should > probably have two column families, one keyed by primary, one by secondary, > each with a column for the other, then you can do you operations. It > uses more space, but disk is cheap so probably not a big deal. yes, we thought so, using the second column family to only keep a list of the keys in the former without the data. > If you > have to model a many-to-many relationship you can use super columns. For now we are only storing a single attribute data, so we used normal columns instead of super columns, so in the end our schema is PrimaryCF { 'primary' => {'secondary'=>'data_0'} } SecondaryCF {'secondary'={'primary'=>''} I believe that using a SuperColumn in PrimaryCF would be necessary only when using more than one attribute, or are there other implications I'm not seeing? As for the secondary, I don't like the idea of storing a dummy value (new byte[0]) when I only need the name, is that a smell that I should be using something else? <snip> > You do your inserts into both, and for deletes you do a get_slice for the > secondary id, which will give you all primary ids which contain the > secondary id. Then you can delete everything. yes, we actually did it a bit "smarter" by querying first, and keeping a list of only the diff between the first and second insert. Thanks a lot for your answer, it's been very useful.