On Thu, Nov 26, 2009 at 7:12 PM, Anthony Molinaro
<antho...@alumni.caltech.edu> wrote:


> Unless you are using order preserving partitioning which might or might not
> be what you want, you won't be able to do a full scan.  Instead you should
> probably have two column families, one keyed by primary, one by secondary,
> each with a column for the other, then you can do you operations.  It
> uses more space, but disk is cheap so probably not a big deal.

yes, we thought so, using the second column family to only keep a list
of the keys in the former without the data.

> If you
> have to model a many-to-many relationship you can use super columns.

For now we are only storing a single attribute data, so we used normal
columns instead of super columns, so in the end our schema is
PrimaryCF
{ 'primary' => {'secondary'=>'data_0'} }
SecondaryCF
{'secondary'={'primary'=>''}

I believe that using a SuperColumn in PrimaryCF would be necessary
only when using more than one attribute, or are there other
implications I'm not seeing?
As for the secondary, I don't like the idea of storing a dummy value
(new byte[0]) when I only need the name, is that a smell that I should
be using something else?

<snip>

> You do your inserts into both, and for deletes you do a get_slice for the
> secondary id, which will give you all primary ids which contain the
> secondary id.  Then you can delete everything.

yes, we actually did it a bit "smarter" by querying first, and keeping
a list of only the diff between the first and second insert. Thanks a
lot for your answer, it's been very useful.

Reply via email to