Of course what you really want is this: create table x( id text, timestamp timeuuid, flag boolean, // other fields primary key (flag, id, timestamp) )
Whoops now there are only 2 partition keys! Not good if you have any reasonable number of rows... Faced with a situation like this (although this is extreme) of a limited number of partition keys - and if this access path is important - then you can add shards like this: create table x( id text, timestamp timeuuid, flag boolean, // other fields shard int, primary key ((flag, shard), id, timestamp) ) The last part of the partition key may be used with IN so you can query like this: select * from x where flag=true and shard in (0,1,2,3,4,5,6,...); I monitor partition sizes and shard enough to keep them reasonable in this sort of situation. The C* infrastructure parallelizes a lot of the activity so such queries are quite fast. Oh, and ORDER BY works across shards. But the main point is: drive from your queries. Designing for C* is NOT like SQL - don't expect to develop a normalized set of tables to do it all. Start with how you want to access data and design from there. So - if you need to get a bunch of ids fast given a flag and maybe an id/timestamp range, and your volumes/sizes are such that the number of shards can be kept reasonable, this might be a good design, otherwise its crap. Drive from your own access patterns to derive your (typically denormalized) table defs. ml On Fri, Mar 21, 2014 at 3:34 PM, DuyHai Doan <[email protected]> wrote: > Hello Ben > > Try the following alternative with composite partition key to encode the > dual states of the boolean: > > > create table x( > id text, > flag boolean, > timestamp timeuuid, > // other fields > primary key (*(id,flag)* timestamp) > ) > > Your previous "select * from x where flag = true;" translate into: > > SELECT * FROM x WHERE id=... AND flag = true > > Of course, you'll need to provide the id in any case. > > If you want to query only on the boolean flag, I'm afraid that manual > indexing or secondary index (beware of cardinality !) are your only choices. > > Regards > > Duy Hai DOAN > > > > > On Fri, Mar 21, 2014 at 8:27 PM, Ben Hood <[email protected]> wrote: > >> Hi, >> >> I was wondering what the best way is to lay column families out so >> that you can to query by a boolean attribute. >> >> For example: >> >> create table x( >> id text, >> timestamp timeuuid, >> flag boolean, >> // other fields >> primary key (id, timestamp) >> ) >> >> So that you can query >> >> select * from x where flag = true; >> >> Three approaches spring to mind: >> >> 1) Put a secondary index on the flag column >> 2) Split the column family definition out into two separate CFs, one >> for true and one for false >> 3) Maintain a manual index table >> 4) Something else >> >> Option (1) seems to be the easiest, but I was wondering if that was >> going to put too much load on the secondary index, given the low >> cardinality of this attribute. It seems like what Cassandra would have >> to do internally to achieve (1) is to effectively implement (2) or (3) >> behind the scenes, but I'm just guessing. >> >> Does anybody have any experience with this? >> >> Cheers, >> >> Ben >> > >
