Hello Mark, El 26/04/2010, a las 07:17, Mark Robson escribió:
> I think the solution to this would be to choose your nodes' tokens wisely > before you start inserting data, and if possible, modify the keys to split > them better between the nodes. > > For example, if your key has two parts, one of which you want to range scan, > another which you don't. Say you have customer_id and a timestamp. The > customer ID does not need to be range scanned, so you can hash it into a hex > value (say), then append the timestamp (in a lexically sortable way of > course). So you'd end up with keys like > > HHHH-0012345-0001234567890 > > Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the > rest is a timestamp. > > You'd be able to do a time range scan by using the known prefixes, and > distributing your nodes equally from 0000 to ffff would result in fairly even > data (provided you don't have a very small number of very large customers). How do you ask cassandra to do a range scan with a prefix? As far as I can tell, you can't do something like: db.get_range('SomeCF', :start => 'HHHH-0012345-*') ...do you? Regards -- Lucas Di Pentima - Santa Fe, Argentina Jabber: lu...@di-pentima.com.ar MSN: ldipent...@hotmail.com