When starting your cassandra cluster, please configure the InitialToken for each node, which make the key range balance.
On Mon, Apr 26, 2010 at 6:17 PM, Mark Robson <mar...@gmail.com> wrote: > On 26 April 2010 01:18, 刘兵兵 <rucb...@gmail.com> wrote: > >> i do some INSERT ,because i will do some scan operations, i use the >> OrderPreservingPartition method. >> >> the state of the cluster is showed below. >> >> as i predicated the load is very imbalance > > > > I think the solution to this would be to choose your nodes' tokens wisely > before you start inserting data, and if possible, modify the keys to split > them better between the nodes. > > For example, if your key has two parts, one of which you want to range > scan, another which you don't. Say you have customer_id and a timestamp. The > customer ID does not need to be range scanned, so you can hash it into a hex > value (say), then append the timestamp (in a lexically sortable way of > course). So you'd end up with keys like > > HHHH-0012345-0001234567890 > > Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and > the rest is a timestamp. > > You'd be able to do a time range scan by using the known prefixes, and > distributing your nodes equally from 0000 to ffff would result in fairly > even data (provided you don't have a very small number of very large > customers). > > Mark >