> From: Joe Pallas <[email protected]>

> > Could it be that your row key is not distributing the
> > data well enough?
> > That is, if your key is primarily based on the current
> > date, it will only put the data into a small number of
> > regions.
> 
> This, I have come to realize, is an essential difference
> between the Cassandra approach and the HBase approach. 
> With HBase, your keys can be randomly distributed over the
> entire keyspace, but if all your data fits in a single
> region, then all your requests are going to a single
> regionserver.  

Yes, BigTable == distributed ordered table; Cassandra == hash partitioned ring 
typically. (With great simplification.) Because HBase is a DOT it can provide 
strongly consistent and atomic operations on rows, because rows exist in only 
one place at a time. This is a feature, or a problem, or both, depending on 
your use case.

> The only ways I know around this are to make the split
> threshold low or to pre-split the table.  If you make
> the split threshold low, you get distribution for smaller
> tables, but if the tables get big, you have the overhead of
> more regions to deal with.

The split point is adjustable. It can be set as a table attribute on a 
per-table basis. Start small and revise upward after enough regions are split 
so the table itself is well distributed. This assumes the keys used while 
inserting were consistent with the expected distribution of the application.

With HBase 0.90 changing the schema requires disabling the table, making the 
schema change, then enabling the table again.

With HBase 0.92, attribute changes like changing the split point won't require 
a disable/enable.

> If you pre-split the table,
> you're in good shape provided you know the key distribution
> in advance (although I am concerned about possible bugs
> involving empty regions, based on one recent experience).

Empty regions or underutilized regions can be merged (offline). Disable the 
table, use the Merge utility, then enable the table. Online merge is on the 
roadmap. It might be in 0.92, if not than the next.

   - Andy

Reply via email to