Of hbase key distribution and query scalability, again.

Dmitriy Lyubimov Fri, 25 May 2012 10:31:57 -0700

Hello,

I'd like to collect opinions from HBase experts on the query
uniformity and whether there's any advance technique currently exists
in HBase to cope with the problems of query uniformity beyond just
maintaining the key uniform distribution.


I know we start with the statement that in order to scale queries, we
need them uniformly distributed over key space. The next advice people
get is to use uniformly distributed key. Then, the thinking goes, the
query load will also be uniformly distributed among regions.

For what seems to be an embarassingly long time i was missing the
point however that using uniformly distributed keys does not equate
uniform distribution of the queries since it doesn't account for
skewness of queries over the key space itself. This skewness can be
bad enough under some circumstances to create query hot spots in the
cluster which could have been avoided should region splits were
balanced based on query loads rather than on a data size per se. (sort
of dynamic query distribution sampling in order to equalize the load
similar to how TotalOrderPartitioner does random data sampling to
build distribution of the key skewness in the incoming data).

To cut a long story, is the region size the only current HBase
technique to balance load, esp. w.r.t query load? Or perhaps there are
some more advanced techniques to do that ?

Thank you very much.
-Dmitriy

Of hbase key distribution and query scalability, again.

Reply via email to