Hello, I'd like to collect opinions from HBase experts on the query uniformity and whether there's any advance technique currently exists in HBase to cope with the problems of query uniformity beyond just maintaining the key uniform distribution.
I know we start with the statement that in order to scale queries, we need them uniformly distributed over key space. The next advice people get is to use uniformly distributed key. Then, the thinking goes, the query load will also be uniformly distributed among regions. For what seems to be an embarassingly long time i was missing the point however that using uniformly distributed keys does not equate uniform distribution of the queries since it doesn't account for skewness of queries over the key space itself. This skewness can be bad enough under some circumstances to create query hot spots in the cluster which could have been avoided should region splits were balanced based on query loads rather than on a data size per se. (sort of dynamic query distribution sampling in order to equalize the load similar to how TotalOrderPartitioner does random data sampling to build distribution of the key skewness in the incoming data). To cut a long story, is the region size the only current HBase technique to balance load, esp. w.r.t query load? Or perhaps there are some more advanced techniques to do that ? Thank you very much. -Dmitriy
