hbase hashing algorithm and schema design

Sam Seigal Fri, 03 Jun 2011 00:36:23 -0700

Hi,

I am not able to find information regarding the algorithm that decides which
region a particular row belongs to in an HBase cluster. Does the algorithm
take into account the number of physical nodes ? Where can I find more
details about it ?


I went through the HBase book and the OpenTSDB schema examples on schema
definitions and problems with monotonically increasing row keys, and had a
follow up question.

I want to be able to query on ranges of time in HBase. Following the
OpenTSDB example, I have the following row key format:

<eventid> - <yyyy-mm-dd>

My eventId can be one of 12 distinct values (let us say from A-L) , and I
have a 4 node cluster running HBase right now. However, these event id
values are not evenly distributed.  I believe that this implies some of the
regions in the cluster  are going to grow faster in size than others, and
eventually will either automatically split or have to be manually split.
Should this be a concern at this point ? How is HBase deciding which
partition a particular key will go to ? I feel that knowing more details
about the algorithm can help me design the schema better.

Your help is appreciated.

Thank you.

Sam

hbase hashing algorithm and schema design

Reply via email to