Hi, I am not able to find information regarding the algorithm that decides which region a particular row belongs to in an HBase cluster. Does the algorithm take into account the number of physical nodes ? Where can I find more details about it ?
I went through the HBase book and the OpenTSDB schema examples on schema definitions and problems with monotonically increasing row keys, and had a follow up question. I want to be able to query on ranges of time in HBase. Following the OpenTSDB example, I have the following row key format: <eventid> - <yyyy-mm-dd> My eventId can be one of 12 distinct values (let us say from A-L) , and I have a 4 node cluster running HBase right now. However, these event id values are not evenly distributed. I believe that this implies some of the regions in the cluster are going to grow faster in size than others, and eventually will either automatically split or have to be manually split. Should this be a concern at this point ? How is HBase deciding which partition a particular key will go to ? I feel that knowing more details about the algorithm can help me design the schema better. Your help is appreciated. Thank you. Sam
