Re: follow up question on row key schema design

Arvind Jayaprakash Mon, 06 Jun 2011 10:35:50 -0700

On Jun 02, Sam Seigal wrote:
><eventid> - <yyyy-mm-dd>
>
>My eventId can be one of 12 distinct values (let us say from A-L) , and I
>have a 4 node cluster running HBase right now.
>
>After doing some research in our OLTP database, I found that the majority
>(about 45% of the data) from the last 6 months written in the OLTP database
>has the event id equal to value "A".


(Disclaimer: hbase n00b trying to pretend an expert, I might be grossly
wrong in certain respects)

Hbase regions are not organized like a trie. So, a dense clustering for
a given first byte of the row key should not be a problem when it comes
to how the regions are constructed. With the default splitting scheme,
regions should roughly be getting split based on number of keys in a
range (assuming comparable row key sizes). 

The potential problem that you could run into and might be a bit harder
to dodge occurs when all the region sizes are comparable but the access
pattern is heavily skewed towards certain regions. At this point, you
would have to split the regions manually. I am not sure of hbase can
spread hot regions across different physical nodes on the fly though.

Re: follow up question on row key schema design

Reply via email to