Kim: For specification of split keys at time of table creation, see Luke' s comment on Dec 4th in this JIRA: HBASE-4163
Cheers On Thu, Dec 19, 2013 at 10:05 AM, Kim Chew <[email protected]> wrote: > Hello Jean-Marc, > > I ran into a similar situation and I was using Hannibal to check the > regions status. My set up is hbase 0.94.8 and a three Region Servers > cluster. The table is pre-splitted to three regions (Which matches the > number of RS) My row key looks like this, > > <bucket number><reversed timestamp><random number> > > The "bucket number" is the number of regions. After the the table is > created, it looks like this, > > RS start key end key > 0 001 > 1 001 002 > 2 002 > > After many region splits, I checked the regions status which is sorted by > host, I could see from the graph that in each host, there is one single > region that "stick out" i.e. has the biggest size. It is interesting to > find out the start keys and end keys for these three regions are > > start key end key > 000 > 001 001<the rest of the row key> > 002 002<the rest of the row key> > > I am interested to find out why and how that happens. May be my row key > does not make the writes evenly distributed as I thought it would? > Also can I specified the start key and the end key when I pre-split the > table? I am not aware there is such a way. > > Thanks, > > Kim > > > On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Kim, > > > > The regions on the graph are order by size. > > > > When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean the > > next writes are going to be balanced between the 2 regions. so at some > > point, one should reach again 10gb, and the other one maybe still onlye > > 9gb. So you will have this time 9gb, 5gb, 5gb. > > > > And so on. > > > > Also, based on the size of the rows, the blocks, etc., HBase might not be > > able to split right in the middle of the region. So maybe you will get > 6gb > > and 4gb instead of 5 and 5. > > > > Now, add some deletes, some compactions, some manual splits, and you will > > end with a scenario like the one you sent. > > > > hth. > > > > JM > > > > > > 2013/12/18 Kim Chew <[email protected]> > > > > > Sorry if it may sounds like an open-end question, but I am wondering > why > > > this scenario happened after many region-splits, > > > > > > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits > > > > > > It seems to me that the writes are concentrated to the first two > > > bars(Regions) after the splits. > > > > > > Thanks. > > > > > > Kim > > > > > >
