Thanks Ted, Is there a Java api for that? :)
Kim On Thu, Dec 19, 2013 at 10:10 AM, Ted Yu <[email protected]> wrote: > Kim: > For specification of split keys at time of table creation, see Luke' s > comment on Dec 4th in this JIRA: > HBASE-4163 > > Cheers > > > On Thu, Dec 19, 2013 at 10:05 AM, Kim Chew <[email protected]> wrote: > > > Hello Jean-Marc, > > > > I ran into a similar situation and I was using Hannibal to check the > > regions status. My set up is hbase 0.94.8 and a three Region Servers > > cluster. The table is pre-splitted to three regions (Which matches the > > number of RS) My row key looks like this, > > > > <bucket number><reversed timestamp><random number> > > > > The "bucket number" is the number of regions. After the the table is > > created, it looks like this, > > > > RS start key end key > > 0 001 > > 1 001 002 > > 2 002 > > > > After many region splits, I checked the regions status which is sorted by > > host, I could see from the graph that in each host, there is one single > > region that "stick out" i.e. has the biggest size. It is interesting to > > find out the start keys and end keys for these three regions are > > > > start key end key > > 000 > > 001 001<the rest of the row key> > > 002 002<the rest of the row key> > > > > I am interested to find out why and how that happens. May be my row key > > does not make the writes evenly distributed as I thought it would? > > Also can I specified the start key and the end key when I pre-split the > > table? I am not aware there is such a way. > > > > Thanks, > > > > Kim > > > > > > On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > Hi Kim, > > > > > > The regions on the graph are order by size. > > > > > > When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean > the > > > next writes are going to be balanced between the 2 regions. so at some > > > point, one should reach again 10gb, and the other one maybe still onlye > > > 9gb. So you will have this time 9gb, 5gb, 5gb. > > > > > > And so on. > > > > > > Also, based on the size of the rows, the blocks, etc., HBase might not > be > > > able to split right in the middle of the region. So maybe you will get > > 6gb > > > and 4gb instead of 5 and 5. > > > > > > Now, add some deletes, some compactions, some manual splits, and you > will > > > end with a scenario like the one you sent. > > > > > > hth. > > > > > > JM > > > > > > > > > 2013/12/18 Kim Chew <[email protected]> > > > > > > > Sorry if it may sounds like an open-end question, but I am wondering > > why > > > > this scenario happened after many region-splits, > > > > > > > > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits > > > > > > > > It seems to me that the writes are concentrated to the first two > > > > bars(Regions) after the splits. > > > > > > > > Thanks. > > > > > > > > Kim > > > > > > > > > >
