Hello Jean-Marc,
I ran into a similar situation and I was using Hannibal to check the
regions status. My set up is hbase 0.94.8 and a three Region Servers
cluster. The table is pre-splitted to three regions (Which matches the
number of RS) My row key looks like this,
<bucket number><reversed timestamp><random number>
The "bucket number" is the number of regions. After the the table is
created, it looks like this,
RS start key end key
0 001
1 001 002
2 002
After many region splits, I checked the regions status which is sorted by
host, I could see from the graph that in each host, there is one single
region that "stick out" i.e. has the biggest size. It is interesting to
find out the start keys and end keys for these three regions are
start key end key
000
001 001<the rest of the row key>
002 002<the rest of the row key>
I am interested to find out why and how that happens. May be my row key
does not make the writes evenly distributed as I thought it would?
Also can I specified the start key and the end key when I pre-split the
table? I am not aware there is such a way.
Thanks,
Kim
On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari <
[email protected]> wrote:
> Hi Kim,
>
> The regions on the graph are order by size.
>
> When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean the
> next writes are going to be balanced between the 2 regions. so at some
> point, one should reach again 10gb, and the other one maybe still onlye
> 9gb. So you will have this time 9gb, 5gb, 5gb.
>
> And so on.
>
> Also, based on the size of the rows, the blocks, etc., HBase might not be
> able to split right in the middle of the region. So maybe you will get 6gb
> and 4gb instead of 5 and 5.
>
> Now, add some deletes, some compactions, some manual splits, and you will
> end with a scenario like the one you sent.
>
> hth.
>
> JM
>
>
> 2013/12/18 Kim Chew <[email protected]>
>
> > Sorry if it may sounds like an open-end question, but I am wondering why
> > this scenario happened after many region-splits,
> >
> > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits
> >
> > It seems to me that the writes are concentrated to the first two
> > bars(Regions) after the splits.
> >
> > Thanks.
> >
> > Kim
> >
>