Re: uneven regions size after region split.

Kim Chew Thu, 19 Dec 2013 10:17:59 -0800

Thanks Ted,

Is there a Java api for that? :)


Kim


On Thu, Dec 19, 2013 at 10:10 AM, Ted Yu <[email protected]> wrote:

> Kim:
> For specification of split keys at time of table creation, see Luke' s
> comment on Dec 4th in this JIRA:
> HBASE-4163
>
> Cheers
>
>
> On Thu, Dec 19, 2013 at 10:05 AM, Kim Chew <[email protected]> wrote:
>
> > Hello Jean-Marc,
> >
> > I ran into a similar situation and I was using Hannibal to check the
> > regions status. My set up is hbase 0.94.8 and a three Region Servers
> > cluster. The table is pre-splitted to three regions (Which matches the
> > number of RS) My row key looks like this,
> >
> >          <bucket number><reversed timestamp><random number>
> >
> > The "bucket number" is the number of regions. After the the table is
> > created, it looks like this,
> >
> > RS           start key           end key
> > 0                                      001
> > 1              001                   002
> > 2              002
> >
> > After many region splits, I checked the regions status which is sorted by
> > host, I could see from the graph that in each host, there is one single
> > region that "stick out" i.e. has the biggest size. It is interesting to
> > find out the start keys and end keys for these three regions are
> >
> > start key     end key
> >                   000
> > 001             001<the rest of the row key>
> > 002             002<the rest of the row key>
> >
> > I am interested to find out why and how that happens. May be my row key
> > does not make the writes evenly distributed as I thought it would?
> > Also can I specified the start key and the end key when I pre-split the
> > table? I am not aware there is such a way.
> >
> > Thanks,
> >
> > Kim
> >
> >
> > On Wed, Dec 18, 2013 at 6:15 PM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Hi Kim,
> > >
> > > The regions on the graph are order by size.
> > >
> > > When you split a region, let's say from 10gb to 2 x 5gb, doesn't mean
> the
> > > next writes are going to be balanced between the 2 regions. so at some
> > > point, one should reach again 10gb, and the other one maybe still onlye
> > > 9gb. So you will have this time 9gb, 5gb, 5gb.
> > >
> > > And so on.
> > >
> > > Also, based on the size of the rows, the blocks, etc., HBase might not
> be
> > > able to split right in the middle of the region. So maybe you will get
> > 6gb
> > > and 4gb instead of 5 and 5.
> > >
> > > Now, add some deletes, some compactions, some manual splits, and you
> will
> > > end with a scenario like the one you sent.
> > >
> > > hth.
> > >
> > > JM
> > >
> > >
> > > 2013/12/18 Kim Chew <[email protected]>
> > >
> > > > Sorry if it may sounds like an open-end question, but I am wondering
> > why
> > > > this scenario happened after many region-splits,
> > > >
> > > > https://github.com/sentric/hannibal/wiki/Usage#wiki-region_splits
> > > >
> > > > It seems to me that the writes are concentrated to the first two
> > > > bars(Regions) after the splits.
> > > >
> > > > Thanks.
> > > >
> > > > Kim
> > > >
> > >
> >
>

Re: uneven regions size after region split.

Reply via email to