Thanks Arun, and John,

Both of your scenarios make a lot of sense to me. But for the "sequence-based 
key" case, I am still confused. It is like an append-only operation, so new 
data are always written into the same region, but that region will eventually 
reach the hbase.hregion.max.filesize and be automatically split, why still need 
a manual split? If we set the hbase.hregion.max.filesize to a "not too big" 
value, then a region will never grow too big?   

And I think I need to first understand how HBase do the auto split internally ( 
I am very new to HBase). Given a region with start key A, and end key B. When 
split, how HBase do split internally? Split in the middle of key range?
Original region is in range [A,B], so split to [A, B-A/2] and [B-A/2+1, B] ?
Then if most of the row key are in a small range [A, C], while C is very close 
to B-A/2, then I can see a problem of auto split. 

Is this true? Can HBase do split in other ways?

Thanks,
Ming

-----Original Message-----
From: john guthrie [mailto:[email protected]] 
Sent: Wednesday, August 06, 2014 6:01 PM
To: [email protected]
Subject: Re: Why hbase need manual split?

i had a customer with a sequence-based key (yes, he knew all the downsides for 
that). being able to split manually meant he could split a region that got too 
big at the end vice right down the middle. with a sequentially increasing key, 
splitting the region in half left one region half the desired size and likely 
to never be added to


On Wed, Aug 6, 2014 at 2:44 AM, Arun Allamsetty <[email protected]>
wrote:

> Hi Ming,
>
> The reason why we have it is because the user can decide where each 
> key goes. I can think multiple scenarios off the top of my head where 
> it would be useful and others can correct me if I am wrong.
>
> 1. Cases where you cannot have row keys which are equally lexically 
> distributed, leading in unequal loads on the regions. In such cases, 
> we can set key ranges to be assigned to different regions so that we 
> can have a more equal distribution.
>
> 2. The second scenario I am thinking of may be wrong and if it is, 
> it'll clear my misconceptions. In case you cannot denormalize your 
> data and you have to perform joins on certain range of row keys which 
> are lexically similar. So we split them and they would be assigned to 
> the same region server (right?) and the join would be performed locally.
>
> Cheers,
> Arun
>
> Sent from a mobile device. Please don't mind the typos.
> On Aug 6, 2014 12:30 AM, "Liu, Ming (HPIT-GADSC)" <[email protected]>
> wrote:
>
> > Hi, all,
> >
> > As I understand, HBase will automatically split a region when the 
> > region is too big.
> > So in what scenario, user needs to do a manual split? Could someone
> kindly
> > give me some examples that user need to do the region split 
> > explicitly
> via
> > HBase Shell or Java API?
> >
> > Thanks very much.
> >
> > Regards,
> > Ming
> >
>

Reply via email to