Thanks Arun, and John, Both of your scenarios make a lot of sense to me. But for the "sequence-based key" case, I am still confused. It is like an append-only operation, so new data are always written into the same region, but that region will eventually reach the hbase.hregion.max.filesize and be automatically split, why still need a manual split? If we set the hbase.hregion.max.filesize to a "not too big" value, then a region will never grow too big?
And I think I need to first understand how HBase do the auto split internally ( I am very new to HBase). Given a region with start key A, and end key B. When split, how HBase do split internally? Split in the middle of key range? Original region is in range [A,B], so split to [A, B-A/2] and [B-A/2+1, B] ? Then if most of the row key are in a small range [A, C], while C is very close to B-A/2, then I can see a problem of auto split. Is this true? Can HBase do split in other ways? Thanks, Ming -----Original Message----- From: john guthrie [mailto:[email protected]] Sent: Wednesday, August 06, 2014 6:01 PM To: [email protected] Subject: Re: Why hbase need manual split? i had a customer with a sequence-based key (yes, he knew all the downsides for that). being able to split manually meant he could split a region that got too big at the end vice right down the middle. with a sequentially increasing key, splitting the region in half left one region half the desired size and likely to never be added to On Wed, Aug 6, 2014 at 2:44 AM, Arun Allamsetty <[email protected]> wrote: > Hi Ming, > > The reason why we have it is because the user can decide where each > key goes. I can think multiple scenarios off the top of my head where > it would be useful and others can correct me if I am wrong. > > 1. Cases where you cannot have row keys which are equally lexically > distributed, leading in unequal loads on the regions. In such cases, > we can set key ranges to be assigned to different regions so that we > can have a more equal distribution. > > 2. The second scenario I am thinking of may be wrong and if it is, > it'll clear my misconceptions. In case you cannot denormalize your > data and you have to perform joins on certain range of row keys which > are lexically similar. So we split them and they would be assigned to > the same region server (right?) and the join would be performed locally. > > Cheers, > Arun > > Sent from a mobile device. Please don't mind the typos. > On Aug 6, 2014 12:30 AM, "Liu, Ming (HPIT-GADSC)" <[email protected]> > wrote: > > > Hi, all, > > > > As I understand, HBase will automatically split a region when the > > region is too big. > > So in what scenario, user needs to do a manual split? Could someone > kindly > > give me some examples that user need to do the region split > > explicitly > via > > HBase Shell or Java API? > > > > Thanks very much. > > > > Regards, > > Ming > > >
