Currently merging regions can only be done while HBase is offline, a long time ago this was opened: https://issues.apache.org/jira/browse/HBASE-420. And some work was to at least be able to merge regions in disabled tables: https://issues.apache.org/jira/browse/HBASE-1621 but it requires a lot more engineering.
J-D On Thu, Sep 16, 2010 at 9:32 AM, Jinsong Hu <[email protected]> wrote: > > I did the test, instead of waiting for one day, I manually run major_compact > and found the old data is indeed removed. For this part , it is working as > advertised. > > However, I found that I end up having several regions that have no data > inside. > and the regions are not merged even though they are empty and consecutive. > > That means, if we run this in production system and key is chronological > order, we will end up > having thousands of regions as time goes on and the number of regions never > decrease, > even though old data are compacted away. we don't really mind having several > empty regions, but the fact that the region number continue to grow > unlimited without stop as time goes on, is really troublesome. It waste > hadoop namenode resource, and waste memory resource on regionserver, as each > region takes some memory to store region info. > > can this be added to the compaction task, to merge consecutive empty region > into single one > after data is processed ? > > Jimmy. > > -------------------------------------------------- > From: "Stack" <[email protected]> > Sent: Thursday, September 16, 2010 8:39 AM > To: <[email protected]> > Subject: Re: hbase doesn't delete data older than TTL in old regions > >> You could change hbase.hregion.majorcompaction to be less than one day >> so you don't have to wait so long. Make sure DEBUG is enabled (It >> should be by default). With DEBUG, you'll be able to see compactions >> running. Log will include type of compaction run. >> >> Thanks for testing, >> St.Ack >> >> On Wed, Sep 15, 2010 at 10:43 PM, Jinsong Hu <[email protected]> >> wrote: >>> >>> Hi, Stack: >>> Thanks for the explanation. I looked at the code and it seems that the >>> old >>> region should get compacted >>> and data older than TTL will get removed. I will do a test with a table >>> with >>> 10 min TTL , and insert several >>> regions and wait for 1 day, and see if old records will indeed get >>> removed >>> or not. >>> >>> Jimmy. >>> >>> -------------------------------------------------- >>> From: "Stack" <[email protected]> >>> Sent: Wednesday, September 15, 2010 9:53 PM >>> To: <[email protected]> >>> Subject: Re: hbase doesn't delete data older than TTL in old regions >>> >>>> On Wed, Sep 15, 2010 at 5:50 PM, Jinsong Hu <[email protected]> >>>> wrote: >>>>> >>>>> One thing I am not clear about major compaction is that for the regions >>>>> with >>>>> a single map file, >>>>> will hbase actually load it and remove the records older than TTL ? >>>> >>>> Major compactions will run even if only one file IFF this file is not >>>> already the product of a major compaction (files that have been major >>>> compacted get a marker in their metadata so next time a major >>>> compaction runs we'll skip the file) AND the time since the last major >>>> compaction is < TTL (See >>>> >>>> >>>> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/Store.html#743). >>>> >>>> The RegionServer runs a Major Compaction checking thread... it runs on a >>>> period. >>>> >>>> So, it should be doing what you want (if a little crudely given its >>>> waiting TTL before rechecking if already major compacted. >>>> >>>> We could make improvement by looking at oldest timestamp every time we >>>> run the major compaction check. >>>> >>>> St.Ack >>>> >>> >> >
