Currently merging regions can only be done while HBase is offline, a
long time ago this was opened:
https://issues.apache.org/jira/browse/HBASE-420. And some work was to
at least be able to merge regions in disabled tables:
https://issues.apache.org/jira/browse/HBASE-1621 but it requires a lot
more engineering.

J-D

On Thu, Sep 16, 2010 at 9:32 AM, Jinsong Hu <[email protected]> wrote:
>
> I did the test, instead of waiting for one day, I manually run major_compact
> and found the old data is indeed removed. For this part , it is working as
> advertised.
>
> However, I found that I end up having several regions that have no data
> inside.
> and the regions are not merged even though they are empty  and consecutive.
>
> That means, if we run this in production system and key is chronological
> order, we will end up
> having thousands of regions as time goes on and the number of regions never
> decrease,
> even though old data are compacted away. we don't really mind having several
> empty regions, but the fact that the region number continue to grow
> unlimited without stop as time goes on, is really troublesome. It waste
> hadoop namenode resource, and waste memory resource on regionserver, as each
> region takes some memory to store region info.
>
> can this be added to the compaction task, to merge consecutive empty region
> into single one
> after data is processed ?
>
> Jimmy.
>
> --------------------------------------------------
> From: "Stack" <[email protected]>
> Sent: Thursday, September 16, 2010 8:39 AM
> To: <[email protected]>
> Subject: Re: hbase doesn't delete data older than TTL in old regions
>
>> You could change hbase.hregion.majorcompaction to be less than one day
>> so you don't have to wait so long.  Make sure DEBUG is enabled (It
>> should be by default).  With DEBUG, you'll be able to see compactions
>> running.  Log will include type of compaction run.
>>
>> Thanks for testing,
>> St.Ack
>>
>> On Wed, Sep 15, 2010 at 10:43 PM, Jinsong Hu <[email protected]>
>> wrote:
>>>
>>> Hi, Stack:
>>>  Thanks for the explanation.  I looked at the code and it seems that the
>>> old
>>> region should get compacted
>>> and data older than TTL will get removed. I will do a test with a table
>>> with
>>> 10 min TTL , and insert several
>>> regions and wait for 1 day, and see if old records will indeed get
>>> removed
>>> or not.
>>>
>>> Jimmy.
>>>
>>> --------------------------------------------------
>>> From: "Stack" <[email protected]>
>>> Sent: Wednesday, September 15, 2010 9:53 PM
>>> To: <[email protected]>
>>> Subject: Re: hbase doesn't delete data older than TTL in old regions
>>>
>>>> On Wed, Sep 15, 2010 at 5:50 PM, Jinsong Hu <[email protected]>
>>>> wrote:
>>>>>
>>>>> One thing I am not clear about major compaction is that for the regions
>>>>> with
>>>>> a single map file,
>>>>> will hbase actually load it and remove the records older than TTL ?
>>>>
>>>> Major compactions will run even if only one file IFF this file is not
>>>> already the product of a major compaction (files that have been major
>>>> compacted get a marker in their metadata so next time a major
>>>> compaction runs we'll skip the file) AND the time since the last major
>>>> compaction is < TTL (See
>>>>
>>>>
>>>> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/Store.html#743).
>>>>
>>>> The RegionServer runs a Major Compaction checking thread... it runs on a
>>>> period.
>>>>
>>>> So, it should be doing what you want (if a little crudely given its
>>>> waiting TTL before rechecking if already major compacted.
>>>>
>>>> We could make improvement by looking at oldest timestamp every time we
>>>> run the major compaction check.
>>>>
>>>> St.Ack
>>>>
>>>
>>
>

Reply via email to