I did the test, instead of waiting for one day, I manually run major_compact
and found the old data is indeed removed. For this part , it is working as advertised.

However, I found that I end up having several regions that have no data inside.
and the regions are not merged even though they are empty  and consecutive.

That means, if we run this in production system and key is chronological order, we will end up having thousands of regions as time goes on and the number of regions never decrease, even though old data are compacted away. we don't really mind having several empty regions, but the fact that the region number continue to grow unlimited without stop as time goes on, is really troublesome. It waste hadoop namenode resource, and waste memory resource on regionserver, as each region takes some memory to store region info.

can this be added to the compaction task, to merge consecutive empty region into single one
after data is processed ?

Jimmy.

--------------------------------------------------
From: "Stack" <[email protected]>
Sent: Thursday, September 16, 2010 8:39 AM
To: <[email protected]>
Subject: Re: hbase doesn't delete data older than TTL in old regions

You could change hbase.hregion.majorcompaction to be less than one day
so you don't have to wait so long.  Make sure DEBUG is enabled (It
should be by default).  With DEBUG, you'll be able to see compactions
running.  Log will include type of compaction run.

Thanks for testing,
St.Ack

On Wed, Sep 15, 2010 at 10:43 PM, Jinsong Hu <[email protected]> wrote:
Hi, Stack:
Thanks for the explanation. I looked at the code and it seems that the old
region should get compacted
and data older than TTL will get removed. I will do a test with a table with
10 min TTL , and insert several
regions and wait for 1 day, and see if old records will indeed get removed
or not.

Jimmy.

--------------------------------------------------
From: "Stack" <[email protected]>
Sent: Wednesday, September 15, 2010 9:53 PM
To: <[email protected]>
Subject: Re: hbase doesn't delete data older than TTL in old regions

On Wed, Sep 15, 2010 at 5:50 PM, Jinsong Hu <[email protected]>
wrote:

One thing I am not clear about major compaction is that for the regions
with
a single map file,
will hbase actually load it and remove the records older than TTL ?

Major compactions will run even if only one file IFF this file is not
already the product of a major compaction (files that have been major
compacted get a marker in their metadata so next time a major
compaction runs we'll skip the file) AND the time since the last major
compaction is < TTL (See

http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/Store.html#743).

The RegionServer runs a Major Compaction checking thread... it runs on a
period.

So, it should be doing what you want (if a little crudely given its
waiting TTL before rechecking if already major compacted.

We could make improvement by looking at oldest timestamp every time we
run the major compaction check.

St.Ack



Reply via email to