I updated the ticket with our discussion, and added the following comments:

What I suggest is to make the sweep part of the major_compact. basically, it needs to merge consecutive empty regions to the neighboring region that is not empty. it need to merge the records in .META. table, and delete the empty directories in the HDFS for the empty regions. it should then instruct the region servers to unload the original regions and reload the merged regions.

Jimmy.

--------------------------------------------------
From: "Stack" <[email protected]>
Sent: Thursday, September 16, 2010 9:49 AM
To: <[email protected]>
Subject: Re: hbase doesn't delete data older than TTL in old regions

On Thu, Sep 16, 2010 at 9:32 AM, Jinsong Hu <[email protected]> wrote:
That means, if we run this in production system and key is chronological
order, we will end up
having thousands of regions as time goes on and the number of regions never
decrease,
even though old data are compacted away. we don't really mind having several
empty regions, but the fact that the region number continue to grow
unlimited without stop as time goes on, is really troublesome. It waste
hadoop namenode resource, and waste memory resource on regionserver, as each
region takes some memory to store region info.


Agreed.

It'd be easy enough to write a script to do this run out of cron but
yeah, we should have a facility to sweep hbase and in particular if
regions are empty of store files, merge to neighbour.

Would you mind updating hbase-2999 to make it clear what is  needed to
satisfy the issue?  The clearer the stipulation, the easier it is on
the implementor (Patches also accepted if you'd like to have a go at
this yourself).

St.Ack

Reply via email to