I ended up writing a tool which helps merge the table regions into a target # of regions. For example if you want to go from N --> N/8, then the tool figures out the grouping and merges them in one pass. I will put it up in a github repo soon and share it here.
The sad part of this approach is the downtime required. It's taking over 2 hours on my test cluster which is less than 30% of the production table size. In absolute value, the table has over 100 regions and I am merging it down to 20 or so and it has 20GB of compressed (lzo) data. Is there a better way to achieve this ? If not, should I open a JIRA to explore the chance of running the Merge util on a disabled table rather than having to shutdown the entire cluster ? It will also be great to ignore compaction when merging the table and then do it as a later step since that can happen online. Just throwing some ideas here. Thanks, Viral On Tue, Jul 2, 2013 at 11:22 AM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Viral, > > It was working fine when I did it. I'm not sure you can still apply it > to a recent HBase version because some code change. But I can take a > look to see if I can rebase it... > > JM >
