Thanks Sean for detailed reply. This is very useful. I think we can afford to shut down cluster for few hours in the night and probably do this in batches. Few questions to use your experience on this:
1. You are referring to Shut down of HBase to bring down RS, Master and Zookeeper? I was getting some errors test merging when all three were down. 2. Just to make sure this is the merge utility you used? $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.util.Merge <tbl_name> <region_1> <region_2> Few follow-up questions On Wed, Dec 1, 2010 at 6:09 AM, Sean Sechrist <[email protected]> wrote: > Hey Abhijit, > > We ran into this same issue a while back, and here is what we did (and it > seemed to work ok): > > 1. Went onto to the HBase web ui for our biggest table, and grabbed all of > the region names (they appear in order on that page). Saved the region > names > to a text file. > 2. Wrote a shell script to run the hbase merge tool on every pair of > regions > in that file. > 3. Shut down HBase. > 4. Run that shell script. It went at about 50 merges/hour on our 5 node > cluster. > 5. Start HBase. When it went back up we saw that our region count was about > 1500 regions, down from almost 3000. > > So this would only work if you can take down HBase for a decent amount of > time. > > I wonder if you could alternatively, run an Export job and an Import job of > your table. Do those preserve the regions, or could you use it to bring > down > the number of regions? > > -Sean > > On Wed, Dec 1, 2010 at 2:24 AM, Abhijit Pol <[email protected]> wrote: > > > We have HBase cluster which was peacefully (acceptable throughput and > > latencies) serving for about a month (we are using 0.89.20100926 version) > > > > This morning we wanted to set TTL to value smaller than default and Mr. > > Murphy struck. > > > > (A) We disabled and altered table with desired TTL value (using shell). > > Alerting one property (in this case TTL) reseted all other > > table properties to default values. For example version was reset to 3, > > compression was reset to NONE etc. [I think this is known issue with open > > Jira] > > > > (B) We wanted to go back to previous table properties. Now after multiple > > retires we were not able to disable table (even restarting cluster didn't > > helped). Most likely if clients are hitting hard (in this case ~10k qps) > on > > HBase table, it takes forever to disable a table. > > > > So we stopped all clients and then were able to disable table and altered > > table properties to desired values. > > > > (C) Due to compression was reset to NONE and version was reset to 3 for > > good > > 10-12hrs, the total number of regions tripled and load (#regions/RS) > > increased from 100 to 300. After first major_compaction, compression, > > version 1, and new lower TTL became effective and we were back to > original > > HDFS footprint and have bunch of small regions. > > > > What will trigger merging of these regions? the tool for merging does not > > seem to work or even if it does, it can only do two regions at-a-time. > > > > Any suggestions on how we can reduce number of regions and bring load > back > > to where it was before? > > > > > > Thanks, > > --Abhi > > >
