Related to this discussion, Jimmy provided some function to check for compaction state in HBASE-6033. But that is in 0.95 only.
On Thu, Mar 21, 2013 at 10:49 AM, Jean-Daniel Cryans <[email protected]>wrote: > On Thu, Mar 21, 2013 at 6:46 AM, Brennon Church <[email protected]> > wrote: > > Hello all, > > > > As I understand it, a common performance tweak is to disable major > > compactions so that you don't end up with storms taking things out at > > inconvenient times. I'm thinking that I should just write a quick > script to > > rotate through all of our regions, one at a time, and compact them. > Again, > > if I'm understanding this correctly we should not end up with storms as > > they'll only happen one at a time, and each one doesn't run for long. > Does > > that seem reasonable, or am I missing something? My hope is to run the > > script regularly. > > FWIW major compacting isn't even needed if you don't update or delete > cells so do consider that too. > > The problem with scheduling major compactions yourself is that, since > the command is async, you can still end up with a storm of compactions > if you just blindly issue major_compact for all your regions. Things > like adding wait time works but then let's say you want the > compactions to run only between 2 and 4AM then you can run out of > time. What I have seen to circumvent this is to only do a subset of > the regions at a time. You can also use JMX to monitor the compaction > queue on each RS and make sure you are not just piling them up, but > this requires some more work. > > > > > Corollary question... I recently added drives to our nodes and since I > did > > this while they were all still running, basically just restarting the > > datanode underneath to pick up the new spindles, I'm fairly sure I've > thrown > > data locality out the window, based on the changed pattern of network > > traffic. > > Interesting but unlikely. Even restarting HBase shouldn't do that > unless it was wrongly restarted. Each RS publishes a locality index > (hdfsBlocksLocalityIndex) that you can find via JMX or in their web > UI, are they close to 100% or way down? Also which version are you on? > > > If I'm right, manually running major compactions against all of > > the regions should resolve that, as the underlying data would all get > > written locally. Again, does that make sense? > > Major compacting would do that yes, but first check if you need it at > all I think. > > J-D >
