To start major compaction for tablename from cli, you need to run: echo major_compact tablename | hbase shell
I do this after bulk loading to the table. FYI, to avoid surprises, I also turn off load balancer and rebalance regions manually. The cli command to turn off balancer is: echo balance_switch false | hbase shell To rebalance regions after a bulk load or other changes, run: echo balance | hbase shell You can run these two command using ssh. I use Ansible to do these. Assuming you have defined hbase_master in your hosts file, you can run: ansible -i hosts hbase_master -a "echo major_compact tablename | hbase shell" Behdad Forghani On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <[email protected]> wrote: > Hi, > > What's the best way to automate major compactions without enabling it > during off peak period? > > What I was testing is simple script which runs on every node in cluster, > checks if there is major compaction already running on that node, if not > picks one region for compaction and run compaction on that one region. > > It's running for some time and it helped us get our data to much better > shape, but now I'm not quite sure how to choose anymore which region to > compact. So far I was reading for that node rs-status#regionStoreStats and > first choosing the one with biggest amount of storefiles, and then those > with biggest storefile sizes. > > Is there maybe something more intelligent I could/should do? > > Thanks a lot! >
