Just missing the ColumnFamiliy at the end of the path. Your memory is pretty good.
JM 2015-07-08 16:39 GMT-04:00 Vladimir Rodionov <[email protected]>: > You can find this info yourself, Dejan > > 1. Locate table dir on HDFS > 2. List all regions (directories) > 3. Iterate files in each directory and find the oldest one (creation time) > 4. The region with the oldest file is your candidate for major compaction > > /HBASE_ROOT/data/namespace/table/region (If my memory serves me right :)) > > -Vlad > > On Wed, Jul 8, 2015 at 1:07 PM, Dejan Menges <[email protected]> > wrote: > > > Hi Mikhail, > > > > Actually, reason is quite stupid on my side - to avoid compacting one > > region over and over again while others are waiting in line (reading HTML > > and sorting only on number of store files gets you at some point having > > bunch of regions having exactly the same number of store files). > > > > Thanks for this hint - this is exactly something I was looking for. Was > > trying previously to figure out if it's possible to query meta for this > > information (using currently 0.98.0, 0.98.4 and waiting for HDP 2.3 from > > Hortonworks to upgrade immediately) but for our current version didn't > > found that possible, that's why I decided going this way. > > > > On Wed, Jul 8, 2015 at 10:02 PM Mikhail Antonov <[email protected]> > > wrote: > > > > > I totally understand the reasoning behind compacting regions with > > > biggest number of store files, but didn't follow why it's best to > > > compact regions which have biggest store files, maybe I'm missing > > > something? I'd maybe compact regions which have the smallest avg > > > storefile size? > > > > > > You may also want to take a look at > > > https://issues.apache.org/jira/browse/HBASE-12859, and compact regions > > > for which MC was last run longer time ago. > > > > > > -Mikhail > > > > > > On Wed, Jul 8, 2015 at 10:30 AM, Dejan Menges <[email protected]> > > > wrote: > > > > Hi Behdad, > > > > > > > > Thanks a lot, but this part I do already. My question was more what > to > > > use > > > > to most intelligently (what exposed or not exposed metrics) figure > out > > > > where major compaction is needed the most. > > > > > > > > Currently, choosing the region which has biggest number of store > files > > + > > > > the biggest amount of store files is doing the job, but wasn't sure > if > > > > there's maybe something better so far to choose from. > > > > > > > > Cheers, > > > > Dejan > > > > > > > > On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani < > [email protected]> > > > > wrote: > > > > > > > >> To start major compaction for tablename from cli, you need to run: > > > >> echo major_compact tablename | hbase shell > > > >> > > > >> I do this after bulk loading to the table. > > > >> > > > >> FYI, to avoid surprises, I also turn off load balancer and rebalance > > > >> regions manually. > > > >> > > > >> The cli command to turn off balancer is: > > > >> echo balance_switch false | hbase shell > > > >> > > > >> To rebalance regions after a bulk load or other changes, run: > > > >> echo balance | hbase shell > > > >> > > > >> You can run these two command using ssh. I use Ansible to do these. > > > >> Assuming you have defined hbase_master in your hosts file, you can > > run: > > > >> ansible -i hosts hbase_master -a "echo major_compact tablename | > hbase > > > >> shell" > > > >> > > > >> Behdad Forghani > > > >> > > > >> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges < > [email protected]> > > > >> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > What's the best way to automate major compactions without enabling > > it > > > >> > during off peak period? > > > >> > > > > >> > What I was testing is simple script which runs on every node in > > > cluster, > > > >> > checks if there is major compaction already running on that node, > if > > > not > > > >> > picks one region for compaction and run compaction on that one > > region. > > > >> > > > > >> > It's running for some time and it helped us get our data to much > > > better > > > >> > shape, but now I'm not quite sure how to choose anymore which > region > > > to > > > >> > compact. So far I was reading for that node > > rs-status#regionStoreStats > > > >> and > > > >> > first choosing the one with biggest amount of storefiles, and then > > > those > > > >> > with biggest storefile sizes. > > > >> > > > > >> > Is there maybe something more intelligent I could/should do? > > > >> > > > > >> > Thanks a lot! > > > >> > > > > >> > > > > > > > > > > > > -- > > > Thanks, > > > Michael Antonov > > > > > >
