Re: Automating major compactions

Dejan Menges Thu, 09 Jul 2015 00:49:57 -0700

Thanks a lot to everyone - very nice point about looking for oldest file
and taking locality into consideration. Going to implement it now :)


On Wed, Jul 8, 2015 at 10:57 PM Bryan Beaudreault <[email protected]>
wrote:

> Our automation uses a combination of the following to determine what to
> compact:
>
> - Which regions have bad locality (% of blocks are local vs remote, using
> HDFS getBlockLocations APIs)
> - Which regions have the most number of HFiles (most files per region/cf
> directory)
> - Which regions have gone the longest since a compaction (oldest file)
>
> The order here is the priority we have given each, but YMMV.  We run in
> EC2, so value locality over almost everything, to avoid network latencies
> on reads.
>
> On Wed, Jul 8, 2015 at 4:48 PM Jean-Marc Spaggiari <
> [email protected]>
> wrote:
>
> > Just missing the ColumnFamiliy at the end of the path. Your memory is
> > pretty good.
> >
> > JM
> >
> > 2015-07-08 16:39 GMT-04:00 Vladimir Rodionov <[email protected]>:
> >
> > > You can find this info yourself, Dejan
> > >
> > > 1. Locate table dir on HDFS
> > > 2. List all regions (directories)
> > > 3. Iterate files in each directory and find the oldest one (creation
> > time)
> > > 4. The region with the oldest file is your candidate for major
> compaction
> > >
> > > /HBASE_ROOT/data/namespace/table/region (If my memory serves me right
> :))
> > >
> > > -Vlad
> > >
> > > On Wed, Jul 8, 2015 at 1:07 PM, Dejan Menges <[email protected]>
> > > wrote:
> > >
> > > > Hi Mikhail,
> > > >
> > > > Actually, reason is quite stupid on my side - to avoid compacting one
> > > > region over and over again while others are waiting in line (reading
> > HTML
> > > > and sorting only on number of store files gets you at some point
> having
> > > > bunch of regions having exactly the same number of store files).
> > > >
> > > > Thanks for this hint - this is exactly something I was looking for.
> Was
> > > > trying previously to figure out if it's possible to query meta for
> this
> > > > information (using currently 0.98.0, 0.98.4 and waiting for HDP 2.3
> > from
> > > > Hortonworks to upgrade immediately) but for our current version
> didn't
> > > > found that possible, that's why I decided going this way.
> > > >
> > > > On Wed, Jul 8, 2015 at 10:02 PM Mikhail Antonov <
> [email protected]>
> > > > wrote:
> > > >
> > > > > I totally understand the reasoning behind compacting regions with
> > > > > biggest number of store files, but didn't follow why it's best to
> > > > > compact regions which have biggest store files, maybe I'm missing
> > > > > something? I'd maybe compact regions which have the smallest avg
> > > > > storefile size?
> > > > >
> > > > > You may also want to take a look at
> > > > > https://issues.apache.org/jira/browse/HBASE-12859, and compact
> > regions
> > > > > for which MC was last run longer time ago.
> > > > >
> > > > > -Mikhail
> > > > >
> > > > > On Wed, Jul 8, 2015 at 10:30 AM, Dejan Menges <
> > [email protected]>
> > > > > wrote:
> > > > > > Hi Behdad,
> > > > > >
> > > > > > Thanks a lot, but this part I do already. My question was more
> what
> > > to
> > > > > use
> > > > > > to most intelligently (what exposed or not exposed metrics)
> figure
> > > out
> > > > > > where major compaction is needed the most.
> > > > > >
> > > > > > Currently, choosing the region which has biggest number of store
> > > files
> > > > +
> > > > > > the biggest amount of store files is doing the job, but wasn't
> sure
> > > if
> > > > > > there's maybe something better so far to choose from.
> > > > > >
> > > > > > Cheers,
> > > > > > Dejan
> > > > > >
> > > > > > On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> To start major compaction for tablename from cli, you need to
> run:
> > > > > >> echo major_compact tablename | hbase shell
> > > > > >>
> > > > > >> I do this after bulk loading to the table.
> > > > > >>
> > > > > >> FYI, to avoid surprises, I also turn off load balancer and
> > rebalance
> > > > > >> regions manually.
> > > > > >>
> > > > > >> The cli command to turn off balancer is:
> > > > > >> echo balance_switch false | hbase shell
> > > > > >>
> > > > > >> To rebalance regions after a bulk load or other changes, run:
> > > > > >> echo balance | hbase shell
> > > > > >>
> > > > > >> You  can run these two command using ssh. I use Ansible to do
> > these.
> > > > > >> Assuming you have defined hbase_master in your hosts file, you
> can
> > > > run:
> > > > > >> ansible -i hosts hbase_master -a "echo major_compact tablename |
> > > hbase
> > > > > >> shell"
> > > > > >>
> > > > > >> Behdad Forghani
> > > > > >>
> > > > > >> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <
> > > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > What's the best way to automate major compactions without
> > enabling
> > > > it
> > > > > >> > during off peak period?
> > > > > >> >
> > > > > >> > What I was testing is simple script which runs on every node
> in
> > > > > cluster,
> > > > > >> > checks if there is major compaction already running on that
> > node,
> > > if
> > > > > not
> > > > > >> > picks one region for compaction and run compaction on that one
> > > > region.
> > > > > >> >
> > > > > >> > It's running for some time and it helped us get our data to
> much
> > > > > better
> > > > > >> > shape, but now I'm not quite sure how to choose anymore which
> > > region
> > > > > to
> > > > > >> > compact. So far I was reading for that node
> > > > rs-status#regionStoreStats
> > > > > >> and
> > > > > >> > first choosing the one with biggest amount of storefiles, and
> > then
> > > > > those
> > > > > >> > with biggest storefile sizes.
> > > > > >> >
> > > > > >> > Is there maybe something more intelligent I could/should do?
> > > > > >> >
> > > > > >> > Thanks a lot!
> > > > > >> >
> > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Michael Antonov
> > > > >
> > > >
> > >
> >
>

Re: Automating major compactions

Reply via email to