Re: How to speedup Hbase query throughput

Matt Corgan Thu, 19 May 2011 08:15:58 -0700

I wanted to do some more investigation before posting to the list, but it
seems relevant to this conversation...

Is it possible that major compactions don't always localize the data blocks?
 Our cluster had a bunch of regions full of historical analytics data that
were already major compacted, then we added a new datanode/regionserver.  We
have a job that triggers major compactions at a minimum of once per week by
hashing the region name and giving it a time slot.  It's been several weeks
and the original nodes each have ~480gb used in hdfs, while the new node has
only 240gb.  Regions are scattered pretty randomly and evenly among the
regionservers.

The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());

My guess is that if a region is already major compacted and no new data has
been added to it, then compaction is skipped.  That's definitely an
essential feature during typical operation, but it's a problem if you're
relying on major compaction to balance the cluster.

Matt

On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[email protected]>wrote:

> I had asked the question about how he created random keys... Hadn't seen a
> response.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote:
>
> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected]>
> wrote:
> >> All the DNs almost have the same number of blocks. Major compaction
> >> makes no difference.
> >>
> >
> > I would expect major compaction to even the number of blocks across
> > the cluster and it'd move the data for each region local to the
> > regionserver.
> >
> > The only explanation that I can see is that the hot DNs must be
> > carrying the hot blocks (The client querys are not random).  I do not
> > know what else it could be.
> >
> > St.Ack
> >
>

Re: How to speedup Hbase query throughput

Reply via email to