that's right

On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[email protected]> wrote:

> Am I right to assume that all of your data is in HBase, ie you don't
> keep anything in just HDFS files?
>
> -Joey
>
> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> wrote:
> > I wanted to do some more investigation before posting to the list, but it
> > seems relevant to this conversation...
> >
> > Is it possible that major compactions don't always localize the data
> blocks?
> >  Our cluster had a bunch of regions full of historical analytics data
> that
> > were already major compacted, then we added a new datanode/regionserver.
>  We
> > have a job that triggers major compactions at a minimum of once per week
> by
> > hashing the region name and giving it a time slot.  It's been several
> weeks
> > and the original nodes each have ~480gb used in hdfs, while the new node
> has
> > only 240gb.  Regions are scattered pretty randomly and evenly among the
> > regionservers.
> >
> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
> >
> > My guess is that if a region is already major compacted and no new data
> has
> > been added to it, then compaction is skipped.  That's definitely an
> > essential feature during typical operation, but it's a problem if you're
> > relying on major compaction to balance the cluster.
> >
> > Matt
> >
> >
> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[email protected]
> >wrote:
> >
> >> I had asked the question about how he created random keys... Hadn't seen
> a
> >> response.
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote:
> >>
> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected]
> >
> >> wrote:
> >> >> All the DNs almost have the same number of blocks. Major compaction
> >> >> makes no difference.
> >> >>
> >> >
> >> > I would expect major compaction to even the number of blocks across
> >> > the cluster and it'd move the data for each region local to the
> >> > regionserver.
> >> >
> >> > The only explanation that I can see is that the hot DNs must be
> >> > carrying the hot blocks (The client querys are not random).  I do not
> >> > know what else it could be.
> >> >
> >> > St.Ack
> >> >
> >>
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Reply via email to