I'm surprised the major compactions didn't balance the cluster better. I wonder if you've stumbled upon a bug in HBase that's causing it to leak old HFiles.
Is the total amount of data in HDFS what you expect? -Joey On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[email protected]> wrote: > that's right > > > On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[email protected]> wrote: > >> Am I right to assume that all of your data is in HBase, ie you don't >> keep anything in just HDFS files? >> >> -Joey >> >> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> wrote: >> > I wanted to do some more investigation before posting to the list, but it >> > seems relevant to this conversation... >> > >> > Is it possible that major compactions don't always localize the data >> blocks? >> > Our cluster had a bunch of regions full of historical analytics data >> that >> > were already major compacted, then we added a new datanode/regionserver. >> We >> > have a job that triggers major compactions at a minimum of once per week >> by >> > hashing the region name and giving it a time slot. It's been several >> weeks >> > and the original nodes each have ~480gb used in hdfs, while the new node >> has >> > only 240gb. Regions are scattered pretty randomly and evenly among the >> > regionservers. >> > >> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); >> > >> > My guess is that if a region is already major compacted and no new data >> has >> > been added to it, then compaction is skipped. That's definitely an >> > essential feature during typical operation, but it's a problem if you're >> > relying on major compaction to balance the cluster. >> > >> > Matt >> > >> > >> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[email protected] >> >wrote: >> > >> >> I had asked the question about how he created random keys... Hadn't seen >> a >> >> response. >> >> >> >> Sent from a remote device. Please excuse any typos... >> >> >> >> Mike Segel >> >> >> >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote: >> >> >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected] >> > >> >> wrote: >> >> >> All the DNs almost have the same number of blocks. Major compaction >> >> >> makes no difference. >> >> >> >> >> > >> >> > I would expect major compaction to even the number of blocks across >> >> > the cluster and it'd move the data for each region local to the >> >> > regionserver. >> >> > >> >> > The only explanation that I can see is that the hot DNs must be >> >> > carrying the hot blocks (The client querys are not random). I do not >> >> > know what else it could be. >> >> > >> >> > St.Ack >> >> > >> >> >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
