I think i traced this to a bug in my compaction scheduler that would have missed scheduling about half the regions, hence the 240gb vs 480gb. To confirm: major compaction will always run when asked, even if the region is already major compacted, the table settings haven't changed, and it was last major compacted on that same server. [potential hbase optimization here for clusters with many cold regions]. So my theory about not localizing blocks is false.
Weihua - why do you think your throughput doubled when you went from user+month to month+user keys? Are your queries using an even distribution of months? I'm not exactly clear on your schema or query pattern. On Thu, May 19, 2011 at 8:39 AM, Joey Echeverria <[email protected]> wrote: > I'm surprised the major compactions didn't balance the cluster better. > I wonder if you've stumbled upon a bug in HBase that's causing it to > leak old HFiles. > > Is the total amount of data in HDFS what you expect? > > -Joey > > On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[email protected]> wrote: > > that's right > > > > > > On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[email protected]> > wrote: > > > >> Am I right to assume that all of your data is in HBase, ie you don't > >> keep anything in just HDFS files? > >> > >> -Joey > >> > >> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> > wrote: > >> > I wanted to do some more investigation before posting to the list, but > it > >> > seems relevant to this conversation... > >> > > >> > Is it possible that major compactions don't always localize the data > >> blocks? > >> > Our cluster had a bunch of regions full of historical analytics data > >> that > >> > were already major compacted, then we added a new > datanode/regionserver. > >> We > >> > have a job that triggers major compactions at a minimum of once per > week > >> by > >> > hashing the region name and giving it a time slot. It's been several > >> weeks > >> > and the original nodes each have ~480gb used in hdfs, while the new > node > >> has > >> > only 240gb. Regions are scattered pretty randomly and evenly among > the > >> > regionservers. > >> > > >> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > >> > > >> > My guess is that if a region is already major compacted and no new > data > >> has > >> > been added to it, then compaction is skipped. That's definitely an > >> > essential feature during typical operation, but it's a problem if > you're > >> > relying on major compaction to balance the cluster. > >> > > >> > Matt > >> > > >> > > >> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel < > [email protected] > >> >wrote: > >> > > >> >> I had asked the question about how he created random keys... Hadn't > seen > >> a > >> >> response. > >> >> > >> >> Sent from a remote device. Please excuse any typos... > >> >> > >> >> Mike Segel > >> >> > >> >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote: > >> >> > >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG < > [email protected] > >> > > >> >> wrote: > >> >> >> All the DNs almost have the same number of blocks. Major > compaction > >> >> >> makes no difference. > >> >> >> > >> >> > > >> >> > I would expect major compaction to even the number of blocks across > >> >> > the cluster and it'd move the data for each region local to the > >> >> > regionserver. > >> >> > > >> >> > The only explanation that I can see is that the hot DNs must be > >> >> > carrying the hot blocks (The client querys are not random). I do > not > >> >> > know what else it could be. > >> >> > > >> >> > St.Ack > >> >> > > >> >> > >> > > >> > >> > >> > >> -- > >> Joseph Echeverria > >> Cloudera, Inc. > >> 443.305.9434 > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
