Am I right to assume that all of your data is in HBase, ie you don't keep anything in just HDFS files?
-Joey On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> wrote: > I wanted to do some more investigation before posting to the list, but it > seems relevant to this conversation... > > Is it possible that major compactions don't always localize the data blocks? > Our cluster had a bunch of regions full of historical analytics data that > were already major compacted, then we added a new datanode/regionserver. We > have a job that triggers major compactions at a minimum of once per week by > hashing the region name and giving it a time slot. It's been several weeks > and the original nodes each have ~480gb used in hdfs, while the new node has > only 240gb. Regions are scattered pretty randomly and evenly among the > regionservers. > > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > > My guess is that if a region is already major compacted and no new data has > been added to it, then compaction is skipped. That's definitely an > essential feature during typical operation, but it's a problem if you're > relying on major compaction to balance the cluster. > > Matt > > > On Thu, May 19, 2011 at 4:42 AM, Michel Segel > <[email protected]>wrote: > >> I had asked the question about how he created random keys... Hadn't seen a >> response. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote: >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected]> >> wrote: >> >> All the DNs almost have the same number of blocks. Major compaction >> >> makes no difference. >> >> >> > >> > I would expect major compaction to even the number of blocks across >> > the cluster and it'd move the data for each region local to the >> > regionserver. >> > >> > The only explanation that I can see is that the hot DNs must be >> > carrying the hot blocks (The client querys are not random). I do not >> > know what else it could be. >> > >> > St.Ack >> > >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
