that's right
On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[email protected]> wrote: > Am I right to assume that all of your data is in HBase, ie you don't > keep anything in just HDFS files? > > -Joey > > On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> wrote: > > I wanted to do some more investigation before posting to the list, but it > > seems relevant to this conversation... > > > > Is it possible that major compactions don't always localize the data > blocks? > > Our cluster had a bunch of regions full of historical analytics data > that > > were already major compacted, then we added a new datanode/regionserver. > We > > have a job that triggers major compactions at a minimum of once per week > by > > hashing the region name and giving it a time slot. It's been several > weeks > > and the original nodes each have ~480gb used in hdfs, while the new node > has > > only 240gb. Regions are scattered pretty randomly and evenly among the > > regionservers. > > > > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > > > > My guess is that if a region is already major compacted and no new data > has > > been added to it, then compaction is skipped. That's definitely an > > essential feature during typical operation, but it's a problem if you're > > relying on major compaction to balance the cluster. > > > > Matt > > > > > > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[email protected] > >wrote: > > > >> I had asked the question about how he created random keys... Hadn't seen > a > >> response. > >> > >> Sent from a remote device. Please excuse any typos... > >> > >> Mike Segel > >> > >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote: > >> > >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected] > > > >> wrote: > >> >> All the DNs almost have the same number of blocks. Major compaction > >> >> makes no difference. > >> >> > >> > > >> > I would expect major compaction to even the number of blocks across > >> > the cluster and it'd move the data for each region local to the > >> > regionserver. > >> > > >> > The only explanation that I can see is that the hot DNs must be > >> > carrying the hot blocks (The client querys are not random). I do not > >> > know what else it could be. > >> > > >> > St.Ack > >> > > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
