Re: How to speedup Hbase query throughput

Joey Echeverria Thu, 19 May 2011 08:40:16 -0700

I'm surprised the major compactions didn't balance the cluster better.
I wonder if you've stumbled upon a bug in HBase that's causing it to
leak old HFiles.


Is the total amount of data in HDFS what you expect?

-Joey

On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[email protected]> wrote:
> that's right
>
>
> On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[email protected]> wrote:
>
>> Am I right to assume that all of your data is in HBase, ie you don't
>> keep anything in just HDFS files?
>>
>> -Joey
>>
>> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[email protected]> wrote:
>> > I wanted to do some more investigation before posting to the list, but it
>> > seems relevant to this conversation...
>> >
>> > Is it possible that major compactions don't always localize the data
>> blocks?
>> >  Our cluster had a bunch of regions full of historical analytics data
>> that
>> > were already major compacted, then we added a new datanode/regionserver.
>>  We
>> > have a job that triggers major compactions at a minimum of once per week
>> by
>> > hashing the region name and giving it a time slot.  It's been several
>> weeks
>> > and the original nodes each have ~480gb used in hdfs, while the new node
>> has
>> > only 240gb.  Regions are scattered pretty randomly and evenly among the
>> > regionservers.
>> >
>> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
>> >
>> > My guess is that if a region is already major compacted and no new data
>> has
>> > been added to it, then compaction is skipped.  That's definitely an
>> > essential feature during typical operation, but it's a problem if you're
>> > relying on major compaction to balance the cluster.
>> >
>> > Matt
>> >
>> >
>> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[email protected]
>> >wrote:
>> >
>> >> I had asked the question about how he created random keys... Hadn't seen
>> a
>> >> response.
>> >>
>> >> Sent from a remote device. Please excuse any typos...
>> >>
>> >> Mike Segel
>> >>
>> >> On May 18, 2011, at 11:27 PM, Stack <[email protected]> wrote:
>> >>
>> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[email protected]
>> >
>> >> wrote:
>> >> >> All the DNs almost have the same number of blocks. Major compaction
>> >> >> makes no difference.
>> >> >>
>> >> >
>> >> > I would expect major compaction to even the number of blocks across
>> >> > the cluster and it'd move the data for each region local to the
>> >> > regionserver.
>> >> >
>> >> > The only explanation that I can see is that the hot DNs must be
>> >> > carrying the hot blocks (The client querys are not random).  I do not
>> >> > know what else it could be.
>> >> >
>> >> > St.Ack
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: How to speedup Hbase query throughput

Reply via email to