Sure - I can create a minimal testcase and send it along.

Gurjeet

On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[email protected]> wrote:
> That's interesting.
> Could you share your old and new schema. I would like to track down the 
> performance problems you saw.
> (If you had a demo program that populates your rows with 200.000 columns in a 
> way where you saw the performance issues, that'd be even better, but not 
> necessary).
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Gurjeet Singh <[email protected]>
> To: [email protected]; lars hofhansl <[email protected]>
> Sent: Thursday, August 16, 2012 11:26 AM
> Subject: Re: Slow full-table scans
>
> Sorry for the delay guys.
>
> Here are a few results:
>
> 1. Regions in the table = 11
> 2. The region servers don't appear to be very busy with the query ~5%
> CPU (but with parallelization, they are all busy)
>
> Finally, I changed the format of my data, such that each cell in HBase
> contains a chunk of a row instead of the single value it had. So,
> stuffing each Hbase cell with 500 columns of a row, gave me a
> performance boost of 1000x. It seems that the underlying issue was IO
> overhead per byte of actual data stored.
>
>
> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[email protected]> wrote:
>> Yeah... It looks OK.
>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
>>
>>
>> If you can I'd like to know how busy your regionservers are during these 
>> operations. That would be an indication on whether the parallelization is 
>> good or not.
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Stack <[email protected]>
>> To: [email protected]
>> Cc:
>> Sent: Wednesday, August 15, 2012 3:13 PM
>> Subject: Re: Slow full-table scans
>>
>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[email protected]> wrote:
>>> I am beginning to think that this is a configuration issue on my
>>> cluster. Do the following configuration files seem sane ?
>>>
>>> hbase-env.sh    https://gist.github.com/3345338
>>>
>>
>> Nothing wrong w/ this (Remove the -ea, you don't want asserts in
>> production, and the -XX:+CMSIncrementalMode flag if >= 2 cores).
>>
>>
>>> hbase-site.xml    https://gist.github.com/3345356
>>>
>>
>> This is all defaults effectively.   I don't see any of the configs.
>> recommended by the performance section of the reference guide and/or
>> those suggested by the GBIF blog.
>>
>> You don't answer LarsH's query about where you see the 4% difference.
>>
>> How many regions in your table?  Whats the HBase Master UI look like
>> when this scan is running?
>> St.Ack
>>

Reply via email to