Re: Slow Inserts on EC2 Cluster

Ryan Rawson Wed, 01 Sep 2010 18:00:22 -0700

Yes exactly, column families have the same performance profile as
tables.  12 CF = 12 tables.


-ryan

On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens
<[email protected]> wrote:
> Good call JD!  We've gone from 20k inserts/minute to 200k. Much
> better! I still think it's slower than I'd want by about one OOM, but
> it's progress.
>
> Since we're populating 12 families, I guess we're seeking for 12 files
> on each write. Not pretty. I'll look at the customer and see if they
> really have any sparse data that would benefit from its own
> ColumnFamily. Probably not.
>
> Cheers,
> B
>
> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens
> <[email protected]> wrote:
>> Yeah, those families are all needed -- but I didn't realize the files
>> were so small. That's odd -- and you're right, that'd certainly throw
>> it off. I'll merge them all and see if that helps.
>>
>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[email protected]> 
>> wrote:
>>> Took a quick look at your RS log, it looks like you are using a lot of
>>> families and loading them pretty much at the same rate. Look at lines
>>> that start with:
>>>
>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ...
>>>
>>> And you will see that you are dumping very small files on the
>>> filesystem, on average 5MB, that together account for ~64MB which is
>>> the default flush size (and then it generates tons of compactions
>>> which makes it even worse). Do you really need all those families? Try
>>> merging them and see the difference.
>>>
>>> J-D
>>>
>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens
>>> <[email protected]> wrote:
>>>> 'allo,
>>>>
>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting
>>>> about 4k inserts /node / minute instead of 2k. A small improvement,
>>>> but nowhere near what I'm used to, even from vague memories of old
>>>> clusters on EC2.
>>>>
>>>> I also stripped all the Cascading from my code and have a very basic
>>>> raw MR job -- we're basically reading raw text, splitting it into
>>>> fields, and adding those rows to HBase. About the simplest task you
>>>> could do.
>>>>
>>>> Ideas for next steps? What other info could I share?
>>>>
>>>> Cheers,
>>>> B
>>>>
>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> 
>>>> wrote:
>>>>>> From: Gary Helmling
>>>>>>
>>>>>> If you're using AMIs based on the latest Ubuntu (10.4),
>>>>>> theres a known kernel issue that seems to be causing
>>>>>> high loads while idle.  More info here:
>>>>>>
>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910
>>>>>
>>>>> Seems best to avoid using Lucid on EC2 for now, then.
>>>>>
>>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI 
>>>>> (with updates). See http://github.com/apurtell/hbase-ec2
>>>>>
>>>>>  - Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bradford Stephens,
>>>> Founder, Drawn to Scale
>>>> drawntoscalehq.com
>>>> 727.697.7528
>>>>
>>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>>> solution. Process, store, query, search, and serve all your data.
>>>>
>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>>> Media, and Computer Science
>>>>
>>>
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Re: Slow Inserts on EC2 Cluster

Reply via email to