Re: Slow Inserts on EC2 Cluster

Bradford Stephens Wed, 01 Sep 2010 18:59:27 -0700

On the full data set (10 reducers), speeds are about 100k/minute (WAL
Disabled). Still much slower than I'd like, but I'll take it over the
former :)


On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson <[email protected]> wrote:
> Yes exactly, column families have the same performance profile as
> tables.  12 CF = 12 tables.
>
> -ryan
>
> On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens
> <[email protected]> wrote:
>> Good call JD!  We've gone from 20k inserts/minute to 200k. Much
>> better! I still think it's slower than I'd want by about one OOM, but
>> it's progress.
>>
>> Since we're populating 12 families, I guess we're seeking for 12 files
>> on each write. Not pretty. I'll look at the customer and see if they
>> really have any sparse data that would benefit from its own
>> ColumnFamily. Probably not.
>>
>> Cheers,
>> B
>>
>> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens
>> <[email protected]> wrote:
>>> Yeah, those families are all needed -- but I didn't realize the files
>>> were so small. That's odd -- and you're right, that'd certainly throw
>>> it off. I'll merge them all and see if that helps.
>>>
>>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[email protected]> 
>>> wrote:
>>>> Took a quick look at your RS log, it looks like you are using a lot of
>>>> families and loading them pretty much at the same rate. Look at lines
>>>> that start with:
>>>>
>>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ...
>>>>
>>>> And you will see that you are dumping very small files on the
>>>> filesystem, on average 5MB, that together account for ~64MB which is
>>>> the default flush size (and then it generates tons of compactions
>>>> which makes it even worse). Do you really need all those families? Try
>>>> merging them and see the difference.
>>>>
>>>> J-D
>>>>
>>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens
>>>> <[email protected]> wrote:
>>>>> 'allo,
>>>>>
>>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting
>>>>> about 4k inserts /node / minute instead of 2k. A small improvement,
>>>>> but nowhere near what I'm used to, even from vague memories of old
>>>>> clusters on EC2.
>>>>>
>>>>> I also stripped all the Cascading from my code and have a very basic
>>>>> raw MR job -- we're basically reading raw text, splitting it into
>>>>> fields, and adding those rows to HBase. About the simplest task you
>>>>> could do.
>>>>>
>>>>> Ideas for next steps? What other info could I share?
>>>>>
>>>>> Cheers,
>>>>> B
>>>>>
>>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> 
>>>>> wrote:
>>>>>>> From: Gary Helmling
>>>>>>>
>>>>>>> If you're using AMIs based on the latest Ubuntu (10.4),
>>>>>>> theres a known kernel issue that seems to be causing
>>>>>>> high loads while idle.  More info here:
>>>>>>>
>>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910
>>>>>>
>>>>>> Seems best to avoid using Lucid on EC2 for now, then.
>>>>>>
>>>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI 
>>>>>> (with updates). See http://github.com/apurtell/hbase-ec2
>>>>>>
>>>>>>  - Andy
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bradford Stephens,
>>>>> Founder, Drawn to Scale
>>>>> drawntoscalehq.com
>>>>> 727.697.7528
>>>>>
>>>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>>>> solution. Process, store, query, search, and serve all your data.
>>>>>
>>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>>>> Media, and Computer Science
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Bradford Stephens,
>>> Founder, Drawn to Scale
>>> drawntoscalehq.com
>>> 727.697.7528
>>>
>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>> solution. Process, store, query, search, and serve all your data.
>>>
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>



-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Slow Inserts on EC2 Cluster

Reply via email to