Re: Slow Inserts on EC2 Cluster

Jean-Daniel Cryans Wed, 01 Sep 2010 17:24:54 -0700

Took a quick look at your RS log, it looks like you are using a lot of
families and loading them pretty much at the same rate. Look at lines
that start with:


INFO org.apache.hadoop.hbase.regionserver.Store: Added ...

And you will see that you are dumping very small files on the
filesystem, on average 5MB, that together account for ~64MB which is
the default flush size (and then it generates tons of compactions
which makes it even worse). Do you really need all those families? Try
merging them and see the difference.

J-D

On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens
<[email protected]> wrote:
> 'allo,
>
> I changed the cluster form m1.large to c1.xlarge -- we're getting
> about 4k inserts /node / minute instead of 2k. A small improvement,
> but nowhere near what I'm used to, even from vague memories of old
> clusters on EC2.
>
> I also stripped all the Cascading from my code and have a very basic
> raw MR job -- we're basically reading raw text, splitting it into
> fields, and adding those rows to HBase. About the simplest task you
> could do.
>
> Ideas for next steps? What other info could I share?
>
> Cheers,
> B
>
> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> wrote:
>>> From: Gary Helmling
>>>
>>> If you're using AMIs based on the latest Ubuntu (10.4),
>>> theres a known kernel issue that seems to be causing
>>> high loads while idle.  More info here:
>>>
>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910
>>
>> Seems best to avoid using Lucid on EC2 for now, then.
>>
>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI 
>> (with updates). See http://github.com/apurtell/hbase-ec2
>>
>>  - Andy
>>
>>
>>
>>
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Re: Slow Inserts on EC2 Cluster

Reply via email to