Took a quick look at your RS log, it looks like you are using a lot of families and loading them pretty much at the same rate. Look at lines that start with:
INFO org.apache.hadoop.hbase.regionserver.Store: Added ... And you will see that you are dumping very small files on the filesystem, on average 5MB, that together account for ~64MB which is the default flush size (and then it generates tons of compactions which makes it even worse). Do you really need all those families? Try merging them and see the difference. J-D On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens <[email protected]> wrote: > 'allo, > > I changed the cluster form m1.large to c1.xlarge -- we're getting > about 4k inserts /node / minute instead of 2k. A small improvement, > but nowhere near what I'm used to, even from vague memories of old > clusters on EC2. > > I also stripped all the Cascading from my code and have a very basic > raw MR job -- we're basically reading raw text, splitting it into > fields, and adding those rows to HBase. About the simplest task you > could do. > > Ideas for next steps? What other info could I share? > > Cheers, > B > > On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> wrote: >>> From: Gary Helmling >>> >>> If you're using AMIs based on the latest Ubuntu (10.4), >>> theres a known kernel issue that seems to be causing >>> high loads while idle. More info here: >>> >>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >> >> Seems best to avoid using Lucid on EC2 for now, then. >> >> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI >> (with updates). See http://github.com/apurtell/hbase-ec2 >> >> - Andy >> >> >> >> >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >
