On the full data set (10 reducers), speeds are about 100k/minute (WAL Disabled). Still much slower than I'd like, but I'll take it over the former :)
On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson <[email protected]> wrote: > Yes exactly, column families have the same performance profile as > tables. 12 CF = 12 tables. > > -ryan > > On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens > <[email protected]> wrote: >> Good call JD! We've gone from 20k inserts/minute to 200k. Much >> better! I still think it's slower than I'd want by about one OOM, but >> it's progress. >> >> Since we're populating 12 families, I guess we're seeking for 12 files >> on each write. Not pretty. I'll look at the customer and see if they >> really have any sparse data that would benefit from its own >> ColumnFamily. Probably not. >> >> Cheers, >> B >> >> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens >> <[email protected]> wrote: >>> Yeah, those families are all needed -- but I didn't realize the files >>> were so small. That's odd -- and you're right, that'd certainly throw >>> it off. I'll merge them all and see if that helps. >>> >>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[email protected]> >>> wrote: >>>> Took a quick look at your RS log, it looks like you are using a lot of >>>> families and loading them pretty much at the same rate. Look at lines >>>> that start with: >>>> >>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >>>> >>>> And you will see that you are dumping very small files on the >>>> filesystem, on average 5MB, that together account for ~64MB which is >>>> the default flush size (and then it generates tons of compactions >>>> which makes it even worse). Do you really need all those families? Try >>>> merging them and see the difference. >>>> >>>> J-D >>>> >>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >>>> <[email protected]> wrote: >>>>> 'allo, >>>>> >>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>>>> about 4k inserts /node / minute instead of 2k. A small improvement, >>>>> but nowhere near what I'm used to, even from vague memories of old >>>>> clusters on EC2. >>>>> >>>>> I also stripped all the Cascading from my code and have a very basic >>>>> raw MR job -- we're basically reading raw text, splitting it into >>>>> fields, and adding those rows to HBase. About the simplest task you >>>>> could do. >>>>> >>>>> Ideas for next steps? What other info could I share? >>>>> >>>>> Cheers, >>>>> B >>>>> >>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> >>>>> wrote: >>>>>>> From: Gary Helmling >>>>>>> >>>>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>>>> theres a known kernel issue that seems to be causing >>>>>>> high loads while idle. More info here: >>>>>>> >>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>>>> >>>>>> Seems best to avoid using Lucid on EC2 for now, then. >>>>>> >>>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI >>>>>> (with updates). See http://github.com/apurtell/hbase-ec2 >>>>>> >>>>>> - Andy >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Bradford Stephens, >>>>> Founder, Drawn to Scale >>>>> drawntoscalehq.com >>>>> 727.697.7528 >>>>> >>>>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>>>> solution. Process, store, query, search, and serve all your data. >>>>> >>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>>>> Media, and Computer Science >>>>> >>>> >>> >>> >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
