Yes exactly, column families have the same performance profile as tables. 12 CF = 12 tables.
-ryan On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens <[email protected]> wrote: > Good call JD! We've gone from 20k inserts/minute to 200k. Much > better! I still think it's slower than I'd want by about one OOM, but > it's progress. > > Since we're populating 12 families, I guess we're seeking for 12 files > on each write. Not pretty. I'll look at the customer and see if they > really have any sparse data that would benefit from its own > ColumnFamily. Probably not. > > Cheers, > B > > On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens > <[email protected]> wrote: >> Yeah, those families are all needed -- but I didn't realize the files >> were so small. That's odd -- and you're right, that'd certainly throw >> it off. I'll merge them all and see if that helps. >> >> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[email protected]> >> wrote: >>> Took a quick look at your RS log, it looks like you are using a lot of >>> families and loading them pretty much at the same rate. Look at lines >>> that start with: >>> >>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >>> >>> And you will see that you are dumping very small files on the >>> filesystem, on average 5MB, that together account for ~64MB which is >>> the default flush size (and then it generates tons of compactions >>> which makes it even worse). Do you really need all those families? Try >>> merging them and see the difference. >>> >>> J-D >>> >>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >>> <[email protected]> wrote: >>>> 'allo, >>>> >>>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>>> about 4k inserts /node / minute instead of 2k. A small improvement, >>>> but nowhere near what I'm used to, even from vague memories of old >>>> clusters on EC2. >>>> >>>> I also stripped all the Cascading from my code and have a very basic >>>> raw MR job -- we're basically reading raw text, splitting it into >>>> fields, and adding those rows to HBase. About the simplest task you >>>> could do. >>>> >>>> Ideas for next steps? What other info could I share? >>>> >>>> Cheers, >>>> B >>>> >>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[email protected]> >>>> wrote: >>>>>> From: Gary Helmling >>>>>> >>>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>>> theres a known kernel issue that seems to be causing >>>>>> high loads while idle. More info here: >>>>>> >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>>> >>>>> Seems best to avoid using Lucid on EC2 for now, then. >>>>> >>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI >>>>> (with updates). See http://github.com/apurtell/hbase-ec2 >>>>> >>>>> - Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Bradford Stephens, >>>> Founder, Drawn to Scale >>>> drawntoscalehq.com >>>> 727.697.7528 >>>> >>>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>>> solution. Process, store, query, search, and serve all your data. >>>> >>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>>> Media, and Computer Science >>>> >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >
