Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down.
To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of "scala out" don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir <solf.li...@gmail.com> wrote: > I already do a lot of in-memory aggregation before writing to Cassandra. > > The question here is what is wrong with Cassandra (or its configuration) > that causes huge performance drop when moving from 1-replication to > 2-replication for counters -- and more importantly how to resolve the > problem. 2x-3x drop when moving from 1-replication to 2-replication on two > nodes is reasonable. 6x is not. Like I said, with this kind of performance > degradation it makes more sense to run two clusters with replication=1 in > parallel rather than rely on Cassandra replication. > > And yes, Rainbird was the inspiration for what we are trying to do here :) > > > > Edward Capriolo wrote >> Cassandra's counters read on increment. Additionally they are distributed >> so that can be multiple reads on increment. If they are not fast enough >> and >> you have avoided all tuning options add more servers to handle the load. >> >> In many cases incrementing the same counter n times can be avoided. >> >> Twitter's rainbird did just that. It avoided multiple counter increments >> by >> batching them. >> >> I have done a similar think using cassandra and Kafka. >> >> https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java >> >> >> On Tuesday, November 27, 2012, Sergey Olefir < > >> solf.lists@ > >> > wrote: >>> Hi, thanks for your suggestions. >>> >>> Regarding replicate=2 vs replicate=1 performance: I expected that below >>> configurations will have similar performance: >>> - single node, replicate = 1 >>> - two nodes, replicate = 2 (okay, this probably should be a bit slower >>> due >>> to additional overhead). >>> >>> However what I'm seeing is that second option (replicate=2) is about >>> THREE >>> times slower than single node. >>> >>> >>> Regarding replicate_on_write -- it is, in fact, a dangerous option. As >> JIRA >>> discusses, if you make changes to your ring (moving tokens and such) you >>> will *silently* lose data. That is on top of whatever data you might end >> up >>> losing if you run replicate_on_write=false and the only node that got the >>> data fails. >>> >>> But what is much worse -- with replicate_on_write being false the data >> will >>> NOT be replicated (in my tests) ever unless you explicitly request the >> cell. >>> Then it will return the wrong result. And only on subsequent reads it >>> will >>> return adequate results. I haven't tested it, but documentation states >> that >>> range query will NOT do 'read repair' and thus will not force >>> replication. >>> The test I did went like this: >>> - replicate_on_write = false >>> - write something to node A (which should in theory replicate to node B) >>> - wait for a long time (longest was on the order of 5 hours) >>> - read from node B (and here I was getting null / wrong result) >>> - read from node B again (here you get what you'd expect after read >> repair) >>> >>> In essence, using replicate_on_write=false with rarely read data will >>> practically defeat the purpose of having replication in the first place >>> (failover, data redundancy). >>> >>> >>> Or, in other words, this option doesn't look to be applicable to my >>> situation. >>> >>> It looks like I will get much better performance by simply writing to two >>> separate clusters rather than using single cluster with replicate=2. >>> Which >>> is kind of stupid :) I think something's fishy with counters and >>> replication. >>> >>> >>> >>> Edward Capriolo wrote >>>> I mispoke really. It is not dangerous you just have to understand what >>>> it >>>> means. this jira discusses it. >>>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-3868 >>>> >>>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay < >>> >>>> scottm@ >>> >>>> >wrote: >>>> >>>>> We're having a similar performance problem. Setting >>>>> 'replicate_on_write: >>>>> false' fixes the performance issue in our tests. >>>>> >>>>> How dangerous is it? What exactly could go wrong? >>>>> >>>>> On 12-11-27 01:44 PM, Edward Capriolo wrote: >>>>> >>>>> The difference between Replication factor =1 and replication factor > 1 >>>>> is >>>>> significant. Also it sounds like your cluster is 2 node so going from >>>>> RF=1 >>>>> to RF=2 means double the load on both nodes. >>>>> >>>>> You may want to experiment with the very dangerous column family >>>>> attribute: >>>>> >>>>> - replicate_on_write: Replicate every counter update from the leader >>>>> to >>>>> the >>>>> follower replicas. Accepts the values true and false. >>>>> >>>>> Edward >>>>> On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman < >>>>> >>> >>>> mkjellman@ >>> >>>>> wrote: >>>>> >>>>>> Are you writing with QUORUM consistency or ONE? >>>>>> >>>>>> On 11/27/12 9:52 AM, "Sergey Olefir" < >>> >>>> solf.lists@ >>> >>>> > wrote: >>>>>> >>>>>> >Hi Juan, >>>> cassandra-user@.apache > >> mailing list archive at >> Nabble.com. >>> > > > > > > -- > View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. >