Re: counters + replication = awful performance?

Edward Capriolo Tue, 27 Nov 2012 17:26:31 -0800

Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node.
If you go to rf2 that is 100 inserts a node.  If you were at 75 % capacity
on each mode your now at 150% which is not possible so things bog down.


To figure out what is going on we would need to see tpstat, iostat , and
top information.

I think your looking at the performance the wrong way. Starting off at rf 1
is not the way to understand cassandra performance.

You do not get the benefits of "scala out" don't happen until you fix your
rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf
3 even better.
On Tuesday, November 27, 2012, Sergey Olefir <solf.li...@gmail.com> wrote:
> I already do a lot of in-memory aggregation before writing to Cassandra.
>
> The question here is what is wrong with Cassandra (or its configuration)
> that causes huge performance drop when moving from 1-replication to
> 2-replication for counters -- and more importantly how to resolve the
> problem. 2x-3x drop when moving from 1-replication to 2-replication on two
> nodes is reasonable. 6x is not. Like I said, with this kind of performance
> degradation it makes more sense to run two clusters with replication=1 in
> parallel rather than rely on Cassandra replication.
>
> And yes, Rainbird was the inspiration for what we are trying to do here :)
>
>
>
> Edward Capriolo wrote
>> Cassandra's counters read on increment. Additionally they are distributed
>> so that can be multiple reads on increment. If they are not fast enough
>> and
>> you have avoided all tuning options add more servers to handle the load.
>>
>> In many cases incrementing the same counter n times can be avoided.
>>
>> Twitter's rainbird did just that. It avoided multiple counter increments
>> by
>> batching them.
>>
>> I have done a similar think using cassandra and Kafka.
>>
>>
https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java
>>
>>
>> On Tuesday, November 27, 2012, Sergey Olefir &lt;
>
>> solf.lists@
>
>> &gt; wrote:
>>> Hi, thanks for your suggestions.
>>>
>>> Regarding replicate=2 vs replicate=1 performance: I expected that below
>>> configurations will have similar performance:
>>> - single node, replicate = 1
>>> - two nodes, replicate = 2 (okay, this probably should be a bit slower
>>> due
>>> to additional overhead).
>>>
>>> However what I'm seeing is that second option (replicate=2) is about
>>> THREE
>>> times slower than single node.
>>>
>>>
>>> Regarding replicate_on_write -- it is, in fact, a dangerous option. As
>> JIRA
>>> discusses, if you make changes to your ring (moving tokens and such) you
>>> will *silently* lose data. That is on top of whatever data you might end
>> up
>>> losing if you run replicate_on_write=false and the only node that got
the
>>> data fails.
>>>
>>> But what is much worse -- with replicate_on_write being false the data
>> will
>>> NOT be replicated (in my tests) ever unless you explicitly request the
>> cell.
>>> Then it will return the wrong result. And only on subsequent reads it
>>> will
>>> return adequate results. I haven't tested it, but documentation states
>> that
>>> range query will NOT do 'read repair' and thus will not force
>>> replication.
>>> The test I did went like this:
>>> - replicate_on_write = false
>>> - write something to node A (which should in theory replicate to node B)
>>> - wait for a long time (longest was on the order of 5 hours)
>>> - read from node B (and here I was getting null / wrong result)
>>> - read from node B again (here you get what you'd expect after read
>> repair)
>>>
>>> In essence, using replicate_on_write=false with rarely read data will
>>> practically defeat the purpose of having replication in the first place
>>> (failover, data redundancy).
>>>
>>>
>>> Or, in other words, this option doesn't look to be applicable to my
>>> situation.
>>>
>>> It looks like I will get much better performance by simply writing to
two
>>> separate clusters rather than using single cluster with replicate=2.
>>> Which
>>> is kind of stupid :) I think something's fishy with counters and
>>> replication.
>>>
>>>
>>>
>>> Edward Capriolo wrote
>>>> I mispoke really. It is not dangerous you just have to understand what
>>>> it
>>>> means. this jira discusses it.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-3868
>>>>
>>>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay &lt;
>>>
>>>> scottm@
>>>
>>>> &gt;wrote:
>>>>
>>>>>  We're having a similar performance problem.  Setting
>>>>> 'replicate_on_write:
>>>>> false' fixes the performance issue in our tests.
>>>>>
>>>>> How dangerous is it?  What exactly could go wrong?
>>>>>
>>>>> On 12-11-27 01:44 PM, Edward Capriolo wrote:
>>>>>
>>>>> The difference between Replication factor =1 and replication factor >
1
>>>>> is
>>>>> significant. Also it sounds like your cluster is 2 node so going from
>>>>> RF=1
>>>>> to RF=2 means double the load on both nodes.
>>>>>
>>>>>  You may want to experiment with the very dangerous column family
>>>>> attribute:
>>>>>
>>>>>  - replicate_on_write: Replicate every counter update from the leader
>>>>> to
>>>>> the
>>>>> follower replicas. Accepts the values true and false.
>>>>>
>>>>>  Edward
>>>>>  On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman <
>>>>>
>>>
>>>> mkjellman@
>>>
>>>>> wrote:
>>>>>
>>>>>> Are you writing with QUORUM consistency or ONE?
>>>>>>
>>>>>> On 11/27/12 9:52 AM, "Sergey Olefir" &lt;
>>>
>>>> solf.lists@
>>>
>>>> &gt; wrote:
>>>>>>
>>>>>> >Hi Juan,
>>>> cassandra-user@.apache
>
>>  mailing list archive at
>> Nabble.com.
>>>
>
>
>
>
>
> --
> View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584014.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
>

Re: counters + replication = awful performance?

Reply via email to