Re: counters + replication = awful performance?

Edward Capriolo Tue, 27 Nov 2012 13:45:05 -0800

The difference between Replication factor =1 and replication factor > 1 is
significant. Also it sounds like your cluster is 2 node so going from RF=1
to RF=2 means double the load on both nodes.


You may want to experiment with the very dangerous column family attribute:

- replicate_on_write: Replicate every counter update from the leader to the
follower replicas. Accepts the values true and false.

Edward
On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman
<mkjell...@barracuda.com>wrote:

> Are you writing with QUORUM consistency or ONE?
>
> On 11/27/12 9:52 AM, "Sergey Olefir" <solf.li...@gmail.com> wrote:
>
> >Hi Juan,
> >
> >thanks for your input!
> >
> >In my case, however, I doubt this is the case -- clients are able to push
> >many more updates than I need to saturate replication_factor=2 case (e.g.
> >I'm doing as many as 6x more increments when testing 2-node cluster with
> >replication_factor=1), so bandwidth between clients and server should be
> >sufficient.
> >
> >Bandwidth between nodes in the cluster should also be quite sufficient
> >since
> >they are both in the same DC. But it is something to check, thanks!
> >
> >Best regards,
> >Sergey
> >
> >
> >Juan Valencia wrote
> >> Hi Sergey,
> >>
> >> I know I've had similar issues with counters which were bottle-necked by
> >> network throughput.  You might be seeing a problem with throughput
> >>between
> >> the clients and Cass or between the two Cass nodes.  It might not be
> >>your
> >> case, but that was what happened to me :-)
> >>
> >> Juan
> >>
> >>
> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
> >
> >> solf.lists@
> >
> >> &gt; wrote:
> >>
> >>> Hi,
> >>>
> >>> I have a serious problem with counters performance and I can't seem to
> >>> figure it out.
> >>>
> >>> Basically I'm building a system for accumulating some statistics "on
> >>>the
> >>> fly" via Cassandra distributed counters. For this I need counter
> >>>updates
> >>> to
> >>> work "really fast" and herein lies my problem -- as soon as I enable
> >>> replication_factor = 2, the performance goes down the drain. This
> >>>happens
> >>> in
> >>> my tests using both 1.0.x and 1.1.6.
> >>>
> >>> Let me elaborate:
> >>>
> >>> I have two boxes (virtual servers on top of physical servers rented
> >>> specifically for this purpose, i.e. it's not a cloud, nor it is shared;
> >>> virtual servers are managed by our admins as a way to limit damage as I
> >>> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner
> >>> because
> >>> I want to be able to do some range queries.
> >>>
> >>> First, I set up Cassandra individually on each box (not in a cluster)
> >>>and
> >>> test counter increments performance (exclusively increments, no reads).
> >>> For
> >>> tests I use code that is intended to somewhat resemble the expected
> >>>load
> >>> pattern -- particularly the majority of increments create new counters
> >>> with
> >>> some updating (adding) to already existing counters. In this test each
> >>> single node exhibits respectable performance - something on the order
> >>>of
> >>> 70k
> >>> (seventy thousand) increments per second.
> >>>
> >>> I then join both of these nodes into single cluster (using SimpleSnitch
> >>> and
> >>> SimpleStrategy, nothing fancy yet). I then run the same test using
> >>> replication_factor=1. The performance is on the order of 120k
> >>>increments
> >>> per
> >>> second -- which seems to be a reasonable increase over the single node
> >>> performance.
> >>>
> >>>
> >>> HOWEVER I then rerun the same test on the two-node cluster using
> >>> replication_factor=2 -- which is the least I'll need for actual
> >>> production
> >>> for redundancy purposes. And the performance I get is absolutely
> >>>horrible
> >>> --
> >>> much, MUCH worse than even single-node performance -- something on the
> >>> order
> >>> of less than 25k increments per second. In addition to clients not
> >>>being
> >>> able to push updates fast enough, I also see a lot of 'messages
> >>>dropped'
> >>> messages in the Cassandra log under this load.
> >>>
> >>> Could anyone advise what could be causing such drastic performance drop
> >>> under replication_factor=2? I was expecting something on the order of
> >>> single-node performance, not approximately 3x less.
> >>>
> >>>
> >>> When testing replication_factor=2 on 1.1.6 I can see that CPU usage
> >>>goes
> >>> through the roof. On 1.0.x I think it looked more like disk overload,
> >>>but
> >>> I'm not sure (being on virtual server I apparently can't see true
> >>> iostats).
> >>>
> >>> I do have Cassandra data on a separate disk, commit log and cache are
> >>> currently on the same disk as the system. I experimented with commit
> >>>log
> >>> flush modes and even with disabling commit log at all -- but it doesn't
> >>> seem
> >>> to have noticeable impact on the performance when under
> >>> replication_factor=2.
> >>>
> >>>
> >>> Any suggestions and hints will be much appreciated :) And please let me
> >>> know
> >>> if I need to share additional information about the configuration I'm
> >>> running on.
> >>>
> >>> Best regards,
> >>> Sergey
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> >>>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter
> >>>s-replication-awful-performance-tp7583993.html
> >>> Sent from the
> >
> >> cassandra-user@.apache
> >
> >>  mailing list archive at
> >>> Nabble.com.
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
> >> Quality
> >
> >
> >
> >
> >
> >--
> >View this message in context:
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
> >replication-awful-performance-tp7583993p7583996.html
> >Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> >Nabble.com.
>
>
> 'Like' us on Facebook for exclusive content and other resources on all
> Barracuda Networks solutions.
>
> Visit http://barracudanetworks.com/facebook
>
>
>
>
>

Re: counters + replication = awful performance?

Reply via email to