Re: ReplicateOnWrite issues

David Hawthorne Tue, 12 Jul 2011 00:38:29 -0700

It's definitely for counters, and some of the rows I'm inserting are long-ish, 
if 1.3MB is long.


Maybe it would help if I said I was using counter super columns.  I'm also 
writing to only a handful of rows at a time, until they are full.  It looks 
like the counter super column code in addReadCommandFromColumnFamily, in the 
ReplicateOnWrite handling, might not be optimized for this kind of load.

It looks like I may be able to turn off replicate_on_write for the CFs as long 
as I use CL.ONE to get around this in the short term, but I'd like to know what 
happens when replicate on write is true and when it is false so I can make a 
more informed choice.

On Jul 12, 2011, at 12:20 AM, Yang wrote:

> interesting,
> 
> first just to make sure: since replicateOnWrite is for Counters, you
> are using counters (you use the word "insert" instead of
> "add/increment" ) right?
> 
> if you are using counters, supposedly the leader runs
> replicateOnWrite, somehow all your adds find the one box as leader,
> that's probably something worth investigating.
> 
> finally, u could try debugging into those replicateOnWrite tasks and
> see what they are doing exactly, maybe there is something immediately
> wrong
> 
> 
> 
> On Mon, Jul 11, 2011 at 11:57 PM, David Hawthorne <dha...@gmx.3crowd.com> 
> wrote:
>> This is now my fourth attempt to get the message through.  Apologies if you 
>> see multiple copies.
>> 
>> I've tried to give as much relevant data as I can think of, but please let 
>> me know if you need any other info.  I spent the day getting jmxtrans to 
>> talk to statsd with the cassandra JMX data, so I can provide graphs of just 
>> about anything you want.  I can also re-run the tests to replicate the 
>> problem if need be.  As always, thanks for taking a look!
>> 
>> I have a 4 host test cluster that I'm writing counters into running 0.8.1, 
>> and I'm writing to it with hector with CL.ONE or CL.QUORUM (test was run 
>> with each and both gave the same results).  Partitioner is the 
>> RandomPartitioner, and RF=3.  nodetool ring shows that actual data is well 
>> balanced:
>> 
>> Address         DC          Rack        Status State   Load            Owns  
>>   Token
>>                                                                              
>> 127605887595351923798765477786913079296
>> 10.0.0.57    datacenter1 rack1       Up     Normal  1.59 GB         25.00%  0
>> 10.0.0.56    datacenter1 rack1       Up     Normal  1.39 GB         25.00%  
>> 42535295865117307932921825928971026432
>> 10.0.0.55    datacenter1 rack1       Up     Normal  1.24 GB         25.00%  
>> 85070591730234615865843651857942052864
>> 10.0.0.54    datacenter1 rack1       Up     Normal  1.34 GB         25.00%  
>> 127605887595351923798765477786913079296
>> 
>> I also tested against a 1-box test cluster with RF=1.
>> 
>> Both show the same results:  high performance for a while, and then 
>> ReplicateOnWrite backs way up (I've seen it as high as a million), the 
>> cassandra process becomes unresponsive, and the hector client starts 
>> throwing exceptions.  High performance in this case equates to about 5-20k 
>> inserts/sec on both the 4-box and 1-box cluster (I'm inserting one long row 
>> at a time, so the performance is bounded by the one box owning the row).  
>> The cassandra process never seems to recover, even if left for 24 hours.  
>> It's still chewing through those ReplicateOnWrite pending tasks.
>> 
>> GC doesn't seem to be a factor.  Logs show GCs completing in < 1s on all 4 
>> boxes in the test cluster, as well as on the 1 standalone server.
>> 
>> I'm also seeing highly erratic performance from the cluster in general:  
>> inserts/sec usually start at around 3k, but then maybe creep up towards 6k, 
>> then drop back down to below 1k for an extended period of time, then maybe 
>> 1% of the time they'll spike up to 8k or even 15k, but that doesn't last 
>> more than 30 seconds.
>> 
>> Here's the tpstats output from all 4 boxes:
>> 
>> Slowest/most problematic/highest cpu usage one:
>> 
>> This tpstats output is also representative of the one box cluster about 20 
>> minutes into my insert run.
>> 
>> 4 $ ./nodetool -h localhost tpstats
>> Pool Name                    Active   Pending      Completed
>> ReadStage                         0         0              0
>> RequestResponseStage              0         0       59095840
>> MutationStage                     0         0       31182942
>> ReadRepairStage                   0         0              0
>> ReplicateOnWriteStage            32    928042       28246716
>> GossipStage                       1         1         449464
>> AntiEntropyStage                  0         0              0
>> MigrationStage                    0         0              4
>> MemtablePostFlusher               0         0            445
>> StreamStage                       0         0              0
>> FlushWriter                       0         0            445
>> FILEUTILS-DELETE-POOL             0         0             38
>> MiscStage                         0         0              0
>> FlushSorter                       0         0              0
>> InternalResponseStage             0         0              4
>> HintedHandoff                     0         0              0
>> 
>> Second highest cpu usage one:
>> 
>> 3 $ ./nodetool -h localhost tpstats
>> Pool Name                    Active   Pending      Completed
>> ReadStage                         0         0              0
>> RequestResponseStage              0         0       27413910
>> MutationStage                     0         0       27523094
>> ReadRepairStage                   0         0              0
>> ReplicateOnWriteStage             0         0       11111971
>> GossipStage                       0         0         335422
>> AntiEntropyStage                  0         0              0
>> MigrationStage                    0         0              4
>> MemtablePostFlusher               0         0             66
>> StreamStage                       0         0              0
>> FlushWriter                       0         0             66
>> FILEUTILS-DELETE-POOL             0         0             31
>> MiscStage                         0         0              0
>> FlushSorter                       0         0              0
>> InternalResponseStage             0         0              4
>> HintedHandoff                     0         0              0
>> 
>> The other 2 that see minimal cpu usage:
>> 
>> 2 $ ./nodetool -h localhost tpstats
>> Pool Name                    Active   Pending      Completed
>> ReadStage                         0         0              0
>> RequestResponseStage              0         0        9213999
>> MutationStage                     0         0       33233114
>> ReadRepairStage                   0         0              0
>> ReplicateOnWriteStage             0         0              0
>> GossipStage                       0         0         336016
>> AntiEntropyStage                  0         0              0
>> MigrationStage                    0         0              4
>> MemtablePostFlusher               0         0             77
>> StreamStage                       0         0              0
>> FlushWriter                       0         0             77
>> FILEUTILS-DELETE-POOL             0         0             40
>> MiscStage                         0         0              0
>> FlushSorter                       0         0              0
>> InternalResponseStage             0         0              4
>> HintedHandoff                     0         0              0
>> 
>> 
>> 1 $ ./nodetool -h localhost tpstats
>> Pool Name                    Active   Pending      Completed
>> ReadStage                         0         0              0
>> RequestResponseStage              0         0       13796875
>> MutationStage                     0         0       36116662
>> ReadRepairStage                   0         0              0
>> ReplicateOnWriteStage             0         0              0
>> GossipStage                       0         0         336496
>> AntiEntropyStage                  0         0              0
>> MigrationStage                    0         0              4
>> MemtablePostFlusher               0         0             91
>> StreamStage                       0         0              0
>> FlushWriter                       0         0             91
>> FILEUTILS-DELETE-POOL             0         0             43
>> MiscStage                         0         0              0
>> FlushSorter                       0         0              0
>> InternalResponseStage             0         0              4
>> HintedHandoff                     0         0              0

Re: ReplicateOnWrite issues

Reply via email to