Re: Replicate On Write behavior

Sylvain Lebresne Fri, 02 Sep 2011 00:31:49 -0700

On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne <dha...@gmx.3crowd.com> wrote:
> I'm curious... digging through the source, it looks like replicate on write 
> triggers a read of the entire row, and not just the columns/supercolumns that 
> are affected by the counter update.  Is this the case?  It would certainly 
> explain why my inserts/sec decay over time and why the average insert latency 
> increases over time.  The strange thing is that I'm not seeing disk read IO 
> increase over that same period, but that might be due to the OS buffer 
> cache...


It does not. It only reads the columns/supercolumns affected by the
counter update.
In the source, this happens in CounterMutation.java. If you look at
addReadCommandFromColumnFamily you'll see that it does a query by name
only for the column involved in the update (the update is basically
the content of the columnFamily parameter there).

And Cassandra does *not* always reads a full row. Never had, never will.

> On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
> ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that normal? 
>  I'm using RandomPartitioner...
>
> Address         DC          Rack        Status State   Load            Owns   
>  Token
>                                                                            
> 136112946768375385385349842972707284580
> 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
> 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  
> 34028236692093846346337460743176821145
> 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  
> 68056473384187692692674921486353642290
> 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  
> 102084710076281539039012382229530463435
> 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  
> 136112946768375385385349842972707284580
>
> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and 
> last node both have a count of 0.  This is a clean cluster, and I've been 
> doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours.  
> The last time this test ran, it went all the way down to 500 inserts/sec 
> before I killed it.

Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.

--
Sylvain

Re: Replicate On Write behavior

Reply via email to