Re: Data not fully replicated with 2 nodes and replication factor 2

Wei Zhu Thu, 20 Jun 2013 09:54:57 -0700

I don't think you can fully trust hintedhandoff, it's more like "we are trying 
our best to deliver it" but no guarantee. Even if the hints are guaranteed to 
be delivered and there will be a delay which is supposed to be part of 
"eventual consistency" paradigm. If you want enforce real consistency, change 
your consistency level. Or do a repair.


Thanks. 
-Wei 

----- Original Message -----

From: "James Lee" <james....@metaswitch.com> 
To: user@cassandra.apache.org, "Wei Zhu" <wz1...@yahoo.com>, 
rc...@eventbrite.com 
Sent: Thursday, June 20, 2013 3:21:30 AM 
Subject: RE: Data not fully replicated with 2 nodes and replication factor 2 

Rob, Wei, thank you both for your responses - from what Rob says below my test 
is a valid one. 

I've run some additional tests and observed the following: 
-- I mentioned before that some of the initial writes initially failed and then 
succeed when the test tool retries them. I've checked that there's no 
correlation between the keys for writes which required a retry and the keys for 
the failed reads (i.e. the reads are failing for keys that were written fine at 
the first attempt). 
-- I've retried this test but limiting the rate of initial writes to be much 
lower (from 8000/s down to 2000/s). This makes the problem go away completely: 
no more read failures. 

So it seems like I have exposed a genuine bug in Cassandra replication which 
manifests under high write load. What's the best next step - should I be filing 
a bug report, and if so what diagnostics are likely to be useful? 

Thanks, 
James Lee 


-----Original Message----- 
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: 19 June 2013 20:59 
To: user@cassandra.apache.org; Wei Zhu 
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 

On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu <wz1...@yahoo.com> wrote: 
> I think hints are only stored when the other node is down, not on the 
> dropped mutations. (Correct me if I am wrong, actually it's not a bad 
> idea to store hints for dropped mutations and replay them later?) 

This used to be the way it worked pre-1.0... 

https://issues.apache.org/jira/browse/CASSANDRA-2034 

In modern cassandra, anything but a successful ack from a coordinated write 
results in a hint on the coordinator. 

> To solve your issue, as I mentioned, either do nodetool repair, or 
> increase your consistency level. By the way, you probably write 
> faster than your cluster can handle if you see that many dropped mutations. 

If his hints are ultimately delivered, OP should not "need" repair to be 
consistent. 

=Rob

Re: Data not fully replicated with 2 nodes and replication factor 2

Reply via email to