inconsistent/corrupt counters w/ broken shards never converge -------------------------------------------------------------
Key: CASSANDRA-3641 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the *same* node_id, *same* clock id, but *different* counts. The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards. In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key). In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send. I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption. The other patch is against trunk, and contains the same change. What the patch does is: * On diffing, treat as DISJOINT if there is a count discrepancy. * On reconciliation, look at the count and *deterministically* pick the higher one, and: ** log the fact that we detected a corrupt counter ** increment a JMX observable counter for monitoring purposes A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira