Colleen, to your question, yes there is a difference between 2.x and 3.x that 
would impact repairs.  The merkel tree computations changed, to having a 
default tree depth that is greater. That can cause significant memory drag, to 
the point that nodes sometimes even OOM.  This has been fixed in 4.x to make 
the setting tunable.  I think 3.11.5 now contains the same as a back-patch.

From: Reid Pinchback <rpinchb...@tripadvisor.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, December 10, 2019 at 11:23 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Seeing tons of DigestMismatchException exceptions after upgrading 
from 2.2.13 to 3.11.4

Message from External Sender
Carl, your speculation matches our observations, and we have a use case with 
that unfortunate usage pattern.  Write-then-immediately-read is not friendly to 
eventually-consistent data stores. It makes the reading pay a tax that really 
is associated with writing activity.

From: Carl Mueller <carl.muel...@smartthings.com.INVALID>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, December 9, 2019 at 3:18 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Seeing tons of DigestMismatchException exceptions after upgrading 
from 2.2.13 to 3.11.4

Message from External Sender
My speculation on rapidly churning/fast reads of recently written data:

- data written at quorum (for RF3): write confirm is after two nodes reply
- data read very soon after (possibly code antipattern), and let's assume the 
third node update hasn't completed yet (e.g. AWS network "variance"). The read 
will pick a replica, and then there is a 50% chance the second replica chosen 
for quorum read is the stale node, which triggers a DigestMismatch read repair.

Is that plausible?

The code seems to log the exception in all read repair instances, so it doesn't 
seem to be an ERROR with red blaring klaxons, maybe it should be a WARN?

On Mon, Nov 25, 2019 at 11:12 AM Colleen Velo 
<cmv...@gmail.com<mailto:cmv...@gmail.com>> wrote:
Hello,

As part of the final stages of our 2.2 --> 3.11 upgrades, one of our clusters 
(on AWS/ 18 nodes/ m4.2xlarge) produced some post-upgrade fits. We started 
getting spikes of Cassandra read and write timeouts despite the fact the 
overall metrics volumes were unchanged. As part of the upgrade process, there 
was a TWCS table that we used a facade implementation to help change the 
namespace of the compaction class, but that has very low query volume.

The DigestMismatchException error messages, (based on sampling the hash keys 
and finding which tables have partitions for that hash key), seem to be 
occurring on the heaviest volume table (4,000 reads, 1600 writes per second per 
node approximately), and that table has semi-medium row widths with about 10-40 
column keys. (Or at least the digest mismatch partitions have that type of 
width). The keyspace is an RF3 using NetworkTopology, the CL is QUORUM for both 
reads and writes.

We have experienced the DigestMismatchException errors on all 3 of the 
Production clusters that we have upgraded (all of them are single DC in the 
us-east-1/eu-west-1/ap-northeast-2 AWS regions) and in all three cases, those 
DigestMismatchException errors were not there in either the  2.1.x or 2.2.x 
versions of Cassandra.
Does anyone know of changes from 2.2 to 3.11 that would produce additional 
timeout problems, such as heavier blocking read repair logic?  Also,

We ran repairs (via reaper v1.4.8) (much nicer in 3.11 than 2.1) on all of the 
tables and across all of the nodes, and our timeouts seemed to have 
disappeared, but we continue to see a rapid streaming of the Digest mismatches 
exceptions, so much so that our Cassandra debug logs are rolling over every 15 
minutes..   There is a mail list post from 2018 that indicates that some 
DigestMismatchException error messages are natural if you are reading while 
writing, but the sheer volume that we are getting is very concerning:
 - 
https://www.mail-archive.com/user@cassandra.apache.org/msg56078.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_user-40cassandra.apache.org_msg56078.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=dwLj6E_WYM8uXYOVXSvTCxWeihgwwGEpbPrvDTOoQ24&s=2QbuYooXdG_wC9dKbsjNzdNLXkbXAW_517Xu7lqhKws&e=>

Is that level of DigestMismatchException unusual? Or is can that volume of 
mismatches appear if semi-wide rows simply require a lot of resolution because 
flurries of quorum reads/writes (RF3) on recent partitions have a decent chance 
of not having fully synced data on the replica reads? Does the digest mismatch 
error get debug-logged on every chance read repair? (edited)
Also, why are these DigestMismatchException only occurring once the upgrade to 
3.11 has occurred?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sample DigestMismatchException error message:
    DEBUG [ReadRepairStage:13] 2019-11-22 01:38:14,448 ReadCallback.java:242 - 
Digest mismatch:
    org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-6492169518344121155, 
66306139353831322d323064382d313037322d663965632d636565663165326563303965) 
(be2c0feaa60d99c388f9d273fdc360f7 vs 09eaded2d69cf2dd49718076edf56b36)
        at 
org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_77]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_77]
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.4.jar:3.11.4]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]

Cluster(s) setup:
    * AWS region: eu-west-1:
        — Nodes: 18
        — single DC
        — keyspace: RF3 using NetworkTopology

    * AWS region: us-east-1:
        — Nodes: 20
        — single DC
        — keyspace: RF3 using NetworkTopology

    * AWS region: ap-northeast-2:
        — Nodes: 30
        — single DC
        — keyspace: RF3 using NetworkTopology

Thanks for any insight into this issue.

--
Colleen Velo
email: cmv...@gmail.com<mailto:cmv...@gmail.com>

Reply via email to