My speculation on rapidly churning/fast reads of recently written data:

- data written at quorum (for RF3): write confirm is after two nodes reply
- data read very soon after (possibly code antipattern), and let's assume
the third node update hasn't completed yet (e.g. AWS network "variance").
The read will pick a replica, and then there is a 50% chance the second
replica chosen for quorum read is the stale node, which triggers a
DigestMismatch read repair.

Is that plausible?

The code seems to log the exception in all read repair instances, so it
doesn't seem to be an ERROR with red blaring klaxons, maybe it should be a
WARN?

On Mon, Nov 25, 2019 at 11:12 AM Colleen Velo <cmv...@gmail.com> wrote:

> Hello,
>
> As part of the final stages of our 2.2 --> 3.11 upgrades, one of our
> clusters (on AWS/ 18 nodes/ m4.2xlarge) produced some post-upgrade fits. We
> started getting spikes of Cassandra read and write timeouts despite the
> fact the overall metrics volumes were unchanged. As part of the upgrade
> process, there was a TWCS table that we used a facade implementation to
> help change the namespace of the compaction class, but that has very low
> query volume.
>
> The DigestMismatchException error messages, (based on sampling the hash
> keys and finding which tables have partitions for that hash key), seem to
> be occurring on the heaviest volume table (4,000 reads, 1600 writes per
> second per node approximately), and that table has semi-medium row widths
> with about 10-40 column keys. (Or at least the digest mismatch partitions
> have that type of width). The keyspace is an RF3 using NetworkTopology, the
> CL is QUORUM for both reads and writes.
>
> We have experienced the DigestMismatchException errors on all 3 of the
> Production clusters that we have upgraded (all of them are single DC in the
> us-east-1/eu-west-1/ap-northeast-2 AWS regions) and in all three cases,
> those DigestMismatchException errors were not there in either the  2.1.x or
> 2.2.x versions of Cassandra.
> Does anyone know of changes from 2.2 to 3.11 that would produce additional
> timeout problems, such as heavier blocking read repair logic?  Also,
>
> We ran repairs (via reaper v1.4.8) (much nicer in 3.11 than 2.1) on all of
> the tables and across all of the nodes, and our timeouts seemed to have
> disappeared, but we continue to see a rapid streaming of the Digest
> mismatches exceptions, so much so that our Cassandra debug logs are rolling
> over every 15 minutes..   There is a mail list post from 2018 that
> indicates that some DigestMismatchException error messages are natural if
> you are reading while writing, but the sheer volume that we are getting is
> very concerning:
>  - https://www.mail-archive.com/user@cassandra.apache.org/msg56078.html
>
> Is that level of DigestMismatchException unusual? Or is can that volume of
> mismatches appear if semi-wide rows simply require a lot of resolution
> because flurries of quorum reads/writes (RF3) on recent partitions have a
> decent chance of not having fully synced data on the replica reads? Does
> the digest mismatch error get debug-logged on every chance read repair?
> (edited)
> Also, why are these DigestMismatchException only occurring once the
> upgrade to 3.11 has occurred?
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Sample DigestMismatchException error message:
>     DEBUG [ReadRepairStage:13] 2019-11-22 01:38:14,448
> ReadCallback.java:242 - Digest mismatch:
>     org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-6492169518344121155,
> 66306139353831322d323064382d313037322d663965632d636565663165326563303965)
> (be2c0feaa60d99c388f9d273fdc360f7 vs 09eaded2d69cf2dd49718076edf56b36)
>         at
> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
>         at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.4.jar:3.11.4]
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
>
> Cluster(s) setup:
>     * AWS region: eu-west-1:
>         — Nodes: 18
>         — single DC
>         — keyspace: RF3 using NetworkTopology
>
>     * AWS region: us-east-1:
>         — Nodes: 20
>         — single DC
>         — keyspace: RF3 using NetworkTopology
>
>     * AWS region: ap-northeast-2:
>         — Nodes: 30
>         — single DC
>         — keyspace: RF3 using NetworkTopology
>
> Thanks for any insight into this issue.
>
> --
>
> *Colleen Veloemail: cmv...@gmail.com <cmv...@gmail.com>*
>

Reply via email to