Re: Why a READ REPAIR ?

Tobias Eriksson Tue, 11 Aug 2020 01:59:08 -0700

If a READ triggers a READ REPAIR, and then if we do an additional READ would 
then that BLOCK until the “first” READ REPAIR would be done ?
-Tobias

From: Jeff Jirsa <jji...@gmail.com>
Reply to: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, 11 August 2020 at 07:30
To: cassandra <user@cassandra.apache.org>
Subject: Re: Why a READ REPAIR ?

Your schema may have read repair (non-blocking, background) set to 10% (0.1, 
for dclocal).
You may have GC pauses causing writes (or reads) to be delayed.
You may be hitting a cassandra bug.

Would need the `TRACING` output to know for sure.

On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson 
<tobias.eriks...@qvantel.com<mailto:tobias.eriks...@qvantel.com>> wrote:
Hi
We have a Cassandra solution with 2 DCs where each DC has  >30 nodes
From time to time we see problems with READ REPAIR, but I am stuck with the 
analysis
We have a pattern for these faults where we do

  1.  INSERT with Local Quorum (2 out of 3)
  2.  Wait for 0.5 - 1 seconds time window
  3.  READ with Local Quorum (2 out of 3)

     *   Triggers a read repair

  1.  Then we do an UPDATE …

The replication factor is 3
In my world in (1) we for sure store the data in 2 out of 3 places, and I would 
be surprised if we would not also reach the 3;rd node within 0.5 sec
So how come in (3) the read can’t get a proper response from 2 out of 3
Some are saying the problem started occurring when we added DC2, but I can’t 
understand how it could be as our query is Local Quorum and will involve only 
DC1

How can I debug this fault ?
How can I track if the data has reached all 3 nodes ?

All ideas are welcome
-Tobias

Re: Why a READ REPAIR ?

Reply via email to