If a READ triggers a READ REPAIR, and then if we do an additional READ would then that BLOCK until the “first” READ REPAIR would be done ? -Tobias
From: Jeff Jirsa <jji...@gmail.com> Reply to: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Tuesday, 11 August 2020 at 07:30 To: cassandra <user@cassandra.apache.org> Subject: Re: Why a READ REPAIR ? Your schema may have read repair (non-blocking, background) set to 10% (0.1, for dclocal). You may have GC pauses causing writes (or reads) to be delayed. You may be hitting a cassandra bug. Would need the `TRACING` output to know for sure. On Mon, Aug 10, 2020 at 10:10 PM Tobias Eriksson <tobias.eriks...@qvantel.com<mailto:tobias.eriks...@qvantel.com>> wrote: Hi We have a Cassandra solution with 2 DCs where each DC has >30 nodes From time to time we see problems with READ REPAIR, but I am stuck with the analysis We have a pattern for these faults where we do 1. INSERT with Local Quorum (2 out of 3) 2. Wait for 0.5 - 1 seconds time window 3. READ with Local Quorum (2 out of 3) * Triggers a read repair 1. Then we do an UPDATE … The replication factor is 3 In my world in (1) we for sure store the data in 2 out of 3 places, and I would be surprised if we would not also reach the 3;rd node within 0.5 sec So how come in (3) the read can’t get a proper response from 2 out of 3 Some are saying the problem started occurring when we added DC2, but I can’t understand how it could be as our query is Local Quorum and will involve only DC1 How can I debug this fault ? How can I track if the data has reached all 3 nodes ? All ideas are welcome -Tobias