Thanks Sam - a couple of subtleties there that we missed in our review.

Cheers
Ben

On Tue, 30 Aug 2016 at 19:42 Sam Tunnicliffe <s...@beobal.com> wrote:

> Just to clarify a little further, it's true that read repair queries are
> performed at CL ALL, but this is slightly different to a regular,
> user-initiated query at that CL.
>
> Say you have RF=5 and you issue read at CL ALL, the coordinator will send
> requests to all 5 replicas and block until it receives a response from each
> (or a timeout occurs) before replying to the client. This is the
> straightforward and intuitive case.
>
> If instead you read at CL QUORUM, the # of replicas required for CL is 3,
> so the coordinator only contacts 3 nodes. In the case where a speculative
> retry is activated, an additional replica is added to the initial set. The
> coordinator will still only wait for 3 out of the 4 responses before
> proceeding, but if a digest mismatch occurs the read repair queries are
> sent to all 4. It's this follow up query that the coordinator executes at
> CL ALL, i.e. it requires all 4 replicas to respond to the read repair query
> before merging their results to figure out the canonical, latest data.
>
> You can see that the number of replicas queried/required for read repair
> is different than if the client actually requests a read at CL ALL (i.e.
> here it's 4, not 5), it's the behaviour of waiting for all *contacted*
> replicas to respond which is significant here.
>
> There are addtional considerations when constructing that initial replica
> set (which you can follow in
> o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's
> read_repair_chance, dclocal_read_repair_chance and speculative_retry
> options. THe main gotcha is global read repair (via read_repair_chance)
> which will trigger cross-dc repairs at CL ALL in the case of a digest
> mismatch, even if the requested CL is DC-local.
>
>
> On Sun, Aug 28, 2016 at 11:55 AM, Ben Slater <ben.sla...@instaclustr.com>
> wrote:
>
>> In case anyone else is interested - we figured this out. When C* decides
>> it need to do a repair based on a digest mismatch from the initial reads
>> for the consistency level it does actually try to do a read at CL=ALL in
>> order to get the most up to date data to use to repair.
>>
>> This led to an interesting issue in our case where we had one node in an
>> RF3 cluster down for maintenance (to correct data that became corrupted due
>> to a severe write overload) and started getting occasional “timeout during
>> read query at consistency LOCAL_QUORUM” failures. We believe this due to
>> the case where data for a read was only available on one of the two up
>> replicas which then triggered an attempt to repair and a failed read at
>> CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
>> that C* reports a failure at the originally request level even when it was
>> actually the attempted repair read at CL=ALL which could not read
>> sufficient replicas - a bit confusing (although I can also see how getting
>> CL=ALL errors when you thought you were reading at QUORUM or ONE would be
>> confusing).
>>
>> Cheers
>> Ben
>>
>> On Sun, 28 Aug 2016 at 10:52 kurt Greaves <k...@instaclustr.com> wrote:
>>
>>> Looking at the wiki for the read path (
>>> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom
>>> diagram for reading with a read repair, it states the following when
>>> "reading from all replica nodes" after there is a hash mismatch:
>>>
>>> If hashes do not match, do conflict resolution. First step is to read
>>>> all data from all replica nodes excluding the fastest replica (since 
>>>> CL=ALL)
>>>>
>>>
>>>  In the bottom left of the diagram it also states:
>>>
>>>> In this example:
>>>>
>>> RF>=2
>>>>
>>> CL=ALL
>>>>
>>>
>>> The (since CL=ALL) implies that the CL for the read during the read
>>> repair is based off the CL of the query. However I don't think that makes
>>> sense at other CLs. Anyway, I just want to clarify what CL the read for the
>>> read repair occurs at for cases where the overall query CL is not ALL.
>>>
>>> Thanks,
>>> Kurt.
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
> --
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Reply via email to