Mahdi , the issue in your code is here:

else // we lost LWT, fetch the winning value
 9    existing_id = SELECT id FROM hash_id WHERE hash=computed_hash |
consistency = ONE

You lost LWT, it means that there is a concurrent LWT that has won the
Paxos round and has applied the value using QUORUM/SERIAL.

In best case, it means that the won LWT value has been applied to at least
2 replicas out of 3 (assuming RF=3)
In worst case, the won LWT value has not been applied yet or is pending to
be applied to any replica

Now, if you immediately read with CL=ONE, you may:

1) Read the staled value on the 3rd replica which has not yet received the
correct won LWT value
2) Or worst, read a staled value because the won LWT is being applied when
the read operation is made

That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is
not sufficient enough)

Reading with CL=SERIAL will:

a. like QUORUM, contact strict majority of replicas
b. unlike QUORUM, look for validated (but not yet applied) previous Paxos
round value and force-applied it before actually reading the new value




On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida <ma...@signalfx.com>
wrote:

> Totally understood that it's not worth (or it's rather incorrect) to mix
> serial and non serial operations for LWT tables. It would be highly
> satisfying to my engineer mind if someone can explain why that would cause
> issues in this particular situation. The only explanation I have is that a
> non serial read may cause a read repair to happen and that could interfere
> with a concurrent serial write, although I still can't explain how that
> would cause two different "insert if not exist" transactions to both
> succeed.
>
> --
> Mahdi.
>
> On 2/9/18 2:40 PM, Jonathan Haddad wrote:
>
> If you want consistent reads you have to use the CL that enforces it.
> There’s no way around it.
> On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida <ma...@signalfx.com>
> wrote:
>
>> In this case, we only write using CAS (code guarantees that). We also
>> never update, just insert if not exist. Once a hash exists, it never
>> changes (it may get deleted later and that'll be a CAS delete as well).
>>
>> --
>> Mahdi.
>>
>> On 2/9/18 1:38 PM, Jeff Jirsa wrote:
>>
>>
>>
>> On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida <ma...@signalfx.com>
>> wrote:
>>
>>>  Under what circumstances would we be reading inconsistent results ? Is
>>> there a case where we end up reading a value that actually end up not being
>>> written ?
>>>
>>>
>>>
>>
>> If you ever write the same value with CAS and without CAS (different code
>> paths both updating the same value), you're using CAS wrong, and
>> inconsistencies can happen.
>>
>>
>>
>>
>

Reply via email to