Hi,
In my opinion the guaranty provided by Cassandra is :
if your write request in Quorum *succeed*, then the next (after the
write response) read requests in Quorum (that succeed too) will be
consistent
(actually CL.Write + CL.Read > RF)
Of course while you haven't received a valid response to your write request
in Quorum the cluster is in a inconsistent state, and you have *to retry
your write request.*
That said, Cassandra provides some other important behaviors that will tend
to reduce the time of this inconsistent state :
- the coordinator will not send the request to only the nodes that
should answer to satisfy the CL, but to all nodes that should have
the data (of
course with RF=3, only A,B&C are involved)
- during read requests, cassandra will ask to one node the data and to
the others involved in the CL a digest, and if all digests do not match,
will ask for them the entire data, handle the merge and finally will ask to
those nodes a background repair. Your write may have succeed during this
time.
- according to a chance ratio, cassandra will ask *sometimes* a read to
all nodes holding the data, not only the ones involved in the CL and
execute background repairs
- you have to schedule repairs regularly
I'd add that if some nodes do not succeed to handle write requests in time,
they may be under pressure, and there is a small chance that they succeed
on a read request :)
And finally what is time? From where/when? You may schedule a read after an
other but receive the result before. Writing in Quorum is not writing
within a transaction, you'll certainly have to made some tradeoff.
Regards,
--
Nicolas
Le mer. 14 sept. 2016 à 21:14, Alexander Dejanovski <[email protected]>
a écrit :
> My understanding of the described scenario is that the write hasn't
> succeeded when reads are fired, as B and C haven't processed the mutation
> yet.
>
> There would be 3 clients here and not 2 : C1 writes, C2 and C3 read.
>
> So the race condition could still happen in this particular case.
>
> Le mer. 14 sept. 2016 21:07, Work <[email protected]> a écrit :
>
>> Hi Alex:
>>
>> Hmmm ... Assuming clock skew is eliminated.... And assuming nodes are up
>> and available ... And assuming quorum writes and quorum reads and everyone
>> waiting for success ( which is NOT The OP scenario), Two different clients
>> will be guaranteed to see all successful writes, or be told that read
>> failed.
>>
>> C1 writes at quorum to A,B
>> C2 reads at quorum.
>> So it tries to read from ALL nodes, A,B, C.
>> If A,B respond --> success
>> If A,C respond --> conflict
>> If B, C respond --> conflict
>> Because a quorum (2 nodes) responded, the coordinator will return the
>> latest time stamp and may issue read repair depending on YAML settings.
>>
>> So where do you see only one client having this guarantee?
>>
>> Regards,
>>
>> James
>>
>> On Sep 14, 2016, at 4:00 AM, Alexander DEJANOVSKI <[email protected]>
>> wrote:
>>
>> Hi,
>>
>> the analysis is valid, and strong consistency the Cassandra way means
>> that one client writing at quorum, then reading at quorum will always see
>> his previous write.
>> Two different clients have no guarantee to see the same data when using
>> quorum, as illustrated in your example.
>>
>> Only options here are to route requests to specific clients based on some
>> id to guarantee the sequence of operations outside of Cassandra (the same
>> client will always be responsible for a set of ids), or raise the CL to ALL
>> at the expense of availability (you should not do that).
>>
>>
>> Cheers,
>>
>> Alex
>>
>> Le mer. 14 sept. 2016 à 11:47, Qi Li <[email protected]> a écrit :
>>
>>> hi all,
>>>
>>> we are using quorum consistency, and we *suspect* there may be a race
>>> condition during the write. lets say RF is 3. so write will wait for at
>>> least 2 nodes to ack. suppose there is only 1 node acked(node A). the other
>>> 2 nodes(node B, C) are still waiting to update. there come two read requests
>>> one read is having the data responded from the node B and C, so version
>>> 1 us returned.
>>> the other node is having data responded from node A and B, so the latest
>>> version 2 is returned.
>>>
>>> so clients are getting different data at the same time. is this a valid
>>> analysis? if so, is there any options we can set to deal with this issue?
>>>
>>> thanks
>>> Ken
>>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>