Re: How does Cassandra handle failure during synchronous writes

Ritesh Tijoriwala Wed, 23 Feb 2011 15:37:16 -0800

Hi Anthony,
I am not talking about the case of CL ANY. I am talking about the case where
your consistency level is  R + W > N and you want to write to W nodes but
only succeed in writing to X ( where X < W) nodes and hence fail the write
to the client.


thanks,
Ritesh

On Wed, Feb 23, 2011 at 2:48 PM, Anthony John <chirayit...@gmail.com> wrote:

> Ritesh,
>
> At CL ANY - if all endpoints are down - a HH is written. And it is a
> successful write - not a failed write.
>
> Now that does not guarantee a READ of the value just written - but that is
> a risk that you take when you use the ANY CL!
>
> HTH,
>
> -JA
>
>
> On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> hi Anthony,
>> While you stated the facts right, I don't see how it relates to the
>> question I ask. Can you elaborate specifically what happens in the case I
>> mentioned above to Dave?
>>
>> thanks,
>> Ritesh
>>
>>
>> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John <chirayit...@gmail.com>wrote:
>>
>>> Seems to me that the explanations are getting incredibly complicated -
>>> while I submit the real issue is not!
>>>
>>> Salient points here:-
>>> 1. To be guaranteed data consistency - the writes and reads have to be at
>>> Quorum CL or more
>>> 2. Any W/R at lesser CL means that the application has to handle the
>>> inconsistency, or has to be tolerant of it
>>> 3. Writing at "ANY" CL - a special case - means that writes will always
>>> go through (as long as any node is up), even if the destination nodes are
>>> not up. This is done via hinted handoff. But this can result in inconsistent
>>> reads, and yes that is a problem but refer to pt-2 above
>>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>>> handle that case where a particular node is down and the write needs to be
>>> replicated to it. But this will not cause inconsistent R as the hinted
>>> handoff (in this case) only applies after Quorum is met - so a Quorum R is
>>> not dependent on the down node being up, and having got the hint.
>>>
>>> Hope I state this appropriately!
>>>
>>> HTH,
>>>
>>> -JA
>>>
>>>
>>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>>> tijoriwala.rit...@gmail.com> wrote:
>>>
>>>> > Read repair will probably occur at that point (depending on your
>>>> config), which would cause the newest value to propagate to more replicas.
>>>>
>>>> Is the newest value the "quorum" value which means it is the old value
>>>> that will be written back to the nodes having "newer non-quorum" value or
>>>> the newest value is the real new value? :) If later, than this seems kind 
>>>> of
>>>> odd to me and how it will be useful to any application. A bug?
>>>>
>>>> Thanks,
>>>> Ritesh
>>>>
>>>>
>>>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell <d...@meebo-inc.com>wrote:
>>>>
>>>>> Ritesh,
>>>>>
>>>>> You have seen the problem. Clients may read the newly written value
>>>>> even though the client performing the write saw it as a failure. When the
>>>>> client reads, it will use the correct number of replicas for the chosen 
>>>>> CL,
>>>>> then return the newest value seen at any replica. This "newest value" 
>>>>> could
>>>>> be the result of a failed write.
>>>>>
>>>>> Read repair will probably occur at that point (depending on your
>>>>> config), which would cause the newest value to propagate to more replicas.
>>>>>
>>>>> R+W>N guarantees serial order of operations: any read at CL=R that
>>>>> occurs after a write at CL=W will observe the write. I don't think this
>>>>> property is relevant to your current question, though.
>>>>>
>>>>> Cassandra has no mechanism to "roll back" the partial write, other than
>>>>> to simply write again. This may also fail.
>>>>>
>>>>> Best,
>>>>> Dave
>>>>>
>>>>>
>>>>> On Wed, Feb 23, 2011 at 10:12 AM, <tijoriwala.rit...@gmail.com> wrote:
>>>>>
>>>>>> Hi Dave,
>>>>>> Thanks for your input. In the steps you mention, what happens when
>>>>>> client tries to read the value at step 6? Is it possible that the client 
>>>>>> may
>>>>>> see the new value? My understanding was if R + W > N, then client will 
>>>>>> not
>>>>>> see the new value as Quorum nodes will not agree on the new value. If 
>>>>>> that
>>>>>> is the case, then its alright to return failure to the client. However, 
>>>>>> if
>>>>>> not, then it is difficult to program as after every failure, you as an
>>>>>> client are not sure if failure is a pseudo failure with some side 
>>>>>> effects or
>>>>>> real failure.
>>>>>>
>>>>>> Thanks,
>>>>>> Ritesh
>>>>>>
>>>>>> <quote author='Dave Revell'>
>>>>>>
>>>>>> Ritesh,
>>>>>>
>>>>>> There is no commit protocol. Writes may be persisted on some replicas
>>>>>> even
>>>>>> though the quorum fails. Here's a sequence of events that shows the
>>>>>> "problem:"
>>>>>>
>>>>>> 1. Some replica R fails, but recently, so its failure has not yet been
>>>>>> detected
>>>>>> 2. A client writes with consistency > 1
>>>>>> 3. The write goes to all replicas, all replicas except R persist the
>>>>>> write
>>>>>> to disk
>>>>>> 4. Replica R never responds
>>>>>> 5. Failure is returned to the client, but the new value is still in
>>>>>> the
>>>>>> cluster, on all replicas except R.
>>>>>>
>>>>>> Something very similar could happen for CL QUORUM.
>>>>>>
>>>>>> This is a conscious design decision because a commit protocol would
>>>>>> constitute tight coupling between nodes, which goes against the
>>>>>> Cassandra
>>>>>> philosophy. But unfortunately you do have to write your app with this
>>>>>> case
>>>>>> in mind.
>>>>>>
>>>>>> Best,
>>>>>> Dave
>>>>>>
>>>>>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
>>>>>> tijoriwala.rit...@gmail.com> wrote:
>>>>>>
>>>>>> >
>>>>>> > Hi,
>>>>>> > I wanted to get details on how does cassandra do synchronous writes
>>>>>> to W
>>>>>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with
>>>>>> > failures of of nodes before it gets to write to W replicas? If the
>>>>>> > orchestrating node cannot write to W nodes successfully, I guess it
>>>>>> will
>>>>>> > fail the write operation but what happens to the completed writes on
>>>>>> X (W
>>>>>> > >
>>>>>> > X) nodes?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Ritesh
>>>>>> > --
>>>>>> > View this message in context:
>>>>>> >
>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>>>>>> > Sent from the cassandra-u...@incubator.apache.org mailing list
>>>>>> archive at
>>>>>> > Nabble.com.
>>>>>> >
>>>>>>
>>>>>> </quote>
>>>>>> Quoted from:
>>>>>>
>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How does Cassandra handle failure during synchronous writes

Reply via email to