Hi Anthony, I am not talking about the case of CL ANY. I am talking about the case where your consistency level is R + W > N and you want to write to W nodes but only succeed in writing to X ( where X < W) nodes and hence fail the write to the client.
thanks, Ritesh On Wed, Feb 23, 2011 at 2:48 PM, Anthony John <chirayit...@gmail.com> wrote: > Ritesh, > > At CL ANY - if all endpoints are down - a HH is written. And it is a > successful write - not a failed write. > > Now that does not guarantee a READ of the value just written - but that is > a risk that you take when you use the ANY CL! > > HTH, > > -JA > > > On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala < > tijoriwala.rit...@gmail.com> wrote: > >> hi Anthony, >> While you stated the facts right, I don't see how it relates to the >> question I ask. Can you elaborate specifically what happens in the case I >> mentioned above to Dave? >> >> thanks, >> Ritesh >> >> >> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John <chirayit...@gmail.com>wrote: >> >>> Seems to me that the explanations are getting incredibly complicated - >>> while I submit the real issue is not! >>> >>> Salient points here:- >>> 1. To be guaranteed data consistency - the writes and reads have to be at >>> Quorum CL or more >>> 2. Any W/R at lesser CL means that the application has to handle the >>> inconsistency, or has to be tolerant of it >>> 3. Writing at "ANY" CL - a special case - means that writes will always >>> go through (as long as any node is up), even if the destination nodes are >>> not up. This is done via hinted handoff. But this can result in inconsistent >>> reads, and yes that is a problem but refer to pt-2 above >>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to >>> handle that case where a particular node is down and the write needs to be >>> replicated to it. But this will not cause inconsistent R as the hinted >>> handoff (in this case) only applies after Quorum is met - so a Quorum R is >>> not dependent on the down node being up, and having got the hint. >>> >>> Hope I state this appropriately! >>> >>> HTH, >>> >>> -JA >>> >>> >>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala < >>> tijoriwala.rit...@gmail.com> wrote: >>> >>>> > Read repair will probably occur at that point (depending on your >>>> config), which would cause the newest value to propagate to more replicas. >>>> >>>> Is the newest value the "quorum" value which means it is the old value >>>> that will be written back to the nodes having "newer non-quorum" value or >>>> the newest value is the real new value? :) If later, than this seems kind >>>> of >>>> odd to me and how it will be useful to any application. A bug? >>>> >>>> Thanks, >>>> Ritesh >>>> >>>> >>>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell <d...@meebo-inc.com>wrote: >>>> >>>>> Ritesh, >>>>> >>>>> You have seen the problem. Clients may read the newly written value >>>>> even though the client performing the write saw it as a failure. When the >>>>> client reads, it will use the correct number of replicas for the chosen >>>>> CL, >>>>> then return the newest value seen at any replica. This "newest value" >>>>> could >>>>> be the result of a failed write. >>>>> >>>>> Read repair will probably occur at that point (depending on your >>>>> config), which would cause the newest value to propagate to more replicas. >>>>> >>>>> R+W>N guarantees serial order of operations: any read at CL=R that >>>>> occurs after a write at CL=W will observe the write. I don't think this >>>>> property is relevant to your current question, though. >>>>> >>>>> Cassandra has no mechanism to "roll back" the partial write, other than >>>>> to simply write again. This may also fail. >>>>> >>>>> Best, >>>>> Dave >>>>> >>>>> >>>>> On Wed, Feb 23, 2011 at 10:12 AM, <tijoriwala.rit...@gmail.com> wrote: >>>>> >>>>>> Hi Dave, >>>>>> Thanks for your input. In the steps you mention, what happens when >>>>>> client tries to read the value at step 6? Is it possible that the client >>>>>> may >>>>>> see the new value? My understanding was if R + W > N, then client will >>>>>> not >>>>>> see the new value as Quorum nodes will not agree on the new value. If >>>>>> that >>>>>> is the case, then its alright to return failure to the client. However, >>>>>> if >>>>>> not, then it is difficult to program as after every failure, you as an >>>>>> client are not sure if failure is a pseudo failure with some side >>>>>> effects or >>>>>> real failure. >>>>>> >>>>>> Thanks, >>>>>> Ritesh >>>>>> >>>>>> <quote author='Dave Revell'> >>>>>> >>>>>> Ritesh, >>>>>> >>>>>> There is no commit protocol. Writes may be persisted on some replicas >>>>>> even >>>>>> though the quorum fails. Here's a sequence of events that shows the >>>>>> "problem:" >>>>>> >>>>>> 1. Some replica R fails, but recently, so its failure has not yet been >>>>>> detected >>>>>> 2. A client writes with consistency > 1 >>>>>> 3. The write goes to all replicas, all replicas except R persist the >>>>>> write >>>>>> to disk >>>>>> 4. Replica R never responds >>>>>> 5. Failure is returned to the client, but the new value is still in >>>>>> the >>>>>> cluster, on all replicas except R. >>>>>> >>>>>> Something very similar could happen for CL QUORUM. >>>>>> >>>>>> This is a conscious design decision because a commit protocol would >>>>>> constitute tight coupling between nodes, which goes against the >>>>>> Cassandra >>>>>> philosophy. But unfortunately you do have to write your app with this >>>>>> case >>>>>> in mind. >>>>>> >>>>>> Best, >>>>>> Dave >>>>>> >>>>>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh < >>>>>> tijoriwala.rit...@gmail.com> wrote: >>>>>> >>>>>> > >>>>>> > Hi, >>>>>> > I wanted to get details on how does cassandra do synchronous writes >>>>>> to W >>>>>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with >>>>>> > failures of of nodes before it gets to write to W replicas? If the >>>>>> > orchestrating node cannot write to W nodes successfully, I guess it >>>>>> will >>>>>> > fail the write operation but what happens to the completed writes on >>>>>> X (W >>>>>> > > >>>>>> > X) nodes? >>>>>> > >>>>>> > Thanks, >>>>>> > Ritesh >>>>>> > -- >>>>>> > View this message in context: >>>>>> > >>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html >>>>>> > Sent from the cassandra-u...@incubator.apache.org mailing list >>>>>> archive at >>>>>> > Nabble.com. >>>>>> > >>>>>> >>>>>> </quote> >>>>>> Quoted from: >>>>>> >>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html >>>>>> >>>>> >>>>> >>>> >>> >> >