[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023151#comment-13023151 ] Peter Schuller commented on CASSANDRA-2494: --- I don't think anyone is claiming otherwise, unless I'm misunderstanding. The problem is that while the if sucessfully written to quorum, subsequent quorum reads will see it guarantee is indeed maintained, it is possible for quorum reads to see data go backwards (on a timeline) in the event of a *failed* attempted quorum write. This includes the possibility of reads seeing data that then permanently vanishes, even though you only lost say 1 node that you designed your cluster for surviving (RF = 3, QUORUM). (lost 1 node can be substituted with killed 1 node in periodic commit mode) I still don't think this is a violation of what was promised, but I can see how making the further guarantee would make for more useful consistency semantics in some cases. With respect to implicit write: An alternative is to adjust reconciliation logic when applied as part of reads (as opposed to AES, hinted hand-off, writes) to take consistency level into account and only consider columns whose timestamp is = the greatest timestamp that has quorum (off the top of my head I think that should be correct in call cases, but I didn't think this through terribly). Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023154#comment-13023154 ] Peter Schuller commented on CASSANDRA-2494: --- Ok, so my last suggestion is in fact broken. A counter example is: A: column @ t1 B: column @ t2 C: column @ t3 If A + B is participating, A's column @ t1 has timestamp quorum and would be selected. If B + C is participating, B's column is picked. Thus, a read where B + C participates will see data that will be reverted once A + B happens to be picked. Note to self: Think before posting. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023233#comment-13023233 ] Sean Bridges commented on CASSANDRA-2494: - I think the guarantee of quorum reads not seeing old writes once a quorum read sees a new write is very useful. I suspect most people already think that this guarantee occurs, including, it seems, Jonathan Ellis whose quote can be found in the email thread linked to in the bug, The important guarantee this gives you is that once one quorum read sees the new value, all others will too. You can't see the newest version, then see an older version on a subsequent write [sic, I assume he meant read], which is the characteristic of non-strong consistency Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023237#comment-13023237 ] Sylvain Lebresne commented on CASSANDRA-2494: - The problem is you are considering the consistency of reads but not write. The guarantee is: quorum reads will not see old quorum write once a quorum read sees a new quorum. Period. I you don't consider the consistency of a write, consider the case of a CL.ANY write. In this case, the update may not be at all on any replica. How can we ensure the quorum read property that you want ? We query all nodes for quorum reads in case there is an hint somewhere ? If you look at the Consistency part of http://wiki.apache.org/cassandra/ArchitectureOverview, it seems to me that it is pretty clear that the consistency of reads *and* writes is involved to achieve strong consistency. So I would hope 'most people' are aware of that. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023239#comment-13023239 ] Peter Schuller commented on CASSANDRA-2494: --- The issue is that of *failed* QUORUM writes. I.e., you design your system to use QUORUM writes and QUORUM reads, and expect that once a QUORUM read sees a given piece of data a subsequent QUORUM read will also see it (or a later data). A *failed* QUORUM write that was replicated to less than a QUORUM would be visible as part of QUORUM reads that happen to touch one of those replicas, but there is no guarantee that subsequent reads see it. I was under the impression this was never an intended guarantee. Apparently I may be wrong about that given the jbellis quote above. In either case, it is certainly not an *actual* guarantee given by the current implementation. The guarantee that a *successful* QUORUM write is seen by a subsequent QUORUM read is, as far as I can tell, not in question here. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023242#comment-13023242 ] Sean Bridges commented on CASSANDRA-2494: - To be clear, this is a new guarantee. The current guarantee is R+WN gives you consistency. This bug is asking that a quorum read of A means that A has been committed to a quorum of nodes. How can we ensure the quorum read property that you want ? If when reading at quorum, and no quorum can be found which agrees on a particular value, then the coordinator (?) will wait for acks of read repair writes (or perhaps just do normal writes) to be returned from a sufficient number of nodes to ensure that the value has been committed to a quorum of nodes. Without this new guarantee it is hard for readers to function correctly. The reader does not know that the quorum write failed, or is still in progress, so without reading at ALL, the R+WN guarantee does not help the reader. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023116#comment-13023116 ] Stu Hood commented on CASSANDRA-2494: - W plus R must be _greater than_ N for consistency. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021242#comment-13021242 ] Jeremiah Jordan commented on CASSANDRA-2494: I would think that reads at QUORUM should never go backwards. Even if the Write was at ZERO. If there were writes to the cluster of a=1 time=5, a=2 time=10, a=3 time=15, and I do a read at QUORUM which tells me a=3 time=15, I should not be able to do another read at QUORUM and get a=2 time=10. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020867#comment-13020867 ] Peter Schuller commented on CASSANDRA-2494: --- As far as I can tell the consistency being asked for was never promised by Cassandra is in fact not expected. The expected behavior of writes is that they propagate; the difference between ONE and QUORUM is just how many are required to receive a write prior to a return to the client with a successful error code. For reads, that means you may get lucky at ONE or you may get lucky at QUORUM; the positive guarantee is in the case of a *completing* QUORUM write followed by a QUORUM read. So just to be clear, although I don't think this is what is being asked for: As far as I know, it has never been the case, nor the intent to promise, that a write which fails is guaranteed not to eventually complete. Simply fixing reads is not enough; by design the data will be replicated during read-repair and AES - this is how consistency is achieved in Cassandra. However, it sounds like what is being asked for is not that they don't propagate in the event of a write failure, but just that reads don't see the writes until they are sufficiently propagated to guarantee that any future QUORUM read will also see the data. I can understand that is desirable, in the sense of achieving monotonically forward-moving data as the benchmark/test from the e-mail thread does. Another way to look at is that maybe you never want to read data successfully prior to achieving a certain level of replication, in order to avoid a client ever seeing data that may suddenly go away due to e.g. a node failure in spite of said failure not exceeding the number of failures the cluster was designed to survive. So the key point would be the bit about guaranteeing that any future QUORUM read will see the data or data subsequently overwritten, and actively read-repairing and waiting for it to happen would take care of that. It would be important to ensure that the act of ensuring a quorum of nodes have seen the data is the important part; one should not await for a quorum to agree on the *current* version of the data as that would create potentially unbounded round-trips on hotly contended data. Thing to consider: One might think about cases where read-repair is currently not done, like range slices, and how an implementation that requires read repair for consistency affects that. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020898#comment-13020898 ] Sean Bridges commented on CASSANDRA-2494: - Peter Shuller wrote, However, it sounds like what is being asked for is not that they don't propagate in the event of a write failure, but just that reads don't see the writes until they are sufficiently propagated to guarantee that any future QUORUM read will also see the data. Yes, that is the issue. The comment in the bug about writing at ONE and reading at QUORUM is just a way of testing this new guarantee in a distributed test, if Cassandra has those. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira