[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023151#comment-13023151
 ] 

Peter Schuller commented on CASSANDRA-2494:
---

I don't think anyone is claiming otherwise, unless I'm misunderstanding. The 
problem is that while the if sucessfully written to quorum, subsequent quorum 
reads will see it guarantee is indeed maintained, it is possible for quorum 
reads to see data go backwards (on a timeline) in the event of a *failed* 
attempted quorum write. This includes the possibility of reads seeing data that 
then permanently vanishes, even though you only lost say 1 node that you 
designed your cluster for surviving (RF = 3, QUORUM). (lost 1 node can be 
substituted with killed 1 node in periodic commit mode)

I still don't think this is a violation of what was promised, but I can see how 
making the further guarantee would make for more useful consistency semantics 
in some cases.

With respect to implicit write: An alternative is to adjust reconciliation 
logic when applied as part of reads (as opposed to AES,  hinted hand-off, 
writes) to take consistency level into account and only consider columns whose 
timestamp is = the greatest timestamp that has quorum (off the top of my head 
I think that should be correct in call cases, but I didn't think this through 
terribly).


 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023154#comment-13023154
 ] 

Peter Schuller commented on CASSANDRA-2494:
---

Ok, so my last suggestion is in fact broken. A counter example is:

 A: column @ t1
 B: column @ t2
 C: column @ t3

If A + B is participating, A's column @ t1 has timestamp quorum and would be 
selected. If B + C is participating, B's column is picked. Thus, a read where B 
+ C participates will see data that will be reverted once A + B happens to be 
picked.

Note to self: Think before posting.


 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Sean Bridges (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023233#comment-13023233
 ] 

Sean Bridges commented on CASSANDRA-2494:
-

I think the guarantee of quorum reads not seeing old writes once a quorum read 
sees a new write is  very useful.  I suspect most people already think that 
this guarantee occurs, including, it seems, Jonathan Ellis whose quote can be 
found in the email thread linked to in the bug,

The important guarantee this gives you is that once one quorum read sees the 
new value, all others will too.   You can't see the newest version, then see an 
older version on a subsequent write [sic, I
assume he meant read], which is the characteristic of non-strong consistency





 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023237#comment-13023237
 ] 

Sylvain Lebresne commented on CASSANDRA-2494:
-

The problem is you are considering the consistency of reads but not write. The 
guarantee is: quorum reads will not see old quorum write once a quorum read 
sees a new quorum. Period. I you don't consider the consistency of a write, 
consider the case of a CL.ANY write. In this case, the update may not be at all 
on any replica. How can we ensure the quorum read property that you want ? We 
query all nodes for quorum reads in case there is an hint somewhere ?

If you look at the Consistency part of 
http://wiki.apache.org/cassandra/ArchitectureOverview, it seems to me that it 
is pretty clear that the consistency of reads *and* writes is involved to 
achieve strong consistency. So I would hope 'most people' are aware of that.

 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023239#comment-13023239
 ] 

Peter Schuller commented on CASSANDRA-2494:
---

The issue is that of *failed* QUORUM writes. I.e., you design your system to 
use QUORUM writes and QUORUM reads, and expect that once a QUORUM read sees a 
given piece of data a subsequent QUORUM read will also see it (or a later 
data). A *failed* QUORUM write that was replicated to less than a QUORUM would 
be visible as part of QUORUM reads that happen to touch one of those replicas, 
but there is no guarantee that subsequent reads see it.

I was under the impression this was never an intended guarantee. Apparently I 
may be wrong about that given the jbellis quote above. In either case, it is 
certainly not an *actual* guarantee given by the current implementation.

The guarantee that a *successful* QUORUM write is seen by a subsequent QUORUM 
read is, as far as I can tell, not in question here.



 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-22 Thread Sean Bridges (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023242#comment-13023242
 ] 

Sean Bridges commented on CASSANDRA-2494:
-

To be clear, this is a new guarantee.  The current guarantee is R+WN gives you 
consistency.  This bug is asking that a quorum read of A means that A has been 
committed to a quorum of nodes.

How can we ensure the quorum read property that you want ?

If when reading at quorum, and no quorum can be found which agrees on a 
particular value, then the coordinator (?) will wait for acks of read repair 
writes (or perhaps just do normal writes) to be returned from a sufficient 
number of nodes to ensure that the value has been committed to a quorum of 
nodes.

Without this new guarantee it is hard for readers to function correctly.  The 
reader does not know that the quorum write failed, or is still in progress, so 
without reading at ALL, the R+WN guarantee does not help the reader.





 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-21 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023116#comment-13023116
 ] 

Stu Hood commented on CASSANDRA-2494:
-

W plus R must be _greater than_ N for consistency.

 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-18 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021242#comment-13021242
 ] 

Jeremiah Jordan commented on CASSANDRA-2494:


I would think that reads at QUORUM should never go backwards.  Even if the 
Write was at ZERO.  If there were writes to the cluster of a=1 time=5, a=2 
time=10, a=3 time=15, and I do a read at QUORUM which tells me a=3 time=15, I 
should not be able to do another read at QUORUM and get a=2 time=10.

 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-17 Thread Peter Schuller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020867#comment-13020867
 ] 

Peter Schuller commented on CASSANDRA-2494:
---

As far as I can tell the consistency being asked for was never promised by 
Cassandra is in fact not expected.

The expected behavior of writes is that they propagate; the difference between 
ONE and QUORUM is just how many are required to receive a write prior to a 
return to the client with a successful error code. For reads, that means you 
may get lucky at ONE or you may get lucky at QUORUM; the positive guarantee is 
in the case of a *completing* QUORUM write followed by a QUORUM read.

So just to be clear, although I don't think this is what is being asked for: As 
far as I know, it has never been the case, nor the intent to promise, that a 
write which fails is guaranteed not to eventually complete. Simply fixing 
reads is not enough; by design the data will be replicated during read-repair 
and AES - this is how consistency is achieved in Cassandra.

However, it sounds like what is being asked for is not that they don't 
propagate in the event of a write failure, but just that reads don't see the 
writes until they are sufficiently propagated to guarantee that any future 
QUORUM read will also see the data. I can understand that is desirable, in the 
sense of achieving monotonically forward-moving data as the benchmark/test from 
the e-mail thread does. Another way to look at is that maybe you never want to 
read data successfully prior to achieving a certain level of replication, in 
order to avoid a client ever seeing data that may suddenly go away due to e.g. 
a node failure in spite of said failure not exceeding the number of failures 
the cluster was designed to survive.

So the key point would be the bit about guaranteeing that any future QUORUM 
read will see the data or data subsequently overwritten, and actively 
read-repairing and waiting for it to happen would take care of that. It would 
be important to ensure that the act of ensuring a quorum of nodes have seen the 
data is the important part; one should not await for a quorum to agree on the 
*current* version of the data as that would create potentially unbounded 
round-trips on hotly contended data.

Thing to consider: One might think about cases where read-repair is currently 
not done, like range slices, and how an implementation that requires read 
repair for consistency affects that.



 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent

2011-04-17 Thread Sean Bridges (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020898#comment-13020898
 ] 

Sean Bridges commented on CASSANDRA-2494:
-

Peter Shuller wrote,

However, it sounds like what is being asked for is not that they don't 
propagate in the event of a write failure, but just that reads don't see the 
writes until they are sufficiently propagated to guarantee that any future 
QUORUM read will also see the data.

Yes, that is the issue.  The comment in the bug about writing at ONE and 
reading at QUORUM is just a way of testing this new guarantee in a distributed 
test, if Cassandra has those.

 Quorum reads are not consistent
 ---

 Key: CASSANDRA-2494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges

 As discussed in this thread,
 http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html
 Quorum reads should be consistent.  Assume we have a cluster of 3 nodes 
 (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but 
 not Y and Z, then a read from X should not return N unless the read is 
 committed to at  least two nodes.  To ensure this, a read from X should wait 
 for an ack of the read repair write from either Y or Z before returning.
 Are there system tests for cassandra?  If so, there should be a test similar 
 to the original post in the email thread.  One thread should write 1,2,3... 
 at consistency level ONE.  Another thread should read at consistency level 
 QUORUM from a random host, and verify that each read is = the last read.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira