[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2021-04-06 Thread Vytenis Silgalis (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17315894#comment-17315894
 ] 

Vytenis Silgalis edited comment on CASSANDRA-12126 at 4/6/21, 11:53 PM:


Just a note that the bug that this fixes usually pops up as the following 
timeout for people looking for reasons why SERIAL or LOCAL_SERIAL are seeing 
read timeouts >3.11.10.  Setting the flag to the opt-out option will `fix` it 
but probably shouldn't be reading at this level if you run into this.
{code:java}
! com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency LOCAL_SERIAL (2 responses were required but 
only 0 replica responded)
{code}


was (Author: vsilgalis):
Just a note that the bug that this fixes usually pops up as the following 
timeout for people looking for reasons why SERIAL or LOCAL_SERIAL are seeing 
read timeouts >3.11.10.  Setting the flag to the opt-out option will `fix` it, 
but probably shouldn't be reading at this level if you run into this.
{code:java}
! com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency LOCAL_SERIAL (2 responses were required but 
only 0 replica responded)
{code}

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.24, 3.11.10, 4.0, 4.0-beta4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227379#comment-17227379
 ] 

Benjamin Lerer edited comment on CASSANDRA-12126 at 11/6/20, 12:44 PM:
---

It seem to me that there are several options here:
# Try to use your proposal for 4.0 if the community has the appetite for it. 
The main issue there is some potential extra delay for 4.0
# Do nothing for 4.0. Meaning do not commit the patch. We have lived a long 
time with that issue and we can probably wait a bit more for a proper solution.
# Commit the patch as such, fixing the correctness but introducting potentially 
some performance issue until we release a better solution.
#  Changing the patch to default to the current behavior but allowing people to 
enable the new one if the correctness is a problem for them.

May be we should trigger a discussion on the mailing list and see what is other 
people opinion.

I can take care of that next week if you think it is a good idea.


was (Author: blerer):
It seem to me that there are several options here:
# Try to use your proposal for 4.0 if the community has the appetite for it. 
The main issue there is some potential extra delay for 4.0
# Do nothing for 4.0. Meaning do not commit the patch. We have lived a long 
time with that issue and we can probably wait a bit more for a proper solution.
# Commit the patch as such, fixing the correctness but introducting potentially 
some performance issue until we release a better solution.
#  Changing the patch to default to the current behavior but allowing people to 
enable the new one if the correctness is a problem for them.

May be we should trigger a discussion on the mailing list and see what is other 
people opinion.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-05 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226668#comment-17226668
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 11/5/20, 11:56 AM:
---

So, before we commit this I wanted to share that some experimentation found 
that this can lead to a significant increase in timeouts, particularly for 
read-heavy workloads, that previously would not have competed with each other. 
I think committing this to a patch release is honestly problematic, as it could 
surprise users with a service outage. At the very least, there should be HUGE 
warnings in {{NEWS.txt}}, but honestly I would prefer to have users opt-in for 
patch releases.

As much as I agree that it is problematic to provide the wrong semantics, I 
think it is also problematic to force a decision between stability and 
correctness onto our users without their informed and positive consent.

I hope that I will be able to provide the community with an alternative 
solution in the near future, without these (and many other existing) pitfalls. 
However I'm not sure how that should affect this decision.


was (Author: benedict):
So, before we commit this I wanted to share that some internal experimentation 
found that this can lead to a significant increase in timeouts, particularly 
for read-heavy workloads, that previously would not have competed with each 
other. I think committing this to a patch release is honestly problematic, as 
it could surprise users with a service outage. At the very least, there should 
be HUGE warnings in {{NEWS.txt}}, but honestly I would prefer to have users 
opt-in for patch releases.

As much as I agree that it is problematic to provide the wrong semantics, I 
think it is also problematic to force a decision between stability and 
correctness onto our users without their informed and positive consent.

I hope that I will be able to provide the community with an alternative 
solution in the near future, without these (and many other existing) pitfalls. 
However I'm not sure how that should affect this decision.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138503#comment-17138503
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:42 PM:
--

I'm not proposing we do anything different for your patch, just clarifying that 
this isn't strictly necessary - it is quite possible to modify the algorithm to 
never commit empty proposals.  The problem today is that we:

# "Refresh" a quorum with the MRC if not witnessed by all promisers 
# Filter out empty proposals when deciding if we have an in progress proposal 
({{mostRecentInProgressCommit}} vs {{mostRecentInProgressCommitWithUpdate}})

If instead we did not refresh empty commits, and we did not filter out empty 
proposals when _updating_ {{mostRecentInProgressCommitWithUpdate}} but did not 
_complete_ any empty proposals we found then everything would be fine.

{{mostRecentInProgressCommitWithUpdate}} confuses matters because it is poorly 
named, and is updated by its naming rather than intent - I think it is _meant_ 
to be {{mostRecentInProgressProposal}} whereas {{mostRecentInProgressCommit}} 
should be e.g. {{mostRecentInProgressPromiseOrProposal}}, and 
{{mostRecentInProgressProposal}} would gain empty proposals as well as 
non-empty ones, and correctly discount the older in progress proposal that was 
invalidated by the newer read that did not witness it.

To be clear, I'm mostly participating in this discussion for my own benefit and 
for the benefit of future work, not trying to solicit changes to your work.


was (Author: benedict):
I'm not proposing we do anything different for your patch, just clarifying that 
this isn't strictly necessary - it is quite possible to modify the algorithm to 
never commit empty proposals.  The problem today is that we:

# "Refresh" a quorum with the MRC if not witnessed by all promisers 
# Filter out empty proposals when deciding if we have an in progress proposal 
({{mostRecentInProgressCommit}} vs {{mostRecentInProgressCommitWithUpdate}})

If instead we did not refresh empty commits, and we did not filter out empty 
proposals when _updating_ {{mostRecentInProgressCommitWithUpdate}} but did not 
_complete_ any empty proposals we found then everything would be fine.

{{mostRecentInProgressCommitWithUpdate}} confuses matters because it is poorly 
named, and is updated by its naming rather than intent - I think it is _meant_ 
to be {{mostRecentInProgressProposal}} whereas {{mostRecentInProgressCommit}} 
should be e.g. {{mostRecentInProgressPromiseOrProposal}}, and 
{{mostRecentInProgressProposal}} would gain empty proposals as well as 
non-empty ones, and correctly discount the older in progress proposal that was 
invalidated by the newer read that did not witness it.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no 

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138464#comment-17138464
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:12 PM:
--

The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which is a long-winded way of saying that I am fairly confident there's no need 
to update the paxos state table with the "committed" status of this empty 
proposal so long as it remains in the table _as an accepted proposal_, and so 
long as this accepted proposal continues to override earlier in progress 
proposals.



was (Author: benedict):
The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138464#comment-17138464
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 6/17/20, 2:12 PM:
--

The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which (seems to) assume that an empty 
update is for a higher promise (since the meaning is overloaded in the response 
message) rather than an "incomplete" proposal. If the empty proposal were to be 
correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it would 
override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.



was (Author: benedict):
The problem here stems only from the overload of 
{{mostRecentInProgressCommitWithUpdate}}, which assumes an empty update is for 
a higher promise rather than an "incomplete" proposal. If the empty proposal 
were to be correctly merged with {{mostRecentInProgressCommitWithUpdate}}, it 
would override the early non-empty incomplete proposal.

Which its a long-winded way of saying there's no need to update the paxos state 
table with the "committed" status of this empty proposal so long as it remains 
in the table _as an accepted proposal_ and so long as this accepted proposal 
continues to override earlier in progress proposals.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-27 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117855#comment-17117855
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-12126 at 5/27/20, 3:39 PM:
--

The test cases I provided demonstrate several consistency violations during 
range movements.  I've just thought of another one, and am writing a test case 
for it.  Perhaps we could claim that range movements are always (potentially) 
consistency violations, but they are particularly keenly felt when you claim a 
linearisable history.

There are also (more debatably) issues with TTL on {{system.paxos}}, 
particularly when mixed with non-global commit; perhaps we could claim this is 
the user's problem, but it's not clear why we support global consensus that can 
be lost through local commit, and I don't think we communicate clearly the 
consistency implications to not call this a bug.

Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe, and even supporting 
them both is arguably a consistency violation without mechanisms to safely 
transition from one level to another.


was (Author: benedict):
The test cases I provided demonstrate several consistency violations during 
range movements.  I've just thought of another one, and am writing a test case 
for it.  Perhaps we could claim that range movements are always consistency 
violations, but they are particularly keenly felt when you claim a linearisable 
history.

There are also (more debatably) issues with TTL on {{system.paxos}}, 
particularly when mixed with non-global commit; perhaps we could claim this is 
the user's problem, but it's not clear why we support global consensus that can 
be lost through local commit, and I don't think we communicate clearly the 
consistency implications to not call this a bug.

Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe, and even supporting 
them both is arguably a consistency violation without mechanisms to safely 
transition from one level to another.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631226#comment-16631226
 ] 

Benedict edited comment on CASSANDRA-12126 at 9/28/18 9:42 AM:
---

{quote}We read nothing in node Y, yet node Z read something in the next request.
{quote}
I think the problem here is that, at the API level, there isn't enough 
information to say that X didn't simply 'occur' *after* both Y and Z. That is, 
unless the rejection of Y occurs after X's timeout. In this case, it would seem 
to be an API-visible error, as at the point of timeout the indeterminacy should 
be fixed. Timeouts should not ‘live forever’ as the bogeyman, ready to mess 
with history.

I think, though, that the suggested mechanism could result in this.

Take three nodes (RF=3) A, B and C; and any three CAS operations X, Y and Z 
such that:
 * X and Y can always succeed
 * Z can only succeed if X has succeeded

Setup:
 # Prepare _and_ Propose X with ballot 1; proposal accepted only by A
 ** this will be the last and only node’s proposal acceptance
 # Prepare Y with ballot 2; reach B and C before ballot 1, so they do not accept
 # Now, lock X and Y in battle, always failing to proceed to the propose step 
before the other reaches the prepare step again
 # X and Y both timeout having failed to cleanly apply

Part 2:
 # Z is now attempted; it prepares to only B and C, seeing no in-progress 
proposal
 # As a result, it does not see X; it is rejected, so there is no new 
proposal/commit
 # Z is attempted again; this time, A is consulted
 # Suddenly, a wild X appears. From nowhere.  Z succeeds, despite no 
intervening operations.

It does seem, in essence, to be an incidence of the bug (or a very similar one) 
described in the ticket.


was (Author: benedict):
bq. We read nothing in node Y, yet node Z read something in the next request.

I think the problem here is that, at the API level, there isn't enough 
information to say that X didn't simply 'occur' *after* both Y and Z.  That is, 
unless the rejection of Y occurs after X's timeout.  In this case, it would 
seem to be an API-visible error, as at the point of timeout the indeterminacy 
should be fixed.  Timeouts should not ‘live forever’ as the bogeyman, ready to 
mess with history.

I think, though, that the suggested mechanism could result in this.

Take three nodes (RF=3) A, B and C; and any three CAS operations X, Y and Z 
such that:
* X and Y can always succeed
* Z can only succeed if X has succeeded

Setup:
# Prepare _and_ Propose X with ballot 1; proposal accepted only by A 
#* this will be the last and only node’s proposal acceptance
# Prepare Y with ballot 2; reach B and C before ballot 1, so they do not accept
# Now, lock X and Y in battle, always failing to proceed to the propose step 
before the other reaches the prepare step again
# X and Y both timeout having failed to cleanly apply

Part 2:
# Z is now attempted; it prepares to only B and C, seeing no in-progress 
proposal
# As a result, it does not see X; it is rejected, so there is no new 
proposal/commit 
# Read at SERIAL is performed; this time, A is consulted
# Suddenly, a wild X appears.  From nowhere.

It does seem, in essence, to be an incidence of the bug (or a very similar one) 
described in the ticket.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is 

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630700#comment-16630700
 ] 

Jeremiah Jordan edited comment on CASSANDRA-12126 at 9/27/18 4:31 PM:
--

bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.


was (Author: jjordan):
bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630700#comment-16630700
 ] 

Jeremiah Jordan edited comment on CASSANDRA-12126 at 9/27/18 4:31 PM:
--

bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {{value_1='A', value_2=null, value_3=null}} is a valid state and 
not inconsistent.


was (Author: jjordan):
bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For 

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeffrey F. Lukman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629685#comment-16629685
 ] 

Jeffrey F. Lukman edited comment on CASSANDRA-12126 at 9/27/18 3:19 PM:


To complete our scenario, here is the setup for our Cassandra:
 We run the scenario with Cassandra-v2.0.15.
 Here is the scheme that we use:
 * CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * CREATE TABLE tests ( name text PRIMARY KEY, owner text, value_1 text, 
value_2 text, value_3 text);

Here are the queries that we submit:
 * client request to node X (1st): UPDATE test.tests SET value_1 = 'A' WHERE 
name = 'testing' IF owner = 'user_1';
 * client request to node Y (2nd): UPDATE test.tests SET value_2 = 'B' WHERE 
name = 'testing' IF value_1 = 'A';
 * client request to node Z (3rd): UPDATE test.tests SET value_3 = 'C' WHERE 
name = 'testing' IF value_1 = 'A';

To confirm, when the bug is manifested, the end result will be: value_1='A', 
value_2=null, value_3=null

[~jjirsa], regarding our tool, at this point, it is not open for public. 


was (Author: jeffreyflukman):
To complete our scenario, here is the setup for our Cassandra:
We run the scenario with Cassandra-v2.0.15.
Here is the scheme that we use:
 * 
CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * 
CREATE TABLE tests ( name text PRIMARY KEY, owner text, value_1 text, value_2 
text, value_3 text);

Here are the queries that we submit:
 * client request to node X (1st): UPDATE test.tests SET value_1 = 'A' WHERE 
name = 'testing' IF owner = 'user_1';
 * client request to node Y (2nd): UPDATE test.tests SET value_2 = 'B' WHERE 
name = 'testing' IF value_1 = 'A';
 * client request to node Z (3rd): UPDATE test.tests SET value_3 = 'C' WHERE 
name = 'testing' IF value_1 = 'A';

To confirm, when the bug is manifested, the end result will be: value_1='A', 
value_2=null, value_3=null



[~jjirsa], regarding our tool, at this point, it is not open for public. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org


[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629950#comment-16629950
 ] 

Benedict edited comment on CASSANDRA-12126 at 9/27/18 10:12 AM:


[~jeffreyflukman] it would help if you could explicitly state the client 
responses returned for each of your operations.  The options are: time out, 
rejected (condition not met), success (condition met, and mutation applied)

For completeness, as with CASSANDRA-12438, the read queries you are performing, 
to which nodes, at what point and with what consistency levels would be helpful 
to know.  Are you verifying the state with a SERIAL read after the last query, 
most specifically?  Also, can we assume that the state of the table began with 
\\{name:'testing', owner:'user_1', value1:null, value2:null, value3:null}\?


was (Author: benedict):
I'm not sure if I'm following, but it seems the bug report is suggesting that 
operation #3 is returned to the client as successful, but #1's state is the 
only state visible.  However, if #1 was successful and the state of the cluster 
prior to #3 succeeding, then #3 should have also modified the cluster state 
since its IF statement should have evaluated to true.

As with CASSANDRA-12438, the read queries you are performing, to which nodes, 
at what point and with what consistency levels would be helpful to know.  Also, 
can we assume that the state of the table began with \\{name:'testing', 
owner:'user_1', value1:null, value2:null, value3:null}\?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-26 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629542#comment-16629542
 ] 

Jeff Jirsa edited comment on CASSANDRA-12126 at 9/26/18 11:47 PM:
--

[~jeffreyflukman] thanks for this report. Suspect that most of the folks who 
are interested in this are already cc'd and received an email notification of 
your response, but explicitly tagging [~benedict] [~iamaleksey] and 
[~bdeggleston] as people who aren't yet watching it but may be interested.

Also, very much interested in the model you mentioned - is that available 
publicly at this point? 


was (Author: jjirsa):
[~jeffreyflukman] thanks for this report. Suspect that most of the folks who 
are interested in this are already cc'd and received an email notification of 
your response, but explicitly tagging [~benedict] [~iamaleksey] and 
[~bdeggleston] as people who aren't yet watching it but may be interested.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974834#comment-15974834
 ] 

Jonathan Ellis edited comment on CASSANDRA-12126 at 4/19/17 2:56 PM:
-

I see.  So you are saying that

1: Write
2: Read -> Nothing
3: Read -> Something

Is broken because to go from Nothing to Something [in a linearized system] 
there needs to be a write in between.


was (Author: jbellis):
I see.  So you are saying that

1: Write
2: Read -> Nothing
3: Read -> Something

Is broken because to go from Nothing to Something there needs to be a write in 
between.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)