Re: lightweight transactions with potential problem?

2015-08-27 Thread ibrahim El-sanosi
Hi Sylvain and all folks,

I have another scenario in my mind where *linearizable consistency (CAS,
Compare-and-Set) *can fail as we *the following round-trips:*

*1.*  *Prepare/promise*

*2.*  *Read/result*

*3.*  *Propose/accept*

4.  *Commit/acknowledgment *

Assume we have an application for resistering new account, I want to make
sure I only allow exactly one user to claim a given account. For example,
we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
have two concurrent clients C1 and C2. We have replication factor 3 and the
partitioner has determined the primary and the replicas nodes of the INSERT
example are N3, N4, and N5.



The scenario happens in following order:

1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
username, not resister before)

2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
to N3, N4 and N5. Note that this prepare for C1 and V1.

3. Now C2 connects to coordinator N2 and sends INSERT  V1.

4. N2 sends PREPARE message with ballot 2 (highest ballot after re-prepare
because first time, N2 does not know about ballot 1, then eventual it
solves and have ballot 2) to N3, N4 and N5. Note that this prepare for C2
and V1.

*5.**N1  sends READ message to N3, N4 and N5 to read V1.*

*6.**N3, N4 and N5 send RESULT message to N1, informing that V1 not
exist which results in N1 will go forward to next round.*

*7.* * N2  sends READ message to N3, N4 and N5 to read V1.*

*8.*   *N3, N4 and N5 send RESULT message to N2, informing that V1 not
exist which results in N2 will go forward to next round.*

9.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

10.  N3, N4 and N5 send ACCEPT message to N1.

11.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

12.  N3, N4 and N5 send ACCEPT message to N2.

13.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

14.   N3, N4 and N5 send ACK message to N1.

15.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

16.  N3, N4 and N5 send ACK message to N2.



As result, both V1 from client C1 and V1 from client C2 have written to
replicas N3, N4, and N5. Which I think it does not achieve the goal of
*linearizable
consistency and CAS. *



*Is that true and such scenario could be occurred?*



I look forward to hearing from you.



Regards,


Ibrahim

On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
ibrahimsaba...@gmail.com wrote:

 Thank you lot

 Ibrahim

 On Wed, Aug 26, 2015 at 12:15 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 Yes

 On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 OK. I see what the purpose of acknowledgment round here. So
 acknowledgment is optional here, depend on CL setting as we normally do in
 Cassandra.
 So we can say that acknowledgment is not really related to Paxos phase,
 it depends on CL in Cassandra?

 Ibrahim

 On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne sylv...@datastax.com
  wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does 
 not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit 
 phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).








Re: lightweight transactions with potential problem?

2015-08-27 Thread DuyHai Doan
Step 10 is wrong

N3, N4 and N5 have accepted ballot from N2 which is higher than ballot from
N1 so N3, N4 and N5 should reject request at step 9)

The fact that there is an extra round trip for Read/Result at steps 5) to
8) does not matter and does not interfere with the correctness of the Paxos
Propose/Accept phase


On Thu, Aug 27, 2015 at 12:08 PM, ibrahim El-sanosi 
ibrahimsaba...@gmail.com wrote:





 Hi Sylvain and all folks,

 I have another scenario in my mind where *linearizable consistency (CAS,
 Compare-and-Set) *can fail as we *the following round-trips:*

 *1.*  *Prepare/promise*

 *2.*  *Read/result*

 *3.*  *Propose/accept*

 4.  *Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.



 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
 to N3, N4 and N5. Note that this prepare for C1 and V1.

 3. Now C2 connects to coordinator N2 and sends INSERT  V1.

 4. N2 sends PREPARE message with ballot 2 (highest ballot after re-prepare
 because first time, N2 does not know about ballot 1, then eventual it
 solves and have ballot 2) to N3, N4 and N5. Note that this prepare for C2
 and V1.

 *5.**N1  sends READ message to N3, N4 and N5 to read V1.*

 *6.**N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.*

 *7.* * N2  sends READ message to N3, N4 and N5 to read V1.*

 *8.*   *N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.*

 9.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 10.  N3, N4 and N5 send ACCEPT message to N1.

 11.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 12.  N3, N4 and N5 send ACCEPT message to N2.

 13.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 14.   N3, N4 and N5 send ACK message to N1.

 15.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 16.  N3, N4 and N5 send ACK message to N2.



 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.



 Regards,


 Ibrahim

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Thank you lot

 Ibrahim

 On Wed, Aug 26, 2015 at 12:15 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 Yes

 On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 OK. I see what the purpose of acknowledgment round here. So
 acknowledgment is optional here, depend on CL setting as we normally do in
 Cassandra.
 So we can say that acknowledgment is not really related to Paxos phase,
 it depends on CL in Cassandra?

 Ibrahim

 On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does 
 not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit 
 phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal 
 consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).









Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
 By the way, I do not understand why in lightweight transactions in
 Cassandra has round-trip commit/acknowledgment?

 For me, I think we can commit the value within phase propose/accept. Do
 you agree? If not agree can you explain why we need commit/acknowledgment?


No, value cannot be committed during the propose/accept phase, because
nothing should be committed *before* the coordinator of the round has
received a QUORUM of accepts. In other words, you have to wait on the
result of the propose/accept to know if you can commit or not. This is not
at all specific to Cassandra btw: you'll find this in most Paxos
presentation, generally named as the learning phase.

 --
Sylvain


Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
Yes, Sylvain, your answer makes more sense. The phase is in Paxos protocol
sometimes called learning or decide phase, BUT this phase does not have
acknowledgment round, just learning or decide message from the proposer to
learners. So why we need acknowledgment round with commit phase in
lightweight transactions?


Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos protocol
 sometimes called learning or decide phase, BUT this phase does not have
 acknowledgment round, just learning or decide message from the proposer to
 learners. So why we need acknowledgment round with commit phase in
 lightweight transactions?


It's not _needed_ as far as Paxos is concerned. But it's useful in the
context of Cassandra. The commit phase is about actually persisting to
replica the update decided by the Paxos algorithm and thus making that
update visible to non paxos reads. Being able to apply normal consistencies
to this phase is thus useful, since it allows user to get visibility
guarantees even for non-paxos reads if they so wish, and that's exactly
what we do and why we optionally wait on acknowledgments (and I say
optionally because how many acks we wait on depends on the user provided
consistency level and if that's CL.ANY then the whole Paxos operation
actually return without waiting on any of those acks).


Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
Thank you lot

Ibrahim

On Wed, Aug 26, 2015 at 12:15 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 Yes

 On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 OK. I see what the purpose of acknowledgment round here. So
 acknowledgment is optional here, depend on CL setting as we normally do in
 Cassandra.
 So we can say that acknowledgment is not really related to Paxos phase,
 it depends on CL in Cassandra?

 Ibrahim

 On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).







Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
Yes

On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:

 OK. I see what the purpose of acknowledgment round here. So
 acknowledgment is optional here, depend on CL setting as we normally do in
 Cassandra.
 So we can say that acknowledgment is not really related to Paxos phase, it
 depends on CL in Cassandra?

 Ibrahim

 On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).






Re: lightweight transactions with potential problem?

2015-08-26 Thread ibrahim El-sanosi
OK. I see what the purpose of acknowledgment round here. So
acknowledgment is optional here, depend on CL setting as we normally do in
Cassandra.
So we can say that acknowledgment is not really related to Paxos phase, it
depends on CL in Cassandra?

Ibrahim

On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne sylv...@datastax.com
wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).





Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
What an excellent explanation!!, thank you a lot.

By the way, I do not understand why in lightweight transactions in
Cassandra has round-trip commit/acknowledgment?

For me, I think we can commit the value within phase propose/accept. Do you
agree? If not agree can you explain why we need commit/acknowledgment?



Regards,



ibrahim


Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne


 So you meant that the older ballot will not only reject in round-trip1
 (prepare/promise), it also can be reject in propose/accept round-trips2, Is
 that correct?


Yes.



 You Said : Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,it's that they won't accept a ballot older than 2

 Why step 8 wrong? I think replicas can accept any highest ballot, so
 ballot 2 is the highest in step 8? what do you think?
  Do you also mean replica can promise older ballot.


I shouldn't have said wrong. What I meant is that your description of
what a PROMISE meant was incomplete. It's true that in practice replicas
won't promise older ballots, but it's not the important property in this
case, the important property is that they also promise to not accept any
older ballot.



 I wish you could make it more clear.

 Thank you a lot Sylvain

 Ibrahim


 On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 That scenario cannot happen. More specifically, your step 12 cannot
 happen if
 step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,
 it's that they won't accept a ballot older than 2. Therefore, after
 step 8,
 the accept from N1 will be reject in step 12 and the insert from N1 will
 be
 rejected (that is, N1 will restart the whole algorithm with a new ballot).


 On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to
 make sure I only allow exactly one user to claim a given account. For
 example, we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1
 is username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
 seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,






Re: lightweight transactions with potential problem?

2015-08-25 Thread DuyHai Doan
The rationale of the last commit/ack phase is to set the chosen value (here
the mutation) in a durable storage (here into Cassandra) and reset this
value to allow another round of Paxos.

More explanation in this blog post:
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

For a detailed explanation of different Paxos phases, look at those slides:
http://www.slideshare.net/doanduyhai/distributed-algorithms-for-big-data-geecon/53


On Tue, Aug 25, 2015 at 6:07 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:





 What an excellent explanation!!, thank you a lot.

 By the way, I do not understand why in lightweight transactions in
 Cassandra has round-trip commit/acknowledgment?

 For me, I think we can commit the value within phase propose/accept. Do
 you agree? If not agree can you explain why we need commit/acknowledgment?



 Regards,



 ibrahim



lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
Hi folks,


Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
using Paxos 4 round-trips as following*

*1.  **Prepare/promise*

*2.  **Read/result*

*3.  **Propose/accept*

*4.  **Commit/acknowledgment *

Assume we have an application for resistering new account, I want to make
sure I only allow exactly one user to claim a given account. For example,
we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
have two concurrent clients C1 and C2. We have replication factor 3 and the
partitioner has determined the primary and the replicas nodes of the INSERT
example are N3, N4, and N5.


The scenario happens in following order:

1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
username, not resister before)

2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
to N3, N4 and N5. Note that this prepare for C1 and V1.

3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any with
older than ballot 1.

4.N1  sends READ message to N3, N4 and N5 to read V1.

5.N3, N4 and N5 send RESULT message to N1, informing that V1 not exist
which results in N1 will go forward to next round.

6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

7.  N2 sends PREPARE message with ballot 2 (highest ballot after
re-prepare because first time, N2 does not know about ballot 1, then
eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
prepare for C2 and V1.

8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any with
older than ballot 2.

9.  N2  sends READ message to N3, N4 and N5 to read V1.

10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not exist
which results in N2 will go forward to next round.

11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

12.  N3, N4 and N5 send ACCEPT message to N1.

13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

14.  N3, N4 and N5 send ACCEPT message to N2.

15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

16.   N3, N4 and N5 send ACK message to N1.

17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

18.  N3, N4 and N5 send ACK message to N2.


As result, both V1 from client C1 and V1 from client C2 have written to
replicas N3, N4, and N5. Which I think it does not achieve the goal of
*linearizable
consistency and CAS. *



*Is that true and such scenario could be occurred?*



I look forward to hearing from you.


Regards,


Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne
That scenario cannot happen. More specifically, your step 12 cannot happen
if
step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
PROMISE, the promise is not that they won't promise a ballot older than 2,
it's that they won't accept a ballot older than 2. Therefore, after step
8,
the accept from N1 will be reject in step 12 and the insert from N1 will be
rejected (that is, N1 will restart the whole algorithm with a new ballot).


On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
 to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,



Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
OK, I see.

So you meant that the older ballot will not only reject in round-trip1
(prepare/promise), it also can be reject in propose/accept round-trips2, Is
that correct?

You Said : Or more precisely, you got step 8 wrong: when a replica PROMISE,
the promise is not that they won't promise a ballot older than 2,it's
that they won't accept a ballot older than 2

Why step 8 wrong? I think replicas can accept any highest ballot, so ballot
2 is the highest in step 8? what do you think?
 Do you also mean replica can promise older ballot.

I wish you could make it more clear.

Thank you a lot Sylvain

Ibrahim


On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 That scenario cannot happen. More specifically, your step 12 cannot happen
 if
 step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,
 it's that they won't accept a ballot older than 2. Therefore, after step
 8,
 the accept from N1 will be reject in step 12 and the insert from N1 will be
 rejected (that is, N1 will restart the whole algorithm with a new ballot).


 On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
 seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,