Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-27 Thread Benjamin Lerer
Just wanted to mention that I am super eager to see your proposal
implemented, Benedict. Thanks for pushing this forward :-)

Le mer. 25 août 2021 à 10:29, bened...@apache.org  a
écrit :

> I’ll move this to a vote in a day or so, assuming no further discussion.
>
> From: Jeff Jirsa 
> Date: Monday, 23 August 2021 at 06:46
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
>
>
> > On Aug 22, 2021, at 7:25 PM, Miles Garnsey 
> wrote:
> >
> > 
> >>
> >> The problem is that today there’s no way to reliably exclude the new DC
> from serving reads, that I know of? If you can, then yes you would only
> need to ensure repair were run prior to activating reads from this DC.
> >
> > We think we have a way to do this using certain settings in the Java
> driver.
> >
> > Agree on your other points!
>
> I don’t see how
>
> Your best chance is with snitch games
>
> And those don’t guarantee correctness if a single replica GC pauses and
> forces a speculative retry
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-25 Thread bened...@apache.org
I’ll move this to a vote in a day or so, assuming no further discussion.

From: Jeff Jirsa 
Date: Monday, 23 August 2021 at 06:46
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements


> On Aug 22, 2021, at 7:25 PM, Miles Garnsey  wrote:
>
> 
>>
>> The problem is that today there’s no way to reliably exclude the new DC from 
>> serving reads, that I know of? If you can, then yes you would only need to 
>> ensure repair were run prior to activating reads from this DC.
>
> We think we have a way to do this using certain settings in the Java driver.
>
> Agree on your other points!

I don’t see how

Your best chance is with snitch games

And those don’t guarantee correctness if a single replica GC pauses and forces 
a speculative retry


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-25 Thread Miles Garnsey
Jeff, Benedict, we’ve taken your thoughts on this on board and will probably 
plan to repair in between scaling. The point about speculative retry did get me 
thinking. Many thanks for the tip and the explanation!


> On 23 Aug 2021, at 3:46 pm, Jeff Jirsa  wrote:
> 
> 
> 
>> On Aug 22, 2021, at 7:25 PM, Miles Garnsey  
>> wrote:
>> 
>> 
>>> 
>>> The problem is that today there’s no way to reliably exclude the new DC 
>>> from serving reads, that I know of? If you can, then yes you would only 
>>> need to ensure repair were run prior to activating reads from this DC.
>> 
>> We think we have a way to do this using certain settings in the Java driver.
>> 
>> Agree on your other points!
> 
> I don’t see how
> 
> Your best chance is with snitch games
> 
> And those don’t guarantee correctness if a single replica GC pauses and 
> forces a speculative retry
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 



Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-22 Thread Jeff Jirsa



> On Aug 22, 2021, at 7:25 PM, Miles Garnsey  wrote:
> 
> 
>> 
>> The problem is that today there’s no way to reliably exclude the new DC from 
>> serving reads, that I know of? If you can, then yes you would only need to 
>> ensure repair were run prior to activating reads from this DC.
> 
> We think we have a way to do this using certain settings in the Java driver.
> 
> Agree on your other points!

I don’t see how

Your best chance is with snitch games

And those don’t guarantee correctness if a single replica GC pauses and forces 
a speculative retry


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-22 Thread Miles Garnsey
> The problem is that today there’s no way to reliably exclude the new DC from 
> serving reads, that I know of? If you can, then yes you would only need to 
> ensure repair were run prior to activating reads from this DC.

We think we have a way to do this using certain settings in the Java driver.

Agree on your other points!



> On 20 Aug 2021, at 10:02 pm, bened...@apache.org wrote:
> 
>> My initial testing suggestedit was not required (when the new DC is not 
>> serving reads).
> 
> The problem is that today there’s no way to reliably exclude the new DC from 
> serving reads, that I know of? If you can, then yes you would only need to 
> ensure repair were run prior to activating reads from this DC.
> 
>> Perhaps the CL mechanism could be pluggable
> 
> I think this is unlikely, particularly as we start to consider things like 
> consensus - at least any time soon. Quorums are quite intricately woven into 
> any implementation, and it would be quite hard to fully generalise them. In 
> practice we can probably accommodate any simple vote threshold quorums  
> (those where some electorate each have a vote, and each vote has an equal 
> weight that reaches consensus once a threshold is crossed) and support at 
> least one level of nesting (so that DCs may logically vote as a block based 
> on some quorum within a DC) in any topology without a plugin system, and I 
> suspect this will be more than enough for any system in the foreseeable 
> future.
> 
>> I wonder if it should be a ‘default CL’ which can additionally be overridden 
>> by queries?
> 
> There are some practicalities that probably prohibit us from eliminating user 
> provided CLs, but I would like to see them phased out as far as possible as 
> they are very hard to verify. To support this flexibility more generally I’d 
> prefer to see tables offer potentially multiple consensus schemes with 
> potentially different qualities (that can perhaps even be named by the user) 
> for these cases, such as (for instance) fast-and-inconsistent-reads. This 
> still permits their properties to be vetted by the database while offering 
> flexibility to the user, and for them to declare at the operator level what 
> meeting this concept requires. It also means the database can maintain these 
> properties through any topology change.
> 
> But we’ll probably have people using legacy CLs for another decade, so we’re 
> going to have to support people querying with those CLs, but we might want to 
> encourage people to disable them on their clusters and migrate to safer 
> setups.
> 
> From: Miles Garnsey 
> Date: Friday, 20 August 2021 at 12:51
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> Many thanks for this detailed response Benedict. I look forward to seeing the 
> forthcoming proposals in relation to schema change safety when LWTs are in 
> use.
> 
> We have been following almost the scale-by-one workaround you described - I 
> am grateful for the additional validation. The only divergence is that we 
> have not been advising a repair in between each node addition. My initial 
> testing suggestedit was not required (when the new DC is not serving reads). 
> But if you are aware of issues that arise at scale then I’d love to hear your 
> experience, as we are still in the planning phase for that project.
> 
> Regarding CLs (off topic)
> 
>> To respond to Mick: we could introduce an EACH_SERIAL which would permit 
>> this to be done in one go. This isn’t a super complicated piece of work, and 
>> I’d be happy to help review a contribution here. However, in my view we 
>> should be reconsidering how quorums are decided more comprehensively. This 
>> is very off-topic, but there are other more sensible quorums for 
>> multi-region setups (such as quorum-of-quorums), but also there’s a wide 
>> range of useful quorums we don’t support, particularly heterogenous ones 
>> supporting lower write failure tolerance than read failure tolerance (for 
>> instance). Today we support only the most extreme versions of this, and all 
>> of our quorums must be mixed manually by clients which is error prone. In my 
>> opinion we should be moving towards specifying quorums on a per-table basis 
>> for reads and writes, so that clients do not specify their consistency 
>> levels. This way the database can configure arbitrary quorums, and also 
>> guarantee that these quorums provide the desired consistency.
> 
> I agree with your points here. I’d add that the geographical location of DCs 
> can be relevant.
> Perhaps the CL mechanism could be pluggable (in the same way that authn/z 
> both are) so that we can experiment in this are

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread Joshua McKenzie
>
> In my opinion we should be moving towards specifying quorums on a
> per-table basis for reads and writes, so that clients do not specify their
> consistency levels.

This stood out to me: I'm a strong +1 on this. The less clients have to
know about their powerful and complex distributed database and still gain
the benefits of it the better.

~Josh

On Fri, Aug 20, 2021 at 8:41 AM bened...@apache.org 
wrote:

> > My initial testing suggestedit was not required (when the new DC is not
> serving reads).
>
> The problem is that today there’s no way to reliably exclude the new DC
> from serving reads, that I know of? If you can, then yes you would only
> need to ensure repair were run prior to activating reads from this DC.
>
> > Perhaps the CL mechanism could be pluggable
>
> I think this is unlikely, particularly as we start to consider things like
> consensus - at least any time soon. Quorums are quite intricately woven
> into any implementation, and it would be quite hard to fully generalise
> them. In practice we can probably accommodate any simple vote threshold
> quorums  (those where some electorate each have a vote, and each vote has
> an equal weight that reaches consensus once a threshold is crossed) and
> support at least one level of nesting (so that DCs may logically vote as a
> block based on some quorum within a DC) in any topology without a plugin
> system, and I suspect this will be more than enough for any system in the
> foreseeable future.
>
> > I wonder if it should be a ‘default CL’ which can additionally be
> overridden by queries?
>
> There are some practicalities that probably prohibit us from eliminating
> user provided CLs, but I would like to see them phased out as far as
> possible as they are very hard to verify. To support this flexibility more
> generally I’d prefer to see tables offer potentially multiple consensus
> schemes with potentially different qualities (that can perhaps even be
> named by the user) for these cases, such as (for instance)
> fast-and-inconsistent-reads. This still permits their properties to be
> vetted by the database while offering flexibility to the user, and for them
> to declare at the operator level what meeting this concept requires. It
> also means the database can maintain these properties through any topology
> change.
>
> But we’ll probably have people using legacy CLs for another decade, so
> we’re going to have to support people querying with those CLs, but we might
> want to encourage people to disable them on their clusters and migrate to
> safer setups.
>
> From: Miles Garnsey 
> Date: Friday, 20 August 2021 at 12:51
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> Many thanks for this detailed response Benedict. I look forward to seeing
> the forthcoming proposals in relation to schema change safety when LWTs are
> in use.
>
> We have been following almost the scale-by-one workaround you described -
> I am grateful for the additional validation. The only divergence is that we
> have not been advising a repair in between each node addition. My initial
> testing suggestedit was not required (when the new DC is not serving
> reads). But if you are aware of issues that arise at scale then I’d love to
> hear your experience, as we are still in the planning phase for that
> project.
>
> Regarding CLs (off topic)
>
> > To respond to Mick: we could introduce an EACH_SERIAL which would permit
> this to be done in one go. This isn’t a super complicated piece of work,
> and I’d be happy to help review a contribution here. However, in my view we
> should be reconsidering how quorums are decided more comprehensively. This
> is very off-topic, but there are other more sensible quorums for
> multi-region setups (such as quorum-of-quorums), but also there’s a wide
> range of useful quorums we don’t support, particularly heterogenous ones
> supporting lower write failure tolerance than read failure tolerance (for
> instance). Today we support only the most extreme versions of this, and all
> of our quorums must be mixed manually by clients which is error prone. In
> my opinion we should be moving towards specifying quorums on a per-table
> basis for reads and writes, so that clients do not specify their
> consistency levels. This way the database can configure arbitrary quorums,
> and also guarantee that these quorums provide the desired consistency.
>
> I agree with your points here. I’d add that the geographical location of
> DCs can be relevant.
> Perhaps the CL mechanism could be pluggable (in the same way that authn/z
> both are) so that we can experiment in this area at higher velocity? (I
> appreciate this is an invasive change.)
> A colleague a

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread bened...@apache.org
> My initial testing suggestedit was not required (when the new DC is not 
> serving reads).

The problem is that today there’s no way to reliably exclude the new DC from 
serving reads, that I know of? If you can, then yes you would only need to 
ensure repair were run prior to activating reads from this DC.

> Perhaps the CL mechanism could be pluggable

I think this is unlikely, particularly as we start to consider things like 
consensus - at least any time soon. Quorums are quite intricately woven into 
any implementation, and it would be quite hard to fully generalise them. In 
practice we can probably accommodate any simple vote threshold quorums  (those 
where some electorate each have a vote, and each vote has an equal weight that 
reaches consensus once a threshold is crossed) and support at least one level 
of nesting (so that DCs may logically vote as a block based on some quorum 
within a DC) in any topology without a plugin system, and I suspect this will 
be more than enough for any system in the foreseeable future.

> I wonder if it should be a ‘default CL’ which can additionally be overridden 
> by queries?

There are some practicalities that probably prohibit us from eliminating user 
provided CLs, but I would like to see them phased out as far as possible as 
they are very hard to verify. To support this flexibility more generally I’d 
prefer to see tables offer potentially multiple consensus schemes with 
potentially different qualities (that can perhaps even be named by the user) 
for these cases, such as (for instance) fast-and-inconsistent-reads. This still 
permits their properties to be vetted by the database while offering 
flexibility to the user, and for them to declare at the operator level what 
meeting this concept requires. It also means the database can maintain these 
properties through any topology change.

But we’ll probably have people using legacy CLs for another decade, so we’re 
going to have to support people querying with those CLs, but we might want to 
encourage people to disable them on their clusters and migrate to safer setups.

From: Miles Garnsey 
Date: Friday, 20 August 2021 at 12:51
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
Many thanks for this detailed response Benedict. I look forward to seeing the 
forthcoming proposals in relation to schema change safety when LWTs are in use.

We have been following almost the scale-by-one workaround you described - I am 
grateful for the additional validation. The only divergence is that we have not 
been advising a repair in between each node addition. My initial testing 
suggestedit was not required (when the new DC is not serving reads). But if you 
are aware of issues that arise at scale then I’d love to hear your experience, 
as we are still in the planning phase for that project.

Regarding CLs (off topic)

> To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
> to be done in one go. This isn’t a super complicated piece of work, and I’d 
> be happy to help review a contribution here. However, in my view we should be 
> reconsidering how quorums are decided more comprehensively. This is very 
> off-topic, but there are other more sensible quorums for multi-region setups 
> (such as quorum-of-quorums), but also there’s a wide range of useful quorums 
> we don’t support, particularly heterogenous ones supporting lower write 
> failure tolerance than read failure tolerance (for instance). Today we 
> support only the most extreme versions of this, and all of our quorums must 
> be mixed manually by clients which is error prone. In my opinion we should be 
> moving towards specifying quorums on a per-table basis for reads and writes, 
> so that clients do not specify their consistency levels. This way the 
> database can configure arbitrary quorums, and also guarantee that these 
> quorums provide the desired consistency.

I agree with your points here. I’d add that the geographical location of DCs 
can be relevant.
Perhaps the CL mechanism could be pluggable (in the same way that authn/z both 
are) so that we can experiment in this area at higher velocity? (I appreciate 
this is an invasive change.)
A colleague and I are considering whether we might be able to look at the 
EACH_QUORUM idea in the shorter term. We will share more if we have the 
bandwidth to undertake the work.
I also agree that CLs defined for tables is a worthy enhancement, I wonder if 
it should be a ‘default CL’ which can additionally be overridden by queries?

In any event I feel I’ve hijacked your thread enough, but thank you again for 
the warm welcome and the interesting discussion!

> On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote:
>
> Hello and welcome!
>
> So this is a really complicated topic, unfortunately, but the simple answer 
> is that as currently formulated this work won’t addre

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread Miles Garnsey
Many thanks for this detailed response Benedict. I look forward to seeing the 
forthcoming proposals in relation to schema change safety when LWTs are in use.

We have been following almost the scale-by-one workaround you described - I am 
grateful for the additional validation. The only divergence is that we have not 
been advising a repair in between each node addition. My initial testing 
suggestedit was not required (when the new DC is not serving reads). But if you 
are aware of issues that arise at scale then I’d love to hear your experience, 
as we are still in the planning phase for that project.

Regarding CLs (off topic)

> To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
> to be done in one go. This isn’t a super complicated piece of work, and I’d 
> be happy to help review a contribution here. However, in my view we should be 
> reconsidering how quorums are decided more comprehensively. This is very 
> off-topic, but there are other more sensible quorums for multi-region setups 
> (such as quorum-of-quorums), but also there’s a wide range of useful quorums 
> we don’t support, particularly heterogenous ones supporting lower write 
> failure tolerance than read failure tolerance (for instance). Today we 
> support only the most extreme versions of this, and all of our quorums must 
> be mixed manually by clients which is error prone. In my opinion we should be 
> moving towards specifying quorums on a per-table basis for reads and writes, 
> so that clients do not specify their consistency levels. This way the 
> database can configure arbitrary quorums, and also guarantee that these 
> quorums provide the desired consistency.

I agree with your points here. I’d add that the geographical location of DCs 
can be relevant.
Perhaps the CL mechanism could be pluggable (in the same way that authn/z both 
are) so that we can experiment in this area at higher velocity? (I appreciate 
this is an invasive change.)
A colleague and I are considering whether we might be able to look at the 
EACH_QUORUM idea in the shorter term. We will share more if we have the 
bandwidth to undertake the work.
I also agree that CLs defined for tables is a worthy enhancement, I wonder if 
it should be a ‘default CL’ which can additionally be overridden by queries? 

In any event I feel I’ve hijacked your thread enough, but thank you again for 
the warm welcome and the interesting discussion!

> On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote:
> 
> Hello and welcome!
> 
> So this is a really complicated topic, unfortunately, but the simple answer 
> is that as currently formulated this work won’t address this particular case. 
> The slightly longer answer is that this problem will be a thing of the past 
> soon either way - there’s work incoming to address every possible category of 
> this kind of problem, but it might take a little longer.
> 
> The full answer is that membership of a keyspace in Cassandra is a mess, and 
> is derived from the intersection of two things: schema and gossip. The 
> electorate verification addresses _gossip_ inconsistencies, that is, 
> inconsistencies about what nodes are perceived to be a member of the ring. 
> Schema generates the issue you are discussing here. In particular the lack of 
> any state machine that transitions from one topology to another when a new 
> schema implies a new topology. This is its own distinct problem, that others 
> I work with plan to file a CEP for in the coming weeks or months.
> 
> In the meantime, the correct way to do this (painful though it might be) is 
> to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 
> and wait for that to settle, *run repair* and then bump to RF=2, etc.
> 
> To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
> to be done in one go. This isn’t a super complicated piece of work, and I’d 
> be happy to help review a contribution here. However, in my view we should be 
> reconsidering how quorums are decided more comprehensively. This is very 
> off-topic, but there are other more sensible quorums for multi-region setups 
> (such as quorum-of-quorums), but also there’s a wide range of useful quorums 
> we don’t support, particularly heterogenous ones supporting lower write 
> failure tolerance than read failure tolerance (for instance). Today we 
> support only the most extreme versions of this, and all of our quorums must 
> be mixed manually by clients which is error prone. In my opinion we should be 
> moving towards specifying quorums on a per-table basis for reads and writes, 
> so that clients do not specify their consistency levels. This way the 
> database can configure arbitrary quorums, and also guarantee that these 
> quorums provide the desired consistency.
> 
> 
> Fro

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread bened...@apache.org
Hello and welcome!

So this is a really complicated topic, unfortunately, but the simple answer is 
that as currently formulated this work won’t address this particular case. The 
slightly longer answer is that this problem will be a thing of the past soon 
either way - there’s work incoming to address every possible category of this 
kind of problem, but it might take a little longer.

The full answer is that membership of a keyspace in Cassandra is a mess, and is 
derived from the intersection of two things: schema and gossip. The electorate 
verification addresses _gossip_ inconsistencies, that is, inconsistencies about 
what nodes are perceived to be a member of the ring. Schema generates the issue 
you are discussing here. In particular the lack of any state machine that 
transitions from one topology to another when a new schema implies a new 
topology. This is its own distinct problem, that others I work with plan to 
file a CEP for in the coming weeks or months.

In the meantime, the correct way to do this (painful though it might be) is to 
add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 and 
wait for that to settle, *run repair* and then bump to RF=2, etc.

To respond to Mick: we could introduce an EACH_SERIAL which would permit this 
to be done in one go. This isn’t a super complicated piece of work, and I’d be 
happy to help review a contribution here. However, in my view we should be 
reconsidering how quorums are decided more comprehensively. This is very 
off-topic, but there are other more sensible quorums for multi-region setups 
(such as quorum-of-quorums), but also there’s a wide range of useful quorums we 
don’t support, particularly heterogenous ones supporting lower write failure 
tolerance than read failure tolerance (for instance). Today we support only the 
most extreme versions of this, and all of our quorums must be mixed manually by 
clients which is error prone. In my opinion we should be moving towards 
specifying quorums on a per-table basis for reads and writes, so that clients 
do not specify their consistency levels. This way the database can configure 
arbitrary quorums, and also guarantee that these quorums provide the desired 
consistency.


From: Miles Garnsey 
Date: Friday, 20 August 2021 at 00:47
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
Long time listener, first time caller here - hello!

I am very interested in this part "Better safety among range movements: 
Electorate verification during range movements provides a stronger assertion of 
linearizability via assurance of the set of instances voting on a transaction.”

I have seen issues in the wild where people want to add/remove DCs. I think 
that there may be a risk consistency violations due to transactions 
circumventing the locks held by in-progress transactions. Will electorate 
verification help in the below scenario?
Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3.
DC2 is added, and once all nodes are in UN the schema is adjusted so that DC2’s 
RF=3.
While the new schema propagates, there is a transitional state, in which some 
potential coordinators have the new schema S2, and others are operating on the 
old schema S1.
In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form 
consensus from 2/3 nodes.
A query issued from an S1 coordinator can form a valid consensus which will 
circumvent the lock held by an S2 coordinator.
I was thinking of proposing an EACH_QUORUM serial CL, but if electorate 
verification solves the problem then that may be the better solution.

Miles


> On 19 Aug 2021, at 9:18 am, Scott Andreas  wrote:
>
> Benedict, thank you for sharing this CEP!
>
> Adding some notes on why I support this proposal:
>
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
>
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
>
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
>
> - Better safety among range movements: Electorate verification during range 
> movements provides a stron

Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread Mick Semb Wever



On 2021/08/20 07:07:00, Mick Semb Wever  wrote: 
> >  e.g. mixing SERIAL with LOCAL_SERIAL, which is not safe unless you
> > perform a really intricate dance, but we can distinguish this case from
> > real bugs.
> >
> >
> >
> 
> Benedict, possibly off-topic, but are there any plans or thoughts around
> adding EACH_SERIAL ?
> 
> A number of users have enquired about this, having to deal with edge cases
> when changing replication between two DCs, for example when migrating to
> and decommissioning DCs.
> 


Apologies, this repeats Miles' question, which I didn't see until now.


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-20 Thread Mick Semb Wever
>  e.g. mixing SERIAL with LOCAL_SERIAL, which is not safe unless you
> perform a really intricate dance, but we can distinguish this case from
> real bugs.
>
>
>

Benedict, possibly off-topic, but are there any plans or thoughts around
adding EACH_SERIAL ?

A number of users have enquired about this, having to deal with edge cases
when changing replication between two DCs, for example when migrating to
and decommissioning DCs.



Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread Miles Garnsey
Long time listener, first time caller here - hello! 

I am very interested in this part "Better safety among range movements: 
Electorate verification during range movements provides a stronger assertion of 
linearizability via assurance of the set of instances voting on a transaction.”

I have seen issues in the wild where people want to add/remove DCs. I think 
that there may be a risk consistency violations due to transactions 
circumventing the locks held by in-progress transactions. Will electorate 
verification help in the below scenario?
Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3.
DC2 is added, and once all nodes are in UN the schema is adjusted so that DC2’s 
RF=3.
While the new schema propagates, there is a transitional state, in which some 
potential coordinators have the new schema S2, and others are operating on the 
old schema S1.
In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form 
consensus from 2/3 nodes.
A query issued from an S1 coordinator can form a valid consensus which will 
circumvent the lock held by an S2 coordinator.
I was thinking of proposing an EACH_QUORUM serial CL, but if electorate 
verification solves the problem then that may be the better solution.

Miles


> On 19 Aug 2021, at 9:18 am, Scott Andreas  wrote:
> 
> Benedict, thank you for sharing this CEP!
> 
> Adding some notes on why I support this proposal:
> 
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
> 
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
> 
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
> 
> - Better safety among range movements: Electorate verification during range 
> movements provides a stronger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
> 
> – Scott
> 
> 
> From: bened...@apache.org 
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] CEP 14: Paxos Improvements
> 
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> 
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
> 
> If you have any concerns or questions please raise them here for discussion.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 



Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread bened...@apache.org
> Why not throw an exception?

So this is essentially just a reporting mechanism for when an operation 
encounters state that should be impossible – this will have been left behind by 
prior operations, so the damage is already done and there’s no reason to throw 
an exception and fail the current one.

I should also make clear this _isn’t_ a guarantee of spotting violations, but 
it’s quite sensitive and much better than nothing. In a real system the most 
likely cause of this kind of impossible state would be e.g. mixing SERIAL with 
LOCAL_SERIAL, which is not safe unless you perform a really intricate dance, 
but we can distinguish this case from real bugs.

> Also, way to sell the next discussion Benedict :D

:D


From: Patrick McFadin 
Date: Thursday, 19 August 2021 at 21:48
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
I'm curious about this: "We will introduce mechanisms to spot and log
linearizability violations for the user to file as bug reports" Why not
throw an exception? Maybe it's just I don't quite see how this will be
detected. I think this is very interesting though.

Also, way to sell the next discussion Benedict :D

Patrick

On Thu, Aug 19, 2021 at 1:50 AM bened...@apache.org 
wrote:

> Hi Jeremy,
>
> That’s a great question, and the answer is that we shouldn’t compare the
> two as they aren’t in conflict. The goal of this work is only to improve
> the existing Paxos implementation – the characteristics are identical
> besides being faster, so this is a simple and safe upgrade route for users
> in the short to medium term.
>
> Watch this space for a follow up discussion very soon about what we can do
> to modernise transactions in Cassandra more generally, and what this might
> mean for how we perform consensus. A comparative discussion of EPaxos and
> other related work is very well suited to that topic, in my opinion.
>
>
> From: Jeremy Hanna 
> Date: Thursday, 19 August 2021 at 00:58
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> It sounds like a great improvement!
>
> Just for those who had followed the development of ePaxos* that Blake and
> others worked on but was never committed, it would be nice to briefly
> compare the two.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6246
>
> > On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
> >
> > Benedict, thank you for sharing this CEP!
> >
> > Adding some notes on why I support this proposal:
> >
> > - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x
> on reads is a huge improvement. This latency reduction may be sufficient to
> allow many users of Cassandra who operate in a single datacenter,
> availability zone, or region to migrate to a multi-region topology.
> >
> > - The Cluster Simulation work described in CEP-10 provides a toolchain
> for probabilistically-exhaustive validation and simulation of transactional
> correctness, allowing assertion of linearizability in the presence of
> adversarial thread scheduling and message ordering over an unbounded number
> of simulated clusters and transactions.
> >
> > - Some use cases may see a superlinear increase in LWT performance due
> to a reduction in contention afforded by fewer message round-trips. E.g.,
> halving latency shortens the interval during which competing transactions
> may conflict, reducing contention and improving throughput beyond a level
> that would be afforded by the latency reduction alone.
> >
> > - Better safety among range movements: Electorate verification during
> range movements provides a stronger assertion of linearizability via
> assurance of the set of instances voting on a transaction.
> >
> > – Scott
> >
> > 
> > From: bened...@apache.org 
> > Sent: Wednesday, August 18, 2021 2:31 PM
> > To: dev@cassandra.apache.org
> > Subject: [DISCUSS] CEP 14: Paxos Improvements
> >
> > RE:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> >
> > I’m proposing this CEP for approval by the project. The goal is to both
> improve the performance of LWTs and to ensure their correctness across a
> range of scenario like range movements. This work builds upon the Simulator
> CEP that has been recently adopted, and patches will follow in the coming
> weeks.
> >
> > If you have any concerns or questions please raise them here for
> discussion.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread Patrick McFadin
I'm curious about this: "We will introduce mechanisms to spot and log
linearizability violations for the user to file as bug reports" Why not
throw an exception? Maybe it's just I don't quite see how this will be
detected. I think this is very interesting though.

Also, way to sell the next discussion Benedict :D

Patrick

On Thu, Aug 19, 2021 at 1:50 AM bened...@apache.org 
wrote:

> Hi Jeremy,
>
> That’s a great question, and the answer is that we shouldn’t compare the
> two as they aren’t in conflict. The goal of this work is only to improve
> the existing Paxos implementation – the characteristics are identical
> besides being faster, so this is a simple and safe upgrade route for users
> in the short to medium term.
>
> Watch this space for a follow up discussion very soon about what we can do
> to modernise transactions in Cassandra more generally, and what this might
> mean for how we perform consensus. A comparative discussion of EPaxos and
> other related work is very well suited to that topic, in my opinion.
>
>
> From: Jeremy Hanna 
> Date: Thursday, 19 August 2021 at 00:58
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> It sounds like a great improvement!
>
> Just for those who had followed the development of ePaxos* that Blake and
> others worked on but was never committed, it would be nice to briefly
> compare the two.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6246
>
> > On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
> >
> > Benedict, thank you for sharing this CEP!
> >
> > Adding some notes on why I support this proposal:
> >
> > - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x
> on reads is a huge improvement. This latency reduction may be sufficient to
> allow many users of Cassandra who operate in a single datacenter,
> availability zone, or region to migrate to a multi-region topology.
> >
> > - The Cluster Simulation work described in CEP-10 provides a toolchain
> for probabilistically-exhaustive validation and simulation of transactional
> correctness, allowing assertion of linearizability in the presence of
> adversarial thread scheduling and message ordering over an unbounded number
> of simulated clusters and transactions.
> >
> > - Some use cases may see a superlinear increase in LWT performance due
> to a reduction in contention afforded by fewer message round-trips. E.g.,
> halving latency shortens the interval during which competing transactions
> may conflict, reducing contention and improving throughput beyond a level
> that would be afforded by the latency reduction alone.
> >
> > - Better safety among range movements: Electorate verification during
> range movements provides a stronger assertion of linearizability via
> assurance of the set of instances voting on a transaction.
> >
> > – Scott
> >
> > 
> > From: bened...@apache.org 
> > Sent: Wednesday, August 18, 2021 2:31 PM
> > To: dev@cassandra.apache.org
> > Subject: [DISCUSS] CEP 14: Paxos Improvements
> >
> > RE:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> >
> > I’m proposing this CEP for approval by the project. The goal is to both
> improve the performance of LWTs and to ensure their correctness across a
> range of scenario like range movements. This work builds upon the Simulator
> CEP that has been recently adopted, and patches will follow in the coming
> weeks.
> >
> > If you have any concerns or questions please raise them here for
> discussion.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-19 Thread bened...@apache.org
Hi Jeremy,

That’s a great question, and the answer is that we shouldn’t compare the two as 
they aren’t in conflict. The goal of this work is only to improve the existing 
Paxos implementation – the characteristics are identical besides being faster, 
so this is a simple and safe upgrade route for users in the short to medium 
term.

Watch this space for a follow up discussion very soon about what we can do to 
modernise transactions in Cassandra more generally, and what this might mean 
for how we perform consensus. A comparative discussion of EPaxos and other 
related work is very well suited to that topic, in my opinion.


From: Jeremy Hanna 
Date: Thursday, 19 August 2021 at 00:58
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
It sounds like a great improvement!

Just for those who had followed the development of ePaxos* that Blake and 
others worked on but was never committed, it would be nice to briefly compare 
the two.

https://issues.apache.org/jira/browse/CASSANDRA-6246

> On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
>
> Benedict, thank you for sharing this CEP!
>
> Adding some notes on why I support this proposal:
>
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
>
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
>
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
>
> - Better safety among range movements: Electorate verification during range 
> movements provides a stronger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
>
> – Scott
>
> 
> From: bened...@apache.org 
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] CEP 14: Paxos Improvements
>
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
>
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
>
> If you have any concerns or questions please raise them here for discussion.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-18 Thread Jeremy Hanna
It sounds like a great improvement!

Just for those who had followed the development of ePaxos* that Blake and 
others worked on but was never committed, it would be nice to briefly compare 
the two. 

https://issues.apache.org/jira/browse/CASSANDRA-6246

> On Aug 19, 2021, at 9:18 AM, Scott Andreas  wrote:
> 
> Benedict, thank you for sharing this CEP!
> 
> Adding some notes on why I support this proposal:
> 
> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
> reads is a huge improvement. This latency reduction may be sufficient to 
> allow many users of Cassandra who operate in a single datacenter, 
> availability zone, or region to migrate to a multi-region topology.
> 
> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
> probabilistically-exhaustive validation and simulation of transactional 
> correctness, allowing assertion of linearizability in the presence of 
> adversarial thread scheduling and message ordering over an unbounded number 
> of simulated clusters and transactions.
> 
> - Some use cases may see a superlinear increase in LWT performance due to a 
> reduction in contention afforded by fewer message round-trips. E.g., halving 
> latency shortens the interval during which competing transactions may 
> conflict, reducing contention and improving throughput beyond a level that 
> would be afforded by the latency reduction alone.
> 
> - Better safety among range movements: Electorate verification during range 
> movements provides a stronger assertion of linearizability via assurance of 
> the set of instances voting on a transaction.
> 
> – Scott
> 
> 
> From: bened...@apache.org 
> Sent: Wednesday, August 18, 2021 2:31 PM
> To: dev@cassandra.apache.org
> Subject: [DISCUSS] CEP 14: Paxos Improvements
> 
> RE: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
> 
> I’m proposing this CEP for approval by the project. The goal is to both 
> improve the performance of LWTs and to ensure their correctness across a 
> range of scenario like range movements. This work builds upon the Simulator 
> CEP that has been recently adopted, and patches will follow in the coming 
> weeks.
> 
> If you have any concerns or questions please raise them here for discussion.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


Re: [DISCUSS] CEP 14: Paxos Improvements

2021-08-18 Thread Scott Andreas
Benedict, thank you for sharing this CEP!

Adding some notes on why I support this proposal:

- Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
reads is a huge improvement. This latency reduction may be sufficient to allow 
many users of Cassandra who operate in a single datacenter, availability zone, 
or region to migrate to a multi-region topology.

- The Cluster Simulation work described in CEP-10 provides a toolchain for 
probabilistically-exhaustive validation and simulation of transactional 
correctness, allowing assertion of linearizability in the presence of 
adversarial thread scheduling and message ordering over an unbounded number of 
simulated clusters and transactions.

- Some use cases may see a superlinear increase in LWT performance due to a 
reduction in contention afforded by fewer message round-trips. E.g., halving 
latency shortens the interval during which competing transactions may conflict, 
reducing contention and improving throughput beyond a level that would be 
afforded by the latency reduction alone.

- Better safety among range movements: Electorate verification during range 
movements provides a stronger assertion of linearizability via assurance of the 
set of instances voting on a transaction.

– Scott


From: bened...@apache.org 
Sent: Wednesday, August 18, 2021 2:31 PM
To: dev@cassandra.apache.org
Subject: [DISCUSS] CEP 14: Paxos Improvements

RE: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements

I’m proposing this CEP for approval by the project. The goal is to both improve 
the performance of LWTs and to ensure their correctness across a range of 
scenario like range movements. This work builds upon the Simulator CEP that has 
been recently adopted, and patches will follow in the coming weeks.

If you have any concerns or questions please raise them here for discussion.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



[DISCUSS] CEP 14: Paxos Improvements

2021-08-18 Thread bened...@apache.org
RE: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements

I’m proposing this CEP for approval by the project. The goal is to both improve 
the performance of LWTs and to ensure their correctness across a range of 
scenario like range movements. This work builds upon the Simulator CEP that has 
been recently adopted, and patches will follow in the coming weeks.

If you have any concerns or questions please raise them here for discussion.