Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread [email protected]
> valuing community over code

“Community” involves treating others with respect: following the norms of 
conversation by acknowledging and responding to the points and queries of 
others, accepting when you have a minority position, and stepping aside when 
your goals are not clearly in conflict with others.

A negative vote is also expected to be accompanied by an alternative proposal: 
https://www.apache.org/foundation/how-it-works.html#decision-making

During the discussion, despite weeks of exhortation, no alternative proposal 
was made.

I understand that you prefer no conflict, Mick, but consensus rests on these 
norms being followed, and they clearly were not in this case.

I agree with Leif this was a bug in our process. I will be bringing forward a 
proposal to make clear the expectations of people involved in the CEP process, 
so there is greater clarity in future and so that failures to behave in an 
appropriate manner cannot unduly prevent progress.


From: Mick Semb Wever 
Date: Friday, 15 October 2021 at 16:33
To: [email protected] 
Subject: Re: Tradeoffs for Cassandra transaction management
>
> I have reviewed CEP-15 and I must say, I'm excited to see its inclusion
> into mainline Cassandra, and I'm disheartened to see what appears to be an
> unsubstantiated veto of the proposal from the committee's leadership.
>


Leif,
the Accord paper and CEP-15 has indeed generated a lot of excitement in the
community.

But please don't misinterpret what vetoes are. Cassandra 4.0 (from RCs) was
vetoed four times before it got released, every veto was important and in
support of 4.0.0 out and appreciated by all. No one doubted that 4.0.0
wasn't about to come out.

The ASF community has a precedence for seeking consensus, and valuing
community over code. The latter point is a touchy topic, wide open to
different opinions about what constitutes a healthy and inclusive community
both in the short and long term. In my opinion, rushing people never helps,
bear with us and we will get there and get there together. And I believe we
will have some valuable retrospectives from current threads to help us
become even better at what we do.

kind regards,
Mick


Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Henrik Ingo
On Fri, Oct 15, 2021 at 5:54 PM Dinesh Joshi 
wrote:

> Thank you for clarifying the terminology. I haven’t honestly heard anybody
> call these as interactive transactions. Therefore it is very crucial that
> we lay out things systematically so everyone is on the same page. You’re
> talking about bundling several statements into a single SQL transaction
> block.
>
>
Well, it's more complicated than that. Systems like Calvin and VoltDB have
introduced concepts where you can bundle several statements into a  single
transaction block, but that block is executed server side, and it's not
possible to have any additional roundtrips to the client. So the use of
"interactive transactions" is supposed to distinguish from those. But
you're right I may have invented the word. Historically such transactions
were the norm so no additional qualifier has been needed.

henrik


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Bowen Song
I'm worried that by the time a consensus is reached, the people who 
originally purposed the CEP may have long lost their passion about it 
and may no longer willing to contribute.


On 15/10/2021 16:55, Benjamin Lerer wrote:

Reaching consensus is hard but we will get there :-)

Le ven. 15 oct. 2021 à 17:33, Mick Semb Wever  a écrit :


I have reviewed CEP-15 and I must say, I'm excited to see its inclusion
into mainline Cassandra, and I'm disheartened to see what appears to be

an

unsubstantiated veto of the proposal from the committee's leadership.



Leif,
the Accord paper and CEP-15 has indeed generated a lot of excitement in the
community.

But please don't misinterpret what vetoes are. Cassandra 4.0 (from RCs) was
vetoed four times before it got released, every veto was important and in
support of 4.0.0 out and appreciated by all. No one doubted that 4.0.0
wasn't about to come out.

The ASF community has a precedence for seeking consensus, and valuing
community over code. The latter point is a touchy topic, wide open to
different opinions about what constitutes a healthy and inclusive community
both in the short and long term. In my opinion, rushing people never helps,
bear with us and we will get there and get there together. And I believe we
will have some valuable retrospectives from current threads to help us
become even better at what we do.

kind regards,
Mick



-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Benjamin Lerer
Reaching consensus is hard but we will get there :-)

Le ven. 15 oct. 2021 à 17:33, Mick Semb Wever  a écrit :

> >
> > I have reviewed CEP-15 and I must say, I'm excited to see its inclusion
> > into mainline Cassandra, and I'm disheartened to see what appears to be
> an
> > unsubstantiated veto of the proposal from the committee's leadership.
> >
>
>
> Leif,
> the Accord paper and CEP-15 has indeed generated a lot of excitement in the
> community.
>
> But please don't misinterpret what vetoes are. Cassandra 4.0 (from RCs) was
> vetoed four times before it got released, every veto was important and in
> support of 4.0.0 out and appreciated by all. No one doubted that 4.0.0
> wasn't about to come out.
>
> The ASF community has a precedence for seeking consensus, and valuing
> community over code. The latter point is a touchy topic, wide open to
> different opinions about what constitutes a healthy and inclusive community
> both in the short and long term. In my opinion, rushing people never helps,
> bear with us and we will get there and get there together. And I believe we
> will have some valuable retrospectives from current threads to help us
> become even better at what we do.
>
> kind regards,
> Mick
>


Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Mick Semb Wever
>
> I have reviewed CEP-15 and I must say, I'm excited to see its inclusion
> into mainline Cassandra, and I'm disheartened to see what appears to be an
> unsubstantiated veto of the proposal from the committee's leadership.
>


Leif,
the Accord paper and CEP-15 has indeed generated a lot of excitement in the
community.

But please don't misinterpret what vetoes are. Cassandra 4.0 (from RCs) was
vetoed four times before it got released, every veto was important and in
support of 4.0.0 out and appreciated by all. No one doubted that 4.0.0
wasn't about to come out.

The ASF community has a precedence for seeking consensus, and valuing
community over code. The latter point is a touchy topic, wide open to
different opinions about what constitutes a healthy and inclusive community
both in the short and long term. In my opinion, rushing people never helps,
bear with us and we will get there and get there together. And I believe we
will have some valuable retrospectives from current threads to help us
become even better at what we do.

kind regards,
Mick


Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Dinesh Joshi
Thank you for clarifying the terminology. I haven’t honestly heard anybody call 
these as interactive transactions. Therefore it is very crucial that we lay out 
things systematically so everyone is on the same page. You’re talking about 
bundling several statements into a single SQL transaction block.

Dinesh

> On Oct 15, 2021, at 2:01 AM, Henrik Ingo  wrote:
> On Fri, Oct 15, 2021 at 3:37 AM Dinesh Joshi 
> wrote:
> 
>>> On 10/14/21 6:54 AM, Jonathan Ellis wrote:
>>> I think I've also been clear that I want a path to supporting (1) local
>>> latencies (SLOG is a more elegant solution but "let's just let people
>> give
>>> up global serializability like LWT" is also reasonable) and (2) SQL with
>>> interactive transactions.
>> 
>> 
>> 99% of the transactions in a system will not be performed as interactive
>> SQL transactions by a human. We should be optimizing for the 99%.
> "Interactive" here does not mean that it's a human typing the queries. It
> rather means that there are more than one round trips between the client
> and server.
> 
> Any application doing:
> 
>BEGIN
>x = SELECT x FROM ...
>if x == 5:
>UPDATE t SET y=6
>COMMIT
> 
> ...would be an interactive transaction. And this is traditionally the
> common case, even if recent NewSQL and NoSQL databases have introduced some
> intriguing outside of the box thinking in this area.
> 
> henrik
> 
> -- 
> 
> Henrik Ingo
> 
> +358 40 569 7354 <358405697354>
> 
> [image: Visit us online.]   [image: Visit us on
> Twitter.]   [image: Visit us on YouTube.]
> 
>  [image: Visit my LinkedIn profile.] 

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



RE: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Leif Walsh
Hi all,

I'm not an active member of the c* developer community, but I'm a user of
c* at my day job, and I have a healthy background in distributed storage
systems and consensus protocols (my previous job and university training).

I have reviewed CEP-15 and I must say, I'm excited to see its inclusion
into mainline Cassandra, and I'm disheartened to see what appears to be an
unsubstantiated veto of the proposal from the committee's leadership.

Accord is a solid, well documented, and well tested addition to c* that I
believe will improve the performance of many c* workloads, including those
important to my team, and at a fundamental level it's an advancement in the
state of the art for distributed database technology.

The argument that Accord needs to provide a transactional model and
guarantees for CQL users before it can be included in mainline Cassandra
misses the point. Adding Accord provides real value to c* users today. It
can be extended to provide more SQL-like semantics in the future, which
others have demonstrated, but even if it couldn't, that should not block
non-CQL users from taking advantage of it now.

The Cassandra project home page claims that its values include openness,
empathy, and collaboration. Merging CEP-15 is in line with those values.
Reappearing from years of ignoring the project and wielding one's committee
membership to veto an important and useful addition to the project does
not, to me, seem to align with stepping down considerately.

Jonathan, please reconsider your role and voice in this matter and try to
put c* users' needs first. Committee, please discuss whether it is in your
project's best interest to maintain the collaboration structure in place
that has caused this proposal to end up in the position it is in.

On 2021/10/09 16:54:10 Jonathan Ellis wrote:
> * Hi all,After calling several times for a broader discussion of goals and
> tradeoffs around transaction management in the CEP-15 thread, I’ve put
> together a short analysis to kick that off.Here is a table that summarizes
> the state of the art for distributed transactions that offer
> serializability, i.e., a superset of what you can get with LWT. (The most
> interesting option that this eliminates is RAMP.)Since I'm not sure how
> this will render outside gmail, I've also uploaded it here:
> https://imgur.com/a/SCZ8jex
> SpannerCockroachCalvin/FaunaSLOG (see
> below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> intercontinental replication this is 100+ms. Cloud Spanner does not allow
> truly global deployments for this reason.Single-region Paxos, plus 2pc.
> I’m not very clear on how this works but it results in non-strict
> serializability.I didn’t find actual numbers for CR other than “2ms in a
> single AZ” which is not a typical scenario.Global Raft. Fauna posts actual
> numbers of ~70ms in production which I assume corresponds to a
multi-region
> deployment with all regions in the USA. SLOG paper says true global Calvin
> is 200+ms.Single-region Paxos (common case) with fallback to multi-region
> Paxos.Under 10ms.Scalability bottlenecksLocks held during cross-region
> replicationSame as SpannerOLLP approach required when PKs are not known in
> advance (mostly for indexed queries) -- results in retries under
> contentionSame as CalvinRead latency at serial consistencyTimestamp from
> Paxos leader (may be cross-region), then read from local replica.Same as
> Spanner, I thinkSame as writesSame as writesMaximum serializability
> flavorStrictUn-strictStrictStrictSupport for other isolation
> levels?SnapshotNoSnapshot (in Fauna)Paper mentions dropping from
> strict-serializable to only serializable. Probably could also support
> Snapshot like Fauna.Interactive transaction support (req’d for
> SQL)YesYesNoNoPotential for grafting onto C*NightmareNightmareReasonable,
> Calvin is relatively simple and the storage assumptions it makes are
> minimalI haven’t thought about this enough. SLOG may require versioned
> storage, e.g. see this comment
> <
http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html?showComment=1570497003296#c5976719429355924873
>.(I
> have not included Accord here because it’s not sufficiently clear to me
how
> to create a full transaction manager from the Accord protocol, so I can’t
> analyze many of the properties such a system would have. The most obvious
> solution would be “Calvin but with Accord instead of Raft”, but since
> Accord already does some Calvin-like things that seems like it would
result
> in some suboptimal redundancy.)After putting the above together it seems
to
> me that the two main areas of tradeoff are, 1. Is it worth giving up local
> latencies to get full global consistency? Most LWT use cases use
> LOCAL_SERIAL. While all of the above have more efficient designs than LWT,
> it’s still true that global serialization will require 100+ms in the
> general case due to physical transmission latency. So 

Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Henrik Ingo
On Fri, Oct 15, 2021 at 3:37 AM Dinesh Joshi 
wrote:

> On 10/14/21 6:54 AM, Jonathan Ellis wrote:
>
> > I think I've also been clear that I want a path to supporting (1) local
> > latencies (SLOG is a more elegant solution but "let's just let people
> give
> > up global serializability like LWT" is also reasonable) and (2) SQL with
> > interactive transactions.
>
>
> 99% of the transactions in a system will not be performed as interactive
> SQL transactions by a human. We should be optimizing for the 99%.
>
>
"Interactive" here does not mean that it's a human typing the queries. It
rather means that there are more than one round trips between the client
and server.

Any application doing:

BEGIN
x = SELECT x FROM ...
if x == 5:
UPDATE t SET y=6
COMMIT

...would be an interactive transaction. And this is traditionally the
common case, even if recent NewSQL and NoSQL databases have introduced some
intriguing outside of the box thinking in this area.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Jordan West
Hi All,

First off, thank you for the very interesting technical discussions on this
topic. It's been great to see some back and forth on it. I haven't been
involved mainly because my research on this topic is relatively stale. I
did however want to chime in to encourage us to step back and take a look
at the topic of whether SQL support is the direction we want to be going
with Cassandra. For some context, I now work on and operate both Cassandra
and CockroachDB at a relatively large scale. In this case, CockroachDB is
not positioned as a potential replacement for Cassandra but as an
additional choice to meet different needs. Meeting those needs necessitates
different tradeoffs. Tradeoffs that have concrete impacts on how the
database performs, how production support works, how the user can break the
database, and what can be accomplished successfully by the user. When I
look at what my users need from Cassandra, it's not to have a competing
solution to CockroachDB -- a solution that exists and is becoming more and
more production proven every day. They do however need things like
scalable, consistent secondary indexing -- a feature I envision Accord
could unlock with its multi-partition CAS/transactions -- or better
performing single-partition LWTs -- ones that take significantly less round
trips and work over the WAN. I would encourage those pushing for SQL
support to consider that and to start a discussion first with the community
on whether SQL support is the direction we should be heading in the best
interest of the project.

The technical understanding I do have of both Accord and CockroachDB leads
me to believe that holding up CEP-15 for that decision, regardless of
whether we decide SQL support is the direction to go or not, is not
necessary. I believe it was stated earlier in the thread but if Accord
provides similar or better guartunees than Raft then a similar distributed
transaction protocol can be built on top of it to support interactive SQL.

Jordan


On Tue, Oct 12, 2021 at 8:21 PM Jonathan Ellis  wrote:

> Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
> precedent of pushing through major initiatives in this project in a matter
> of weeks.  We [members of the PMC that weren’t involved in creating Accord]
> need time to do thorough research and make sure both that we understand
> what is being proposed and that we have evaluated reasonable alternatives.
>
> One of the difficulties in evaluating Accord is that it combines a
> state-of-the-art consensus/ordering protocol with a fairly limited
> transaction manager.  So it may be useful to decouple the consensus and
> transaction processing components, which would both allow non-Cassandra
> usage of the consensus piece, and also make explicit the boundaries with
> transaction processing with the consequence of making it easier to evolve
> independently.
>
> In the meantime, it’s very important to me to understand on which
> dimensions the transaction manager can be improved easily, and which
> dimensions resist such improvement.  I get that Accord is your [plural]
> baby and it’s awkward for me to come along and start pointing at its
> limitations, but that’s part of creating a complete understanding of any
> system.
>
> If I keep coming back to the subject of SQL support and interactive
> transactions, that’s because it’s becoming table stakes in the distributed
> database space. People are using Cockroach or Yugabyte or Cloud Spanner for
> use cases where a couple years ago they would have used Cassandra. We can
> expect this trend to continue and strengthen.
>
> On Mon, Oct 11, 2021 at 11:39 PM Blake Eggleston
>  wrote:
>
> > Let’s get back on topic.
> >
> > Jonathan, in your opening email you stated that, in your view, the 2 main
> > areas of tradeoff were:
> >
> > > 1. Is it worth giving up local latencies to get full global
> consistency?
> >
> > Now we’ve established that we don’t need to give up local latencies with
> > Accord, which leaves:
> >
> > > 2. Is it worth giving up the possibility of SQL support, to get the
> > benefits of deterministic transaction design?
> >
> > I pointed out that this was a false dilemma and that, in the worst case,
> a
> > hypothetical SQL feature could have it’s own consensus system. I hope
> that
> > won’t be necessary, but as I later pointed out (and you did not address,
> > although maybe I should have phrased it as a question), if we’re going to
> > weigh accord against a hypothetical SQL feature that lacks design goals,
> or
> > any clear ideas about how it might be implemented, how can we rule that
> out?
> >
> > So Jonathan, how can we rule that out? How can we have a productive
> > discussion about a feature you yourself are unable to describe in any
> > meaningful detail?
> >
> > > On Oct 11, 2021, at 6:34 PM, Jonathan Ellis  wrote:
> > >
> > > On Mon, Oct 11, 2021 at 5:11 PM [email protected] <
> [email protected]
> > >
> > > wrote:
> > >
> > >> If we want to full

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Dinesh Joshi
On 10/14/21 6:54 AM, Jonathan Ellis wrote:

> I think I've also been clear that I want a path to supporting (1) local
> latencies (SLOG is a more elegant solution but "let's just let people give
> up global serializability like LWT" is also reasonable) and (2) SQL with
> interactive transactions.


99% of the transactions in a system will not be performed as interactive
SQL transactions by a human. We should be optimizing for the 99%.

Dinesh

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Jonathan Ellis
On Thu, Oct 14, 2021 at 4:01 PM [email protected] 
wrote:

> The only TPC-C New Order transaction I recall you linking was interactive,
> which as far as I am aware is not supported by Calvin.
>

The SQLite version I linked was interactive, but it can be implemented
non-interactively, which is what the Calvin team did to benchmark it.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread [email protected]
The only TPC-C New Order transaction I recall you linking was interactive, 
which as far as I am aware is not supported by Calvin.

Are we settling on Calvin for your preferred system semantics then? As it does 
not support your preferred interactive transactions. To continue this 
discussion I must insist you specify your goal criteria, that I have requested 
six times already.

Please then specify a Calvin-compatible transaction and an explanation of why 
you believe Accord does not support it. To continue this discussion I must 
insist on concrete problems that you have invested the time to state clearly, 
with reasoned explanations using your present understanding of Accord to 
explain why you believe it does not work. This should ideally reference the 
actual protocol specified in the whitepaper. You should be able to demonstrate 
that you have invested the time to understand the proposal, and the problem 
case you perceive, in some reasonable level of detail.

Since you have asked no clarifying questions about the whitepaper in the past 
six weeks, I can only assume you believe yourself to understand it already, but 
in case any confusion has arisen your detailed explanation of the problem case 
will help me better understand what needs to be stated in response to your 
query.



From: Jonathan Ellis 
Date: Thursday, 14 October 2021 at 21:47
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
I already linked a description of the TPC-C New Order transaction, and an
implementation.  This is the most-benchmarked OLTP transaction in the
world.  I look forward to your explanation of how Accord can handle this.

Since your claim is that "[Accord] is equivalent to Calvin," please limit
the discussion to Accord as it is today instead of engaging in
hypotheticals around how "we could enhance Accord with X."

On Thu, Oct 14, 2021 at 12:46 PM [email protected] 
wrote:

> > Calvin supports arbitrarily complex transactions (included dependent
> statements and indexed reads and writes), executed in parallel, with
> locking as necessary to enable that parallelism.
>
> By CAS I mean to include any arbitrary state mapping function for the
> involved keys. This is equivalent to Calvin. The locks for execution are
> isomorphic with any multi-shard distributed consensus protocol that applies
> its operations in the agreed partial order on each replica. If you want to
> continue this thread of discussion, please provide a counter example you
> believe disproves this statement.
>
>
> From: Jonathan Ellis 
> Date: Thursday, 14 October 2021 at 14:55
> To: dev 
> Subject: Re: Tradeoffs for Cassandra transaction management
> Hi Benedict,
>
> I'm not sure how to reconcile your statement that "your request to separate
> consensus from execution is [nonsensical]" with your earlier claims that we
> could build whatever additional transactional semantics we want on top of
> Accord.  The Accord whitepaper specifically separates out the consensus and
> the execution algorithms, but if we can't use the former to create
> execution timestamps for a different transaction manager then it doesn't
> sound as flexible as you're claiming.
>
> To your other points, it looks like the core problem is that you believe
> that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
> not the case.  Calvin supports arbitrarily complex transactions (included
> dependent statements and indexed reads and writes), executed in parallel,
> with locking as necessary to enable that parallelism.
>
> I think I've also been clear that I want a path to supporting (1) local
> latencies (SLOG is a more elegant solution but "let's just let people give
> up global serializability like LWT" is also reasonable) and (2) SQL with
> interactive transactions.
>
> I'd prefer to keep the discussion on the mailing list, thanks.
>
>
> On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
> wrote:
>
> > Jonathan,
> >
> > Your request to separate consensus from execution is about as sensical as
> > asking for this separation in Paxos, or any other distributed consensus
> > protocol. I have made these statements repeatedly, so let me break it
> down
> > step by step.
> >
> > 1. Accord is an optimal leaderless distributed consensus protocol,
> > offering multi-shard CAS semantics in one round-trip (or two under
> > contention and clock skew).
> > 2. By simple virtue of this property, it already achieves Calvin
> semantics
> > with no other work. It remains a distributed consensus protocol, and the
> > whitepaper compares to these as peers.
> > 3. To build distributed transactions with more complex semantics, the
> > remaini

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Jonathan Ellis
I already linked a description of the TPC-C New Order transaction, and an
implementation.  This is the most-benchmarked OLTP transaction in the
world.  I look forward to your explanation of how Accord can handle this.

Since your claim is that "[Accord] is equivalent to Calvin," please limit
the discussion to Accord as it is today instead of engaging in
hypotheticals around how "we could enhance Accord with X."

On Thu, Oct 14, 2021 at 12:46 PM [email protected] 
wrote:

> > Calvin supports arbitrarily complex transactions (included dependent
> statements and indexed reads and writes), executed in parallel, with
> locking as necessary to enable that parallelism.
>
> By CAS I mean to include any arbitrary state mapping function for the
> involved keys. This is equivalent to Calvin. The locks for execution are
> isomorphic with any multi-shard distributed consensus protocol that applies
> its operations in the agreed partial order on each replica. If you want to
> continue this thread of discussion, please provide a counter example you
> believe disproves this statement.
>
>
> From: Jonathan Ellis 
> Date: Thursday, 14 October 2021 at 14:55
> To: dev 
> Subject: Re: Tradeoffs for Cassandra transaction management
> Hi Benedict,
>
> I'm not sure how to reconcile your statement that "your request to separate
> consensus from execution is [nonsensical]" with your earlier claims that we
> could build whatever additional transactional semantics we want on top of
> Accord.  The Accord whitepaper specifically separates out the consensus and
> the execution algorithms, but if we can't use the former to create
> execution timestamps for a different transaction manager then it doesn't
> sound as flexible as you're claiming.
>
> To your other points, it looks like the core problem is that you believe
> that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
> not the case.  Calvin supports arbitrarily complex transactions (included
> dependent statements and indexed reads and writes), executed in parallel,
> with locking as necessary to enable that parallelism.
>
> I think I've also been clear that I want a path to supporting (1) local
> latencies (SLOG is a more elegant solution but "let's just let people give
> up global serializability like LWT" is also reasonable) and (2) SQL with
> interactive transactions.
>
> I'd prefer to keep the discussion on the mailing list, thanks.
>
>
> On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
> wrote:
>
> > Jonathan,
> >
> > Your request to separate consensus from execution is about as sensical as
> > asking for this separation in Paxos, or any other distributed consensus
> > protocol. I have made these statements repeatedly, so let me break it
> down
> > step by step.
> >
> > 1. Accord is an optimal leaderless distributed consensus protocol,
> > offering multi-shard CAS semantics in one round-trip (or two under
> > contention and clock skew).
> > 2. By simple virtue of this property, it already achieves Calvin
> semantics
> > with no other work. It remains a distributed consensus protocol, and the
> > whitepaper compares to these as peers.
> > 3. To build distributed transactions with more complex semantics, the
> > remaining candidates are the CockroachDB or YugaByte approach. These must
> > utilise a distributed consensus protocol. They do so using Raft today.
> > Accord is as optimal as Raft, therefore, Accord may be used to implement
> > this technique *without penalty*. Through its multi-shard consensus it
> has
> > the added advantage of supporting stronger isolation (but not requiring
> it
> > – a read/write intent design may choose weaker isolation).
> >
> > You continue to refuse to engage with these and other points. Please
> > respond directly to ALL of the below, that I have been asking you to
> answer
> > now for several weeks.
> >
> > 1. Since Accord supports all of your mooted transaction systems without
> > penalty the conversation about which semantics to pursue may be conducted
> > in parallel with its development. What about this claim do you not yet
> > understand? If you understand, why should a vote on CEP-15 be delayed?
> > 2. Which SPECIFIC transaction semantics do you want to achieve? You are
> > all over the shop today, demanding Cockroach/YugaByte interactive
> > semantics, but also LOCAL_SERIAL operation and proposing SLOG. These are
> > conflicting demands.
> > 3. Why do you think Accord cannot support your preferred semantics?
> > 4. Will you accept a video call so we may discuss

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread [email protected]
> Calvin supports arbitrarily complex transactions (included dependent 
> statements and indexed reads and writes), executed in parallel, with locking 
> as necessary to enable that parallelism.

By CAS I mean to include any arbitrary state mapping function for the involved 
keys. This is equivalent to Calvin. The locks for execution are isomorphic with 
any multi-shard distributed consensus protocol that applies its operations in 
the agreed partial order on each replica. If you want to continue this thread 
of discussion, please provide a counter example you believe disproves this 
statement.


From: Jonathan Ellis 
Date: Thursday, 14 October 2021 at 14:55
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
Hi Benedict,

I'm not sure how to reconcile your statement that "your request to separate
consensus from execution is [nonsensical]" with your earlier claims that we
could build whatever additional transactional semantics we want on top of
Accord.  The Accord whitepaper specifically separates out the consensus and
the execution algorithms, but if we can't use the former to create
execution timestamps for a different transaction manager then it doesn't
sound as flexible as you're claiming.

To your other points, it looks like the core problem is that you believe
that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
not the case.  Calvin supports arbitrarily complex transactions (included
dependent statements and indexed reads and writes), executed in parallel,
with locking as necessary to enable that parallelism.

I think I've also been clear that I want a path to supporting (1) local
latencies (SLOG is a more elegant solution but "let's just let people give
up global serializability like LWT" is also reasonable) and (2) SQL with
interactive transactions.

I'd prefer to keep the discussion on the mailing list, thanks.


On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
wrote:

> Jonathan,
>
> Your request to separate consensus from execution is about as sensical as
> asking for this separation in Paxos, or any other distributed consensus
> protocol. I have made these statements repeatedly, so let me break it down
> step by step.
>
> 1. Accord is an optimal leaderless distributed consensus protocol,
> offering multi-shard CAS semantics in one round-trip (or two under
> contention and clock skew).
> 2. By simple virtue of this property, it already achieves Calvin semantics
> with no other work. It remains a distributed consensus protocol, and the
> whitepaper compares to these as peers.
> 3. To build distributed transactions with more complex semantics, the
> remaining candidates are the CockroachDB or YugaByte approach. These must
> utilise a distributed consensus protocol. They do so using Raft today.
> Accord is as optimal as Raft, therefore, Accord may be used to implement
> this technique *without penalty*. Through its multi-shard consensus it has
> the added advantage of supporting stronger isolation (but not requiring it
> – a read/write intent design may choose weaker isolation).
>
> You continue to refuse to engage with these and other points. Please
> respond directly to ALL of the below, that I have been asking you to answer
> now for several weeks.
>
> 1. Since Accord supports all of your mooted transaction systems without
> penalty the conversation about which semantics to pursue may be conducted
> in parallel with its development. What about this claim do you not yet
> understand? If you understand, why should a vote on CEP-15 be delayed?
> 2. Which SPECIFIC transaction semantics do you want to achieve? You are
> all over the shop today, demanding Cockroach/YugaByte interactive
> semantics, but also LOCAL_SERIAL operation and proposing SLOG. These are
> conflicting demands.
> 3. Why do you think Accord cannot support your preferred semantics?
> 4. Will you accept a video call so we may discuss this with you in detail,
> so we may understand your difficulty understanding these points I keep
> repeating?
>
> After several weeks of back and forth you should already be able to answer
> these questions. If you cannot invest the time to answer them now, I
> perceive this as obstructive and I will escalate this to a PMC vote to
> break the deadlock.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 13 October 2021 at 04:21
> To: dev 
> Subject: Re: Tradeoffs for Cassandra transaction management
> Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
> precedent of pushing through major initiatives in this project in a matter
> of weeks.  We [members of the PMC that weren’t involved in creating Accord]
> need time to do thorough research and make sure both that we understand
> what is being p

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread [email protected]
Hi Jonathan,

This conversation has been circular for some time. I think it is time to 
separate out your reasons for blocking progress on the CEP as part of a vote, 
so that the PMC may express its view on this justification for preventing the 
CEP’s adoption.


From: Jonathan Ellis 
Date: Thursday, 14 October 2021 at 14:55
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
Hi Benedict,

I'm not sure how to reconcile your statement that "your request to separate
consensus from execution is [nonsensical]" with your earlier claims that we
could build whatever additional transactional semantics we want on top of
Accord.  The Accord whitepaper specifically separates out the consensus and
the execution algorithms, but if we can't use the former to create
execution timestamps for a different transaction manager then it doesn't
sound as flexible as you're claiming.

To your other points, it looks like the core problem is that you believe
that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
not the case.  Calvin supports arbitrarily complex transactions (included
dependent statements and indexed reads and writes), executed in parallel,
with locking as necessary to enable that parallelism.

I think I've also been clear that I want a path to supporting (1) local
latencies (SLOG is a more elegant solution but "let's just let people give
up global serializability like LWT" is also reasonable) and (2) SQL with
interactive transactions.

I'd prefer to keep the discussion on the mailing list, thanks.


On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
wrote:

> Jonathan,
>
> Your request to separate consensus from execution is about as sensical as
> asking for this separation in Paxos, or any other distributed consensus
> protocol. I have made these statements repeatedly, so let me break it down
> step by step.
>
> 1. Accord is an optimal leaderless distributed consensus protocol,
> offering multi-shard CAS semantics in one round-trip (or two under
> contention and clock skew).
> 2. By simple virtue of this property, it already achieves Calvin semantics
> with no other work. It remains a distributed consensus protocol, and the
> whitepaper compares to these as peers.
> 3. To build distributed transactions with more complex semantics, the
> remaining candidates are the CockroachDB or YugaByte approach. These must
> utilise a distributed consensus protocol. They do so using Raft today.
> Accord is as optimal as Raft, therefore, Accord may be used to implement
> this technique *without penalty*. Through its multi-shard consensus it has
> the added advantage of supporting stronger isolation (but not requiring it
> – a read/write intent design may choose weaker isolation).
>
> You continue to refuse to engage with these and other points. Please
> respond directly to ALL of the below, that I have been asking you to answer
> now for several weeks.
>
> 1. Since Accord supports all of your mooted transaction systems without
> penalty the conversation about which semantics to pursue may be conducted
> in parallel with its development. What about this claim do you not yet
> understand? If you understand, why should a vote on CEP-15 be delayed?
> 2. Which SPECIFIC transaction semantics do you want to achieve? You are
> all over the shop today, demanding Cockroach/YugaByte interactive
> semantics, but also LOCAL_SERIAL operation and proposing SLOG. These are
> conflicting demands.
> 3. Why do you think Accord cannot support your preferred semantics?
> 4. Will you accept a video call so we may discuss this with you in detail,
> so we may understand your difficulty understanding these points I keep
> repeating?
>
> After several weeks of back and forth you should already be able to answer
> these questions. If you cannot invest the time to answer them now, I
> perceive this as obstructive and I will escalate this to a PMC vote to
> break the deadlock.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 13 October 2021 at 04:21
> To: dev 
> Subject: Re: Tradeoffs for Cassandra transaction management
> Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
> precedent of pushing through major initiatives in this project in a matter
> of weeks.  We [members of the PMC that weren’t involved in creating Accord]
> need time to do thorough research and make sure both that we understand
> what is being proposed and that we have evaluated reasonable alternatives.
>
> One of the difficulties in evaluating Accord is that it combines a
> state-of-the-art consensus/ordering protocol with a fairly limited
> transaction manager.  So it may be useful to decouple the consensus and
> transaction processing components, which would both allow

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Jeff Jirsa
Do I read this email as "Jonathan will vote against any improvement to
transactions that doesn't guarantee local latencies and interactive SQL,
even though no such proposal exists, thereby blocking any improvement over
the current status quo?"



On Thu, Oct 14, 2021 at 6:55 AM Jonathan Ellis  wrote:

> Hi Benedict,
>
> I'm not sure how to reconcile your statement that "your request to separate
> consensus from execution is [nonsensical]" with your earlier claims that we
> could build whatever additional transactional semantics we want on top of
> Accord.  The Accord whitepaper specifically separates out the consensus and
> the execution algorithms, but if we can't use the former to create
> execution timestamps for a different transaction manager then it doesn't
> sound as flexible as you're claiming.
>
> To your other points, it looks like the core problem is that you believe
> that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
> not the case.  Calvin supports arbitrarily complex transactions (included
> dependent statements and indexed reads and writes), executed in parallel,
> with locking as necessary to enable that parallelism.
>
> I think I've also been clear that I want a path to supporting (1) local
> latencies (SLOG is a more elegant solution but "let's just let people give
> up global serializability like LWT" is also reasonable) and (2) SQL with
> interactive transactions.
>
> I'd prefer to keep the discussion on the mailing list, thanks.
>
>
> On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
> wrote:
>
> > Jonathan,
> >
> > Your request to separate consensus from execution is about as sensical as
> > asking for this separation in Paxos, or any other distributed consensus
> > protocol. I have made these statements repeatedly, so let me break it
> down
> > step by step.
> >
> > 1. Accord is an optimal leaderless distributed consensus protocol,
> > offering multi-shard CAS semantics in one round-trip (or two under
> > contention and clock skew).
> > 2. By simple virtue of this property, it already achieves Calvin
> semantics
> > with no other work. It remains a distributed consensus protocol, and the
> > whitepaper compares to these as peers.
> > 3. To build distributed transactions with more complex semantics, the
> > remaining candidates are the CockroachDB or YugaByte approach. These must
> > utilise a distributed consensus protocol. They do so using Raft today.
> > Accord is as optimal as Raft, therefore, Accord may be used to implement
> > this technique *without penalty*. Through its multi-shard consensus it
> has
> > the added advantage of supporting stronger isolation (but not requiring
> it
> > – a read/write intent design may choose weaker isolation).
> >
> > You continue to refuse to engage with these and other points. Please
> > respond directly to ALL of the below, that I have been asking you to
> answer
> > now for several weeks.
> >
> > 1. Since Accord supports all of your mooted transaction systems without
> > penalty the conversation about which semantics to pursue may be conducted
> > in parallel with its development. What about this claim do you not yet
> > understand? If you understand, why should a vote on CEP-15 be delayed?
> > 2. Which SPECIFIC transaction semantics do you want to achieve? You are
> > all over the shop today, demanding Cockroach/YugaByte interactive
> > semantics, but also LOCAL_SERIAL operation and proposing SLOG. These are
> > conflicting demands.
> > 3. Why do you think Accord cannot support your preferred semantics?
> > 4. Will you accept a video call so we may discuss this with you in
> detail,
> > so we may understand your difficulty understanding these points I keep
> > repeating?
> >
> > After several weeks of back and forth you should already be able to
> answer
> > these questions. If you cannot invest the time to answer them now, I
> > perceive this as obstructive and I will escalate this to a PMC vote to
> > break the deadlock.
> >
> >
> >
> > From: Jonathan Ellis 
> > Date: Wednesday, 13 October 2021 at 04:21
> > To: dev 
> > Subject: Re: Tradeoffs for Cassandra transaction management
> > Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
> > precedent of pushing through major initiatives in this project in a
> matter
> > of weeks.  We [members of the PMC that weren’t involved in creating
> Accord]
> > need time to do thorough research and make sure both that we understand
> > 

Re: Tradeoffs for Cassandra transaction management

2021-10-14 Thread Jonathan Ellis
Hi Benedict,

I'm not sure how to reconcile your statement that "your request to separate
consensus from execution is [nonsensical]" with your earlier claims that we
could build whatever additional transactional semantics we want on top of
Accord.  The Accord whitepaper specifically separates out the consensus and
the execution algorithms, but if we can't use the former to create
execution timestamps for a different transaction manager then it doesn't
sound as flexible as you're claiming.

To your other points, it looks like the core problem is that you believe
that "multi-shard CAS semantics" is the same as "Calvin semantics" which is
not the case.  Calvin supports arbitrarily complex transactions (included
dependent statements and indexed reads and writes), executed in parallel,
with locking as necessary to enable that parallelism.

I think I've also been clear that I want a path to supporting (1) local
latencies (SLOG is a more elegant solution but "let's just let people give
up global serializability like LWT" is also reasonable) and (2) SQL with
interactive transactions.

I'd prefer to keep the discussion on the mailing list, thanks.


On Wed, Oct 13, 2021 at 3:04 AM [email protected] 
wrote:

> Jonathan,
>
> Your request to separate consensus from execution is about as sensical as
> asking for this separation in Paxos, or any other distributed consensus
> protocol. I have made these statements repeatedly, so let me break it down
> step by step.
>
> 1. Accord is an optimal leaderless distributed consensus protocol,
> offering multi-shard CAS semantics in one round-trip (or two under
> contention and clock skew).
> 2. By simple virtue of this property, it already achieves Calvin semantics
> with no other work. It remains a distributed consensus protocol, and the
> whitepaper compares to these as peers.
> 3. To build distributed transactions with more complex semantics, the
> remaining candidates are the CockroachDB or YugaByte approach. These must
> utilise a distributed consensus protocol. They do so using Raft today.
> Accord is as optimal as Raft, therefore, Accord may be used to implement
> this technique *without penalty*. Through its multi-shard consensus it has
> the added advantage of supporting stronger isolation (but not requiring it
> – a read/write intent design may choose weaker isolation).
>
> You continue to refuse to engage with these and other points. Please
> respond directly to ALL of the below, that I have been asking you to answer
> now for several weeks.
>
> 1. Since Accord supports all of your mooted transaction systems without
> penalty the conversation about which semantics to pursue may be conducted
> in parallel with its development. What about this claim do you not yet
> understand? If you understand, why should a vote on CEP-15 be delayed?
> 2. Which SPECIFIC transaction semantics do you want to achieve? You are
> all over the shop today, demanding Cockroach/YugaByte interactive
> semantics, but also LOCAL_SERIAL operation and proposing SLOG. These are
> conflicting demands.
> 3. Why do you think Accord cannot support your preferred semantics?
> 4. Will you accept a video call so we may discuss this with you in detail,
> so we may understand your difficulty understanding these points I keep
> repeating?
>
> After several weeks of back and forth you should already be able to answer
> these questions. If you cannot invest the time to answer them now, I
> perceive this as obstructive and I will escalate this to a PMC vote to
> break the deadlock.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 13 October 2021 at 04:21
> To: dev 
> Subject: Re: Tradeoffs for Cassandra transaction management
> Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
> precedent of pushing through major initiatives in this project in a matter
> of weeks.  We [members of the PMC that weren’t involved in creating Accord]
> need time to do thorough research and make sure both that we understand
> what is being proposed and that we have evaluated reasonable alternatives.
>
> One of the difficulties in evaluating Accord is that it combines a
> state-of-the-art consensus/ordering protocol with a fairly limited
> transaction manager.  So it may be useful to decouple the consensus and
> transaction processing components, which would both allow non-Cassandra
> usage of the consensus piece, and also make explicit the boundaries with
> transaction processing with the consequence of making it easier to evolve
> independently.
>
> In the meantime, it’s very important to me to understand on which
> dimensions the transaction manager can be improved easily, and which
> dimensions resist such improvement.  I get that A

Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Henrik Ingo
Sorry Jonathan, didn't see this reply earlier today.

That would be common behaviour for many MVCC databases, including MongoDB,
MySQL Galera Cluster, PostgreSQL...

https://www.postgresql.org/docs/9.5/transaction-iso.html

*"Applications using this level must be prepared to retry transactions due
to serialization failures."*

On Wed, Oct 13, 2021 at 3:19 AM Jonathan Ellis  wrote:

> Hi Henrik,
>
> I don't see how this resolves the fundamental problem that I outlined to
> start with, namely, that without having the entire logic of the transaction
> available to it, the server cannot retry the transaction when concurrent
> changes are found to have been applied after the reconnaissance reads (what
> you call the conversational phase).
>
> On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo 
> wrote:
>
> > Hi all
> >
> > I was expecting to stay out of the way while a vote on CEP-15 seemed
> > imminent. But discussing this tradeoffs thread with Jonathan, he
> encouraged
> > me to say these points in my own words, so here we are.
> >
> >
> > On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
> >  wrote:
> >
> > > 1. Is it worth giving up local latencies to get full global
> consistency?
> > > Most LWT use cases use
> > > LOCAL_SERIAL.
> > >
> > > This isn’t a tradeoff that needs to be made. There’s nothing about
> Accord
> > > that prevents performing consensus in one DC and replicating the writes
> > to
> > > others. That’s not in scope for the initial work, but there’s no reason
> > it
> > > couldn’t be handled as a follow on if needed. I agree with Jeff that
> > > LOCAL_SERIAL and LWTs are not usually done with a full understanding of
> > the
> > > implications, but there are some valid use cases. For instance, you can
> > > enable an OLAP service to operate against another DC without impacting
> > the
> > > primary, assuming the service can tolerate inconsistency for data
> written
> > > since the last repair, and there are some others.
> > >
> > >
> > Let's start with the stated goal that CEP-15 is intended to be a better
> > version of LWT.
> >
> > Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
> > LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
> > improvement over LWT. I don't agree that Accord will just be so much
> faster
> > anyway, that it would compensate a single network roundtrip around the
> > world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
> > order of 10 ms, but global latencies for just a single round trip are
> > hundreds of ms.
> >
> > So, my suggestion to resolve this discussion would be that "local quorum
> > latency experience" should be included in CEP-15 to meet its stated goal.
> > If I have understood the CEP process correctly, this merely means that we
> > agree this is a valid and significant use case in the Cassandra
> ecosystem.
> > It doesn't mean that everything in the CEP must be released in a single
> v1
> > release. At least personally I don't necessarily need to see a very
> > detailed design for the implementation. But I'm optimistic it would
> resolve
> > one open discussion if it was codified in the CEP that this is a use case
> > that needs to be addressed.
> >
> >
> > > 2. Is it worth giving up the possibility of SQL support, to get the
> > > benefits of deterministic transaction design?
> > >
> > > This is a false dilemma. Today, we’re proposing a deterministic
> > > transaction design that addresses some very common user pain points.
> SQL
> > > addresses different user pain point. If someone wants to add an sql
> > > implementation in the future they can a) build it on top of accord b)
> > > extend or improve accord or c) implement a separate system. The right
> > > choice will depend on their goals, but accord won’t prevent work on it,
> > the
> > > same way the original lwt design isn’t preventing work on
> multi-partition
> > > transactions. In the worst case, if the goals of a hypothetical sql
> > project
> > > are different enough to make them incompatible with accord, I don’t see
> > any
> > > reason why we couldn’t have 2 separate consensus systems, so long as
> > people
> > > are willing to maintain them and the use cases and available
> technologies
> > > justify it.
> > >
> >
> >
> >
> > The part of the discussion that's hard to deal with is "SQL support",
> > "interactive transactions", or "complex transactions". Even if this is
> out
> > of scope for CEP-15, it's a valid question to ask whether Accord would
> > possibly help, but at least not prevent such future work. (The context
> > being, Jonathan and myself both think of this as an important long term
> > goal. You may have figured this out already!)
> >
> > There are various ways we can get more insight into this question, but
> > realistically writing a complete CEP (or a dozen CEPs) on "full SQL
> > support" isn't one of them. On the other hand it seems CEP-15 itself
> > proposes a conservative approach of developing first version(s) in a

Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Alex Miller
On Wed, Oct 13, 2021 at 3:52 AM Henrik Ingo  wrote:
> Aren't you actually pointing out a limitation in any "single shot"
> transactional algorithm? Including Accord itself, without any interactive
> part?
>
> What you are saying is that an Accord transaction is limited by the need
> for both the client, and coordinator, to be able to keep the entire
> transaction in memory and process it?

I'm under the belief as well that any single-shot transaction protocol
would require some limits on transaction size and/or duration, and
those limits would then be imposed on SQL in a way users coming from a
standard RDBMS (e.g. Postgres) wouldn't expect.  The closest that I've
seen databases get away with is having a distributed layer in the
database that serves as an in-memory lock manager.  Both Spanner and
leanXcale maintain locks in memory in the database while clients
execute transactions, which provides a much higher limit of what one
can do in a transaction, but still presents a degree of complexity to
manage to make sure that clients can't drive servers out of memory.

One could just state that the particular SQL implementation *is*
limited to whatever the constraints of the single-shot transaction
protocol is, and deliver clear documentation of what those limits are
to users, along with being loud about the fact that there are limits.
This has gone okay in other non-SQL systems.  My personal experience
in this subject comes from FoundationDB, which offers a rather
conservative 5 second transaction duration limit and 10MB transaction
size limit.  When presenting a raw key-value API and a database
specifically geared towards supporting OLTP workloads, it works out in
most situations, as users need to write their transactions from
scratch utilizing the database's documentation already.  OLTP is
characterized by short and small transactions, and so things tend to
align anyway.  Some users still tried to implement workloads which
weren't strictly OLTP, and ran into problems.  Offering SQL carries
with it a set of expectations for supported workloads, and I don't
have a concrete example that I can think of for a SQL system with
strict and conservative limits on queries.  My only notes of wisdom
here come from an ex-AWS person I once spoke to, who maintained a
system with partial SQL support, and commented that it was a mistake
due to the support load and customer confusion (but that was more
about a restricted SQL feature set than transaction limitations).

That's not to say that single-shot transaction algorithms aren't
useful, even in the context of SQL.  CockroachDB uses a 3 phase
transaction protocol, which is reduced to only 1 phase when it's a
single partition transaction and Raft may perform the atomic
commitment on its own.  A 1RTT transaction protocol would allow one to
extend that optimized 1 phase protocol to a handful of partitions.
Instead of only supporting 1 phase execution of a point insert, one
could support 1 phase execution of point-ish queries, such as an
insert into a table along with a handful of indexes on that table.  I
think there would still need to be a way to degrade into some other
transaction protocol to support extremely large or long-running
queries, but any single-shot multi-partition transaction protocol
(Accord or otherwise) would likely offer ways to optimize your slow
path transaction protocol.  Maybe it's not really surprising though
that protocols designed for "let me transact my entire database at
once" versus "let me transact a few related keys together" turn out to
be relatively different sorts of protocols...

On Wed, Oct 13, 2021 at 3:52 AM Henrik Ingo  wrote:
> I responded to Blake's similar comment on this topic. Out of respect for
> his request to move the discussion to a newly created thread, I will not
> elaborate here rather just reference my reply to Blake.

Oh!  I missed the new thread.  Thanks!  More transaction processing~~!

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread [email protected]
I just realised my email client hid a lot of your email, so I now realise I 
must have misunderstood your statement. I realise now you must have meant 
per-statement snapshot isolation. However, I believe that MVCC is an 
optimisation for such an isolation level, not a requirement – it is possible 
(like Calvin, and for distributed consensus protocols) to serialize their 
execution, unless I’m missing something.

Otherwise I agree entirely with your email, now I’ve read it 😊


From: [email protected] 
Date: Wednesday, 13 October 2021 at 08:52
To: [email protected] 
Subject: Re: Tradeoffs for Cassandra transaction management
Hi Alex,

I hugely value and respect your input here, but I think in this case you may be 
mistaken.

Postgres[1] makes explicit that subsequent SELECT statements may see different 
data, and SQL Server[2] does the same. I believe the Oracle documents you 
reference do the same, but are more obtuse. They say that read committed is a 
statement-level isolation level, i.e. “read committed isolation level, this 
point is the time at which the statement was opened” though for read only 
transactions they upgrade this to transaction level isolation. Indeed, the ANSI 
SQL document you reference also supports this meaning: “Non-repeatable read” is 
defined only to be used later in a table on page 68 that defines READ COMMITTED 
as permitting these to occur.

Accord offers READ COMMITTED out of the box, essentially (modulo 
read-your-writes).

[1] https://www.postgresql.org/docs/7.2/xact-read-committed.html
[2] 
https://docs.microsoft.com/en-us/sql/connect/jdbc/understanding-isolation-levels?view=sql-server-ver15


From: Alex Miller 
Date: Wednesday, 13 October 2021 at 08:07
To: [email protected] 
Subject: Re: Tradeoffs for Cassandra transaction management
On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> We define READ COMMITTED as "whatever is returned by Cassandra when
> executing the query (with QUORUM consistency)". In other words, this
> functionality doesn't require any changes to the storage engine or other
> fundamental changes to Cassandra. The Accord commit is guaranteed to
> succeed per design and the READ COMMITTED transaction doesn't add any
> additional checks for conflicts. As such, this functionality remains
> abort-free.
> [snip]
> Future work: A motivation for the above proposal is that the same scheme
> could be extended to support SNAPSHOT ISOLATION transactions. This would
> require MVCC support from the storage engine.

These two pieces together seem to imply that your claim is that Read
Committed may read whatever the most recently committed data during
the execution of the statement and does not require MVCC.  Though I
agree that the standard[1] is very unclear as to what a "read" means
when defining a non-repeatable read:

>  2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-
>   transaction T2 then modifies or deletes that row and performs
>   a COMMIT. If T1 then attempts to reread the row, it may receive
>the modified value or discover that the row has been deleted.

The common implementation is that Read Committed reads from a snapshot
of the database state.  The documentation of various database
implementations are much more clear about this.  See, for example,
Oracle[2] or MySQL[3] on the subject.

So I believe Read Committed would also require MVCC support in the
storage engine the same way that Snapshot Isolation would.

[1]: https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
[2]: https://docs.oracle.com/cd/E25054_01/server./e25789/consist.htm
, see section "Read Consistency in the Read Committed Isolation Level"
[3]: 
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html


On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> Approach: The conversational part of the transaction is a sequence of
> regular Cassandra reads and writes. Mutations are however executed as
> read-queries toward the database nodes. Database state isn't modified
> during the conversational phase, rather the primary keys of the
> to-be-mutated rows are stored for later use. Accord is essentially the
> commit phase of the transaction. All primary keys to be updated are the
> write set of  the Accord transaction. There's no need to re-execute the
> reads, so the read set is empty.

As I've pondered this over time, I personally specifically fault
read-your-uncommitted-writes as the reason why NewSQL databases are
essentially a design monoculture.  Every database persists uncommitted
writes in the database itself during execution.  Doing so encourages
those writes to be re-used for concurrency control (ie. write
intents), and then that places you in the exact client-driven 3PC
protocol which Cockroach, TiDB, and YugaByte all implement.  Even if
you fin

Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Henrik Ingo
Thank you Alex for your feedback, I greatly value these thoughts and always
enjoy learning new details in this space.

On Wed, Oct 13, 2021 at 10:07 AM Alex Miller  wrote:

> These two pieces together seem to imply that your claim is that Read
> Committed may read whatever the most recently committed data during
> the execution of the statement and does not require MVCC.  Though I
> agree that the standard[1] is very unclear as to what a "read" means
> when defining a non-repeatable read:
>

I responded to Blake's similar comment on this topic. Out of respect for
his request to move the discussion to a newly created thread, I will not
elaborate here rather just reference my reply to Blake.

The following observation seems more relevant for Accord itself and the
discussion on trade-offs, so I'll allow myself to continue within this
thread:


>
> On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo 
> wrote:
> > Approach: The conversational part of the transaction is a sequence of
> > regular Cassandra reads and writes. Mutations are however executed as
> > read-queries toward the database nodes. Database state isn't modified
> > during the conversational phase, rather the primary keys of the
> > to-be-mutated rows are stored for later use. Accord is essentially the
> > commit phase of the transaction. All primary keys to be updated are the
> > write set of  the Accord transaction. There's no need to re-execute the
> > reads, so the read set is empty.
>
> As I've pondered this over time, I personally specifically fault
> read-your-uncommitted-writes as the reason why NewSQL databases are
> essentially a design monoculture.  Every database persists uncommitted
> writes in the database itself during execution.  Doing so encourages
> those writes to be re-used for concurrency control (ie. write
> intents), and then that places you in the exact client-driven 3PC
> protocol which Cockroach, TiDB, and YugaByte all implement.  Even if
> you find some radically different database for SQL, like leanXcale, it
> _still_ persists uncommitted writes into the database.
>
>
This is an interesting point of view... I simply assumed this approach is
inherited by the fact that single server RDBMS implementations are built
this way, and NewSQL solutions reuse well known designs for the database
engine.


> And every time I've thought through this, I tend to agree.  It's too
> exceedingly easy to write a SQL query which will exceed any local
> limit imposed by memory, and it's too easy to write a query which runs
> fine in production for a while, until it hits a memory limit and
> begins to immediately fail.  There's a tremendous implementation
> difference between `DELETE FROM Table` and `TRUNCATE Table`, and it's
> relatively hard to explain that to naive users.
>
> Memory constraints aside, merging a local write cache into remote data
> for execution seems like it'd be quite a task.   Any desire for
> efficient distributed query execution would push for a design where
> query fragments can be pushed down to the nodes holding the data.


Reading this I realize...

Aren't you actually pointing out a limitation in any "single shot"
transactional algorithm? Including Accord itself, without any interactive
part?

What you are saying is that an Accord transaction is limited by the need
for both the client, and coordinator, to be able to keep the entire
transaction in memory and process it?

Where Cassandra is coming from, I'm not particularly alarmed by this
limitation as I would expect operations on a Cassandra database to be fast
and small, but it's an important limitation to call out for sure. Indeed,
those who have been worried Accord will not be able to serve well all
possible future use cases may have found their first meaningful concrete
example to add to the list?

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread [email protected]
Jonathan,

Your request to separate consensus from execution is about as sensical as 
asking for this separation in Paxos, or any other distributed consensus 
protocol. I have made these statements repeatedly, so let me break it down step 
by step.

1. Accord is an optimal leaderless distributed consensus protocol, offering 
multi-shard CAS semantics in one round-trip (or two under contention and clock 
skew).
2. By simple virtue of this property, it already achieves Calvin semantics with 
no other work. It remains a distributed consensus protocol, and the whitepaper 
compares to these as peers.
3. To build distributed transactions with more complex semantics, the remaining 
candidates are the CockroachDB or YugaByte approach. These must utilise a 
distributed consensus protocol. They do so using Raft today. Accord is as 
optimal as Raft, therefore, Accord may be used to implement this technique 
*without penalty*. Through its multi-shard consensus it has the added advantage 
of supporting stronger isolation (but not requiring it – a read/write intent 
design may choose weaker isolation).

You continue to refuse to engage with these and other points. Please respond 
directly to ALL of the below, that I have been asking you to answer now for 
several weeks.

1. Since Accord supports all of your mooted transaction systems without penalty 
the conversation about which semantics to pursue may be conducted in parallel 
with its development. What about this claim do you not yet understand? If you 
understand, why should a vote on CEP-15 be delayed?
2. Which SPECIFIC transaction semantics do you want to achieve? You are all 
over the shop today, demanding Cockroach/YugaByte interactive semantics, but 
also LOCAL_SERIAL operation and proposing SLOG. These are conflicting demands.
3. Why do you think Accord cannot support your preferred semantics?
4. Will you accept a video call so we may discuss this with you in detail, so 
we may understand your difficulty understanding these points I keep repeating?

After several weeks of back and forth you should already be able to answer 
these questions. If you cannot invest the time to answer them now, I perceive 
this as obstructive and I will escalate this to a PMC vote to break the 
deadlock.



From: Jonathan Ellis 
Date: Wednesday, 13 October 2021 at 04:21
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
precedent of pushing through major initiatives in this project in a matter
of weeks.  We [members of the PMC that weren’t involved in creating Accord]
need time to do thorough research and make sure both that we understand
what is being proposed and that we have evaluated reasonable alternatives.

One of the difficulties in evaluating Accord is that it combines a
state-of-the-art consensus/ordering protocol with a fairly limited
transaction manager.  So it may be useful to decouple the consensus and
transaction processing components, which would both allow non-Cassandra
usage of the consensus piece, and also make explicit the boundaries with
transaction processing with the consequence of making it easier to evolve
independently.

In the meantime, it’s very important to me to understand on which
dimensions the transaction manager can be improved easily, and which
dimensions resist such improvement.  I get that Accord is your [plural]
baby and it’s awkward for me to come along and start pointing at its
limitations, but that’s part of creating a complete understanding of any
system.

If I keep coming back to the subject of SQL support and interactive
transactions, that’s because it’s becoming table stakes in the distributed
database space. People are using Cockroach or Yugabyte or Cloud Spanner for
use cases where a couple years ago they would have used Cassandra. We can
expect this trend to continue and strengthen.

On Mon, Oct 11, 2021 at 11:39 PM Blake Eggleston
 wrote:

> Let’s get back on topic.
>
> Jonathan, in your opening email you stated that, in your view, the 2 main
> areas of tradeoff were:
>
> > 1. Is it worth giving up local latencies to get full global consistency?
>
> Now we’ve established that we don’t need to give up local latencies with
> Accord, which leaves:
>
> > 2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> I pointed out that this was a false dilemma and that, in the worst case, a
> hypothetical SQL feature could have it’s own consensus system. I hope that
> won’t be necessary, but as I later pointed out (and you did not address,
> although maybe I should have phrased it as a question), if we’re going to
> weigh accord against a hypothetical SQL feature that lacks design goals, or
> any clear ideas about how it might be implemented, how can we rule that out?
>
> So Jonathan, how can we rule that out? How can we have

Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread [email protected]
Hi Alex,

I hugely value and respect your input here, but I think in this case you may be 
mistaken.

Postgres[1] makes explicit that subsequent SELECT statements may see different 
data, and SQL Server[2] does the same. I believe the Oracle documents you 
reference do the same, but are more obtuse. They say that read committed is a 
statement-level isolation level, i.e. “read committed isolation level, this 
point is the time at which the statement was opened” though for read only 
transactions they upgrade this to transaction level isolation. Indeed, the ANSI 
SQL document you reference also supports this meaning: “Non-repeatable read” is 
defined only to be used later in a table on page 68 that defines READ COMMITTED 
as permitting these to occur.

Accord offers READ COMMITTED out of the box, essentially (modulo 
read-your-writes).

[1] https://www.postgresql.org/docs/7.2/xact-read-committed.html
[2] 
https://docs.microsoft.com/en-us/sql/connect/jdbc/understanding-isolation-levels?view=sql-server-ver15


From: Alex Miller 
Date: Wednesday, 13 October 2021 at 08:07
To: [email protected] 
Subject: Re: Tradeoffs for Cassandra transaction management
On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> We define READ COMMITTED as "whatever is returned by Cassandra when
> executing the query (with QUORUM consistency)". In other words, this
> functionality doesn't require any changes to the storage engine or other
> fundamental changes to Cassandra. The Accord commit is guaranteed to
> succeed per design and the READ COMMITTED transaction doesn't add any
> additional checks for conflicts. As such, this functionality remains
> abort-free.
> [snip]
> Future work: A motivation for the above proposal is that the same scheme
> could be extended to support SNAPSHOT ISOLATION transactions. This would
> require MVCC support from the storage engine.

These two pieces together seem to imply that your claim is that Read
Committed may read whatever the most recently committed data during
the execution of the statement and does not require MVCC.  Though I
agree that the standard[1] is very unclear as to what a "read" means
when defining a non-repeatable read:

>  2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-
>   transaction T2 then modifies or deletes that row and performs
>   a COMMIT. If T1 then attempts to reread the row, it may receive
>the modified value or discover that the row has been deleted.

The common implementation is that Read Committed reads from a snapshot
of the database state.  The documentation of various database
implementations are much more clear about this.  See, for example,
Oracle[2] or MySQL[3] on the subject.

So I believe Read Committed would also require MVCC support in the
storage engine the same way that Snapshot Isolation would.

[1]: https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
[2]: https://docs.oracle.com/cd/E25054_01/server./e25789/consist.htm
, see section "Read Consistency in the Read Committed Isolation Level"
[3]: 
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html


On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> Approach: The conversational part of the transaction is a sequence of
> regular Cassandra reads and writes. Mutations are however executed as
> read-queries toward the database nodes. Database state isn't modified
> during the conversational phase, rather the primary keys of the
> to-be-mutated rows are stored for later use. Accord is essentially the
> commit phase of the transaction. All primary keys to be updated are the
> write set of  the Accord transaction. There's no need to re-execute the
> reads, so the read set is empty.

As I've pondered this over time, I personally specifically fault
read-your-uncommitted-writes as the reason why NewSQL databases are
essentially a design monoculture.  Every database persists uncommitted
writes in the database itself during execution.  Doing so encourages
those writes to be re-used for concurrency control (ie. write
intents), and then that places you in the exact client-driven 3PC
protocol which Cockroach, TiDB, and YugaByte all implement.  Even if
you find some radically different database for SQL, like leanXcale, it
_still_ persists uncommitted writes into the database.

And every time I've thought through this, I tend to agree.  It's too
exceedingly easy to write a SQL query which will exceed any local
limit imposed by memory, and it's too easy to write a query which runs
fine in production for a while, until it hits a memory limit and
begins to immediately fail.  There's a tremendous implementation
difference between `DELETE FROM Table` and `TRUNCATE Table`, and it's
relatively hard to explain that to naive users.

Memory constraints aside, merging a l

Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Alex Miller
On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> We define READ COMMITTED as "whatever is returned by Cassandra when
> executing the query (with QUORUM consistency)". In other words, this
> functionality doesn't require any changes to the storage engine or other
> fundamental changes to Cassandra. The Accord commit is guaranteed to
> succeed per design and the READ COMMITTED transaction doesn't add any
> additional checks for conflicts. As such, this functionality remains
> abort-free.
> [snip]
> Future work: A motivation for the above proposal is that the same scheme
> could be extended to support SNAPSHOT ISOLATION transactions. This would
> require MVCC support from the storage engine.

These two pieces together seem to imply that your claim is that Read
Committed may read whatever the most recently committed data during
the execution of the statement and does not require MVCC.  Though I
agree that the standard[1] is very unclear as to what a "read" means
when defining a non-repeatable read:

>  2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-
>   transaction T2 then modifies or deletes that row and performs
>   a COMMIT. If T1 then attempts to reread the row, it may receive
>the modified value or discover that the row has been deleted.

The common implementation is that Read Committed reads from a snapshot
of the database state.  The documentation of various database
implementations are much more clear about this.  See, for example,
Oracle[2] or MySQL[3] on the subject.

So I believe Read Committed would also require MVCC support in the
storage engine the same way that Snapshot Isolation would.

[1]: https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
[2]: https://docs.oracle.com/cd/E25054_01/server./e25789/consist.htm
, see section "Read Consistency in the Read Committed Isolation Level"
[3]: 
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html


On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo  wrote:
> Approach: The conversational part of the transaction is a sequence of
> regular Cassandra reads and writes. Mutations are however executed as
> read-queries toward the database nodes. Database state isn't modified
> during the conversational phase, rather the primary keys of the
> to-be-mutated rows are stored for later use. Accord is essentially the
> commit phase of the transaction. All primary keys to be updated are the
> write set of  the Accord transaction. There's no need to re-execute the
> reads, so the read set is empty.

As I've pondered this over time, I personally specifically fault
read-your-uncommitted-writes as the reason why NewSQL databases are
essentially a design monoculture.  Every database persists uncommitted
writes in the database itself during execution.  Doing so encourages
those writes to be re-used for concurrency control (ie. write
intents), and then that places you in the exact client-driven 3PC
protocol which Cockroach, TiDB, and YugaByte all implement.  Even if
you find some radically different database for SQL, like leanXcale, it
_still_ persists uncommitted writes into the database.

And every time I've thought through this, I tend to agree.  It's too
exceedingly easy to write a SQL query which will exceed any local
limit imposed by memory, and it's too easy to write a query which runs
fine in production for a while, until it hits a memory limit and
begins to immediately fail.  There's a tremendous implementation
difference between `DELETE FROM Table` and `TRUNCATE Table`, and it's
relatively hard to explain that to naive users.

Memory constraints aside, merging a local write cache into remote data
for execution seems like it'd be quite a task.   Any desire for
efficient distributed query execution would push for a design where
query fragments can be pushed down to the nodes holding the data.  I
imagine that one would then need to distribute all writes out to each
partition along with the query fragment for them to execute, so that
they can merge the pending writes in with the existing data, but such
a solution also places a significant overhead burden on the database.
Clients need to resend all potentially relevant writes to servers on
each statement, and servers need to hold all of a client’s writes in
memory as they execute.  The alternative of trying to very calculate a
subset of the result affected by the local writes and union it into
the query executed without the local writes feels prohibitively
complex.  Both directions seem fraught with peril.

But the write intent approach makes this conveniently easy, as any
server that has a row of data, also has all the uncommitted rows from
currently in progress transactions, and thus can easily filter to the
correct row as part of its MVCC implementation.  Faced with these two
options, I understand why the world has chosen that write intents are
a great solution, but the monoculture does make me a bit sad.


On Tue, Oct 12, 2021 

Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Jonathan Ellis
Blake (and Benedict), I’ll ask for your patience here.  We don’t have a
precedent of pushing through major initiatives in this project in a matter
of weeks.  We [members of the PMC that weren’t involved in creating Accord]
need time to do thorough research and make sure both that we understand
what is being proposed and that we have evaluated reasonable alternatives.

One of the difficulties in evaluating Accord is that it combines a
state-of-the-art consensus/ordering protocol with a fairly limited
transaction manager.  So it may be useful to decouple the consensus and
transaction processing components, which would both allow non-Cassandra
usage of the consensus piece, and also make explicit the boundaries with
transaction processing with the consequence of making it easier to evolve
independently.

In the meantime, it’s very important to me to understand on which
dimensions the transaction manager can be improved easily, and which
dimensions resist such improvement.  I get that Accord is your [plural]
baby and it’s awkward for me to come along and start pointing at its
limitations, but that’s part of creating a complete understanding of any
system.

If I keep coming back to the subject of SQL support and interactive
transactions, that’s because it’s becoming table stakes in the distributed
database space. People are using Cockroach or Yugabyte or Cloud Spanner for
use cases where a couple years ago they would have used Cassandra. We can
expect this trend to continue and strengthen.

On Mon, Oct 11, 2021 at 11:39 PM Blake Eggleston
 wrote:

> Let’s get back on topic.
>
> Jonathan, in your opening email you stated that, in your view, the 2 main
> areas of tradeoff were:
>
> > 1. Is it worth giving up local latencies to get full global consistency?
>
> Now we’ve established that we don’t need to give up local latencies with
> Accord, which leaves:
>
> > 2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> I pointed out that this was a false dilemma and that, in the worst case, a
> hypothetical SQL feature could have it’s own consensus system. I hope that
> won’t be necessary, but as I later pointed out (and you did not address,
> although maybe I should have phrased it as a question), if we’re going to
> weigh accord against a hypothetical SQL feature that lacks design goals, or
> any clear ideas about how it might be implemented, how can we rule that out?
>
> So Jonathan, how can we rule that out? How can we have a productive
> discussion about a feature you yourself are unable to describe in any
> meaningful detail?
>
> > On Oct 11, 2021, at 6:34 PM, Jonathan Ellis  wrote:
> >
> > On Mon, Oct 11, 2021 at 5:11 PM [email protected]  >
> > wrote:
> >
> >> If we want to fully unpack this particular point, as far as I can tell
> >> claiming ANSI SQL would indeed require interactive transactions in which
> >> arbitrary conditional work may be performed by a client within a
> >> transaction in response to other actions within that transaction.
> >>
> >> However:
> >>
> >>  1.  The ANSI SQL standard permits these transactions to fail and
> >> rollback (e.g. in the event that your optimistic transaction fails). So
> if
> >> you want to be pedantic, you may modify my statement to “SQL does not
> >> necessitate support for abort-free interactive transactions” and we can
> >> leave it there.
> >>
> >>  2.  I would personally consider “SQL support” to include the capability
> >> of defining arbitrary SQL stored procedures that may be executed by
> clients
> >> in an interactive session
> >
> >
> > I note your personal preference and I further note that this is not the
> > common understanding of "SQL support" in the industry.  If you tell 100
> > developers that your database supports SQL, then at least 99 of them are
> > going to assume that you can work with APIs like JDBC that expose
> > interactive transactions as a central feature, and hence that you will be
> > reasonably compatible with the vast array of SQL-based applications out
> > there.
> >
> > Historical side note: VoltDB tried to convince people that stored
> > procedures were good enough.  It didn't work, and VoltDB had to add
> > interactive transactions as fast as they could.
> >
> >  3.  Most importantly, as I pointed out in the previous email, Accord is
> >> compatible with a YugaByte/Cockroach-like approach, and indeed makes
> this
> >> approach both easier to accomplish and enables stronger isolation than
> the
> >> equivalent Raft-based approach. These approaches are able to reduce the
> >> number of conflicts, at a cost of significantly higher transaction
> >> management burden.
> >>
> >
> > If you're saying that you could use Accord instead of Raft or Paxos, and
> > layer 2PC on top of that as in Spanner, then I agree, but I don't think
> > that is a very good design, as you would no longer get any of the
> benefits
> > of the deterministic approach you started with.  If you m

Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Jonathan Ellis
Hi Henrik,

I don't see how this resolves the fundamental problem that I outlined to
start with, namely, that without having the entire logic of the transaction
available to it, the server cannot retry the transaction when concurrent
changes are found to have been applied after the reconnaissance reads (what
you call the conversational phase).

On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo 
wrote:

> Hi all
>
> I was expecting to stay out of the way while a vote on CEP-15 seemed
> imminent. But discussing this tradeoffs thread with Jonathan, he encouraged
> me to say these points in my own words, so here we are.
>
>
> On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
>  wrote:
>
> > 1. Is it worth giving up local latencies to get full global consistency?
> > Most LWT use cases use
> > LOCAL_SERIAL.
> >
> > This isn’t a tradeoff that needs to be made. There’s nothing about Accord
> > that prevents performing consensus in one DC and replicating the writes
> to
> > others. That’s not in scope for the initial work, but there’s no reason
> it
> > couldn’t be handled as a follow on if needed. I agree with Jeff that
> > LOCAL_SERIAL and LWTs are not usually done with a full understanding of
> the
> > implications, but there are some valid use cases. For instance, you can
> > enable an OLAP service to operate against another DC without impacting
> the
> > primary, assuming the service can tolerate inconsistency for data written
> > since the last repair, and there are some others.
> >
> >
> Let's start with the stated goal that CEP-15 is intended to be a better
> version of LWT.
>
> Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
> LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
> improvement over LWT. I don't agree that Accord will just be so much faster
> anyway, that it would compensate a single network roundtrip around the
> world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
> order of 10 ms, but global latencies for just a single round trip are
> hundreds of ms.
>
> So, my suggestion to resolve this discussion would be that "local quorum
> latency experience" should be included in CEP-15 to meet its stated goal.
> If I have understood the CEP process correctly, this merely means that we
> agree this is a valid and significant use case in the Cassandra ecosystem.
> It doesn't mean that everything in the CEP must be released in a single v1
> release. At least personally I don't necessarily need to see a very
> detailed design for the implementation. But I'm optimistic it would resolve
> one open discussion if it was codified in the CEP that this is a use case
> that needs to be addressed.
>
>
> > 2. Is it worth giving up the possibility of SQL support, to get the
> > benefits of deterministic transaction design?
> >
> > This is a false dilemma. Today, we’re proposing a deterministic
> > transaction design that addresses some very common user pain points. SQL
> > addresses different user pain point. If someone wants to add an sql
> > implementation in the future they can a) build it on top of accord b)
> > extend or improve accord or c) implement a separate system. The right
> > choice will depend on their goals, but accord won’t prevent work on it,
> the
> > same way the original lwt design isn’t preventing work on multi-partition
> > transactions. In the worst case, if the goals of a hypothetical sql
> project
> > are different enough to make them incompatible with accord, I don’t see
> any
> > reason why we couldn’t have 2 separate consensus systems, so long as
> people
> > are willing to maintain them and the use cases and available technologies
> > justify it.
> >
>
>
>
> The part of the discussion that's hard to deal with is "SQL support",
> "interactive transactions", or "complex transactions". Even if this is out
> of scope for CEP-15, it's a valid question to ask whether Accord would
> possibly help, but at least not prevent such future work. (The context
> being, Jonathan and myself both think of this as an important long term
> goal. You may have figured this out already!)
>
> There are various ways we can get more insight into this question, but
> realistically writing a complete CEP (or a dozen CEPs) on "full SQL
> support" isn't one of them. On the other hand it seems CEP-15 itself
> proposes a conservative approach of developing first version(s) in a
> separate repository, from where it could then prove its usefulness! I feel
> like the authors have already proposed a conservative approach there that
> we can probably work with even without perfect knowledge of the future.
>
>
>
> An idea I've been thinking about for a few days is, what would it take to
> implement interactive READ COMMITTED transactions on top of Accord? Now,
> this may not be an isolation level we want to market as the cool flagship
> feature. BUT this exercise does feel meaningful in a few ways:
>
> * First of all, READ COMMITTED *is* a real isolation level in the SQL

Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Blake Eggleston
Hi Henrik,

I would agree that the local serial experience for valid use cases should be 
supported in some form before legacy LWT is replaced by Accord.

Regarding your read committed proposal, I think this CEP discussion has already 
spent too much time talking about hypothetical SQL implementations, and I’d 
like to avoid veering off course again. However, since you’ve asked a well 
thought out question with concrete goals and implementation ideas, I’m happy to 
answer it. I just ask that if you want to discuss it beyond my reply, you start 
a separate ‘[IDEA] Read committed transaction with Accord’ thread where we 
could talk about it a bit more without it feeling like we need to delay a vote.

So I think it could work with some modifications. 

First you’d need to perform your select statements as accord reads, not quorum 
reads. Otherwise you may not see writes that have been (or could have been) 
committed. A multi-partition write could also appear to become undone, if a 
write commit has not reached one of the keys or needs to be recovered.

Second, when you talk about transforming mutations, I’m assuming you’re talking 
about confirming primary keys do or do not exist, and supporting 
auto-incrementing primary keys. To confirm primary keys do or do not exist, 
you’d also need to perform an accord read also. For auto-incrementing primary 
keys, you’d need to do an accord read/write operation to increment a counter 
somewhere (or just use uuids).

Finally, read committed does lock rows, so you’d still need to perform a read 
on commit to confirm that the rows being written to haven’t been modified since 
the transaction began.

Thanks,

Blake


> On Oct 12, 2021, at 1:54 PM, Henrik Ingo  wrote:
> 
> Hi all
> 
> I was expecting to stay out of the way while a vote on CEP-15 seemed
> imminent. But discussing this tradeoffs thread with Jonathan, he encouraged
> me to say these points in my own words, so here we are.
> 
> 
> On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
> mailto:[email protected]>> wrote:
> 
>> 1. Is it worth giving up local latencies to get full global consistency?
>> Most LWT use cases use
>> LOCAL_SERIAL.
>> 
>> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
>> that prevents performing consensus in one DC and replicating the writes to
>> others. That’s not in scope for the initial work, but there’s no reason it
>> couldn’t be handled as a follow on if needed. I agree with Jeff that
>> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
>> implications, but there are some valid use cases. For instance, you can
>> enable an OLAP service to operate against another DC without impacting the
>> primary, assuming the service can tolerate inconsistency for data written
>> since the last repair, and there are some others.
>> 
>> 
> Let's start with the stated goal that CEP-15 is intended to be a better
> version of LWT.
> 
> Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
> LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
> improvement over LWT. I don't agree that Accord will just be so much faster
> anyway, that it would compensate a single network roundtrip around the
> world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
> order of 10 ms, but global latencies for just a single round trip are
> hundreds of ms.
> 
> So, my suggestion to resolve this discussion would be that "local quorum
> latency experience" should be included in CEP-15 to meet its stated goal.
> If I have understood the CEP process correctly, this merely means that we
> agree this is a valid and significant use case in the Cassandra ecosystem.
> It doesn't mean that everything in the CEP must be released in a single v1
> release. At least personally I don't necessarily need to see a very
> detailed design for the implementation. But I'm optimistic it would resolve
> one open discussion if it was codified in the CEP that this is a use case
> that needs to be addressed.
> 
> 
>> 2. Is it worth giving up the possibility of SQL support, to get the
>> benefits of deterministic transaction design?
>> 
>> This is a false dilemma. Today, we’re proposing a deterministic
>> transaction design that addresses some very common user pain points. SQL
>> addresses different user pain point. If someone wants to add an sql
>> implementation in the future they can a) build it on top of accord b)
>> extend or improve accord or c) implement a separate system. The right
>> choice will depend on their goals, but accord won’t prevent work on it, the
>> same way the original lwt design isn’t preventing work on multi-partition
>> transactions. In the worst case, if the goals of a hypothetical sql project
>> are different enough to make them incompatible with accord, I don’t see any
>> reason why we couldn’t have 2 separate consensus systems, so long as people
>> are willing to maintain them and the use cases and available tec

Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Henrik Ingo
On Tue, Oct 12, 2021 at 11:54 PM Henrik Ingo 
wrote:

> Secondary indexes are supported without any additional work needed.
>
> Correction: The "transaction reads its own writes" feature would require
to also store secondary index keys in the transaction state. These of
course needn't be part of the write set in the commit.

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Henrik Ingo
Hi all

I was expecting to stay out of the way while a vote on CEP-15 seemed
imminent. But discussing this tradeoffs thread with Jonathan, he encouraged
me to say these points in my own words, so here we are.


On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
 wrote:

> 1. Is it worth giving up local latencies to get full global consistency?
> Most LWT use cases use
> LOCAL_SERIAL.
>
> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
> that prevents performing consensus in one DC and replicating the writes to
> others. That’s not in scope for the initial work, but there’s no reason it
> couldn’t be handled as a follow on if needed. I agree with Jeff that
> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
> implications, but there are some valid use cases. For instance, you can
> enable an OLAP service to operate against another DC without impacting the
> primary, assuming the service can tolerate inconsistency for data written
> since the last repair, and there are some others.
>
>
Let's start with the stated goal that CEP-15 is intended to be a better
version of LWT.

Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
improvement over LWT. I don't agree that Accord will just be so much faster
anyway, that it would compensate a single network roundtrip around the
world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
order of 10 ms, but global latencies for just a single round trip are
hundreds of ms.

So, my suggestion to resolve this discussion would be that "local quorum
latency experience" should be included in CEP-15 to meet its stated goal.
If I have understood the CEP process correctly, this merely means that we
agree this is a valid and significant use case in the Cassandra ecosystem.
It doesn't mean that everything in the CEP must be released in a single v1
release. At least personally I don't necessarily need to see a very
detailed design for the implementation. But I'm optimistic it would resolve
one open discussion if it was codified in the CEP that this is a use case
that needs to be addressed.


> 2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> This is a false dilemma. Today, we’re proposing a deterministic
> transaction design that addresses some very common user pain points. SQL
> addresses different user pain point. If someone wants to add an sql
> implementation in the future they can a) build it on top of accord b)
> extend or improve accord or c) implement a separate system. The right
> choice will depend on their goals, but accord won’t prevent work on it, the
> same way the original lwt design isn’t preventing work on multi-partition
> transactions. In the worst case, if the goals of a hypothetical sql project
> are different enough to make them incompatible with accord, I don’t see any
> reason why we couldn’t have 2 separate consensus systems, so long as people
> are willing to maintain them and the use cases and available technologies
> justify it.
>



The part of the discussion that's hard to deal with is "SQL support",
"interactive transactions", or "complex transactions". Even if this is out
of scope for CEP-15, it's a valid question to ask whether Accord would
possibly help, but at least not prevent such future work. (The context
being, Jonathan and myself both think of this as an important long term
goal. You may have figured this out already!)

There are various ways we can get more insight into this question, but
realistically writing a complete CEP (or a dozen CEPs) on "full SQL
support" isn't one of them. On the other hand it seems CEP-15 itself
proposes a conservative approach of developing first version(s) in a
separate repository, from where it could then prove its usefulness! I feel
like the authors have already proposed a conservative approach there that
we can probably work with even without perfect knowledge of the future.



An idea I've been thinking about for a few days is, what would it take to
implement interactive READ COMMITTED transactions on top of Accord? Now,
this may not be an isolation level we want to market as the cool flagship
feature. BUT this exercise does feel meaningful in a few ways:

* First of all, READ COMMITTED *is* a real isolation level in the SQL
standard. So arguably this would be an existence proof of interactive SQL
transactions built on top of Accord.

* It's even the default isolation level in PostgeSQL still today.

* An implementation of such transactions could even be used to benchmark
the performance of such transactions and would give an approximation of how
well Accord is suited for this task. This performance would be "best case"
in the sense that I would expect Snapshot and Serializeable to have worse
performance, but that overhead can be considered as inherent in the
isolation level rather than a fa

Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread [email protected]
Hi Jonathan,

You are missing the woods for the trees here. You outlined several transaction 
systems, and I have demonstrated that Accord brings them *all* closer.

The immediate context of this discussion is that you are unhappy with CEP-15 
due to its impact on a future transaction system. Given the above, it remains 
unclear why this is still an issue.

I’m happy to continue a long-term roadmap discussion, but without specific 
further criticisms of CEP-15 we are long overdue a vote.



From: Jonathan Ellis 
Date: Tuesday, 12 October 2021 at 02:35
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
On Mon, Oct 11, 2021 at 5:11 PM [email protected] 
wrote:

> If we want to fully unpack this particular point, as far as I can tell
> claiming ANSI SQL would indeed require interactive transactions in which
> arbitrary conditional work may be performed by a client within a
> transaction in response to other actions within that transaction.
>
> However:
>
>   1.  The ANSI SQL standard permits these transactions to fail and
> rollback (e.g. in the event that your optimistic transaction fails). So if
> you want to be pedantic, you may modify my statement to “SQL does not
> necessitate support for abort-free interactive transactions” and we can
> leave it there.
>
>   2.  I would personally consider “SQL support” to include the capability
> of defining arbitrary SQL stored procedures that may be executed by clients
> in an interactive session


I note your personal preference and I further note that this is not the
common understanding of "SQL support" in the industry.  If you tell 100
developers that your database supports SQL, then at least 99 of them are
going to assume that you can work with APIs like JDBC that expose
interactive transactions as a central feature, and hence that you will be
reasonably compatible with the vast array of SQL-based applications out
there.

Historical side note: VoltDB tried to convince people that stored
procedures were good enough.  It didn't work, and VoltDB had to add
interactive transactions as fast as they could.

  3.  Most importantly, as I pointed out in the previous email, Accord is
> compatible with a YugaByte/Cockroach-like approach, and indeed makes this
> approach both easier to accomplish and enables stronger isolation than the
> equivalent Raft-based approach. These approaches are able to reduce the
> number of conflicts, at a cost of significantly higher transaction
> management burden.
>

If you're saying that you could use Accord instead of Raft or Paxos, and
layer 2PC on top of that as in Spanner, then I agree, but I don't think
that is a very good design, as you would no longer get any of the benefits
of the deterministic approach you started with.  If you mean something
else, then perhaps an example would help clarify.

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Blake Eggleston
Let’s get back on topic.

Jonathan, in your opening email you stated that, in your view, the 2 main areas 
of tradeoff were:

> 1. Is it worth giving up local latencies to get full global consistency? 

Now we’ve established that we don’t need to give up local latencies with 
Accord, which leaves:

> 2. Is it worth giving up the possibility of SQL support, to get the benefits 
> of deterministic transaction design?

I pointed out that this was a false dilemma and that, in the worst case, a 
hypothetical SQL feature could have it’s own consensus system. I hope that 
won’t be necessary, but as I later pointed out (and you did not address, 
although maybe I should have phrased it as a question), if we’re going to weigh 
accord against a hypothetical SQL feature that lacks design goals, or any clear 
ideas about how it might be implemented, how can we rule that out?

So Jonathan, how can we rule that out? How can we have a productive discussion 
about a feature you yourself are unable to describe in any meaningful detail?

> On Oct 11, 2021, at 6:34 PM, Jonathan Ellis  wrote:
> 
> On Mon, Oct 11, 2021 at 5:11 PM [email protected] 
> wrote:
> 
>> If we want to fully unpack this particular point, as far as I can tell
>> claiming ANSI SQL would indeed require interactive transactions in which
>> arbitrary conditional work may be performed by a client within a
>> transaction in response to other actions within that transaction.
>> 
>> However:
>> 
>>  1.  The ANSI SQL standard permits these transactions to fail and
>> rollback (e.g. in the event that your optimistic transaction fails). So if
>> you want to be pedantic, you may modify my statement to “SQL does not
>> necessitate support for abort-free interactive transactions” and we can
>> leave it there.
>> 
>>  2.  I would personally consider “SQL support” to include the capability
>> of defining arbitrary SQL stored procedures that may be executed by clients
>> in an interactive session
> 
> 
> I note your personal preference and I further note that this is not the
> common understanding of "SQL support" in the industry.  If you tell 100
> developers that your database supports SQL, then at least 99 of them are
> going to assume that you can work with APIs like JDBC that expose
> interactive transactions as a central feature, and hence that you will be
> reasonably compatible with the vast array of SQL-based applications out
> there.
> 
> Historical side note: VoltDB tried to convince people that stored
> procedures were good enough.  It didn't work, and VoltDB had to add
> interactive transactions as fast as they could.
> 
>  3.  Most importantly, as I pointed out in the previous email, Accord is
>> compatible with a YugaByte/Cockroach-like approach, and indeed makes this
>> approach both easier to accomplish and enables stronger isolation than the
>> equivalent Raft-based approach. These approaches are able to reduce the
>> number of conflicts, at a cost of significantly higher transaction
>> management burden.
>> 
> 
> If you're saying that you could use Accord instead of Raft or Paxos, and
> layer 2PC on top of that as in Spanner, then I agree, but I don't think
> that is a very good design, as you would no longer get any of the benefits
> of the deterministic approach you started with.  If you mean something
> else, then perhaps an example would help clarify.
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2021 at 5:11 PM [email protected] 
wrote:

> If we want to fully unpack this particular point, as far as I can tell
> claiming ANSI SQL would indeed require interactive transactions in which
> arbitrary conditional work may be performed by a client within a
> transaction in response to other actions within that transaction.
>
> However:
>
>   1.  The ANSI SQL standard permits these transactions to fail and
> rollback (e.g. in the event that your optimistic transaction fails). So if
> you want to be pedantic, you may modify my statement to “SQL does not
> necessitate support for abort-free interactive transactions” and we can
> leave it there.
>
>   2.  I would personally consider “SQL support” to include the capability
> of defining arbitrary SQL stored procedures that may be executed by clients
> in an interactive session


I note your personal preference and I further note that this is not the
common understanding of "SQL support" in the industry.  If you tell 100
developers that your database supports SQL, then at least 99 of them are
going to assume that you can work with APIs like JDBC that expose
interactive transactions as a central feature, and hence that you will be
reasonably compatible with the vast array of SQL-based applications out
there.

Historical side note: VoltDB tried to convince people that stored
procedures were good enough.  It didn't work, and VoltDB had to add
interactive transactions as fast as they could.

  3.  Most importantly, as I pointed out in the previous email, Accord is
> compatible with a YugaByte/Cockroach-like approach, and indeed makes this
> approach both easier to accomplish and enables stronger isolation than the
> equivalent Raft-based approach. These approaches are able to reduce the
> number of conflicts, at a cost of significantly higher transaction
> management burden.
>

If you're saying that you could use Accord instead of Raft or Paxos, and
layer 2PC on top of that as in Spanner, then I agree, but I don't think
that is a very good design, as you would no longer get any of the benefits
of the deterministic approach you started with.  If you mean something
else, then perhaps an example would help clarify.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread [email protected]
Hi Jonathan,

I would appreciate it if you would respond to all of my email(s), as (at your 
insistence) I spend a great deal of time responding to you. Cherry-picking 
makes these conversations very difficult.

If we want to fully unpack this particular point, as far as I can tell claiming 
ANSI SQL would indeed require interactive transactions in which arbitrary 
conditional work may be performed by a client within a transaction in response 
to other actions within that transaction.

However:

  1.  The ANSI SQL standard permits these transactions to fail and rollback 
(e.g. in the event that your optimistic transaction fails). So if you want to 
be pedantic, you may modify my statement to “SQL does not necessitate support 
for abort-free interactive transactions” and we can leave it there.
  2.  I would personally consider “SQL support” to include the capability of 
defining arbitrary SQL stored procedures that may be executed by clients in an 
interactive session, or interactive sessions where the client must submit 
transactional scripts that may be arbitrarily complex and contingent on prior 
responses, but where each script must be executed within its own transaction. 
For many use cases this would constitute SQL support (and, indeed, I think 
cover every SQL use case in my career).
  3.  Most importantly, as I pointed out in the previous email, Accord is 
compatible with a YugaByte/Cockroach-like approach, and indeed makes this 
approach both easier to accomplish and enables stronger isolation than the 
equivalent Raft-based approach. These approaches are able to reduce the number 
of conflicts, at a cost of significantly higher transaction management burden.

In summary we have all options on the table. Not only does CEP-15 not close any 
doors, it brings them all a step closer. If you have a strong opinion about 
which (if any) of these approaches we pursue post CEP-15, I would love to have 
this conversation. However, this should not block the adoption of CEP-15, since 
they are not in conflict.


From: Jonathan Ellis 
Date: Monday, 11 October 2021 at 22:20
To: dev 
Subject: Re: Tradeoffs for Cassandra transaction management
Hi Benedict,

Yes, interactive transactions are a necessary part of SQL support (as
opposed to a tiny subset of SQL that matches CQL semantics, I don't know
any other way to make sense of your claim that "SQL does not necessitate
support for interactive transactions").

I still don't understand how you're saying we could implement interactive
transactions on top of a deterministic transaction manager.  In the other
thread you said that "Interactive transactions are possible on top of
Accord, as are transactions with an unknown read/write set. In each case
the only cost is that they would use optimistic concurrency control, which
is no worse than spanner derivatives anyway" but this is not correct,
interactive transactions are substantially more difficult to support than
transactions with unknown read/write set, as I outlined in the email to
kick off this thread.

On Sun, Oct 10, 2021 at 4:05 AM [email protected] 
wrote:

> Hi Jonathan,
>
> I will summarise my position below, that I have outlined at various points
> in the other thread, and then I would be interested to hear how you propose
> we move forwards. I will commit to responding the same day to any email I
> receive before 7pm GMT, and to engaging with each of your points. I would
> appreciate it if you could make similar commitments so that we may conclude
> this discussion in a reasonable time frame and conduct a vote on CEP-15.
>
> I also reiterate my standing invitation to an open video chat, to discuss
> anything you like, for as long as you like. Please nominate a suitable time
> and day.
>
> ==TL;DR==
> CEP-15 does not narrow our future options, it only broadens them. Accord
> is a distributed consensus protocol, so these techniques may build upon it
> without penalty. Alternatively, these approaches may simply live alongside
> Accord.
>
> Since these alternative approaches do not achieve the goals of the CEP,
> and this CEP only enhances your ability to pursue them, it seems hard to
> conclude it should not proceed.
>
> ==Goals==
> Our goals are first order principles: we want strict serializable
> cross-shard isolation that is highly available and can be scaled while
> maintaining optimal and predictable latency. Anything less, and the CEP is
> not achieved.
>
> As outlined already (except SLOG, which I address below), these
> alternative approaches do not achieve these goals.
>
> ==Compatibility with other approaches==
> 0. In general, research systems are not irreducible - they are an assembly
> of ideas that can be mixed together. Accord is a distributed consensus
> protocol. These other protocols may utilise it without penalty for
> consensus, in many cases

Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
xities of re-homing data, thus
> avoiding these unpredictable performance characteristics.
>
> For those use cases that do not require high availability, it would be
> possible to implement a “home” region setup with Accord, as with SLOG. This
> is not an idea that is exclusive to this particular system. We even
> discussed this briefly in the call, as some use cases do indeed prefer this
> trade-off.
>
> SLOG additionally offers a kind of “home group” multi-home optimisation
> for clusters with many regions, that accept availability loss if fewer than
> half of their regions fail (e.g. in the paper 6 regions in pairs of 2 for
> availability). This is also exploitable by Accord, and something we can
> pursue as a future optimisation, as users explore such topologies in the
> real world.
>
> ==Responding to specific points==
>
> >because it was asserted in the CEP-15 thread that Accord could support
> SQL by applying known techniques on top. This is mistaken. Deterministic
> systems like Calvin or SLOG or Accord can support queries where the rows
> affected are not known in advance using a technique that Abadi calls OLLP
>
> Language is hard and it is easy to conflate things. Here you seem to be
> discussing abort-free interactive transactions, not SQL. SQL does not
> necessitate support for interactive transactions, let alone abort-free
> ones. The technique you mention can support SQL scripts, and also
> interactive client transactions that may be aborted by the server. However,
> see [1] which may support all of these properties.
>
>
>
> From: Blake Eggleston 
> Date: Sunday, 10 October 2021 at 05:17
> To: [email protected] 
> Subject: Re: Tradeoffs for Cassandra transaction management
> 1. Is it worth giving up local latencies to get full global consistency?
> Most LWT use cases use
> LOCAL_SERIAL.
>
> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
> that prevents performing consensus in one DC and replicating the writes to
> others. That’s not in scope for the initial work, but there’s no reason it
> couldn’t be handled as a follow on if needed. I agree with Jeff that
> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
> implications, but there are some valid use cases. For instance, you can
> enable an OLAP service to operate against another DC without impacting the
> primary, assuming the service can tolerate inconsistency for data written
> since the last repair, and there are some others.
>
> 2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> This is a false dilemma. Today, we’re proposing a deterministic
> transaction design that addresses some very common user pain points. SQL
> addresses different user pain point. If someone wants to add an sql
> implementation in the future they can a) build it on top of accord b)
> extend or improve accord or c) implement a separate system. The right
> choice will depend on their goals, but accord won’t prevent work on it, the
> same way the original lwt design isn’t preventing work on multi-partition
> transactions. In the worst case, if the goals of a hypothetical sql project
> are different enough to make them incompatible with accord, I don’t see any
> reason why we couldn’t have 2 separate consensus systems, so long as people
> are willing to maintain them and the use cases and available technologies
> justify it.
>
> -Blake
>
> > On Oct 9, 2021, at 9:54 AM, Jonathan Ellis  wrote:
> >
> > * Hi all,After calling several times for a broader discussion of goals
> and
> > tradeoffs around transaction management in the CEP-15 thread, I’ve put
> > together a short analysis to kick that off.Here is a table that
> summarizes
> > the state of the art for distributed transactions that offer
> > serializability, i.e., a superset of what you can get with LWT.  (The
> most
> > interesting option that this eliminates is RAMP.)Since I'm not sure how
> > this will render outside gmail, I've also uploaded it here:
> > https://imgur.com/a/SCZ8jex
> > <https://imgur.com/a/SCZ8jex>SpannerCockroachCalvin/FaunaSLOG (see
> > below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> > intercontinental replication this is 100+ms.  Cloud Spanner does not
> allow
> > truly global deployments for this reason.Single-region Paxos, plus 2pc.
> > I’m not very clear on how this works but it results in non-strict
> > serializability.I didn’t find actual numbers for CR other than “2ms in a
> > single AZ” which is not a typical scenario.Global Raft.  Fauna posts
> actual
> > numbers of ~70ms in production which I a

Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2021 at 12:31 PM Blake Eggleston
 wrote:

> > Come on Blake, you have all been developing software long enough to know
> > that "there's nothing about Accord that prevents this" is close to
> > meaningless.
> >
> > If it's so easy to address an overwhelmingly popular use case, then let's
> > add it to the initial work.
>
> This is moving the goal posts. The concern I was addressing implied this
> wasn’t possible with Accord and asked if we should prefer “a design that
> allows local serialization with EC between regions”. Accord is a design
> that allows this, and support for it is an implementation detail. Whether
> or not it’s in scope for the initial work is a project planning discussion,
> not a transaction management protocol tradeoff discussion.
>

I didn't think I had, but I went back to check and you're right, I did
imply that this wasn't possible with Accord.  I stand corrected, thank you.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Blake Eggleston
> Come on Blake, you have all been developing software long enough to know
> that "there's nothing about Accord that prevents this" is close to
> meaningless.
> 
> If it's so easy to address an overwhelmingly popular use case, then let's
> add it to the initial work.


This is moving the goal posts. The concern I was addressing implied this wasn’t 
possible with Accord and asked if we should prefer “a design that allows local 
serialization with EC between regions”. Accord is a design that allows this, 
and support for it is an implementation detail. Whether or not it’s in scope 
for the initial work is a project planning discussion, not a transaction 
management protocol tradeoff discussion.

> I think this is the crux of our disagreement, I very much want to avoid a
> future where we have to maintain two separate consensus systems.


I want to avoid it also, but if we’re going to compare Accord against a 
hypothetical SQL feature that seems to lack design goals, or any clear ideas 
about how it might be implemented, I don’t think we can rule it out.


> On Oct 11, 2021, at 6:02 AM, Jonathan Ellis  wrote:
> 
> On Sat, Oct 9, 2021 at 11:23 PM Blake Eggleston
> mailto:[email protected]>> wrote:
> 
>> 1. Is it worth giving up local latencies to get full global consistency?
>> Most LWT use cases use
>> LOCAL_SERIAL.
>> 
>> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
>> that prevents performing consensus in one DC and replicating the writes to
>> others. That’s not in scope for the initial work, but there’s no reason it
>> couldn’t be handled as a follow on if needed. I agree with Jeff that
>> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
>> implications, but there are some valid use cases. For instance, you can
>> enable an OLAP service to operate against another DC without impacting the
>> primary, assuming the service can tolerate inconsistency for data written
>> since the last repair, and there are some others.
>> 
> 
> Come on Blake, you have all been developing software long enough to know
> that "there's nothing about Accord that prevents this" is close to
> meaningless.
> 
> If it's so easy to address an overwhelmingly popular use case, then let's
> add it to the initial work.
> 
> 2. Is it worth giving up the possibility of SQL support, to get the
>> benefits of deterministic transaction design?
>> 
>> This is a false dilemma. Today, we’re proposing a deterministic
>> transaction design that addresses some very common user pain points. SQL
>> addresses different user pain point. If someone wants to add an sql
>> implementation in the future they can a) build it on top of accord b)
>> extend or improve accord or c) implement a separate system. The right
>> choice will depend on their goals, but accord won’t prevent work on it, the
>> same way the original lwt design isn’t preventing work on multi-partition
>> transactions. In the worst case, if the goals of a hypothetical sql project
>> are different enough to make them incompatible with accord, I don’t see any
>> reason why we couldn’t have 2 separate consensus systems, so long as people
>> are willing to maintain them and the use cases and available technologies
>> justify it.
>> 
> 
> I think this is the crux of our disagreement, I very much want to avoid a
> future where we have to maintain two separate consensus systems.



Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
Thanks for flagging that, Alex.  Here it is without trying to include the
inline table:





















*After calling several times for a broader discussion of goals and
tradeoffs around transaction management in the CEP-15 thread, I’ve put
together a short analysis to kick that off.Here is a table that summarizes
the state of the art for distributed transactions that offer
serializability, i.e., a superset of what you can get with LWT.  (The most
interesting option that this eliminates is
RAMP.)https://imgur.com/a/SCZ8jex (I have not
included Accord here because it’s not sufficiently clear to me how to
create a full transaction manager from the Accord protocol, so I can’t
analyze many of the properties such a system would have.  The most obvious
solution would be “Calvin but with Accord instead of Raft”, but since
Accord already does some Calvin-like things that seems like it would result
in some suboptimal redundancy.)After putting the above together it seems to
me that the two main areas of tradeoff are, 1. Is it worth giving up local
latencies to get full global consistency?  Most LWT use cases use
LOCAL_SERIAL.  While all of the above have more efficient designs than LWT,
it’s still true that global serialization will require 100+ms in the
general case due to physical transmission latency.  So a design that allows
local serialization with EC between regions, or a design (like SLOG) that
automatically infers a “home” region that can do local consensus in the
common case without giving up global serializability, is desirable.2. Is it
worth giving up the possibility of SQL support, to get the benefits of
deterministic transaction design?  To be clear, these benefits include very
significant ones around simplicity of design, higher write throughput, and
(in SLOG) lower read and write latencies.I’ll doubleclick on #2 because it
was asserted in the CEP-15 thread that Accord could support SQL by applying
known techniques on top.  This is mistaken.  Deterministic systems like
Calvin or SLOG or Accord can support queries where the rows affected are
not known in advance using a technique that Abadi calls OLLP (Optimistic
Lock Location Prediction), but this does not help when the transaction
logic is not known in advance.Here is Daniel Abadi’s explanation of OLLP
from “An Overview of Deterministic Database Systems
:”In
practice, deterministic database systems that use ordered locking do not
wait until runtime for transactions to determine their access-sets.
Instead, they use a technique called OLLP where if a transaction does not
know its access-sets in advance, it is not inserted into the input log.
Instead, it is run in a trial mode that does not write to the database
state, but determines what it would have read or written to if it was
actually being processed. It is then annotated with the access-sets
determined during the trial run, and submitted to the input log for actual
processing. In the actual run, every replica processes the transaction
deterministically, acquiring locks for the transaction based on the
estimate from the trial run. In some cases, database state may have changed
in a way that the access sets estimates are now incorrect. Since a
transaction cannot read or write data for which it does not have a lock, it
must abort as soon as it realizes that it acquired the wrong set of locks.
But since the transaction is being processed deterministically at this
point, every replica will independently come to the same conclusion that
the wrong set of locks were acquired, and will all independently decide to
abort the transaction. The transaction then gets resubmitted to the input
log with the new access-set estimates annotated.Clearly this does not work
if the server-visible logic changes between runs.  For instance, consider
this simple interactive transaction:cursor.execute("BEGIN
TRANSACTION")count = cursor.execute("SELECT count FROM inventory WHERE id =
1").result[0]if count > 0:cursor.execute("UPDATE inventory SET count =
count - 1 WHERE id = 1")cursor.execute("COMMIT TRANSACTION")The first
problem is that it’s far from clear how to do a “trial run” of a
transaction that the server only knows pieces of at a time.  But even
worse, the server only knows that it got either a SELECT, or a SELECT
followed by an UPDATE.  It doesn’t know anything about the logic that would
drive a change in those statements.  So if the value read changes between
trial run and execution, there is no possibility of transparently retrying,
you’re just screwed and have to report failure.So Abadi concludes,[A]ll
recent [deterministic database] implementations have limited or no support
for interactive transactions, thereby preventing their use in many existing
deployments. If the advantages of deterministic database systems will be
realized in the coming years, one of two th

Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Oleksandr Petrov
I realise this is not contributing to this discussion, but this email is
very difficult to read because it seems like something has happened with
formatting. For me it gets displayed as a single paragraph with no line
breaks.

There seems to be some overlap between the image uploaded to imgur and this
email, but some things are only present in the email and not on the image.

On Sat, Oct 9, 2021 at 6:54 PM Jonathan Ellis  wrote:

> * Hi all,After calling several times for a broader discussion of goals and
> tradeoffs around transaction management in the CEP-15 thread, I’ve put
> together a short analysis to kick that off.Here is a table that summarizes
> the state of the art for distributed transactions that offer
> serializability, i.e., a superset of what you can get with LWT.  (The most
> interesting option that this eliminates is RAMP.)Since I'm not sure how
> this will render outside gmail, I've also uploaded it here:
> https://imgur.com/a/SCZ8jex
> SpannerCockroachCalvin/FaunaSLOG (see
> below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> intercontinental replication this is 100+ms.  Cloud Spanner does not allow
> truly global deployments for this reason.Single-region Paxos, plus 2pc.
> I’m not very clear on how this works but it results in non-strict
> serializability.I didn’t find actual numbers for CR other than “2ms in a
> single AZ” which is not a typical scenario.Global Raft.  Fauna posts actual
> numbers of ~70ms in production which I assume corresponds to a multi-region
> deployment with all regions in the USA.  SLOG paper says true global Calvin
> is 200+ms.Single-region Paxos (common case) with fallback to multi-region
> Paxos.Under 10ms.Scalability bottlenecksLocks held during cross-region
> replicationSame as SpannerOLLP approach required when PKs are not known in
> advance (mostly for indexed queries) -- results in retries under
> contentionSame as CalvinRead latency at serial consistencyTimestamp from
> Paxos leader (may be cross-region), then read from local replica.Same as
> Spanner, I thinkSame as writesSame as writesMaximum serializability
> flavorStrictUn-strictStrictStrictSupport for other isolation
> levels?SnapshotNoSnapshot (in Fauna)Paper mentions dropping from
> strict-serializable to only serializable.  Probably could also support
> Snapshot like Fauna.Interactive transaction support (req’d for
> SQL)YesYesNoNoPotential for grafting onto C*NightmareNightmareReasonable,
> Calvin is relatively simple and the storage assumptions it makes are
> minimalI haven’t thought about this enough. SLOG may require versioned
> storage, e.g. see this comment
> <
> http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html?showComment=1570497003296#c5976719429355924873
> >.(I
> have not included Accord here because it’s not sufficiently clear to me how
> to create a full transaction manager from the Accord protocol, so I can’t
> analyze many of the properties such a system would have.  The most obvious
> solution would be “Calvin but with Accord instead of Raft”, but since
> Accord already does some Calvin-like things that seems like it would result
> in some suboptimal redundancy.)After putting the above together it seems to
> me that the two main areas of tradeoff are, 1. Is it worth giving up local
> latencies to get full global consistency?  Most LWT use cases use
> LOCAL_SERIAL.  While all of the above have more efficient designs than LWT,
> it’s still true that global serialization will require 100+ms in the
> general case due to physical transmission latency.  So a design that allows
> local serialization with EC between regions, or a design (like SLOG) that
> automatically infers a “home” region that can do local consensus in the
> common case without giving up global serializability, is desirable.2. Is it
> worth giving up the possibility of SQL support, to get the benefits of
> deterministic transaction design?  To be clear, these benefits include very
> significant ones around simplicity of design, higher write throughput, and
> (in SLOG) lower read and write latencies.I’ll doubleclick on #2 because it
> was asserted in the CEP-15 thread that Accord could support SQL by applying
> known techniques on top.  This is mistaken.  Deterministic systems like
> Calvin or SLOG or Accord can support queries where the rows affected are
> not known in advance using a technique that Abadi calls OLLP (Optimistic
> Lock Location Prediction), but this does not help when the transaction
> logic is not known in advance.Here is Daniel Abadi’s explanation of OLLP
> from “An Overview of Deterministic Database Systems
> <
> https://cacm.acm.org/magazines/2018/9/230601-an-overview-of-deterministic-database-systems/fulltext?mobile=false
> >:”In
> practice, deterministic database systems that use ordered locking do not
> wait until runtime for transactions to determine their access-sets.
> Instead, they use a technique called OLLP where 

Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
On Sat, Oct 9, 2021 at 11:23 PM Blake Eggleston
 wrote:

> 1. Is it worth giving up local latencies to get full global consistency?
> Most LWT use cases use
> LOCAL_SERIAL.
>
> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
> that prevents performing consensus in one DC and replicating the writes to
> others. That’s not in scope for the initial work, but there’s no reason it
> couldn’t be handled as a follow on if needed. I agree with Jeff that
> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
> implications, but there are some valid use cases. For instance, you can
> enable an OLAP service to operate against another DC without impacting the
> primary, assuming the service can tolerate inconsistency for data written
> since the last repair, and there are some others.
>

Come on Blake, you have all been developing software long enough to know
that "there's nothing about Accord that prevents this" is close to
meaningless.

If it's so easy to address an overwhelmingly popular use case, then let's
add it to the initial work.

2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> This is a false dilemma. Today, we’re proposing a deterministic
> transaction design that addresses some very common user pain points. SQL
> addresses different user pain point. If someone wants to add an sql
> implementation in the future they can a) build it on top of accord b)
> extend or improve accord or c) implement a separate system. The right
> choice will depend on their goals, but accord won’t prevent work on it, the
> same way the original lwt design isn’t preventing work on multi-partition
> transactions. In the worst case, if the goals of a hypothetical sql project
> are different enough to make them incompatible with accord, I don’t see any
> reason why we couldn’t have 2 separate consensus systems, so long as people
> are willing to maintain them and the use cases and available technologies
> justify it.
>

 I think this is the crux of our disagreement, I very much want to avoid a
future where we have to maintain two separate consensus systems.


Re: Tradeoffs for Cassandra transaction management

2021-10-11 Thread Jonathan Ellis
On Sat, Oct 9, 2021 at 7:20 PM Jeff Jirsa  wrote:

> Most LWT use cases use LOCAL_SERIAL because the difference in latency is
> huge today (given the 4x RTTs) AND almost none of the users actually
> understand how cassandra replication or consistency works, so they
> misunderstand the guarantees provided by the choice they make. When
> informed of the actual tradeoffs, a LOT of those users switch to SERIAL.
>

This doesn't match my experience.  I know of exactly two DataStax customers
using SERIAL; I remember them because they're so unusual.  On the other
hand, I've talked to a dozen plus using LOCAL_SERIAL.

I could try to get more exact numbers if it would help but back of the
envelope, 5:1 in favor of LOCAL is about right.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Tradeoffs for Cassandra transaction management

2021-10-10 Thread [email protected]
necessitate support for interactive transactions, let alone abort-free ones. 
The technique you mention can support SQL scripts, and also interactive client 
transactions that may be aborted by the server. However, see [1] which may 
support all of these properties.



From: Blake Eggleston 
Date: Sunday, 10 October 2021 at 05:17
To: [email protected] 
Subject: Re: Tradeoffs for Cassandra transaction management
1. Is it worth giving up local latencies to get full global consistency? Most 
LWT use cases use
LOCAL_SERIAL.

This isn’t a tradeoff that needs to be made. There’s nothing about Accord that 
prevents performing consensus in one DC and replicating the writes to others. 
That’s not in scope for the initial work, but there’s no reason it couldn’t be 
handled as a follow on if needed. I agree with Jeff that LOCAL_SERIAL and LWTs 
are not usually done with a full understanding of the implications, but there 
are some valid use cases. For instance, you can enable an OLAP service to 
operate against another DC without impacting the primary, assuming the service 
can tolerate inconsistency for data written since the last repair, and there 
are some others.

2. Is it worth giving up the possibility of SQL support, to get the benefits of 
deterministic transaction design?

This is a false dilemma. Today, we’re proposing a deterministic transaction 
design that addresses some very common user pain points. SQL addresses 
different user pain point. If someone wants to add an sql implementation in the 
future they can a) build it on top of accord b) extend or improve accord or c) 
implement a separate system. The right choice will depend on their goals, but 
accord won’t prevent work on it, the same way the original lwt design isn’t 
preventing work on multi-partition transactions. In the worst case, if the 
goals of a hypothetical sql project are different enough to make them 
incompatible with accord, I don’t see any reason why we couldn’t have 2 
separate consensus systems, so long as people are willing to maintain them and 
the use cases and available technologies justify it.

-Blake

> On Oct 9, 2021, at 9:54 AM, Jonathan Ellis  wrote:
>
> * Hi all,After calling several times for a broader discussion of goals and
> tradeoffs around transaction management in the CEP-15 thread, I’ve put
> together a short analysis to kick that off.Here is a table that summarizes
> the state of the art for distributed transactions that offer
> serializability, i.e., a superset of what you can get with LWT.  (The most
> interesting option that this eliminates is RAMP.)Since I'm not sure how
> this will render outside gmail, I've also uploaded it here:
> https://imgur.com/a/SCZ8jex
> <https://imgur.com/a/SCZ8jex>SpannerCockroachCalvin/FaunaSLOG (see
> below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> intercontinental replication this is 100+ms.  Cloud Spanner does not allow
> truly global deployments for this reason.Single-region Paxos, plus 2pc.
> I’m not very clear on how this works but it results in non-strict
> serializability.I didn’t find actual numbers for CR other than “2ms in a
> single AZ” which is not a typical scenario.Global Raft.  Fauna posts actual
> numbers of ~70ms in production which I assume corresponds to a multi-region
> deployment with all regions in the USA.  SLOG paper says true global Calvin
> is 200+ms.Single-region Paxos (common case) with fallback to multi-region
> Paxos.Under 10ms.Scalability bottlenecksLocks held during cross-region
> replicationSame as SpannerOLLP approach required when PKs are not known in
> advance (mostly for indexed queries) -- results in retries under
> contentionSame as CalvinRead latency at serial consistencyTimestamp from
> Paxos leader (may be cross-region), then read from local replica.Same as
> Spanner, I thinkSame as writesSame as writesMaximum serializability
> flavorStrictUn-strictStrictStrictSupport for other isolation
> levels?SnapshotNoSnapshot (in Fauna)Paper mentions dropping from
> strict-serializable to only serializable.  Probably could also support
> Snapshot like Fauna.Interactive transaction support (req’d for
> SQL)YesYesNoNoPotential for grafting onto C*NightmareNightmareReasonable,
> Calvin is relatively simple and the storage assumptions it makes are
> minimalI haven’t thought about this enough. SLOG may require versioned
> storage, e.g. see this comment
> <http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html?showComment=1570497003296#c5976719429355924873>.(I
> have not included Accord here because it’s not sufficiently clear to me how
> to create a full transaction manager from the Accord protocol, so I can’t
> analyze many of the properties such a system would have.  The most obvious
> solution would be “Calvin but with Accord instead of Raft”, but since
> Acco

Re: Tradeoffs for Cassandra transaction management

2021-10-09 Thread Blake Eggleston
1. Is it worth giving up local latencies to get full global consistency? Most 
LWT use cases use
LOCAL_SERIAL.

This isn’t a tradeoff that needs to be made. There’s nothing about Accord that 
prevents performing consensus in one DC and replicating the writes to others. 
That’s not in scope for the initial work, but there’s no reason it couldn’t be 
handled as a follow on if needed. I agree with Jeff that LOCAL_SERIAL and LWTs 
are not usually done with a full understanding of the implications, but there 
are some valid use cases. For instance, you can enable an OLAP service to 
operate against another DC without impacting the primary, assuming the service 
can tolerate inconsistency for data written since the last repair, and there 
are some others.

2. Is it worth giving up the possibility of SQL support, to get the benefits of 
deterministic transaction design? 

This is a false dilemma. Today, we’re proposing a deterministic transaction 
design that addresses some very common user pain points. SQL addresses 
different user pain point. If someone wants to add an sql implementation in the 
future they can a) build it on top of accord b) extend or improve accord or c) 
implement a separate system. The right choice will depend on their goals, but 
accord won’t prevent work on it, the same way the original lwt design isn’t 
preventing work on multi-partition transactions. In the worst case, if the 
goals of a hypothetical sql project are different enough to make them 
incompatible with accord, I don’t see any reason why we couldn’t have 2 
separate consensus systems, so long as people are willing to maintain them and 
the use cases and available technologies justify it.

-Blake

> On Oct 9, 2021, at 9:54 AM, Jonathan Ellis  wrote:
> 
> * Hi all,After calling several times for a broader discussion of goals and
> tradeoffs around transaction management in the CEP-15 thread, I’ve put
> together a short analysis to kick that off.Here is a table that summarizes
> the state of the art for distributed transactions that offer
> serializability, i.e., a superset of what you can get with LWT.  (The most
> interesting option that this eliminates is RAMP.)Since I'm not sure how
> this will render outside gmail, I've also uploaded it here:
> https://imgur.com/a/SCZ8jex
> SpannerCockroachCalvin/FaunaSLOG (see
> below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> intercontinental replication this is 100+ms.  Cloud Spanner does not allow
> truly global deployments for this reason.Single-region Paxos, plus 2pc.
> I’m not very clear on how this works but it results in non-strict
> serializability.I didn’t find actual numbers for CR other than “2ms in a
> single AZ” which is not a typical scenario.Global Raft.  Fauna posts actual
> numbers of ~70ms in production which I assume corresponds to a multi-region
> deployment with all regions in the USA.  SLOG paper says true global Calvin
> is 200+ms.Single-region Paxos (common case) with fallback to multi-region
> Paxos.Under 10ms.Scalability bottlenecksLocks held during cross-region
> replicationSame as SpannerOLLP approach required when PKs are not known in
> advance (mostly for indexed queries) -- results in retries under
> contentionSame as CalvinRead latency at serial consistencyTimestamp from
> Paxos leader (may be cross-region), then read from local replica.Same as
> Spanner, I thinkSame as writesSame as writesMaximum serializability
> flavorStrictUn-strictStrictStrictSupport for other isolation
> levels?SnapshotNoSnapshot (in Fauna)Paper mentions dropping from
> strict-serializable to only serializable.  Probably could also support
> Snapshot like Fauna.Interactive transaction support (req’d for
> SQL)YesYesNoNoPotential for grafting onto C*NightmareNightmareReasonable,
> Calvin is relatively simple and the storage assumptions it makes are
> minimalI haven’t thought about this enough. SLOG may require versioned
> storage, e.g. see this comment
> .(I
> have not included Accord here because it’s not sufficiently clear to me how
> to create a full transaction manager from the Accord protocol, so I can’t
> analyze many of the properties such a system would have.  The most obvious
> solution would be “Calvin but with Accord instead of Raft”, but since
> Accord already does some Calvin-like things that seems like it would result
> in some suboptimal redundancy.)After putting the above together it seems to
> me that the two main areas of tradeoff are, 1. Is it worth giving up local
> latencies to get full global consistency?  Most LWT use cases use
> LOCAL_SERIAL.  While all of the above have more efficient designs than LWT,
> it’s still true that global serialization will require 100+ms in the
> general case due to physical transmission latency.  So a design that allows
> local serialization with EC between

Re: Tradeoffs for Cassandra transaction management

2021-10-09 Thread Jeff Jirsa
I'll read more of this in a bit, I want to make sure I fully digest it
before commenting on the rest, but this block here deserves a few words:


On Sat, Oct 9, 2021 at 9:54 AM Jonathan Ellis  wrote:

> After putting the above together it seems to
> me that the two main areas of tradeoff are, 1. Is it worth giving up local
> latencies to get full global consistency?  Most LWT use cases use
> LOCAL_SERIAL.


Most LWT use cases use LOCAL_SERIAL because the difference in latency is
huge today (given the 4x RTTs) AND almost none of the users actually
understand how cassandra replication or consistency works, so they
misunderstand the guarantees provided by the choice they make. When
informed of the actual tradeoffs, a LOT of those users switch to SERIAL.