Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread [email protected]
So, if we track per-key timestamps we are able to perform stale reads to 
assemble our transaction, and validate them on commit. This likely leads to 
much faster transactions than any other approach, as the interactive part all 
remains local.

If we instead perform Accord operations for every read operation within the 
transaction then I believe it would be safe to use the initiating timestamp, 
though this might also result in some additional aborts that would have been 
unnecessary (where a later read encounters a write that is newer than the 
initiating timestamp, but that might well be serializable/strict serializable).

If we perform only local reads and use the initiating timestamp only then we 
cannot be certain that we did not miss an earlier write than our timestamp that 
had not been replicated to us on a key that was not read by the initial 
operation.

From: Henrik Ingo 
Date: Wednesday, 13 October 2021 at 15:50
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Oct 13, 2021 at 2:54 PM [email protected] 
wrote:

> I think this is a blurring of lines of systems however. I _think_ the
> point Alex is making (correct me if I’m wrong) is that the transaction
> system will need to track the transaction timestamps that were witnessed by
> each read for each key, in order to verify that they remain valid on
> commit.


Isn't it sufficient to simply verify that there were no conflicting writes
between a start timestamp of the transaction and the commit timestamp?

I can imagine verifying the timestamp of each row or cell could result in
"finer grained" dependency checking and therefore cause less aborts due to
occ.

henrik
--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 2:54 PM [email protected] 
wrote:

> I think this is a blurring of lines of systems however. I _think_ the
> point Alex is making (correct me if I’m wrong) is that the transaction
> system will need to track the transaction timestamps that were witnessed by
> each read for each key, in order to verify that they remain valid on
> commit.


Isn't it sufficient to simply verify that there were no conflicting writes
between a start timestamp of the transaction and the commit timestamp?

I can imagine verifying the timestamp of each row or cell could result in
"finer grained" dependency checking and therefore cause less aborts due to
occ.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread [email protected]
> In the context of Cassandra, I had actually assumed the Accord timestamp will 
> be used as the cell timestamp for each value? Isn't something like this 
> needed for compaction to work correctly too?

Yes, though we are likely to apply some kind of compression to the timestamp, 
as global timestamps may not fit in a single long and I would prefer not to 
burden the storage system with that complexity. So, probably, when multiple 
transactions are agreed with the same wall clock but different global 
timestamps we are likely to increment the timestamp that is applied to the 
local node. That is to say, the storage timestamp will be derived from the 
transaction timestamp and the transaction timestamps of its dependencies. In 
reality this will come into play very rarely, of course.

I think this is a blurring of lines of systems however. I _think_ the point 
Alex is making (correct me if I’m wrong) is that the transaction system will 
need to track the transaction timestamps that were witnessed by each read for 
each key, in order to verify that they remain valid on commit. These might both 
be fetched from the storage system on each round (or might be from Accord’s 
non-interactive transaction bookkeeping), but the _interactive_ transaction 
bookkeeping will need to maintain these values separately as part of the 
interactive transaction state (perhaps on the client).

> Alternatively … some backpressure mechanism seems necessary to throttle new
transactions while previously committed ones are still being applied

Yes, this is something I envisage being desirable even without complex 
transactions to prevent DOS problems. We likely want to prevent new 
transactions from being started if the dependency set they would adopt is too 
large, and I think this is relatively straightforward.


From: Henrik Ingo 
Date: Wednesday, 13 October 2021 at 11:25
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Oct 13, 2021 at 12:25 AM Alex Miller  wrote:

> I have, purely out of laziness, been engaging on this topic on ASF Slack as
> opposed to dev@[1].  Benedict has been overly generous in answering
> questions and considering future optimizations there, but it means that I
> inadvertently forked the conversation on this topic.  To bring the
> highlights of that conversation back to the dev list:
>
>
Thanks for contributing to the discussion Alex! Your points and experience
seem rather valuable.




> == Interactive Transactions
>
>
Heh, it seems we sent these almost concurrently :-) Thanks for contributing
this. I think for many readers debating concrete examples is easier, even
if we are talking about future opportunities that's not in scope for the
CEP. It helps to see a path forward.


We also had a bit of discussion over implementation constraints on the
> conflict checking.  Without supporting optimistic transactions, Accord only
> needs to keep track of the read/write sets of transactions which are still
> in flight.  To support optimistic transactions, Accord would need to
> bookkeep the most recent timestamp at which the key was modified, for every
> key.  There's some databases (e.g. CockroachDB, FoundationDB) which have a
> similar need, and use similar data structures which could be copied.
>
>
In the context of Cassandra, I had actually assumed the Accord timestamp
will be used as the cell timestamp for each value? Isn't something like
this needed for compaction to work correctly too?

Committing a transaction before execution means the database is committed
> to performing the deferred work of transaction execution.  In some fashion,
> the expressiveness and complexity of the query language needs to be
> constrained to place limitations on the execution time or resources. Fauna
> invented FQL with a specific set of limitations for a presumable reason.
> CQL seems to already be a reasonably limited query language that doesn't
> easily lend itself to succinctly expressing an incredulous amount of work,
> which would make it already reasonably suited as a query language for
> Accord.
>
>
Alternatively - in a future where the query language evolves to be more
complex - some backpressure mechanism seems necessary to throttle new
transactions while previously committed ones are still being applied. (For
those of you that started reading up on Galera from my previous email, see
"flow control")




> Any query which can't pre-declare its read and write sets must attempt to
> pre-execute enough of the query to determine them, and then submit the
> transaction as optimistic on all values read during the partial execution
> still being untouched.  Most notably, all workloads that utilize secondary
> indexes are affected, and degrade from being guaranteed to commit, to being
> optimistic and potentially requiri

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 12:25 AM Alex Miller  wrote:

> I have, purely out of laziness, been engaging on this topic on ASF Slack as
> opposed to dev@[1].  Benedict has been overly generous in answering
> questions and considering future optimizations there, but it means that I
> inadvertently forked the conversation on this topic.  To bring the
> highlights of that conversation back to the dev list:
>
>
Thanks for contributing to the discussion Alex! Your points and experience
seem rather valuable.




> == Interactive Transactions
>
>
Heh, it seems we sent these almost concurrently :-) Thanks for contributing
this. I think for many readers debating concrete examples is easier, even
if we are talking about future opportunities that's not in scope for the
CEP. It helps to see a path forward.


We also had a bit of discussion over implementation constraints on the
> conflict checking.  Without supporting optimistic transactions, Accord only
> needs to keep track of the read/write sets of transactions which are still
> in flight.  To support optimistic transactions, Accord would need to
> bookkeep the most recent timestamp at which the key was modified, for every
> key.  There's some databases (e.g. CockroachDB, FoundationDB) which have a
> similar need, and use similar data structures which could be copied.
>
>
In the context of Cassandra, I had actually assumed the Accord timestamp
will be used as the cell timestamp for each value? Isn't something like
this needed for compaction to work correctly too?

Committing a transaction before execution means the database is committed
> to performing the deferred work of transaction execution.  In some fashion,
> the expressiveness and complexity of the query language needs to be
> constrained to place limitations on the execution time or resources. Fauna
> invented FQL with a specific set of limitations for a presumable reason.
> CQL seems to already be a reasonably limited query language that doesn't
> easily lend itself to succinctly expressing an incredulous amount of work,
> which would make it already reasonably suited as a query language for
> Accord.
>
>
Alternatively - in a future where the query language evolves to be more
complex - some backpressure mechanism seems necessary to throttle new
transactions while previously committed ones are still being applied. (For
those of you that started reading up on Galera from my previous email, see
"flow control")




> Any query which can't pre-declare its read and write sets must attempt to
> pre-execute enough of the query to determine them, and then submit the
> transaction as optimistic on all values read during the partial execution
> still being untouched.  Most notably, all workloads that utilize secondary
> indexes are affected, and degrade from being guaranteed to commit, to being
> optimistic and potentially requiring retries.  This transformed Calvin into
> an optimistic protocol, and one that's significantly less efficient than
> classic execute-then-commit designs.  Accord is similarly affected, though
> the window of optimism would likely be smaller.  However, it seems like
> most common ways to end up in this situation are already discouraged or
> prevented.  CQL's own restrictions prevent many forms of queries which
> result in unclear read and write sets.  In my highly limited Cassandra
> experience, I've generally seen Secondary Indexes be cautioned against
> already.
>
>
See CEP-7 which independently is proposing a new set of secondary indexes
that we hope to be usable.

Rather than needing to re-execute anything, in my head I had thought that
for Accord to support secondary indexes, the write set is extended to also
cover the secondary index keys read or modified. Essentially this is like
thinking of a secondary index as its own primary key. Mutations that change
indexed columns, would add both their PK to the write set, as well as the
secondary index keys it modified. A read query would then check its
dependencies against whatever indexes (PK, or secondary) it uses to execute
itself, and nothing more.

The above is saying that for a given snapshot/timestamp, the result of a
statement is equally well defined by the secondary index keys used as it is
by the primary keys returned from those secondary index keys.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-12 Thread [email protected]
Thanks Alex! I’ve hugely appreciated our exploration of the optimisation space 
of Accord, and for you to have taken the time to summarise it for everyone is 
particularly decent of you.

FWIW, I think there are likely some easy optimisations for providing snapshot 
isolation without an initial WAN round-trip for many transactions. Replicas 
will be tracking their progress with respect to the global log, and so may 
maintain a high watermark for applied transactions, so that if the latest 
timestamps on all replicas for the keys occur below their high watermark then 
the local replicas are consistent as of any timestamp we may select earlier 
than this (and depending how (or if) MVCC is implemented we may prefer to pick 
the latest timestamp, or an earlier one). If we later involve a shard that is 
not consistent up to this timestamp then we may need a WAN round-trip to ensure 
it is consistent (but this might not need to be global, only to the nearest DC 
that has a sufficiently high watermark).

I could imagine using this mechanism to guarantee serializable reads over the 
LAN by ensuring shards maintain MVCC history that goes far enough back to 
intersect with the lowest high watermark, so that we may always pick a 
consistent timestamp.


From: Alex Miller 
Date: Tuesday, 12 October 2021 at 22:25
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I have, purely out of laziness, been engaging on this topic on ASF Slack as
opposed to dev@[1].  Benedict has been overly generous in answering
questions and considering future optimizations there, but it means that I
inadvertently forked the conversation on this topic.  To bring the
highlights of that conversation back to the dev list:

[1]: https://the-asf.slack.com/archives/CK23JSY2K/p1631611705108600

== Reduced Conflict Tracking

The Accord whitepaper specifies a transaction conflict as:

> We say that two transactions γ and τ conflict (γ ∼ τ) if their execution
is not commutative, so that either their response or the database state
would differ if their execution order were reversed.

Which means that all conflicts the protocol is subsequently tracking are
the full set of read-after-write, write-after-write, and write-after-read
conflicts.  This is a superset of what is required for correctness.

write-after-read conflicts may be ignored when the underlying storage is
multi-version, and I'm told the plan is that Accord would be implemented on
top of multiversioned storage.  A read submitted to a multi-versioned
database is unaffected by writes that occur later, and as such,
write-after-read conflicts don't need to be tracked.

Write-after-write conflicts may be ignored, as Accord assigns a single
write timestamp to all writes, and all writes appear atomically at a single
consistent version.  This means that Accord implements write snapshots[2],
and thus it is impossible to cause a cycle of transaction conflicts with
only writes, so they don't need to be recorded as conflicts.

Thus, Accord only needs to track read-after-write conflicts, which is a
nice reduction to the metadata overhead involved in tracking and
propagating transaction conflicts.

[2]: Maysam Yabandeh and Daniel Gómez Ferro. 2012. A critique of snapshot
isolation. In Proceedings of the 7th ACM European conference on Computer
Systems (EuroSys ’12). Association for Computing Machinery, New York, NY,
USA, 155–168. DOI:https://doi.org/10.1145/2168836.2168853

== Read-Only Transaction Optimizations

As previously mentioned in this list, Calvin-derived designs end up in an
uncomfortable situation where strictly-serializable reads need to be
committed to disk as part of a batch to be assigned a serialization order,
and then wait for all previously scheduled transactions to finish before
performing the reads.  This brings me sadness in two different ways:
strictly serializable reads have high latency, and read-only transactions
involve writes to disk.  Stale read snapshots are offered as a way to avoid
downsides, but require being able to tolerate staleness.

The Accord whitepaper specifies journaling a read-only transaction to disk
as part of PreAccept to record both the existence of the transaction and
its conflicts.  As read-only transactions don't affect the database state,
it's okay to not have durable consensus on if they committed or not.
Read-only transactions have no side effects by definition, and one may rely
on clients to retry if the read-only transaction failed, thus PreAccept
doesn't need to durably record the existence of Read-Only transactions.
Nor does it need to track them for dependencies/conflict reasons, as such
information would only be needed to track write-after-read conflicts, which
we may omit as discussed above.

Additionally, as read-only transactions will always be aborted during
recovery, they may treat a majority quorum as a fastpath quorum, and never
need to proceed into a second round in o

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-12 Thread Alex Miller
tations on what higher-level workloads could
be supported in the future.

So I'm +1 the work, as it seems to be a general purpose and interesting
transaction protocol, but I'm also just here because I thought Benedict was
nice enough that I could trick him into discussing transaction processing
with me. ;)

On Mon, Oct 11, 2021 at 9:08 AM Aleksey Yeschenko 
wrote:

> Lacking the most basic support for multi-partition transactions is a
> serious handicap. The CEP offers a concrete solution.
>
> It’s possible to solve multi-partition transactions in a myriad of other
> ways, I’m sure, but CEP-15 is what’s on offer for Cassandra at the moment,
> and I’m not seeing any alternative CEPs with folks lined up to implement
> them.
>
> The CEP is a clear and meaningful improvement over status quo. The
> engineers behind it are committed to doing the implementation work and can
> be trusted to stick around for maintenance. It’s been a month now, please,
> let’s get this going.
>
> > On 11 Oct 2021, at 13:43, [email protected] wrote:
> >
> > For those who missed it, my talk discussing this CEP at ApacheCon is now
> available to view:  https://www.youtube.com/watch?v=YAE7E-QEAvk
> >
> >
> >
> > From: Oleksandr Petrov 
> > Date: Monday, 11 October 2021 at 10:11
> > To: dev 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> I support this proposal. From what I can understand, this proposal
> moves
> > us towards having the building blocks we need to correctly deliver some
> of
> > the most often requested features in Cassandra.
> >
> > Same here. I also support this proposal and believe it opens up many new
> > opportunities (while not limiting us / not narrowing our future options),
> > can help us implement features we've all wanted to have implemented for
> > years, and make significant improvements in the subsystems that were a
> > source of issues for a long time.
> >
> > I think it's also good to start with CAS batches: it's a great way to
> make
> > the feature available and work incrementally. After this lands, people
> will
> > be able to use Accord/MPT in different subsystems and get busy
> > implementing all sorts of other features and improvements on top of it.
> >
> >
> >
> >
> > On Sat, Oct 9, 2021 at 4:18 PM Joseph Lynch 
> wrote:
> >
> >>> With the proposal hitting the one-month mark, the contributors are
> >> interested in gauging the developer community's response to the
> proposal.
> >>
> >> I support this proposal. From what I can understand, this proposal
> >> moves us towards having the building blocks we need to correctly
> >> deliver some of the most often requested features in Cassandra. For
> >> example it seems to unlock: batches that actually work, registers that
> >> offer fast compare and swap, global secondary indices that can be
> >> correctly maintained, and more. Therefore, given the benefit to the
> >> community, I support working towards that foundation that will allow
> >> us to build solutions in Cassandra that pay consensus closer to
> >> mutation instead of lazily at read/repair time.
> >>
> >> I think the feedback in this thread around interface (what statements
> >> will this facilitate and how will the library integrate with Cassandra
> >> itself), performance (how fast will these transactions be, will we
> >> offer bounded stale reads, etc ...), and implementation (how does this
> >> compare/contrast with other consensus approaches) has been
> >> informative, but at this point I think it makes sense to start trying
> >> to make incremental progress towards a functional integration to
> >> discover any remaining areas for improvement.
> >>
> >> Cheers and thank you!
> >> -Joey
> >>
> >>
> >>
> >> On Thu, Oct 7, 2021 at 10:51 AM C. Scott Andreas 
> >> wrote:
> >>>
> >>> Hi Jonathan,
> >>>
> >>> Following up on my message yesterday as it looks like our replies may
> >> have crossed en route.
> >>>
> >>> Thanks for bumping your message from earlier in our discussion. I
> >> believe we have addressed most of these questions on the thread, in
> >> addition to offering a presentation on this and related work at
> ApacheCon,
> >> a discussion hosted following that presentation at ApacheCon, and in ASF
> >> Slack. Contributors have further offered an opportuntity to discuss
> >> specific questions via videoc

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-11 Thread Aleksey Yeschenko
Lacking the most basic support for multi-partition transactions is a serious 
handicap. The CEP offers a concrete solution.

It’s possible to solve multi-partition transactions in a myriad of other ways, 
I’m sure, but CEP-15 is what’s on offer for Cassandra at the moment, and I’m 
not seeing any alternative CEPs with folks lined up to implement them.

The CEP is a clear and meaningful improvement over status quo. The engineers 
behind it are committed to doing the implementation work and can be trusted to 
stick around for maintenance. It’s been a month now, please, let’s get this 
going.

> On 11 Oct 2021, at 13:43, [email protected] wrote:
> 
> For those who missed it, my talk discussing this CEP at ApacheCon is now 
> available to view:  https://www.youtube.com/watch?v=YAE7E-QEAvk
> 
> 
> 
> From: Oleksandr Petrov 
> Date: Monday, 11 October 2021 at 10:11
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> I support this proposal. From what I can understand, this proposal  moves
> us towards having the building blocks we need to correctly deliver some of
> the most often requested features in Cassandra.
> 
> Same here. I also support this proposal and believe it opens up many new
> opportunities (while not limiting us / not narrowing our future options),
> can help us implement features we've all wanted to have implemented for
> years, and make significant improvements in the subsystems that were a
> source of issues for a long time.
> 
> I think it's also good to start with CAS batches: it's a great way to make
> the feature available and work incrementally. After this lands, people will
> be able to use Accord/MPT in different subsystems and get busy
> implementing all sorts of other features and improvements on top of it.
> 
> 
> 
> 
> On Sat, Oct 9, 2021 at 4:18 PM Joseph Lynch  wrote:
> 
>>> With the proposal hitting the one-month mark, the contributors are
>> interested in gauging the developer community's response to the proposal.
>> 
>> I support this proposal. From what I can understand, this proposal
>> moves us towards having the building blocks we need to correctly
>> deliver some of the most often requested features in Cassandra. For
>> example it seems to unlock: batches that actually work, registers that
>> offer fast compare and swap, global secondary indices that can be
>> correctly maintained, and more. Therefore, given the benefit to the
>> community, I support working towards that foundation that will allow
>> us to build solutions in Cassandra that pay consensus closer to
>> mutation instead of lazily at read/repair time.
>> 
>> I think the feedback in this thread around interface (what statements
>> will this facilitate and how will the library integrate with Cassandra
>> itself), performance (how fast will these transactions be, will we
>> offer bounded stale reads, etc ...), and implementation (how does this
>> compare/contrast with other consensus approaches) has been
>> informative, but at this point I think it makes sense to start trying
>> to make incremental progress towards a functional integration to
>> discover any remaining areas for improvement.
>> 
>> Cheers and thank you!
>> -Joey
>> 
>> 
>> 
>> On Thu, Oct 7, 2021 at 10:51 AM C. Scott Andreas 
>> wrote:
>>> 
>>> Hi Jonathan,
>>> 
>>> Following up on my message yesterday as it looks like our replies may
>> have crossed en route.
>>> 
>>> Thanks for bumping your message from earlier in our discussion. I
>> believe we have addressed most of these questions on the thread, in
>> addition to offering a presentation on this and related work at ApacheCon,
>> a discussion hosted following that presentation at ApacheCon, and in ASF
>> Slack. Contributors have further offered an opportuntity to discuss
>> specific questions via videoconference if it helps to speak live. I'd be
>> happy to do so as well.
>>> 
>>> Since your original message, discussion has covered a lot of ground on
>> the related databases you've mentioned:
>>> – Henrik has shared expertise related to MongoDB and its implementation.
>>> – You've shared an overview of Calvin.
>>> – Alex Miller has helped us review the work relative to other Paxos
>> algorithms and identified a few great enhancements to incorporate.
>>> – The paper discusses related approaches in FoundationDB, CockroachDB,
>> and Yugabyte.
>>> – Subsequent discussion has contrasted the implementation to DynamoDB,
>> Google Cloud BigTable, and Google Cloud Spanner (noting specifically t

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-11 Thread [email protected]
For those who missed it, my talk discussing this CEP at ApacheCon is now 
available to view:  https://www.youtube.com/watch?v=YAE7E-QEAvk



From: Oleksandr Petrov 
Date: Monday, 11 October 2021 at 10:11
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I support this proposal. From what I can understand, this proposal  moves
us towards having the building blocks we need to correctly deliver some of
the most often requested features in Cassandra.

Same here. I also support this proposal and believe it opens up many new
opportunities (while not limiting us / not narrowing our future options),
can help us implement features we've all wanted to have implemented for
years, and make significant improvements in the subsystems that were a
source of issues for a long time.

I think it's also good to start with CAS batches: it's a great way to make
the feature available and work incrementally. After this lands, people will
be able to use Accord/MPT in different subsystems and get busy
implementing all sorts of other features and improvements on top of it.




On Sat, Oct 9, 2021 at 4:18 PM Joseph Lynch  wrote:

> > With the proposal hitting the one-month mark, the contributors are
> interested in gauging the developer community's response to the proposal.
>
> I support this proposal. From what I can understand, this proposal
> moves us towards having the building blocks we need to correctly
> deliver some of the most often requested features in Cassandra. For
> example it seems to unlock: batches that actually work, registers that
> offer fast compare and swap, global secondary indices that can be
> correctly maintained, and more. Therefore, given the benefit to the
> community, I support working towards that foundation that will allow
> us to build solutions in Cassandra that pay consensus closer to
> mutation instead of lazily at read/repair time.
>
> I think the feedback in this thread around interface (what statements
> will this facilitate and how will the library integrate with Cassandra
> itself), performance (how fast will these transactions be, will we
> offer bounded stale reads, etc ...), and implementation (how does this
> compare/contrast with other consensus approaches) has been
> informative, but at this point I think it makes sense to start trying
> to make incremental progress towards a functional integration to
> discover any remaining areas for improvement.
>
> Cheers and thank you!
> -Joey
>
>
>
> On Thu, Oct 7, 2021 at 10:51 AM C. Scott Andreas 
> wrote:
> >
> > Hi Jonathan,
> >
> > Following up on my message yesterday as it looks like our replies may
> have crossed en route.
> >
> > Thanks for bumping your message from earlier in our discussion. I
> believe we have addressed most of these questions on the thread, in
> addition to offering a presentation on this and related work at ApacheCon,
> a discussion hosted following that presentation at ApacheCon, and in ASF
> Slack. Contributors have further offered an opportuntity to discuss
> specific questions via videoconference if it helps to speak live. I'd be
> happy to do so as well.
> >
> > Since your original message, discussion has covered a lot of ground on
> the related databases you've mentioned:
> > – Henrik has shared expertise related to MongoDB and its implementation.
> > – You've shared an overview of Calvin.
> > – Alex Miller has helped us review the work relative to other Paxos
> algorithms and identified a few great enhancements to incorporate.
> > – The paper discusses related approaches in FoundationDB, CockroachDB,
> and Yugabyte.
> > – Subsequent discussion has contrasted the implementation to DynamoDB,
> Google Cloud BigTable, and Google Cloud Spanner (noting specifically that
> the protocol achieves Spanner's 1x round-trip without requiring specialized
> hardware).
> >
> > In my reply yesterday, I've attempted to crystallize what becomes
> possible via CQL: one-shot multi-partition transactions in the first
> implementation and a 4x latency reduction on writes / 2x latency reduction
> on reads relative to today; along with the ability to build upon this work
> to enable interactive transactions in the future.
> >
> > I believe we've exercised the questions you've raised and am grateful
> for the ground we've covered. If you have further questions that are
> difficult to exercise via email, please let me know if you'd like to
> arrange a call (open-invite); we'd be happy to discuss live as well.
> >
> > With the proposal hitting the one-month mark, the contributors are
> interested in gauging the developer community's response to the proposal.
> We war

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-11 Thread Oleksandr Petrov
nt in Calvin
> since
> > the coordination is handled up front in the sequencing step. Glass half
> > empty: even single-row reads and writes have to pay the full coordination
> > cost. Fauna has optimized this away for reads but I am not aware of a
> > description of how they changed the design to allow this.
> >
> > Functionality and limitations: since the entire transaction must be known
> > in advance to allow coordination-less execution at the replicas, Calvin
> > cannot support interactive transactions at all. FaunaDB mitigates this by
> > allowing server-side logic to be included, but a Calvin approach will
> never
> > be able to offer SQL compatibility.
> >
> > Guarantees: Calvin transactions are strictly serializable. There is no
> > additional complexity or performance hit to generalizing to multiple
> > regions, apart from the speed of light. And since Calvin is already
> paying
> > a batching latency penalty, this is less painful than for other systems.
> >
> > Application to Cassandra: B-. Distributed transactions are handled by the
> > sequencing and scheduling layers, which are leaderless, and Calvin’s
> > requirements for the storage layer are easily met by C*. But Calvin also
> > requires a global consensus protocol and LWT is almost certainly not
> > sufficiently performant, so this would require ZK or etcd (reasonable
> for a
> > library approach but not for replacing LWT in C* itself), or an
> > implementation of Accord. I don’t believe Calvin would require additional
> > table-level metadata in Cassandra.
> >
> > On Wed, Oct 6, 2021 at 9:53 AM [email protected] 
> > wrote:
> >
> > The problem with dropping a patch on Jira is that there is no opportunity
> > to point out problems, either with the fundamental approach or with the
> > specific implementation. So please point out some problems I can engage
> > with!
> >
> >
> > From: Jonathan Ellis 
> > Date: Wednesday, 6 October 2021 at 15:48
> > To: dev 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > On Wed, Oct 6, 2021 at 9:21 AM [email protected] 
> > wrote:
> >
> > > The goals of the CEP are stated clearly, and these were the goals we
> had
> > > going into the (multi-month) research project we undertook before
> > proposing
> > > this CEP. These goals are necessarily value judgements, so we cannot
> > expect
> > > that everyone will agree that they are optimal.
> > >
> >
> > Right, so I'm saying that this is exactly the most important thing to get
> > consensus on, and creating a CEP for a protocol to achieve goals that you
> > have not discussed with the community is the CEP equivalent of dropping a
> > patch on Jira without discussing its goals either.
> >
> > That's why our conversations haven't gone anywhere, because I keep saying
> > "we need discuss the goals and tradeoffs", and I'll give an example of
> what
> > I mean, and you keep addressing the examples (sometimes very shallowly,
> "it
> > would be possible to X" or "Y could be done as an optimization") while
> > ignoring the request to open a discussion around the big picture.
> >
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >
> >
> >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
alex p


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-09 Thread Joseph Lynch
 to achieve those goals (and that it will impose on future work around
> transactions) are the right ones for Cassandra long term.
>
> At this point I'm done repeating myself. For the convenience of anyone
> following this thread intermittently, I'll quote my first reply on this
> thread to illustrate the kind of discussion I'd like to have.
>
> -
>
> The whitepaper here is a good description of the consensus algorithm itself
> as well as its robustness and stability characteristics, and its comparison
> with other state-of-the-art consensus algorithms is very useful. In the
> context of Cassandra, where a consensus algorithm is only part of what will
> be implemented, I'd like to see a more complete evaluation of the
> transactional side of things as well, including performance characteristics
> as well as the types of transactions that can be supported and at least a
> general idea of what it would look like applied to Cassandra. This will
> allow the PMC to make a more informed decision about what tradeoffs are
> best for the entire long-term project of first supplementing and ultimately
> replacing LWT.
>
> (Allowing users to mix LWT and AP Cassandra operations against the same
> rows was probably a mistake, so in contrast with LWT we’re not looking for
> something fast enough for occasional use but rather something within a
> reasonable factor of AP operations, appropriate to being the only way to
> interact with tables declared as such.)
>
> Besides Accord, this should cover
>
> - Calvin and FaunaDB
> - A Spanner derivative (no opinion on whether that should be Cockroach or
> Yugabyte, I don’t think it’s necessary to cover both)
> - A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
> there is more public information about MongoDB)
> - RAMP
>
> Here’s an example of what I mean:
>
> =Calvin=
>
> Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
> transactions, then replicas execute the transactions independently with no
> further coordination. No SPOF. Transactions are batched by each sequencer
> to keep this from becoming a bottleneck.
>
> Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
> New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
> with 7GB ram and 8 virtual cores). Note that TPC-C New Order is composed
> of four reads and four writes, so this is effectively 2M reads and 2M
> writes as we normally measure them in C*.
>
> Calvin supports mixed read/write transactions, but because the transaction
> execution logic requires knowing all partition keys in advance to ensure
> that all replicas can reproduce the same results with no coordination,
> reads against non-PK predicates must be done ahead of time (transparently,
> by the server) to determine the set of keys, and this must be retried if
> the set of rows affected is updated before the actual transaction executes.
>
> Batching and global consensus adds latency -- 100ms in the Calvin paper and
> apparently about 50ms in FaunaDB. Glass half full: all transactions
> (including multi-partition updates) are equally performant in Calvin since
> the coordination is handled up front in the sequencing step. Glass half
> empty: even single-row reads and writes have to pay the full coordination
> cost. Fauna has optimized this away for reads but I am not aware of a
> description of how they changed the design to allow this.
>
> Functionality and limitations: since the entire transaction must be known
> in advance to allow coordination-less execution at the replicas, Calvin
> cannot support interactive transactions at all. FaunaDB mitigates this by
> allowing server-side logic to be included, but a Calvin approach will never
> be able to offer SQL compatibility.
>
> Guarantees: Calvin transactions are strictly serializable. There is no
> additional complexity or performance hit to generalizing to multiple
> regions, apart from the speed of light. And since Calvin is already paying
> a batching latency penalty, this is less painful than for other systems.
>
> Application to Cassandra: B-. Distributed transactions are handled by the
> sequencing and scheduling layers, which are leaderless, and Calvin’s
> requirements for the storage layer are easily met by C*. But Calvin also
> requires a global consensus protocol and LWT is almost certainly not
> sufficiently performant, so this would require ZK or etcd (reasonable for a
> library approach but not for replacing LWT in C* itself), or an
> implementation of Accord. I don’t believe Calvin would require additional
> table-level metadata in Cassandra.
>
> On Wed, Oct 6, 2021 at 9:53 AM [email protected] 
> wrote:
>
> The problem with dropping

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-08 Thread Jonathan Ellis
 decision about what tradeoffs are
> best for the entire long-term project of first supplementing and ultimately
> replacing LWT.
>
> (Allowing users to mix LWT and AP Cassandra operations against the same
> rows was probably a mistake, so in contrast with LWT we’re not looking for
> something fast enough for occasional use but rather something within a
> reasonable factor of AP operations, appropriate to being the only way to
> interact with tables declared as such.)
>
> Besides Accord, this should cover
>
> - Calvin and FaunaDB
> - A Spanner derivative (no opinion on whether that should be Cockroach or
> Yugabyte, I don’t think it’s necessary to cover both)
> - A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
> there is more public information about MongoDB)
> - RAMP
>
> Here’s an example of what I mean:
>
> =Calvin=
>
> Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
> transactions, then replicas execute the transactions independently with no
> further coordination. No SPOF. Transactions are batched by each sequencer
> to keep this from becoming a bottleneck.
>
> Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
> New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
> with 7GB ram and 8 virtual cores). Note that TPC-C New Order is composed
> of four reads and four writes, so this is effectively 2M reads and 2M
> writes as we normally measure them in C*.
>
> Calvin supports mixed read/write transactions, but because the transaction
> execution logic requires knowing all partition keys in advance to ensure
> that all replicas can reproduce the same results with no coordination,
> reads against non-PK predicates must be done ahead of time (transparently,
> by the server) to determine the set of keys, and this must be retried if
> the set of rows affected is updated before the actual transaction executes.
>
> Batching and global consensus adds latency -- 100ms in the Calvin paper and
> apparently about 50ms in FaunaDB. Glass half full: all transactions
> (including multi-partition updates) are equally performant in Calvin since
> the coordination is handled up front in the sequencing step. Glass half
> empty: even single-row reads and writes have to pay the full coordination
> cost. Fauna has optimized this away for reads but I am not aware of a
> description of how they changed the design to allow this.
>
> Functionality and limitations: since the entire transaction must be known
> in advance to allow coordination-less execution at the replicas, Calvin
> cannot support interactive transactions at all. FaunaDB mitigates this by
> allowing server-side logic to be included, but a Calvin approach will never
> be able to offer SQL compatibility.
>
> Guarantees: Calvin transactions are strictly serializable. There is no
> additional complexity or performance hit to generalizing to multiple
> regions, apart from the speed of light. And since Calvin is already paying
> a batching latency penalty, this is less painful than for other systems.
>
> Application to Cassandra: B-. Distributed transactions are handled by the
> sequencing and scheduling layers, which are leaderless, and Calvin’s
> requirements for the storage layer are easily met by C*. But Calvin also
> requires a global consensus protocol and LWT is almost certainly not
> sufficiently performant, so this would require ZK or etcd (reasonable for a
> library approach but not for replacing LWT in C* itself), or an
> implementation of Accord. I don’t believe Calvin would require additional
> table-level metadata in Cassandra.
>
> On Wed, Oct 6, 2021 at 9:53 AM [email protected] 
> wrote:
>
> The problem with dropping a patch on Jira is that there is no opportunity
> to point out problems, either with the fundamental approach or with the
> specific implementation. So please point out some problems I can engage
> with!
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 6 October 2021 at 15:48
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Oct 6, 2021 at 9:21 AM [email protected] 
> wrote:
>
> > The goals of the CEP are stated clearly, and these were the goals we had
> > going into the (multi-month) research project we undertook before
> proposing
> > this CEP. These goals are necessarily value judgements, so we cannot
> expect
> > that everyone will agree that they are optimal.
> >
>
> Right, so I'm saying that this is exactly the most important thing to get
> consensus on, and creating a CEP for a protocol to achieve goals that you
> have not discussed with the community is the CEP equivalent of dropping a
> patch on Jira without discussing its goals either.
>
> That's why our conversations haven't gone anywhere, because I keep saying
> "we need discuss the goals and tradeoffs", and I'll give an example of what
> I mean, and you keep addressing the examples (sometimes very shallowly, "it
> would be possible to X" or "Y could be done as an optimization") while
> ignoring the request to open a discussion around the big picture.
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-07 Thread C. Scott Andreas

Hi Jonathan,Following up on my message yesterday as it looks like our replies may 
have crossed en route.Thanks for bumping your message from earlier in our discussion. 
I believe we have addressed most of these questions on the thread, in addition to 
offering a presentation on this and related work at ApacheCon, a discussion hosted 
following that presentation at ApacheCon, and in ASF Slack. Contributors have further 
offered an opportuntity to discuss specific questions via videoconference if it helps 
to speak live. I'd be happy to do so as well.Since your original message, discussion 
has covered a lot of ground on the related databases you've mentioned:– Henrik has 
shared expertise related to MongoDB and its implementation.– You've shared an 
overview of Calvin.– Alex Miller has helped us review the work relative to other 
Paxos algorithms and identified a few great enhancements to incorporate.– The paper 
discusses related approaches in FoundationDB, CockroachDB, and Yugabyte.– Subsequent 
discussion has contrasted the implementation to DynamoDB, Google Cloud BigTable, and 
Google Cloud Spanner (noting specifically that the protocol achieves Spanner's 1x 
round-trip without requiring specialized hardware).In my reply yesterday, I've 
attempted to crystallize what becomes possible via CQL: one-shot multi-partition 
transactions in the first implementation and a 4x latency reduction on writes / 2x 
latency reduction on reads relative to today; along with the ability to build upon 
this work to enable interactive transactions in the future.I believe we've exercised 
the questions you've raised and am grateful for the ground we've covered. If you have 
further questions that are difficult to exercise via email, please let me know if 
you'd like to arrange a call (open-invite); we'd be happy to discuss live as 
well.With the proposal hitting the one-month mark, the contributors are interested in 
gauging the developer community's response to the proposal. We warrant our ability to 
focus durably on the project; execute this development on ASF JIRA in collaboration 
with other contributors; engage with members of the developer and user community on 
feedback, enhancements, and bugs; and intend deliver it to completion at a standard 
of readiness suitable for production transactional systems of record.Thanks,– ScottOn 
Oct 6, 2021, at 8:25 AM, C. Scott Andreas  wrote:Hi 
folks,Thanks for discussion on this proposal, and also to Benedict who’s been 
fielding questions on the list!I’d like to restate the goals and problem statement 
captured by this proposal and frame context.Today, lightweight transactions limit 
users to transacting over a single partition. This unit of atomicity has a very low 
upper limit in terms of the amount of data that can be CAS’d over; and doing so leads 
many to design contorted data models to cram different types of data into one 
partition for the purposes of being able to CAS over it. We propose that Cassandra 
can and should be extended to remove this limit, enabling users to issue one-shot 
transactions that CAS over multiple keys – including CAS batches, which may modify 
multiple keys.To enable this, the CEP authors have designed a novel, leaderless 
paxos-based protocol unique to Cassandra, offered a proof of its correctness, a 
whitepaper outlining it in detail, along with a prototype implementation to incubate 
development, and integrated it with Maelstrom from jepsen.io to validate 
linearizability as more specific test infrastructure is developed. This rigor is 
remarkable, and I’m thrilled to see such a degree of investment in the area.Even 
users who do not require the capability to transact across partition boundaries will 
benefit. The protocol reduces message/WAN round-trips by 4x on writes (4 → 1) and 2x 
on reads (2 → 1) in the common case against today’s baseline. These latency 
improvements coupled with the enhanced flexibility of what can be transacted over in 
Cassandra enable new classes of applications to use the database.In particular, 1xRTT 
read/write transactions across partitions enable Cassandra to be thought of not just 
as a strongly consistent database, but even a transactional database - a mode many 
may even prefer to use by default. Given this capability, Apache Cassandra has an 
opportunity to become one of – or perhaps the only – database in the industry that 
can store multiple petabytes of data in a single database; replicate it across many 
regions; and allow users to transact over any subset of it. These are capabilities 
that can be met by no other system I’m aware of on the market. Dynamo’s transactions 
are single-DC. Google Cloud BigTable does not support transactions. Spanner, Aurora, 
CloudSQL, and RDS have far lower scalability limits or require specialized hardware, 
etc.This is an incredible opportunity for Apache Cassandra - to surpass the 
scalability and transactional capability of some of the most advanced systems in our 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread C. Scott Andreas

Hi folks,Thanks for discussion on this proposal, and also to Benedict who’s 
been fielding questions on the list!I’d like to restate the goals and problem 
statement captured by this proposal and frame context.Today, lightweight 
transactions limit users to transacting over a single partition. This unit of 
atomicity has a very low upper limit in terms of the amount of data that can be 
CAS’d over; and doing so leads many to design contorted data models to cram 
different types of data into one partition for the purposes of being able to 
CAS over it. We propose that Cassandra can and should be extended to remove 
this limit, enabling users to issue one-shot transactions that CAS over 
multiple keys – including CAS batches, which may modify multiple keys.To enable 
this, the CEP authors have designed a novel, leaderless paxos-based protocol 
unique to Cassandra, offered a proof of its correctness, a whitepaper outlining 
it in detail, along with a prototype implementation to incubate development, 
and integrated it with Maelstrom from jepsen.io to validate linearizability as 
more specific test infrastructure is developed. This rigor is remarkable, and 
I’m thrilled to see such a degree of investment in the area.Even users who do 
not require the capability to transact across partition boundaries will 
benefit. The protocol reduces message/WAN round-trips by 4x on writes (4 → 1) 
and 2x on reads (2 → 1) in the common case against today’s baseline. These 
latency improvements coupled with the enhanced flexibility of what can be 
transacted over in Cassandra enable new classes of applications to use the 
database.In particular, 1xRTT read/write transactions across partitions enable 
Cassandra to be thought of not just as a strongly consistent database, but even 
a transactional database - a mode many may even prefer to use by default. Given 
this capability, Apache Cassandra has an opportunity to become one of – or 
perhaps the only – database in the industry that can store multiple petabytes 
of data in a single database; replicate it across many regions; and allow users 
to transact over any subset of it. These are capabilities that can be met by no 
other system I’m aware of on the market. Dynamo’s transactions are single-DC. 
Google Cloud BigTable does not support transactions. Spanner, Aurora, CloudSQL, 
and RDS have far lower scalability limits or require specialized hardware, 
etc.This is an incredible opportunity for Apache Cassandra - to surpass the 
scalability and transactional capability of some of the most advanced systems 
in our industry - and to do so in open source, where anyone can download and 
deploy the software to achieve this without cost; and for students and 
researchers to learn from and build upon as well (a team from UT-Austin has 
already reached out to this effect).As Benedict and Blake noted, the scope of 
what’s captured in this proposal is also not terminal. While the first 
implementation may extend today’s CAS semantics to multiple partitions with 
lower latency, the foundation is suitable to build interactive transactions as 
well — which would be remarkable and is something that I hadn’t considered 
myself at the onset of this project.To that end, the CEP proposes the protocol, 
offers a validated implementation, and the initial capability of extending 
today’s single-partition transactions to multi-partition; while providing the 
flexibility to build upon this work further.A simple example of what becomes 
possible when this work lands and is integrated might be:–––
BEGIN BATCHUPDATE tbl1 SET value1 = newValue1 WHERE partitionKey = k1UPDATE 
tbl2 SET value2 = newValue2 WHERE partitionKey = k2 AND conditionValue = 
someConditionAPPLY BATCH
–––I understand that this query is present in the CEP and my intent isn’t to recommend that folks reread it if they’ve given a careful reading already. 
But I do think it’s important to elaborate upon what becomes possible when this query can be issued.Users of Cassandra who have designed data models that 
cram many types of data into a single partition for the purposes of atomicity no longer need to. They can design their applications with appropriate 
schemas that wouldn’t leave Codd holding his nose. They’re no longer pushed into antipatterns that result in these partitions becoming huge and 
potentially unreadable. Cassandra doesn’t become fully relational in this CEP - but it becomes possible and even easy to design applications that transact 
across tables that mimic a large amount of relational functionality. And for users who are content to transact over a single table, they’ll find those 
transactions become up to 4x faster today due to the protocol’s reduction in round-trips. The library’s loose coupling to Apache Cassandra and ability to 
be incubated out-of-tree also enables other applications to take advantage of the protocol and is a nice step toward bringing modularity to the project. 
There are a lot of good things happ

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread [email protected]
Jonathan,

This work will only determine Cassandra’s future if no other contributors 
choose to take a different route in future. If in future the community decides 
this work is incompatible with its direction, it remains in the community’s 
power to remove the facility, or to make it optional.

OSS is a living thing, and this CEP will shape the future of community only by 
virtue of the work that I and others will do. You are equally capable of 
investing this time and effort.

Today, this is the only CEP of the kind on offer. If another competing proposal 
were to be made, we could either work to reconcile them, or to ensure they may 
co-exist. You cannot, however, expect to impose your _goals_ on the work that I 
and others will undertake. That is not how the community works.

Since we are going around in circles, I propose a simple majority vote to 
establish if the community endorses the stated goals of the CEP.


From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 16:05
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
The problem that I keep pointing out is that you've created this CEP for
Accord without first getting consensus that the goals and the tradeoffs it
makes to achieve those goals (and that it will impose on future work around
transactions) are the right ones for Cassandra long term.

At this point I'm done repeating myself.  For the convenience of anyone
following this thread intermittently, I'll quote my first reply on this
thread to illustrate the kind of discussion I'd like to have.

-

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spanner derivative (no opinion on whether that should be Cockroach or
Yugabyte, I don’t think it’s necessary to cover both)
- A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
there is more public information about MongoDB)
- RAMP

Here’s an example of what I mean:

=Calvin=

Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
transactions, then replicas execute the transactions independently with no
further coordination.  No SPOF.  Transactions are batched by each sequencer
to keep this from becoming a bottleneck.

Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
with 7GB ram and 8 virtual cores).  Note that TPC-C New Order is composed
of four reads and four writes, so this is effectively 2M reads and 2M
writes as we normally measure them in C*.

Calvin supports mixed read/write transactions, but because the transaction
execution logic requires knowing all partition keys in advance to ensure
that all replicas can reproduce the same results with no coordination,
reads against non-PK predicates must be done ahead of time (transparently,
by the server) to determine the set of keys, and this must be retried if
the set of rows affected is updated before the actual transaction executes.

Batching and global consensus adds latency -- 100ms in the Calvin paper and
apparently about 50ms in FaunaDB.  Glass half full: all transactions
(including multi-partition updates) are equally performant in Calvin since
the coordination is handled up front in the sequencing step.  Glass half
empty: even single-row reads and writes have to pay the full coordination
cost.  Fauna has optimized this away for reads but I am not aware of a
description of how they changed the design to allow this.

Functionality and limitations: since the entire transaction must be known
in advance to allow coordination-less execution at the replicas, Calvin
cannot support interactive transactions at all.  FaunaDB mitigates this by
allowing server-side logic to be included, but a Calvin approach will never
be able to offer SQL compatibility.

Guarantees: Calvin transactions are strictly serializable.  There is no
additional complex

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread Jonathan Ellis
an Ellis 
> Date: Wednesday, 6 October 2021 at 15:48
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Oct 6, 2021 at 9:21 AM [email protected] 
> wrote:
>
> > The goals of the CEP are stated clearly, and these were the goals we had
> > going into the (multi-month) research project we undertook before
> proposing
> > this CEP. These goals are necessarily value judgements, so we cannot
> expect
> > that everyone will agree that they are optimal.
> >
>
> Right, so I'm saying that this is exactly the most important thing to get
> consensus on, and creating a CEP for a protocol to achieve goals that you
> have not discussed with the community is the CEP equivalent of dropping a
> patch on Jira without discussing its goals either.
>
> That's why our conversations haven't gone anywhere, because I keep saying
> "we need discuss the goals and tradeoffs", and I'll give an example of what
> I mean, and you keep addressing the examples (sometimes very shallowly, "it
> would be possible to X" or "Y could be done as an optimization") while
> ignoring the request to open a discussion around the big picture.
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread [email protected]
The problem with dropping a patch on Jira is that there is no opportunity to 
point out problems, either with the fundamental approach or with the specific 
implementation. So please point out some problems I can engage with!


From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 15:48
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Oct 6, 2021 at 9:21 AM [email protected] 
wrote:

> The goals of the CEP are stated clearly, and these were the goals we had
> going into the (multi-month) research project we undertook before proposing
> this CEP. These goals are necessarily value judgements, so we cannot expect
> that everyone will agree that they are optimal.
>

Right, so I'm saying that this is exactly the most important thing to get
consensus on, and creating a CEP for a protocol to achieve goals that you
have not discussed with the community is the CEP equivalent of dropping a
patch on Jira without discussing its goals either.

That's why our conversations haven't gone anywhere, because I keep saying
"we need discuss the goals and tradeoffs", and I'll give an example of what
I mean, and you keep addressing the examples (sometimes very shallowly, "it
would be possible to X" or "Y could be done as an optimization") while
ignoring the request to open a discussion around the big picture.


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread Jonathan Ellis
On Wed, Oct 6, 2021 at 9:21 AM [email protected] 
wrote:

> The goals of the CEP are stated clearly, and these were the goals we had
> going into the (multi-month) research project we undertook before proposing
> this CEP. These goals are necessarily value judgements, so we cannot expect
> that everyone will agree that they are optimal.
>

Right, so I'm saying that this is exactly the most important thing to get
consensus on, and creating a CEP for a protocol to achieve goals that you
have not discussed with the community is the CEP equivalent of dropping a
patch on Jira without discussing its goals either.

That's why our conversations haven't gone anywhere, because I keep saying
"we need discuss the goals and tradeoffs", and I'll give an example of what
I mean, and you keep addressing the examples (sometimes very shallowly, "it
would be possible to X" or "Y could be done as an optimization") while
ignoring the request to open a discussion around the big picture.


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread [email protected]
The goals of the CEP are stated clearly, and these were the goals we had going 
into the (multi-month) research project we undertook before proposing this CEP. 
These goals are necessarily value judgements, so we cannot expect that everyone 
will agree that they are optimal.

So far you have not engaged with these goals to state any specific 
disagreement. I have engaged with all of the trade-offs you imagined, and every 
specific concern you have raised. Despite a month having elapsed and a great 
deal of time spent answering your emails, this is the first confirmation I have 
that you are dissatisfied with my responses to you.

The role of the CEP is to advertise a project, allowing people to register 
their interest in collaborating, and for technical concerns to be stated in 
advance. So far you have expressed no specific technical concerns that I have 
not engaged with, and yet I have received no response to my engagements.

The role of the CEP is *not* to permit members of the community to dictate 
their preferences on the proposers, or to declare that the CEP is inadequate 
because it doesn’t meet their goals, or to demand additional work to explore 
others’ preferred research avenues on the topic.

You have to do some of the work here, Jonathan.

If you have an alternative approach, I continue to ask you to propose it so we 
may compare and contrast in a specific and technical manner.  If you have any 
specific technical concerns I exhort you to raise them, so we my discuss them. 
If you dispute the goals, please make an argument as to why. If our goals are 
irreconcilable, file another CEP.



From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 14:41
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I've repeatedly explained why I'm unhappy: instead of starting with a
discussion of what API and tradeoffs we should make to get that, this CEP
starts with a protocol and asks us to figure out what API we can build with
it.

Of course by API I mean, what kinds of CQL and SQL operations we can
perform, with what kinds of ACID semantics and what kinds of performance,
not "Result perform(Transaction transaction)".  And it's not simply SQL
syntax, either.  I realize that this could sound a little vague, but that's
why I gave an example of the kind of analysis I'm talking about in my first
reply.  Your responses have been to attempt to avoid the discussion
entirely ("the relevant goals are [mine]") or to declare it to be out of
scope.

The CEP process is intended to help get to alignment across the community
of PMC members, committers, and contributors on goals and outcomes before
starting in writing code, not simply to bless a completed design.  That's
why we're going in circles here.

On Wed, Oct 6, 2021 at 2:12 AM [email protected] 
wrote:

> We have discussed the API at length in this thread. The API primarily
> involves the semantics of the transactions, as besides this the API of a
> transaction is simply:
>
> Result perform(Transaction transaction)
>
> As discussed in follow-up to that email, a prototype API is specified
> alongside the prototype protocol. I am unsure what more you want than this,
> or the above, or the prior semantic discussions.
>
> It seems clear that you’re unhappy with the proposal, but it remains
> ambiguous as to why. Your emails are terse, infrequent and unclear. My
> responses receive no follow up from you, even to clarify if I have answered
> your query. Sometime later I seem to be able to expect a new unrelated
> problem that you are unhappy about. You have not yet responded to even one
> of my repeated offers to hop on a call to hash out any of your concerns,
> even if only to decline.
>
> This does not feel like constructive and respectful engagement to me, and
> I am losing interest.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 6 October 2021 at 00:02
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I honestly can't understand the perspective that on the one hand, you're
> asking for approval of a specific protocol as part of the CEP, but on the
> other, you think discussion of the APIs this will enable is not warranted.
> Surely we need agreement on what APIs we're trying to build, before we
> discuss the protocols and architectures with which to build them.
>
> On Fri, Oct 1, 2021 at 9:34 AM [email protected] 
> wrote:
>
> > > The current document details thoroughly the protocol but in my view
> > lacks to illustrate what specific API, methods, modules will become
> > available to developers
> >
> > With respect to this, in my view this kind of detail is not warranted
> > within a CEP. Software development is an exploratory process with respect
> > to structure, and these decisions will be made as the CEP progresses. If
> > these need to be specified upfront, then the purpose of a CEP – seeking
> buy
> > in – is invalidated, because the work must be complete before you know
> the
> > answers.
> >


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread Jonathan Ellis
I've repeatedly explained why I'm unhappy: instead of starting with a
discussion of what API and tradeoffs we should make to get that, this CEP
starts with a protocol and asks us to figure out what API we can build with
it.

Of course by API I mean, what kinds of CQL and SQL operations we can
perform, with what kinds of ACID semantics and what kinds of performance,
not "Result perform(Transaction transaction)".  And it's not simply SQL
syntax, either.  I realize that this could sound a little vague, but that's
why I gave an example of the kind of analysis I'm talking about in my first
reply.  Your responses have been to attempt to avoid the discussion
entirely ("the relevant goals are [mine]") or to declare it to be out of
scope.

The CEP process is intended to help get to alignment across the community
of PMC members, committers, and contributors on goals and outcomes before
starting in writing code, not simply to bless a completed design.  That's
why we're going in circles here.

On Wed, Oct 6, 2021 at 2:12 AM [email protected] 
wrote:

> We have discussed the API at length in this thread. The API primarily
> involves the semantics of the transactions, as besides this the API of a
> transaction is simply:
>
> Result perform(Transaction transaction)
>
> As discussed in follow-up to that email, a prototype API is specified
> alongside the prototype protocol. I am unsure what more you want than this,
> or the above, or the prior semantic discussions.
>
> It seems clear that you’re unhappy with the proposal, but it remains
> ambiguous as to why. Your emails are terse, infrequent and unclear. My
> responses receive no follow up from you, even to clarify if I have answered
> your query. Sometime later I seem to be able to expect a new unrelated
> problem that you are unhappy about. You have not yet responded to even one
> of my repeated offers to hop on a call to hash out any of your concerns,
> even if only to decline.
>
> This does not feel like constructive and respectful engagement to me, and
> I am losing interest.
>
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 6 October 2021 at 00:02
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I honestly can't understand the perspective that on the one hand, you're
> asking for approval of a specific protocol as part of the CEP, but on the
> other, you think discussion of the APIs this will enable is not warranted.
> Surely we need agreement on what APIs we're trying to build, before we
> discuss the protocols and architectures with which to build them.
>
> On Fri, Oct 1, 2021 at 9:34 AM [email protected] 
> wrote:
>
> > > The current document details thoroughly the protocol but in my view
> > lacks to illustrate what specific API, methods, modules will become
> > available to developers
> >
> > With respect to this, in my view this kind of detail is not warranted
> > within a CEP. Software development is an exploratory process with respect
> > to structure, and these decisions will be made as the CEP progresses. If
> > these need to be specified upfront, then the purpose of a CEP – seeking
> buy
> > in – is invalidated, because the work must be complete before you know
> the
> > answers.
> >


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-06 Thread [email protected]
We have discussed the API at length in this thread. The API primarily involves 
the semantics of the transactions, as besides this the API of a transaction is 
simply:

Result perform(Transaction transaction)

As discussed in follow-up to that email, a prototype API is specified alongside 
the prototype protocol. I am unsure what more you want than this, or the above, 
or the prior semantic discussions.

It seems clear that you’re unhappy with the proposal, but it remains ambiguous 
as to why. Your emails are terse, infrequent and unclear. My responses receive 
no follow up from you, even to clarify if I have answered your query. Sometime 
later I seem to be able to expect a new unrelated problem that you are unhappy 
about. You have not yet responded to even one of my repeated offers to hop on a 
call to hash out any of your concerns, even if only to decline.

This does not feel like constructive and respectful engagement to me, and I am 
losing interest.



From: Jonathan Ellis 
Date: Wednesday, 6 October 2021 at 00:02
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I honestly can't understand the perspective that on the one hand, you're
asking for approval of a specific protocol as part of the CEP, but on the
other, you think discussion of the APIs this will enable is not warranted.
Surely we need agreement on what APIs we're trying to build, before we
discuss the protocols and architectures with which to build them.

On Fri, Oct 1, 2021 at 9:34 AM [email protected] 
wrote:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 15:31
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, [email protected] <
> [email protected]> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If you want to impose your views on CEP structure on others, please
> file
> > a CEP with the additional restrictions and guidance you want to impose
> and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> > This sounds very kafkaesque. You know I won't file a meta-CEP to change
> the
> > structure of CEP so you're just using this as an excuse to just shut the
> > discu

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-05 Thread Jonathan Ellis
I honestly can't understand the perspective that on the one hand, you're
asking for approval of a specific protocol as part of the CEP, but on the
other, you think discussion of the APIs this will enable is not warranted.
Surely we need agreement on what APIs we're trying to build, before we
discuss the protocols and architectures with which to build them.

On Fri, Oct 1, 2021 at 9:34 AM [email protected] 
wrote:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 15:31
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, [email protected] <
> [email protected]> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If you want to impose your views on CEP structure on others, please
> file
> > a CEP with the additional restrictions and guidance you want to impose
> and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> > This sounds very kafkaesque. You know I won't file a meta-CEP to change
> the
> > structure of CEP so you're just using this as an excuse to just shut the
> > discussion on the lack of clarity on what actual palpable feature will be
> > available once the CEP lands. :-)
> >
> > I'm just providing my humble feedback on how a CEP could be more
> digestible
> > and easier to consume from an external point of view, and this seems like
> > an appropriate and contextualized place to voice this opinion which is
> > perhaps shared by others.
> >
> > Em sex., 1 de out. de 2021 às 10:55, [email protected] <
> > [email protected]> escreveu:
> >
> > > I disagree with you. However, this is the wrong forum to have a meta
> > > discussion about how CEP should be structured.
> > >
> > > If you want to impose your views on CEP structure on others, please
> file
> > a
> > > CEP with the additional restrictions and guidance you want to impose
> and
> > > start a discussion thread. I can then respond in detail to why I
> perceive
> > > this approach to be flawed, in a dedicated context.
> > >
&g

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-04 Thread Paulo Motta
I don’t have any objection to call a vote, I think we had a good time to
discuss and I’m satisfied with the clarifications to my questions.

Thanks Benedict, Blake and Scott for detailing the proposal and answering
questions.

I think everyone is excited and looking forward to this groundbreaking work
that will enable the next generation of features and improvements in
Cassandra! :-)

On Mon, 4 Oct 2021 at 03:03 [email protected]  wrote:

> Hi everyone,
>
> It’s been a month since I brought this proposal forward. I think we’re
> ready for a vote, and I’d like to get a show of hands to see if others
> agree.
>
> I don’t intend for this to curtail any further questions or suggestions.
> I’m grateful for the continued healthy discussion, but from my point of
> view the topics we are now covering are not core to the proposal’s adoption.
>
> If anyone think this proposal is not ready for a vote, I would really
> appreciate it if that sentiment could be accompanied by a brief statement
> of what is wrong with the substance of the proposal, so that we can address
> these issues directly to move things forward.
>
> Thanks!
>
>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-03 Thread [email protected]
Hi everyone,

It’s been a month since I brought this proposal forward. I think we’re ready 
for a vote, and I’d like to get a show of hands to see if others agree.

I don’t intend for this to curtail any further questions or suggestions. I’m 
grateful for the continued healthy discussion, but from my point of view the 
topics we are now covering are not core to the proposal’s adoption.

If anyone think this proposal is not ready for a vote, I would really 
appreciate it if that sentiment could be accompanied by a brief statement of 
what is wrong with the substance of the proposal, so that we can address these 
issues directly to move things forward.

Thanks!



Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 7:20 PM [email protected] 
wrote:

> I haven’t encountered Galera – do you have any technical papers to hand?
>
>
Yes, but it's a whole thesis :-)

https://www.inf.usi.ch/faculty/pedone/Paper/199x/These-2090-Pedone.pdf

I guess parts of that were presented in conference papers.

Pedone's work implements a protocol with Snapshot Isolation. More recent
work from down under describe a similar system providing Serializeable
Snapshot Isolation:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.185&rep=rep1&type=pdf


The best known implementation of Pedone's work would be Galera Cluster,
which hooks the "Galera" replication library into MySQL. It's also included
with MariaDB Cluster and Percona XtraDB Cluster. Oracle later did an
independent implementation (for IPR ownership reasons) which is known as
InnoDB Cluster.

This page in the Galera docs has a great diagram to get you started:
https://galeracluster.com/library/documentation/certification-based-replication.html

For an end user oriented beginner lecture, search conference video
recordings for Seppo Jaakola:
https://www.youtube.com/watch?v=5e3unwy_OVs


Worth calling out that we are in RDBMS land now, and the above is just a
replication solution, there is no sharding anywhere. For the Serializeable
paper, I struggle to even imagine how it could scale to multiple shards.
For SI it's kind of easier as only write conflicts need to be checked.

henrik



-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
> If I'm reading you correctly, then Accord does / could do exactly what I was 
> asking for: two round trips in a single DC cluster, and one roundtrip + 
> SkewMax when network roundtrips are >> SkewMax.

Yes, in fact it’s even better than that. Even in this setup *most* transactions 
will still take only one round-trip, and at worst case (under conflicts) two 
round-trips.

> assuming I got it correct...

As far as I can tell your understanding is correct, yes - though worth noting 
of course that the WAN round-trip on write is asynchronous.

I haven’t encountered Galera – do you have any technical papers to hand?

From: Henrik Ingo 
Date: Friday, 1 October 2021 at 16:24
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Fri, Oct 1, 2021 at 5:30 PM [email protected] 
wrote:

> > Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
> discussions = 7 ms
>
> I think skew max is likely to be much lower than this, even on commodity
> hardware. Bear in mind that unlike Cockroach and Spanner correctness does
> not depend on this value, only performance. So we can pick the real number,
> not some p100 outlier value.
>
> Also bear in mind that this is an optimisation. In clusters where it makes
> no sense we can simply use the raw protocol and accept transactions will
> very infrequently take two round-trips (which is fine, because in this
> scenario round-trips are cheap).
>
>
Oh, this was not at all obvious :-D

If I'm reading you correctly, then Accord does / could do exactly what I
was asking for: two round trips in a single DC cluster, and one roundtrip +
SkewMax when network roundtrips are >> SkewMax.



> > A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node
>
> So, with a leaderless protocol like Accord the ordering decisions are
> never really bottlenecked - no matter how many are in-flight, a new
> transaction will experience no additional latency determining its execution
> order. The only bottleneck will be execution. For this it is absolutely
> possible to funnel everything to a single coordinator, but I don’t know
> that this would in practice achieve much – the important bottleneck would
> be that the coordinators are all within the same
>
> DC, so that the _replicas_ may all respond to them with their data
> dependencies with minimal delay. This is something we discussed in the
> ApacheCon call as it happens. If a significant number of transactions are
> pending, and they are in different DCs, it would be quite straightforward
> to nominate a coordinator within the DC serving the majority of operations
> to serve the remainder, and to forward the results to the original
> coordinators.
>
>
Thanks for explaining. This is really interesting. I now reread section 2.2
of the paper and realize it says exactly this.

So in Accord:

Step 1: One network round trip + SkewMax to establish a global ordering.

Step 2: a) One (local) network round trip for read phase, One (wan) round
trip for writes.
 b) In addition, before either reading or writing, the node
must first commit and apply all previous transactions that are in the
"deps" set of this transaction.

In addition, if we implement interactive transactions, or support for
secondary indexes, or other "complex" transactions, then that work would
happen before Step 1.

Ok, now that I spelled this out... assuming I got it correct... Then this
actually resembles Galera more than Spanner. The wall clock time is not
actually the transaction id, it's just a step in the consensus dialogue
where nodes agree on a global ordering.



> I don’t anticipate this optimisation being a high priority until we have
> user reports of this bottleneck in the wild, however. Since clients for
> many workloads will naturally be geo-partitioned so that related state is
> being updated from the same region, it might simply not be needed – at
> least any time soon.
>
>
For sure. I think we're all just trying to understand the landscape what we
are talking about here, not trying to say everything should be implemented
in v1.


henrik

--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 5:30 PM [email protected] 
wrote:

> > Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
> discussions = 7 ms
>
> I think skew max is likely to be much lower than this, even on commodity
> hardware. Bear in mind that unlike Cockroach and Spanner correctness does
> not depend on this value, only performance. So we can pick the real number,
> not some p100 outlier value.
>
> Also bear in mind that this is an optimisation. In clusters where it makes
> no sense we can simply use the raw protocol and accept transactions will
> very infrequently take two round-trips (which is fine, because in this
> scenario round-trips are cheap).
>
>
Oh, this was not at all obvious :-D

If I'm reading you correctly, then Accord does / could do exactly what I
was asking for: two round trips in a single DC cluster, and one roundtrip +
SkewMax when network roundtrips are >> SkewMax.



> > A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node
>
> So, with a leaderless protocol like Accord the ordering decisions are
> never really bottlenecked - no matter how many are in-flight, a new
> transaction will experience no additional latency determining its execution
> order. The only bottleneck will be execution. For this it is absolutely
> possible to funnel everything to a single coordinator, but I don’t know
> that this would in practice achieve much – the important bottleneck would
> be that the coordinators are all within the same
>
> DC, so that the _replicas_ may all respond to them with their data
> dependencies with minimal delay. This is something we discussed in the
> ApacheCon call as it happens. If a significant number of transactions are
> pending, and they are in different DCs, it would be quite straightforward
> to nominate a coordinator within the DC serving the majority of operations
> to serve the remainder, and to forward the results to the original
> coordinators.
>
>
Thanks for explaining. This is really interesting. I now reread section 2.2
of the paper and realize it says exactly this.

So in Accord:

Step 1: One network round trip + SkewMax to establish a global ordering.

Step 2: a) One (local) network round trip for read phase, One (wan) round
trip for writes.
 b) In addition, before either reading or writing, the node
must first commit and apply all previous transactions that are in the
"deps" set of this transaction.

In addition, if we implement interactive transactions, or support for
secondary indexes, or other "complex" transactions, then that work would
happen before Step 1.

Ok, now that I spelled this out... assuming I got it correct... Then this
actually resembles Galera more than Spanner. The wall clock time is not
actually the transaction id, it's just a step in the consensus dialogue
where nodes agree on a global ordering.



> I don’t anticipate this optimisation being a high priority until we have
> user reports of this bottleneck in the wild, however. Since clients for
> many workloads will naturally be geo-partitioned so that related state is
> being updated from the same region, it might simply not be needed – at
> least any time soon.
>
>
For sure. I think we're all just trying to understand the landscape what we
are talking about here, not trying to say everything should be implemented
in v1.


henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
You can take a look at the Accord library, as linked in the CEP: 
https://github.com/belliottsmith/accord

It will of course be modified extensively over time, but this is the basic 
shape of the API that is envisaged. You can take a look at the Maelstrom 
implementation for how this will be integrated with Cassandra (which of course 
will be much more involved).

There will be a function for describing atomic transactions involving some 
combination of reads and writes, and it will be possible to submit these 
operations and receive an answer back. The relevant point of integration for 
this is accord.local.Node#coordinate.

There will likely be separate APIs for providing the system with topology 
changes, which it will ensure are linearized correctly with respect to ongoing 
transactions.

But when it boils down to it, we are providing a single point of entry for 
one-shot transactions. So the API from the perspective of a developer building 
features on top is pretty simple.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:40
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> With respect to this, in my view this kind of detail is not warranted
within a CEP. Software development is an exploratory process with respect
to structure, and these decisions will be made as the CEP progresses. If
these need to be specified upfront, then the purpose of a CEP – seeking buy
in – is invalidated, because the work must be complete before you know the
answers.

These need not to be set in stone, they're just a rough sketch of what the
end product will look like to make it easier to build a mental model of the
project, specially for those not directly involved with it, as well as to
guide its development for those involved. At least for me it's much easier
to visualize a project top-down (from how it's going to be used to its
particular implementation details) versus the other way around.

Em sex., 1 de out. de 2021 às 11:33, [email protected] <
[email protected]> escreveu:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 15:31
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, [email protected] <
> [email protected]> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If y

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Paulo Motta
> With respect to this, in my view this kind of detail is not warranted
within a CEP. Software development is an exploratory process with respect
to structure, and these decisions will be made as the CEP progresses. If
these need to be specified upfront, then the purpose of a CEP – seeking buy
in – is invalidated, because the work must be complete before you know the
answers.

These need not to be set in stone, they're just a rough sketch of what the
end product will look like to make it easier to build a mental model of the
project, specially for those not directly involved with it, as well as to
guide its development for those involved. At least for me it's much easier
to visualize a project top-down (from how it's going to be used to its
particular implementation details) versus the other way around.

Em sex., 1 de out. de 2021 às 11:33, [email protected] <
[email protected]> escreveu:

> > The current document details thoroughly the protocol but in my view
> lacks to illustrate what specific API, methods, modules will become
> available to developers
>
> With respect to this, in my view this kind of detail is not warranted
> within a CEP. Software development is an exploratory process with respect
> to structure, and these decisions will be made as the CEP progresses. If
> these need to be specified upfront, then the purpose of a CEP – seeking buy
> in – is invalidated, because the work must be complete before you know the
> answers.
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 15:31
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> From the CEP:
>
> Batches (including unconditional batches) on transactional tables will
> receive ACID properties, and grammatically correct conditional batch
> operations that would be rejected for operating over multiple CQL
> partitions will now be supported
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:30
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Can you just answer what palpable feature will be available once this CEP
> lands because this is still not clear to me (and perhaps to others) from
> the current CEP structure. The current document details thoroughly the
> protocol but in my view lacks to illustrate what specific API, methods,
> modules will become available to developers, how it fits into the larger
> picture and interacts with existing modules if at all and perhaps a few
> examples of how it can be used to build features on top.
>
> Em sex., 1 de out. de 2021 às 11:10, [email protected] <
> [email protected]> escreveu:
>
> > I’m not, though it might seem that way. I disagree with your views about
> > how CEP should be structured. Since the CEP process was itself codified
> via
> > the CEP process, if you want to recodify how CEP work, the correct way is
> > via the CEP process itself.
> >
> > The discussion is being drawn in multiple directions away from the CEP
> > itself, and I am trying to keep this particular thread focused on the
> > business at hand, not meta discussions around CEP structure that will no
> > doubt be unproductive given likely irreconcilable views about the topic,
> > nor discussions about other CEP that could have been.
> >
> > If you want to start a separate exploratory discussion thread about CEP
> > structure without filing a CEP feel free to do so.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 15:04
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > > If you want to impose your views on CEP structure on others, please
> file
> > a CEP with the additional restrictions and guidance you want to impose
> and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> > This sounds very kafkaesque. You know I won't file a meta-CEP to change
> the
> > structure of CEP so you're just using this as an excuse to just shut the
> > discussion on the lack of clarity on what actual palpable feature will be
> > available once the CEP lands. :-)
> >
> > I'm just providing my humble feedback on how a CEP could be more
> digestible
> > and easier to consume from an external point of view, and this seems like
> > an appropriate and contextualized place to voice this opinion which is
> > perhaps shared by others.
> >
> > Em sex., 1 de out. de 2021 às 10:55, [email protected] <
> > [email protected]> escreveu:
> >
> > > I disagree with you. However, this is 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
> The current document details thoroughly the protocol but in my view lacks to 
> illustrate what specific API, methods, modules will become available to 
> developers

With respect to this, in my view this kind of detail is not warranted within a 
CEP. Software development is an exploratory process with respect to structure, 
and these decisions will be made as the CEP progresses. If these need to be 
specified upfront, then the purpose of a CEP – seeking buy in – is invalidated, 
because the work must be complete before you know the answers.


From: [email protected] 
Date: Friday, 1 October 2021 at 15:31
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>From the CEP:

Batches (including unconditional batches) on transactional tables will receive 
ACID properties, and grammatically correct conditional batch operations that 
would be rejected for operating over multiple CQL partitions will now be 
supported


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:30
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, [email protected] <
[email protected]> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, [email protected] <
> [email protected]> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jo

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
>From the CEP:

Batches (including unconditional batches) on transactional tables will receive 
ACID properties, and grammatically correct conditional batch operations that 
would be rejected for operating over multiple CQL partitions will now be 
supported


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:30
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, [email protected] <
[email protected]> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, [email protected] <
> [email protected]> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jonathan want to
> > see as an alternative
> >
> > I would personally like to see something along these lines:
> >
> > CEP1: Add ACID-compliant atomic batches
> > - UX changes needed: none, CQL provides the grammar we need.
> > - Distributed transaction protocol needed: Accord (link to white paper if
> > you want specific details about the protcool)
> > - High-level architecture: what new components will be added, how
> existing
> > components will be modified, what new messages will be added, what new
> > configuration knobs will be introduced, what are the milestones of the
> > project, etc.
> >
> > CEP2: Make LWT fast

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
Hi Henrik,

> While I understand they are out of scope, do you happen to have already some 
> idea what it would require to support secondary indexes?

Yes, it is likely that the approach will be the same taken by Calvin-like 
systems where a “reconnaissance” round is taken within the local DC to 
construct a transaction involving the secondary index. This would be the 
reverse if reading from a secondary index, where the primary keys would be 
determined via a reconnaissance round and the transaction updated to include 
them.

If we choose to implement one of the more sophisticated interactive transaction 
proposals then it would of course be possible to implement secondary indexes on 
top of these.

Note that all of this is entirely independent of SAI – since these indexes are 
built per-partition they will be easily transactional within a partition key, 
or probably never transactional if you perform a scatter gather across the 
whole cluster. I’m not sufficiently well versed in SAI to really consider this 
well as yet, and I will update the CEP to note that they are out of scope.

> Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB 
> discussions = 7 ms

I think skew max is likely to be much lower than this, even on commodity 
hardware. Bear in mind that unlike Cockroach and Spanner correctness does not 
depend on this value, only performance. So we can pick the real number, not 
some p100 outlier value.

Also bear in mind that this is an optimisation. In clusters where it makes no 
sense we can simply use the raw protocol and accept transactions will very 
infrequently take two round-trips (which is fine, because in this scenario 
round-trips are cheap).

> A known optimization for the hot rows problem is to "hint" or manually force 
> clients to direct all updates to the hot row to the same node

So, with a leaderless protocol like Accord the ordering decisions are never 
really bottlenecked - no matter how many are in-flight, a new transaction will 
experience no additional latency determining its execution order. The only 
bottleneck will be execution. For this it is absolutely possible to funnel 
everything to a single coordinator, but I don’t know that this would in 
practice achieve much – the important bottleneck would be that the coordinators 
are all within the same

DC, so that the _replicas_ may all respond to them with their data dependencies 
with minimal delay. This is something we discussed in the ApacheCon call as it 
happens. If a significant number of transactions are pending, and they are in 
different DCs, it would be quite straightforward to nominate a coordinator 
within the DC serving the majority of operations to serve the remainder, and to 
forward the results to the original coordinators.

I don’t anticipate this optimisation being a high priority until we have user 
reports of this bottleneck in the wild, however. Since clients for many 
workloads will naturally be geo-partitioned so that related state is being 
updated from the same region, it might simply not be needed – at least any time 
soon.

From: Henrik Ingo 
Date: Friday, 1 October 2021 at 14:38
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Benedict

Since you asked, I reviewed the thread a bit and found this...


*secondary indexes*

>> What I would like to understand better and without guessing is, what do
these transactions look like from a client/user point of view?


> This is a fair question, and perhaps something I should pinpoint more
directly for the reader. The CEP does stipulate non-interactive
transactions, i.e. those that are one-shot. The only other limitation is
that the partition keys must be known upfront, however I expect we will
follow-up soon after with some weaker semantics that build on top (probably
using optimistic concurrency control) to support transactions where only
some partition keys are known upfront, so that we may support global
secondary indexes with proper isolation and consistency.


The CEP doesn't actually mention lack of support for secondary index
queries. Probably good to add as a limitation. (I realize currently using
secondary indexes isn't mainstream in Cassandra anyway, but with SASI in
4.0 and SAI being a separate CEP in discussion, it's good to call out
Accord wouldn't automatically support them.)

While I understand they are out of scope, do you happen to have already
some idea what it would require to support secondary indexes? Is it
sufficient to just include the secondary index keys (or a range of such) in
the "deps" of the transaction? Of course, still needing to also include the
partitions or rows actuallly read as a result of scanning the secondary
index. Similarly then for mutations, deps would have to include changes to
index keys in the transaction?


*commit latency*

A topic on some off-list discussions has been to underst

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Paulo Motta
Can you just answer what palpable feature will be available once this CEP
lands because this is still not clear to me (and perhaps to others) from
the current CEP structure. The current document details thoroughly the
protocol but in my view lacks to illustrate what specific API, methods,
modules will become available to developers, how it fits into the larger
picture and interacts with existing modules if at all and perhaps a few
examples of how it can be used to build features on top.

Em sex., 1 de out. de 2021 às 11:10, [email protected] <
[email protected]> escreveu:

> I’m not, though it might seem that way. I disagree with your views about
> how CEP should be structured. Since the CEP process was itself codified via
> the CEP process, if you want to recodify how CEP work, the correct way is
> via the CEP process itself.
>
> The discussion is being drawn in multiple directions away from the CEP
> itself, and I am trying to keep this particular thread focused on the
> business at hand, not meta discussions around CEP structure that will no
> doubt be unproductive given likely irreconcilable views about the topic,
> nor discussions about other CEP that could have been.
>
> If you want to start a separate exploratory discussion thread about CEP
> structure without filing a CEP feel free to do so.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 15:04
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > If you want to impose your views on CEP structure on others, please file
> a CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
> This sounds very kafkaesque. You know I won't file a meta-CEP to change the
> structure of CEP so you're just using this as an excuse to just shut the
> discussion on the lack of clarity on what actual palpable feature will be
> available once the CEP lands. :-)
>
> I'm just providing my humble feedback on how a CEP could be more digestible
> and easier to consume from an external point of view, and this seems like
> an appropriate and contextualized place to voice this opinion which is
> perhaps shared by others.
>
> Em sex., 1 de out. de 2021 às 10:55, [email protected] <
> [email protected]> escreveu:
>
> > I disagree with you. However, this is the wrong forum to have a meta
> > discussion about how CEP should be structured.
> >
> > If you want to impose your views on CEP structure on others, please file
> a
> > CEP with the additional restrictions and guidance you want to impose and
> > start a discussion thread. I can then respond in detail to why I perceive
> > this approach to be flawed, in a dedicated context.
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 14:48
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > >  The proposal as it stands today is exceptionally thorough, more so
> than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > The protocol is thoroughly described, but in my view CEP is a forum to
> > discuss the high level architecture and plan for adding a full end-to-end
> > enhancement to the database, breaking it into sub-CEPs if needed, as long
> > as the full plan is known in advance, otherwise the community will not
> have
> > the context to judge the full extent and impact of the proposed
> > enhancement.
> >
> > > Since it remains unclear to me what either yourself or Jonathan want to
> > see as an alternative
> >
> > I would personally like to see something along these lines:
> >
> > CEP1: Add ACID-compliant atomic batches
> > - UX changes needed: none, CQL provides the grammar we need.
> > - Distributed transaction protocol needed: Accord (link to white paper if
> > you want specific details about the protcool)
> > - High-level architecture: what new components will be added, how
> existing
> > components will be modified, what new messages will be added, what new
> > configuration knobs will be introduced, what are the milestones of the
> > project, etc.
> >
> > CEP2: Make LWT faster and more reliable
> > - UX changes needed: none
> > - Distributed transaction protocol needed: Accord, already added by
> > previous CEP.
> > - High-level architecture: blablabla... and so on.
> >
> > Em sex., 1 de out. de 2021 às 10:19, [email protected] <
> > [email protected]> escreveu:
> >
> > > I think this is getting circ

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
I’m not, though it might seem that way. I disagree with your views about how 
CEP should be structured. Since the CEP process was itself codified via the CEP 
process, if you want to recodify how CEP work, the correct way is via the CEP 
process itself.

The discussion is being drawn in multiple directions away from the CEP itself, 
and I am trying to keep this particular thread focused on the business at hand, 
not meta discussions around CEP structure that will no doubt be unproductive 
given likely irreconcilable views about the topic, nor discussions about other 
CEP that could have been.

If you want to start a separate exploratory discussion thread about CEP 
structure without filing a CEP feel free to do so.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 15:04
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> If you want to impose your views on CEP structure on others, please file
a CEP with the additional restrictions and guidance you want to impose and
start a discussion thread. I can then respond in detail to why I perceive
this approach to be flawed, in a dedicated context.

This sounds very kafkaesque. You know I won't file a meta-CEP to change the
structure of CEP so you're just using this as an excuse to just shut the
discussion on the lack of clarity on what actual palpable feature will be
available once the CEP lands. :-)

I'm just providing my humble feedback on how a CEP could be more digestible
and easier to consume from an external point of view, and this seems like
an appropriate and contextualized place to voice this opinion which is
perhaps shared by others.

Em sex., 1 de out. de 2021 às 10:55, [email protected] <
[email protected]> escreveu:

> I disagree with you. However, this is the wrong forum to have a meta
> discussion about how CEP should be structured.
>
> If you want to impose your views on CEP structure on others, please file a
> CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 14:48
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >  The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> The protocol is thoroughly described, but in my view CEP is a forum to
> discuss the high level architecture and plan for adding a full end-to-end
> enhancement to the database, breaking it into sub-CEPs if needed, as long
> as the full plan is known in advance, otherwise the community will not have
> the context to judge the full extent and impact of the proposed
> enhancement.
>
> > Since it remains unclear to me what either yourself or Jonathan want to
> see as an alternative
>
> I would personally like to see something along these lines:
>
> CEP1: Add ACID-compliant atomic batches
> - UX changes needed: none, CQL provides the grammar we need.
> - Distributed transaction protocol needed: Accord (link to white paper if
> you want specific details about the protcool)
> - High-level architecture: what new components will be added, how existing
> components will be modified, what new messages will be added, what new
> configuration knobs will be introduced, what are the milestones of the
> project, etc.
>
> CEP2: Make LWT faster and more reliable
> - UX changes needed: none
> - Distributed transaction protocol needed: Accord, already added by
> previous CEP.
> - High-level architecture: blablabla... and so on.
>
> Em sex., 1 de out. de 2021 às 10:19, [email protected] <
> [email protected]> escreveu:
>
> > I think this is getting circular and unproductive. Basic disagreements
> > about whether the CEP specifies a feature I am inclined to leave for a
> > vote. In my view the CEP specifies several features, both immediate ones
> > for the user (ACID batches and multi-key LWTS) and developer-focused ones
> > around ground-breaking semantics that will be enabled.
> >
> > The proposal as it stands today is exceptionally thorough, more so than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > This is a Cassandra Enhancement *Proposal*, and at some point we have to
> > engage with what is proposed, not what you might like to be proposed.
> Since
> > it remains unclear to me what either yourself or Jonathan want to see as
> an
> > alternative, at this point it would seem more productive to produce your
> > own proposals for the community to consider. It is possible for multiple
> > transaction systems to co-ex

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Paulo Motta
> If you want to impose your views on CEP structure on others, please file
a CEP with the additional restrictions and guidance you want to impose and
start a discussion thread. I can then respond in detail to why I perceive
this approach to be flawed, in a dedicated context.

This sounds very kafkaesque. You know I won't file a meta-CEP to change the
structure of CEP so you're just using this as an excuse to just shut the
discussion on the lack of clarity on what actual palpable feature will be
available once the CEP lands. :-)

I'm just providing my humble feedback on how a CEP could be more digestible
and easier to consume from an external point of view, and this seems like
an appropriate and contextualized place to voice this opinion which is
perhaps shared by others.

Em sex., 1 de out. de 2021 às 10:55, [email protected] <
[email protected]> escreveu:

> I disagree with you. However, this is the wrong forum to have a meta
> discussion about how CEP should be structured.
>
> If you want to impose your views on CEP structure on others, please file a
> CEP with the additional restrictions and guidance you want to impose and
> start a discussion thread. I can then respond in detail to why I perceive
> this approach to be flawed, in a dedicated context.
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 14:48
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >  The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> The protocol is thoroughly described, but in my view CEP is a forum to
> discuss the high level architecture and plan for adding a full end-to-end
> enhancement to the database, breaking it into sub-CEPs if needed, as long
> as the full plan is known in advance, otherwise the community will not have
> the context to judge the full extent and impact of the proposed
> enhancement.
>
> > Since it remains unclear to me what either yourself or Jonathan want to
> see as an alternative
>
> I would personally like to see something along these lines:
>
> CEP1: Add ACID-compliant atomic batches
> - UX changes needed: none, CQL provides the grammar we need.
> - Distributed transaction protocol needed: Accord (link to white paper if
> you want specific details about the protcool)
> - High-level architecture: what new components will be added, how existing
> components will be modified, what new messages will be added, what new
> configuration knobs will be introduced, what are the milestones of the
> project, etc.
>
> CEP2: Make LWT faster and more reliable
> - UX changes needed: none
> - Distributed transaction protocol needed: Accord, already added by
> previous CEP.
> - High-level architecture: blablabla... and so on.
>
> Em sex., 1 de out. de 2021 às 10:19, [email protected] <
> [email protected]> escreveu:
>
> > I think this is getting circular and unproductive. Basic disagreements
> > about whether the CEP specifies a feature I am inclined to leave for a
> > vote. In my view the CEP specifies several features, both immediate ones
> > for the user (ACID batches and multi-key LWTS) and developer-focused ones
> > around ground-breaking semantics that will be enabled.
> >
> > The proposal as it stands today is exceptionally thorough, more so than
> > any other CEP to date, or any CEP is likely to be in the near future.
> >
> > This is a Cassandra Enhancement *Proposal*, and at some point we have to
> > engage with what is proposed, not what you might like to be proposed.
> Since
> > it remains unclear to me what either yourself or Jonathan want to see as
> an
> > alternative, at this point it would seem more productive to produce your
> > own proposals for the community to consider. It is possible for multiple
> > transaction systems to co-exist, if you feel this is necessary.
> >
> >
> >
> > From: Paulo Motta 
> > Date: Friday, 1 October 2021 at 13:58
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > I share similar feelings as jbellis that this proposal seems to be
> focusing
> > on the protocol itself but lacking the actual feature that will use the
> > protocol which IMO a key element to discuss on a CEP.
> >
> > It's similar to saying: hey I want to add this Tries Serialization
> Protocol
> > to Cassandra, but not providing specific details of how this protocol is
> > going to be used.
> >
> > I think the right route for a CEP is to describe the feature that will be
> > added to the database and the protocol is a mere requirement of the
> 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
I disagree with you. However, this is the wrong forum to have a meta discussion 
about how CEP should be structured.

If you want to impose your views on CEP structure on others, please file a CEP 
with the additional restrictions and guidance you want to impose and start a 
discussion thread. I can then respond in detail to why I perceive this approach 
to be flawed, in a dedicated context.


From: Paulo Motta 
Date: Friday, 1 October 2021 at 14:48
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>  The proposal as it stands today is exceptionally thorough, more so than
any other CEP to date, or any CEP is likely to be in the near future.

The protocol is thoroughly described, but in my view CEP is a forum to
discuss the high level architecture and plan for adding a full end-to-end
enhancement to the database, breaking it into sub-CEPs if needed, as long
as the full plan is known in advance, otherwise the community will not have
the context to judge the full extent and impact of the proposed enhancement.

> Since it remains unclear to me what either yourself or Jonathan want to
see as an alternative

I would personally like to see something along these lines:

CEP1: Add ACID-compliant atomic batches
- UX changes needed: none, CQL provides the grammar we need.
- Distributed transaction protocol needed: Accord (link to white paper if
you want specific details about the protcool)
- High-level architecture: what new components will be added, how existing
components will be modified, what new messages will be added, what new
configuration knobs will be introduced, what are the milestones of the
project, etc.

CEP2: Make LWT faster and more reliable
- UX changes needed: none
- Distributed transaction protocol needed: Accord, already added by
previous CEP.
- High-level architecture: blablabla... and so on.

Em sex., 1 de out. de 2021 às 10:19, [email protected] <
[email protected]> escreveu:

> I think this is getting circular and unproductive. Basic disagreements
> about whether the CEP specifies a feature I am inclined to leave for a
> vote. In my view the CEP specifies several features, both immediate ones
> for the user (ACID batches and multi-key LWTS) and developer-focused ones
> around ground-breaking semantics that will be enabled.
>
> The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> This is a Cassandra Enhancement *Proposal*, and at some point we have to
> engage with what is proposed, not what you might like to be proposed. Since
> it remains unclear to me what either yourself or Jonathan want to see as an
> alternative, at this point it would seem more productive to produce your
> own proposals for the community to consider. It is possible for multiple
> transaction systems to co-exist, if you feel this is necessary.
>
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 13:58
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I share similar feelings as jbellis that this proposal seems to be focusing
> on the protocol itself but lacking the actual feature that will use the
> protocol which IMO a key element to discuss on a CEP.
>
> It's similar to saying: hey I want to add this Tries Serialization Protocol
> to Cassandra, but not providing specific details of how this protocol is
> going to be used.
>
> I think the right route for a CEP is to describe the feature that will be
> added to the database and the protocol is a mere requirement of the
> high-level feature, for example:
>
> CEP: Add Trie-backed memtable
> - Trie Serialization Protocol: implementation detail of the above CEP
>
> What is the difficulty of taking this approach, picking one of the myriad
> of features that will be enabled by Accord and using that as the initial
> CEP to introduce the protocol to the database?
>
> Em sex., 1 de out. de 2021 às 08:37, [email protected] <
> [email protected]> escreveu:
>
> > Actually, thinking about it again, the simple optimistic protocol would
> in
> > fact guarantee system forward progress (i.e. independent of transaction
> > formulation).
> >
> >
> > From: [email protected] 
> > Date: Friday, 1 October 2021 at 09:14
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > Hi Jonathan,
> >
> > It would be great if we could achieve a bandwidth higher than 1-2 short
> > emails per week. It remains unclear to me what your goal is, and it would
> > help if you could make a statement like “I want Cassandra to be able to
> do
> > X” so that we can respond directly to it. I am also available to have
> > another call, in 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Paulo Motta
>  The proposal as it stands today is exceptionally thorough, more so than
any other CEP to date, or any CEP is likely to be in the near future.

The protocol is thoroughly described, but in my view CEP is a forum to
discuss the high level architecture and plan for adding a full end-to-end
enhancement to the database, breaking it into sub-CEPs if needed, as long
as the full plan is known in advance, otherwise the community will not have
the context to judge the full extent and impact of the proposed enhancement.

> Since it remains unclear to me what either yourself or Jonathan want to
see as an alternative

I would personally like to see something along these lines:

CEP1: Add ACID-compliant atomic batches
- UX changes needed: none, CQL provides the grammar we need.
- Distributed transaction protocol needed: Accord (link to white paper if
you want specific details about the protcool)
- High-level architecture: what new components will be added, how existing
components will be modified, what new messages will be added, what new
configuration knobs will be introduced, what are the milestones of the
project, etc.

CEP2: Make LWT faster and more reliable
- UX changes needed: none
- Distributed transaction protocol needed: Accord, already added by
previous CEP.
- High-level architecture: blablabla... and so on.

Em sex., 1 de out. de 2021 às 10:19, [email protected] <
[email protected]> escreveu:

> I think this is getting circular and unproductive. Basic disagreements
> about whether the CEP specifies a feature I am inclined to leave for a
> vote. In my view the CEP specifies several features, both immediate ones
> for the user (ACID batches and multi-key LWTS) and developer-focused ones
> around ground-breaking semantics that will be enabled.
>
> The proposal as it stands today is exceptionally thorough, more so than
> any other CEP to date, or any CEP is likely to be in the near future.
>
> This is a Cassandra Enhancement *Proposal*, and at some point we have to
> engage with what is proposed, not what you might like to be proposed. Since
> it remains unclear to me what either yourself or Jonathan want to see as an
> alternative, at this point it would seem more productive to produce your
> own proposals for the community to consider. It is possible for multiple
> transaction systems to co-exist, if you feel this is necessary.
>
>
>
> From: Paulo Motta 
> Date: Friday, 1 October 2021 at 13:58
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I share similar feelings as jbellis that this proposal seems to be focusing
> on the protocol itself but lacking the actual feature that will use the
> protocol which IMO a key element to discuss on a CEP.
>
> It's similar to saying: hey I want to add this Tries Serialization Protocol
> to Cassandra, but not providing specific details of how this protocol is
> going to be used.
>
> I think the right route for a CEP is to describe the feature that will be
> added to the database and the protocol is a mere requirement of the
> high-level feature, for example:
>
> CEP: Add Trie-backed memtable
> - Trie Serialization Protocol: implementation detail of the above CEP
>
> What is the difficulty of taking this approach, picking one of the myriad
> of features that will be enabled by Accord and using that as the initial
> CEP to introduce the protocol to the database?
>
> Em sex., 1 de out. de 2021 às 08:37, [email protected] <
> [email protected]> escreveu:
>
> > Actually, thinking about it again, the simple optimistic protocol would
> in
> > fact guarantee system forward progress (i.e. independent of transaction
> > formulation).
> >
> >
> > From: [email protected] 
> > Date: Friday, 1 October 2021 at 09:14
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > Hi Jonathan,
> >
> > It would be great if we could achieve a bandwidth higher than 1-2 short
> > emails per week. It remains unclear to me what your goal is, and it would
> > help if you could make a statement like “I want Cassandra to be able to
> do
> > X” so that we can respond directly to it. I am also available to have
> > another call, in which we can have a back and forth, please feel free to
> > propose a London-compatible time within the next week that is suitable
> for
> > you.
> >
> > In my opinion we are at risk of veering off-topic, though. This CEP is
> not
> > to deliver interactive transactions, and to my knowledge nobody is
> > proposing a CEP for interactive transactions. So, for the CEP at hand the
> > salient question seems: does this CEP prevent us from implementing
> > interactive transactions wi

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 4:37 PM Henrik Ingo  wrote:

> A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node,
> essentially making the system leader based. This allows the database to
> start processing new updates even while the first one is still committing.
> (See Galera for an example implementing this
> .)
> This makes me wonder whether there is a similar optimization for Accord
> where transactions from the same coordinator can be allowed to commit
> within the SkewMax window, because we can assume that the trx timestamps
> originating at the same coordinator cannot arrive out of order when using
> TPC?
>
>
TCP

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
the SkewMax window, because we can assume that the trx timestamps
originating at the same coordinator cannot arrive out of order when using
TPC?



henrik







On Mon, Sep 27, 2021 at 11:59 PM [email protected] 
wrote:

> Ok, it’s time for the weekly poking of the hornet’s nest.
>
> Any more thoughts, questions or criticisms, anyone?
>
> From: [email protected] 
> Date: Friday, 24 September 2021 at 22:41
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I’m not aware of anybody having taken any notes, but somebody please chime
> in if I’m wrong.
>
> From my recollection, re Accord:
>
>
>   *   Q: Will batches now support rollbacks?
>  *   Batches would apply atomically or not, but unlikely to have a
> concept of rollback. Timeouts remain unknown, but hope to have some
> mechanism to provide clients a definitive answer about such transactions
> after the fact.
>   *   Q: Can stale replicas participate in transactions?
>  *   Accord applies conflicting transactions in-order at every
> replica, so only nodes that are up-to-date may participate in the execution
> of a transaction, but any replica may participate in agreeing a
> transaction. To ensure replicas remain up-to-date I anticipate introducing
> a real-time repair facility at the transactional message level, with peers
> reconciling recently processed messages and cross-delivering any that are
> missing.
>   *   Possible UX directions in very vague terms: CQL atomic and
> conditional batches initially; going forwards interactive transactions?
> Complex user defined functions? SQL?
>   *   Discussed possibility of LOCAL_QUORUM reads for globally replicated
> transactional tables, as this is an important use case
>  *   Simple stale reads to transactional tables
>  *   Brainstormed a bit about serializable reads to a single DC
> without (normally) crossing WAN
>  *   Discussed possibility of multiple ACKs providing separate LAN and
> WAN persistence notifications to clients
>   *   Discussed size of fast path quorums in Accord, and how this might
> affect global latency in high RF clusters (i.e. not optimal, and in some
> cases may need every DC to participate) and how this can be modified by
> biasing fast path electorate so that 2 of the 3 DCs may reach fast-path
> decisions with each other (remaining DC having to reach both those DCs to
> reach fast path). Also discussed Calvin-like modes of operation that would
> offer optimal global latency for sufficiently small clusters at RF=3 or
> RF=5.
>
> I’m sure there were other discussions I can’t remember, perhaps others can
> fill in the blanks.
>
>
> From: Jonathan Ellis 
> Date: Friday, 24 September 2021 at 20:28
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Does anyone have notes for those of us who couldn't make the call?
>
> On Wed, Sep 22, 2021 at 1:35 PM [email protected] 
> wrote:
>
> > Hi everyone,
> >
> > Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> > 4pm BST to discuss Accord and other things in the community. There are no
> > plans to make any kind of project decisions. Everyone is welcome to drop
> in
> > to discuss Accord or whatever else might be on your mind.
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gather.town_app_2UKSboSjqKXIXliE_ac2021-2Dcass-2Dsocial&d=DwIF-g&c=adz96Xi0w1RHqtPMowiL2g&r=eYcKRCU2ISzgciHbxg_tERbSQOZMMscdGLftkLqUuXo&m=yN7Y6u6BfW9NUZaSousZnD2Y-WiBtM1xDeJNy2WEq_r-gZqFwHVT4IPaeMOUa-AF&s=cgKblfbz9lUghSPbj5Si7oM7RsZy1w9vfvWjyzL8MXs&e=
> >
> >
> > From: [email protected] 
> > Date: Wednesday, 22 September 2021 at 16:22
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > No, I would expect to deliver strict serializable interactive
> transactions
> > using Accord. These would simply corroborate that the participating keys
> > had not modified their write timestamps during the final transaction.
> These
> > could even be undertaken with still only a single wide area round-trip,
> > using local copies of the data to assemble the transaction (though this
> > would marginally increase the chance of aborts)
> >
> > My goal for MVCC is parallelism, not additional isolation levels (though
> > snapshot isolation is useful and we’ll probably also want to offer that
> > eventually)
> >
> > From: Henrik Ingo 
> > Date: Wednesday, 22 September 2021 at 15:15
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > On Wed, Sep 22, 2021 at 7:56 AM bened...@apa

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
I am of course more than happy to continue discussing CEP-15 with respect to 
the proposed goals, and queries about the proposed protocol. I hope people feel 
free to continue raising queries. If anybody disagrees with the goals or any 
specific part of the proposal on substantive (rather than aesthetic/structural) 
grounds I also remain very open to further discussion.

However, I think at this point it is reasonable to request that we engage with 
the proposal as defined, and in particular the goals that have been proposed. 
Those who wish for a different proposal can produce one so that it may be 
engaged with on the same terms.

From: [email protected] 
Date: Friday, 1 October 2021 at 14:19
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I think this is getting circular and unproductive. Basic disagreements about 
whether the CEP specifies a feature I am inclined to leave for a vote. In my 
view the CEP specifies several features, both immediate ones for the user (ACID 
batches and multi-key LWTS) and developer-focused ones around ground-breaking 
semantics that will be enabled.

The proposal as it stands today is exceptionally thorough, more so than any 
other CEP to date, or any CEP is likely to be in the near future.

This is a Cassandra Enhancement *Proposal*, and at some point we have to engage 
with what is proposed, not what you might like to be proposed. Since it remains 
unclear to me what either yourself or Jonathan want to see as an alternative, 
at this point it would seem more productive to produce your own proposals for 
the community to consider. It is possible for multiple transaction systems to 
co-exist, if you feel this is necessary.



From: Paulo Motta 
Date: Friday, 1 October 2021 at 13:58
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I share similar feelings as jbellis that this proposal seems to be focusing
on the protocol itself but lacking the actual feature that will use the
protocol which IMO a key element to discuss on a CEP.

It's similar to saying: hey I want to add this Tries Serialization Protocol
to Cassandra, but not providing specific details of how this protocol is
going to be used.

I think the right route for a CEP is to describe the feature that will be
added to the database and the protocol is a mere requirement of the
high-level feature, for example:

CEP: Add Trie-backed memtable
- Trie Serialization Protocol: implementation detail of the above CEP

What is the difficulty of taking this approach, picking one of the myriad
of features that will be enabled by Accord and using that as the initial
CEP to introduce the protocol to the database?

Em sex., 1 de out. de 2021 às 08:37, [email protected] <
[email protected]> escreveu:

> Actually, thinking about it again, the simple optimistic protocol would in
> fact guarantee system forward progress (i.e. independent of transaction
> formulation).
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 09:14
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jonathan,
>
> It would be great if we could achieve a bandwidth higher than 1-2 short
> emails per week. It remains unclear to me what your goal is, and it would
> help if you could make a statement like “I want Cassandra to be able to do
> X” so that we can respond directly to it. I am also available to have
> another call, in which we can have a back and forth, please feel free to
> propose a London-compatible time within the next week that is suitable for
> you.
>
> In my opinion we are at risk of veering off-topic, though. This CEP is not
> to deliver interactive transactions, and to my knowledge nobody is
> proposing a CEP for interactive transactions. So, for the CEP at hand the
> salient question seems: does this CEP prevent us from implementing
> interactive transactions with properties X, Y, Z in future? To which the
> answer is almost certainly no.
>
> However, to continue the discussion and respond directly to your queries,
> I believe we agree on the definition of an interactive transaction.
>
> Two protocols were loosely outlined. The first, using timestamps for
> optimistic concurrency control, would indeed involve the possibility of
> aborts. It would not however inherently adopt the issue of LWTs where no
> transaction is able to make progress. Whether or not progress is guaranteed
> (in a livelock-free sense) would depend on the structure of the
> transactions that were interfering.
>
> This approach has the advantage of being very simple to implement, so that
> we could realistically support interactive transactions quite quickly. It
> has the additional advantage that transactions would execute very quickly
> by avoiding the WAN during construction, and as a result may in p

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
I think this is getting circular and unproductive. Basic disagreements about 
whether the CEP specifies a feature I am inclined to leave for a vote. In my 
view the CEP specifies several features, both immediate ones for the user (ACID 
batches and multi-key LWTS) and developer-focused ones around ground-breaking 
semantics that will be enabled.

The proposal as it stands today is exceptionally thorough, more so than any 
other CEP to date, or any CEP is likely to be in the near future.

This is a Cassandra Enhancement *Proposal*, and at some point we have to engage 
with what is proposed, not what you might like to be proposed. Since it remains 
unclear to me what either yourself or Jonathan want to see as an alternative, 
at this point it would seem more productive to produce your own proposals for 
the community to consider. It is possible for multiple transaction systems to 
co-exist, if you feel this is necessary.



From: Paulo Motta 
Date: Friday, 1 October 2021 at 13:58
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I share similar feelings as jbellis that this proposal seems to be focusing
on the protocol itself but lacking the actual feature that will use the
protocol which IMO a key element to discuss on a CEP.

It's similar to saying: hey I want to add this Tries Serialization Protocol
to Cassandra, but not providing specific details of how this protocol is
going to be used.

I think the right route for a CEP is to describe the feature that will be
added to the database and the protocol is a mere requirement of the
high-level feature, for example:

CEP: Add Trie-backed memtable
- Trie Serialization Protocol: implementation detail of the above CEP

What is the difficulty of taking this approach, picking one of the myriad
of features that will be enabled by Accord and using that as the initial
CEP to introduce the protocol to the database?

Em sex., 1 de out. de 2021 às 08:37, [email protected] <
[email protected]> escreveu:

> Actually, thinking about it again, the simple optimistic protocol would in
> fact guarantee system forward progress (i.e. independent of transaction
> formulation).
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 09:14
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jonathan,
>
> It would be great if we could achieve a bandwidth higher than 1-2 short
> emails per week. It remains unclear to me what your goal is, and it would
> help if you could make a statement like “I want Cassandra to be able to do
> X” so that we can respond directly to it. I am also available to have
> another call, in which we can have a back and forth, please feel free to
> propose a London-compatible time within the next week that is suitable for
> you.
>
> In my opinion we are at risk of veering off-topic, though. This CEP is not
> to deliver interactive transactions, and to my knowledge nobody is
> proposing a CEP for interactive transactions. So, for the CEP at hand the
> salient question seems: does this CEP prevent us from implementing
> interactive transactions with properties X, Y, Z in future? To which the
> answer is almost certainly no.
>
> However, to continue the discussion and respond directly to your queries,
> I believe we agree on the definition of an interactive transaction.
>
> Two protocols were loosely outlined. The first, using timestamps for
> optimistic concurrency control, would indeed involve the possibility of
> aborts. It would not however inherently adopt the issue of LWTs where no
> transaction is able to make progress. Whether or not progress is guaranteed
> (in a livelock-free sense) would depend on the structure of the
> transactions that were interfering.
>
> This approach has the advantage of being very simple to implement, so that
> we could realistically support interactive transactions quite quickly. It
> has the additional advantage that transactions would execute very quickly
> by avoiding the WAN during construction, and as a result may in practice
> experience fewer aborts than protocols that guarantee livelock-freedom.
>
> The second protocol proposed using read/write intents and would be able to
> support almost any behaviour you want. We could even utilise pessimistic
> concurrency control, or anything in-between. This is its own huge design
> space, and discussion of this approach and the trade-offs that could be
> made is (in my opinion) entirely out of scope for this CEP.
>
>
> From: Jonathan Ellis 
> Date: Friday, 1 October 2021 at 05:00
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> The obstacle for me is you've provided a protocol but not a fully fleshed
> out architecture, so it's hard to fill in some of the blanks.  But it looks

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Paulo Motta
I share similar feelings as jbellis that this proposal seems to be focusing
on the protocol itself but lacking the actual feature that will use the
protocol which IMO a key element to discuss on a CEP.

It's similar to saying: hey I want to add this Tries Serialization Protocol
to Cassandra, but not providing specific details of how this protocol is
going to be used.

I think the right route for a CEP is to describe the feature that will be
added to the database and the protocol is a mere requirement of the
high-level feature, for example:

CEP: Add Trie-backed memtable
- Trie Serialization Protocol: implementation detail of the above CEP

What is the difficulty of taking this approach, picking one of the myriad
of features that will be enabled by Accord and using that as the initial
CEP to introduce the protocol to the database?

Em sex., 1 de out. de 2021 às 08:37, [email protected] <
[email protected]> escreveu:

> Actually, thinking about it again, the simple optimistic protocol would in
> fact guarantee system forward progress (i.e. independent of transaction
> formulation).
>
>
> From: [email protected] 
> Date: Friday, 1 October 2021 at 09:14
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jonathan,
>
> It would be great if we could achieve a bandwidth higher than 1-2 short
> emails per week. It remains unclear to me what your goal is, and it would
> help if you could make a statement like “I want Cassandra to be able to do
> X” so that we can respond directly to it. I am also available to have
> another call, in which we can have a back and forth, please feel free to
> propose a London-compatible time within the next week that is suitable for
> you.
>
> In my opinion we are at risk of veering off-topic, though. This CEP is not
> to deliver interactive transactions, and to my knowledge nobody is
> proposing a CEP for interactive transactions. So, for the CEP at hand the
> salient question seems: does this CEP prevent us from implementing
> interactive transactions with properties X, Y, Z in future? To which the
> answer is almost certainly no.
>
> However, to continue the discussion and respond directly to your queries,
> I believe we agree on the definition of an interactive transaction.
>
> Two protocols were loosely outlined. The first, using timestamps for
> optimistic concurrency control, would indeed involve the possibility of
> aborts. It would not however inherently adopt the issue of LWTs where no
> transaction is able to make progress. Whether or not progress is guaranteed
> (in a livelock-free sense) would depend on the structure of the
> transactions that were interfering.
>
> This approach has the advantage of being very simple to implement, so that
> we could realistically support interactive transactions quite quickly. It
> has the additional advantage that transactions would execute very quickly
> by avoiding the WAN during construction, and as a result may in practice
> experience fewer aborts than protocols that guarantee livelock-freedom.
>
> The second protocol proposed using read/write intents and would be able to
> support almost any behaviour you want. We could even utilise pessimistic
> concurrency control, or anything in-between. This is its own huge design
> space, and discussion of this approach and the trade-offs that could be
> made is (in my opinion) entirely out of scope for this CEP.
>
>
> From: Jonathan Ellis 
> Date: Friday, 1 October 2021 at 05:00
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> The obstacle for me is you've provided a protocol but not a fully fleshed
> out architecture, so it's hard to fill in some of the blanks.  But it looks
> to me like optimistic concurrency control for interactive transactions
> applied to Accord would leave you in a LWT-like situation under fairly
> light contention where nobody actually makes progress due to retries.
>
> To make sure we're talking about the same thing, as Henrik pointed out,
> interactive transactions mean multiple round trips from the client within a
> transaction.  For example, here
> <
> https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213
> >
> is a simple implementation of the TPC-C New Order transaction.  The high
> level logic (via
> <
> https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm
> >)
> is,
>
>1. Get records describing a warehouse, customer, & district
>2. Update the district
>3. Increment next available order number
>4. Insert record into Order and New-Order tables
>5. For 5-15 items, get Item record, get/update Stock record
>6. Insert Order-Line Record

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
Actually, thinking about it again, the simple optimistic protocol would in fact 
guarantee system forward progress (i.e. independent of transaction formulation).


From: [email protected] 
Date: Friday, 1 October 2021 at 09:14
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

It would be great if we could achieve a bandwidth higher than 1-2 short emails 
per week. It remains unclear to me what your goal is, and it would help if you 
could make a statement like “I want Cassandra to be able to do X” so that we 
can respond directly to it. I am also available to have another call, in which 
we can have a back and forth, please feel free to propose a London-compatible 
time within the next week that is suitable for you.

In my opinion we are at risk of veering off-topic, though. This CEP is not to 
deliver interactive transactions, and to my knowledge nobody is proposing a CEP 
for interactive transactions. So, for the CEP at hand the salient question 
seems: does this CEP prevent us from implementing interactive transactions with 
properties X, Y, Z in future? To which the answer is almost certainly no.

However, to continue the discussion and respond directly to your queries, I 
believe we agree on the definition of an interactive transaction.

Two protocols were loosely outlined. The first, using timestamps for optimistic 
concurrency control, would indeed involve the possibility of aborts. It would 
not however inherently adopt the issue of LWTs where no transaction is able to 
make progress. Whether or not progress is guaranteed (in a livelock-free sense) 
would depend on the structure of the transactions that were interfering.

This approach has the advantage of being very simple to implement, so that we 
could realistically support interactive transactions quite quickly. It has the 
additional advantage that transactions would execute very quickly by avoiding 
the WAN during construction, and as a result may in practice experience fewer 
aborts than protocols that guarantee livelock-freedom.

The second protocol proposed using read/write intents and would be able to 
support almost any behaviour you want. We could even utilise pessimistic 
concurrency control, or anything in-between. This is its own huge design space, 
and discussion of this approach and the trade-offs that could be made is (in my 
opinion) entirely out of scope for this CEP.


From: Jonathan Ellis 
Date: Friday, 1 October 2021 at 05:00
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
The obstacle for me is you've provided a protocol but not a fully fleshed
out architecture, so it's hard to fill in some of the blanks.  But it looks
to me like optimistic concurrency control for interactive transactions
applied to Accord would leave you in a LWT-like situation under fairly
light contention where nobody actually makes progress due to retries.

To make sure we're talking about the same thing, as Henrik pointed out,
interactive transactions mean multiple round trips from the client within a
transaction.  For example, here
<https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213>
is a simple implementation of the TPC-C New Order transaction.  The high
level logic (via
<https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm>)
is,

   1. Get records describing a warehouse, customer, & district
   2. Update the district
   3. Increment next available order number
   4. Insert record into Order and New-Order tables
   5. For 5-15 items, get Item record, get/update Stock record
   6. Insert Order-Line Record

As you can see, this requires a lot of client-side logic mixed in with the
actual SQL commands.


On Thu, Sep 30, 2021 at 2:30 AM [email protected] 
wrote:

> Essentially this, although I think in practice we will need to track each
> partition’s timestamp separately (or optionally for reduced conflicts, each
> row or datum’s), and make them all part of the conditional application of
> the transaction - at least for strict-serializability.
>
> The alternative is to insert read/write intents for the transaction during
> each step, and to confirm they are still valid on commit, but this approach
> would require a WAN round-trip for each step in the interactive
> transaction, whereas the timestamp-validating approach can use a LAN
> round-trip for each step besides the final one, and is also much simpler to
> implement.
>
>
> From: Blake Eggleston 
> Date: Thursday, 30 September 2021 at 05:47
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> You could establish a lower timestamp bound and buffer transaction state
> on the coordinator, then make the commit an operation that only applies if
> all partitions involved haven’t been changed by a more recent timestamp.
> You could als

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread [email protected]
Hi Jonathan,

It would be great if we could achieve a bandwidth higher than 1-2 short emails 
per week. It remains unclear to me what your goal is, and it would help if you 
could make a statement like “I want Cassandra to be able to do X” so that we 
can respond directly to it. I am also available to have another call, in which 
we can have a back and forth, please feel free to propose a London-compatible 
time within the next week that is suitable for you.

In my opinion we are at risk of veering off-topic, though. This CEP is not to 
deliver interactive transactions, and to my knowledge nobody is proposing a CEP 
for interactive transactions. So, for the CEP at hand the salient question 
seems: does this CEP prevent us from implementing interactive transactions with 
properties X, Y, Z in future? To which the answer is almost certainly no.

However, to continue the discussion and respond directly to your queries, I 
believe we agree on the definition of an interactive transaction.

Two protocols were loosely outlined. The first, using timestamps for optimistic 
concurrency control, would indeed involve the possibility of aborts. It would 
not however inherently adopt the issue of LWTs where no transaction is able to 
make progress. Whether or not progress is guaranteed (in a livelock-free sense) 
would depend on the structure of the transactions that were interfering.

This approach has the advantage of being very simple to implement, so that we 
could realistically support interactive transactions quite quickly. It has the 
additional advantage that transactions would execute very quickly by avoiding 
the WAN during construction, and as a result may in practice experience fewer 
aborts than protocols that guarantee livelock-freedom.

The second protocol proposed using read/write intents and would be able to 
support almost any behaviour you want. We could even utilise pessimistic 
concurrency control, or anything in-between. This is its own huge design space, 
and discussion of this approach and the trade-offs that could be made is (in my 
opinion) entirely out of scope for this CEP.


From: Jonathan Ellis 
Date: Friday, 1 October 2021 at 05:00
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
The obstacle for me is you've provided a protocol but not a fully fleshed
out architecture, so it's hard to fill in some of the blanks.  But it looks
to me like optimistic concurrency control for interactive transactions
applied to Accord would leave you in a LWT-like situation under fairly
light contention where nobody actually makes progress due to retries.

To make sure we're talking about the same thing, as Henrik pointed out,
interactive transactions mean multiple round trips from the client within a
transaction.  For example, here
<https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213>
is a simple implementation of the TPC-C New Order transaction.  The high
level logic (via
<https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm>)
is,

   1. Get records describing a warehouse, customer, & district
   2. Update the district
   3. Increment next available order number
   4. Insert record into Order and New-Order tables
   5. For 5-15 items, get Item record, get/update Stock record
   6. Insert Order-Line Record

As you can see, this requires a lot of client-side logic mixed in with the
actual SQL commands.


On Thu, Sep 30, 2021 at 2:30 AM [email protected] 
wrote:

> Essentially this, although I think in practice we will need to track each
> partition’s timestamp separately (or optionally for reduced conflicts, each
> row or datum’s), and make them all part of the conditional application of
> the transaction - at least for strict-serializability.
>
> The alternative is to insert read/write intents for the transaction during
> each step, and to confirm they are still valid on commit, but this approach
> would require a WAN round-trip for each step in the interactive
> transaction, whereas the timestamp-validating approach can use a LAN
> round-trip for each step besides the final one, and is also much simpler to
> implement.
>
>
> From: Blake Eggleston 
> Date: Thursday, 30 September 2021 at 05:47
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> You could establish a lower timestamp bound and buffer transaction state
> on the coordinator, then make the commit an operation that only applies if
> all partitions involved haven’t been changed by a more recent timestamp.
> You could also implement mvcc either in the storage layer or for some
> period of time by buffering commits on each replica before applying.
>
> > On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
> >
> > How are interactive transactions possible with Accord?
> >
> >
> >
> > On T

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-30 Thread Jonathan Ellis
 The obstacle for me is you've provided a protocol but not a fully fleshed
out architecture, so it's hard to fill in some of the blanks.  But it looks
to me like optimistic concurrency control for interactive transactions
applied to Accord would leave you in a LWT-like situation under fairly
light contention where nobody actually makes progress due to retries.

To make sure we're talking about the same thing, as Henrik pointed out,
interactive transactions mean multiple round trips from the client within a
transaction.  For example, here
<https://github.com/apavlo/py-tpcc/blob/master/pytpcc/drivers/sqlitedriver.py#L213>
is a simple implementation of the TPC-C New Order transaction.  The high
level logic (via
<https://courses.cs.washington.edu/courses/csep545/01wi/lectures/class1/tsld039.htm>)
is,

   1. Get records describing a warehouse, customer, & district
   2. Update the district
   3. Increment next available order number
   4. Insert record into Order and New-Order tables
   5. For 5-15 items, get Item record, get/update Stock record
   6. Insert Order-Line Record

As you can see, this requires a lot of client-side logic mixed in with the
actual SQL commands.


On Thu, Sep 30, 2021 at 2:30 AM [email protected] 
wrote:

> Essentially this, although I think in practice we will need to track each
> partition’s timestamp separately (or optionally for reduced conflicts, each
> row or datum’s), and make them all part of the conditional application of
> the transaction - at least for strict-serializability.
>
> The alternative is to insert read/write intents for the transaction during
> each step, and to confirm they are still valid on commit, but this approach
> would require a WAN round-trip for each step in the interactive
> transaction, whereas the timestamp-validating approach can use a LAN
> round-trip for each step besides the final one, and is also much simpler to
> implement.
>
>
> From: Blake Eggleston 
> Date: Thursday, 30 September 2021 at 05:47
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> You could establish a lower timestamp bound and buffer transaction state
> on the coordinator, then make the commit an operation that only applies if
> all partitions involved haven’t been changed by a more recent timestamp.
> You could also implement mvcc either in the storage layer or for some
> period of time by buffering commits on each replica before applying.
>
> > On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
> >
> > How are interactive transactions possible with Accord?
> >
> >
> >
> > On Tue, Sep 21, 2021 at 11:56 PM [email protected] <
> [email protected]>
> > wrote:
> >
> >> Could you explain why you believe this trade-off is necessary? We can
> >> support full SQL just fine with Accord, and I hope that we eventually
> do so.
> >>
> >> This domain is incredibly complex, so it is easy to reach wrong
> >> conclusions. I would invite you again to propose a system for discussion
> >> that you think offers something Accord is unable to, and that you
> consider
> >> desirable, and we can work from there.
> >>
> >> To pre-empt some possible discussions, I am not aware of anything we
> >> cannot do with Accord that we could do with either Calvin or Spanner.
> >> Interactive transactions are possible on top of Accord, as are
> transactions
> >> with an unknown read/write set. In each case the only cost is that they
> >> would use optimistic concurrency control, which is no worse the spanner
> >> derivatives anyway (which I have to assume is your benchmark in this
> >> regard). I do not expect to deliver either functionality initially, but
> >> Accord takes us most of the way there for both.
> >>
> >>
> >> From: Jonathan Ellis 
> >> Date: Wednesday, 22 September 2021 at 05:36
> >> To: dev 
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> Right, I'm looking for exactly a discussion on the high level goals.
> >> Instead of saying "here's the goals and we ruled out X because Y" we
> should
> >> start with a discussion around, "Approach A allows X and W, approach B
> >> allows Y and Z" and decide together what the goals should be and and
> what
> >> we are willing to trade to get those goals, e.g., are we willing to
> give up
> >> global strict serializability to get the ability to support full SQL.
> Both
> >> of these are nice to have!
> >>
> >> On Tue, Sep 21, 2021 at 9:52 PM [email protected] <
> [email protected]>
> >>

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-30 Thread [email protected]
Essentially this, although I think in practice we will need to track each 
partition’s timestamp separately (or optionally for reduced conflicts, each row 
or datum’s), and make them all part of the conditional application of the 
transaction - at least for strict-serializability.

The alternative is to insert read/write intents for the transaction during each 
step, and to confirm they are still valid on commit, but this approach would 
require a WAN round-trip for each step in the interactive transaction, whereas 
the timestamp-validating approach can use a LAN round-trip for each step 
besides the final one, and is also much simpler to implement.


From: Blake Eggleston 
Date: Thursday, 30 September 2021 at 05:47
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
You could establish a lower timestamp bound and buffer transaction state on the 
coordinator, then make the commit an operation that only applies if all 
partitions involved haven’t been changed by a more recent timestamp. You could 
also implement mvcc either in the storage layer or for some period of time by 
buffering commits on each replica before applying.

> On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
>
> How are interactive transactions possible with Accord?
>
>
>
> On Tue, Sep 21, 2021 at 11:56 PM [email protected] 
> wrote:
>
>> Could you explain why you believe this trade-off is necessary? We can
>> support full SQL just fine with Accord, and I hope that we eventually do so.
>>
>> This domain is incredibly complex, so it is easy to reach wrong
>> conclusions. I would invite you again to propose a system for discussion
>> that you think offers something Accord is unable to, and that you consider
>> desirable, and we can work from there.
>>
>> To pre-empt some possible discussions, I am not aware of anything we
>> cannot do with Accord that we could do with either Calvin or Spanner.
>> Interactive transactions are possible on top of Accord, as are transactions
>> with an unknown read/write set. In each case the only cost is that they
>> would use optimistic concurrency control, which is no worse the spanner
>> derivatives anyway (which I have to assume is your benchmark in this
>> regard). I do not expect to deliver either functionality initially, but
>> Accord takes us most of the way there for both.
>>
>>
>> From: Jonathan Ellis 
>> Date: Wednesday, 22 September 2021 at 05:36
>> To: dev 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Right, I'm looking for exactly a discussion on the high level goals.
>> Instead of saying "here's the goals and we ruled out X because Y" we should
>> start with a discussion around, "Approach A allows X and W, approach B
>> allows Y and Z" and decide together what the goals should be and and what
>> we are willing to trade to get those goals, e.g., are we willing to give up
>> global strict serializability to get the ability to support full SQL.  Both
>> of these are nice to have!
>>
>> On Tue, Sep 21, 2021 at 9:52 PM [email protected] 
>> wrote:
>>
>>> Hi Jonathan,
>>>
>>> These other systems are incompatible with the goals of the CEP. I do
>>> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
>>> summarise that discussion below. A true and accurate comparison of these
>>> other systems is essentially intractable, as there are complex subtleties
>>> to each flavour, and those who are interested would be better served by
>>> performing their own research.
>>>
>>> I think it is more productive to focus on what we want to achieve as a
>>> community. If you believe the goals of this CEP are wrong for the
>> project,
>>> let’s focus on that. If you want to compare and contrast specific facets
>> of
>>> alternative systems that you consider to be preferable in some dimension,
>>> let’s do that here or in a Q&A as proposed by Joey.
>>>
>>> The relevant goals are that we:
>>>
>>>
>>>  1.  Guarantee strict serializable isolation on commodity hardware
>>>  2.  Scale to any cluster size
>>>  3.  Achieve optimal latency
>>>
>>> The approach taken by Spanner derivatives is rejected by (1) because they
>>> guarantee only Serializable isolation (they additionally fail (3)). From
>>> watching talks by YugaByte, and inferring from Cockroach’s
>>> panic-cluster-death under clock skew, this is clearly considered by
>>> everyone to be undesirable but necessary to achieve scalability.
>>>
>>> The approach taken

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-29 Thread Blake Eggleston
You could establish a lower timestamp bound and buffer transaction state on the 
coordinator, then make the commit an operation that only applies if all 
partitions involved haven’t been changed by a more recent timestamp. You could 
also implement mvcc either in the storage layer or for some period of time by 
buffering commits on each replica before applying.

> On Sep 29, 2021, at 6:18 PM, Jonathan Ellis  wrote:
> 
> How are interactive transactions possible with Accord?
> 
> 
> 
> On Tue, Sep 21, 2021 at 11:56 PM [email protected] 
> wrote:
> 
>> Could you explain why you believe this trade-off is necessary? We can
>> support full SQL just fine with Accord, and I hope that we eventually do so.
>> 
>> This domain is incredibly complex, so it is easy to reach wrong
>> conclusions. I would invite you again to propose a system for discussion
>> that you think offers something Accord is unable to, and that you consider
>> desirable, and we can work from there.
>> 
>> To pre-empt some possible discussions, I am not aware of anything we
>> cannot do with Accord that we could do with either Calvin or Spanner.
>> Interactive transactions are possible on top of Accord, as are transactions
>> with an unknown read/write set. In each case the only cost is that they
>> would use optimistic concurrency control, which is no worse the spanner
>> derivatives anyway (which I have to assume is your benchmark in this
>> regard). I do not expect to deliver either functionality initially, but
>> Accord takes us most of the way there for both.
>> 
>> 
>> From: Jonathan Ellis 
>> Date: Wednesday, 22 September 2021 at 05:36
>> To: dev 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Right, I'm looking for exactly a discussion on the high level goals.
>> Instead of saying "here's the goals and we ruled out X because Y" we should
>> start with a discussion around, "Approach A allows X and W, approach B
>> allows Y and Z" and decide together what the goals should be and and what
>> we are willing to trade to get those goals, e.g., are we willing to give up
>> global strict serializability to get the ability to support full SQL.  Both
>> of these are nice to have!
>> 
>> On Tue, Sep 21, 2021 at 9:52 PM [email protected] 
>> wrote:
>> 
>>> Hi Jonathan,
>>> 
>>> These other systems are incompatible with the goals of the CEP. I do
>>> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
>>> summarise that discussion below. A true and accurate comparison of these
>>> other systems is essentially intractable, as there are complex subtleties
>>> to each flavour, and those who are interested would be better served by
>>> performing their own research.
>>> 
>>> I think it is more productive to focus on what we want to achieve as a
>>> community. If you believe the goals of this CEP are wrong for the
>> project,
>>> let’s focus on that. If you want to compare and contrast specific facets
>> of
>>> alternative systems that you consider to be preferable in some dimension,
>>> let’s do that here or in a Q&A as proposed by Joey.
>>> 
>>> The relevant goals are that we:
>>> 
>>> 
>>>  1.  Guarantee strict serializable isolation on commodity hardware
>>>  2.  Scale to any cluster size
>>>  3.  Achieve optimal latency
>>> 
>>> The approach taken by Spanner derivatives is rejected by (1) because they
>>> guarantee only Serializable isolation (they additionally fail (3)). From
>>> watching talks by YugaByte, and inferring from Cockroach’s
>>> panic-cluster-death under clock skew, this is clearly considered by
>>> everyone to be undesirable but necessary to achieve scalability.
>>> 
>>> The approach taken by FaunaDB (Calvin) is rejected by (2) because its
>>> sequencing layer requires a global leader process for the cluster, which
>> is
>>> incompatible with Cassandra’s scalability requirements. It additionally
>>> fails (3) for global clients.
>>> 
>>> Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a
>>> Spanner clone for its multi-key transaction functionality, not 2PC.
>>> 
>>> Systems such as RAMP with even weaker isolation are not considered for
>> the
>>> simple reason that they do not even claim to meet (1).
>>> 
>>> If we want to additionally offer weaker isolation levels than
>>> Serializable, such as that provided by the recent RAM

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-29 Thread Jonathan Ellis
How are interactive transactions possible with Accord?



On Tue, Sep 21, 2021 at 11:56 PM [email protected] 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>
> This domain is incredibly complex, so it is easy to reach wrong
> conclusions. I would invite you again to propose a system for discussion
> that you think offers something Accord is unable to, and that you consider
> desirable, and we can work from there.
>
> To pre-empt some possible discussions, I am not aware of anything we
> cannot do with Accord that we could do with either Calvin or Spanner.
> Interactive transactions are possible on top of Accord, as are transactions
> with an unknown read/write set. In each case the only cost is that they
> would use optimistic concurrency control, which is no worse the spanner
> derivatives anyway (which I have to assume is your benchmark in this
> regard). I do not expect to deliver either functionality initially, but
> Accord takes us most of the way there for both.
>
>
> From: Jonathan Ellis 
> Date: Wednesday, 22 September 2021 at 05:36
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Right, I'm looking for exactly a discussion on the high level goals.
> Instead of saying "here's the goals and we ruled out X because Y" we should
> start with a discussion around, "Approach A allows X and W, approach B
> allows Y and Z" and decide together what the goals should be and and what
> we are willing to trade to get those goals, e.g., are we willing to give up
> global strict serializability to get the ability to support full SQL.  Both
> of these are nice to have!
>
> On Tue, Sep 21, 2021 at 9:52 PM [email protected] 
> wrote:
>
> > Hi Jonathan,
> >
> > These other systems are incompatible with the goals of the CEP. I do
> > discuss them (besides 2PC) in both the whitepaper and the CEP, and will
> > summarise that discussion below. A true and accurate comparison of these
> > other systems is essentially intractable, as there are complex subtleties
> > to each flavour, and those who are interested would be better served by
> > performing their own research.
> >
> > I think it is more productive to focus on what we want to achieve as a
> > community. If you believe the goals of this CEP are wrong for the
> project,
> > let’s focus on that. If you want to compare and contrast specific facets
> of
> > alternative systems that you consider to be preferable in some dimension,
> > let’s do that here or in a Q&A as proposed by Joey.
> >
> > The relevant goals are that we:
> >
> >
> >   1.  Guarantee strict serializable isolation on commodity hardware
> >   2.  Scale to any cluster size
> >   3.  Achieve optimal latency
> >
> > The approach taken by Spanner derivatives is rejected by (1) because they
> > guarantee only Serializable isolation (they additionally fail (3)). From
> > watching talks by YugaByte, and inferring from Cockroach’s
> > panic-cluster-death under clock skew, this is clearly considered by
> > everyone to be undesirable but necessary to achieve scalability.
> >
> > The approach taken by FaunaDB (Calvin) is rejected by (2) because its
> > sequencing layer requires a global leader process for the cluster, which
> is
> > incompatible with Cassandra’s scalability requirements. It additionally
> > fails (3) for global clients.
> >
> > Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a
> > Spanner clone for its multi-key transaction functionality, not 2PC.
> >
> > Systems such as RAMP with even weaker isolation are not considered for
> the
> > simple reason that they do not even claim to meet (1).
> >
> > If we want to additionally offer weaker isolation levels than
> > Serializable, such as that provided by the recent RAMP-TAO paper,
> Cassandra
> > is likely able to support multiple distinct transaction layers that
> operate
> > independently. I would encourage you to file a CEP to explore how we can
> > meet these distinct use cases, but I consider them to be niche. I expect
> > that a majority of our user base desire strict serializable isolation,
> and
> > certainly no less than serializable isolation, to augment the existing
> > weaker isolation offered by quorum reads and writes.
> >
> > I would tangentially note that we are not an AP database under normal
> > recommended operation. A minority in any network partition cannot reach
> > QUORUM, so under recommended usage we are a high-avail

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-28 Thread Benjamin Lerer
Did I bite someone?  😂

Thanks for your patience with all the questions and comments Benedict. I
believe that everybody is pretty excited by this CEP. At least I am :-)

Le lun. 27 sept. 2021 à 22:59, [email protected]  a
écrit :

> Ok, it’s time for the weekly poking of the hornet’s nest.
>
> Any more thoughts, questions or criticisms, anyone?
>
> From: [email protected] 
> Date: Friday, 24 September 2021 at 22:41
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I’m not aware of anybody having taken any notes, but somebody please chime
> in if I’m wrong.
>
> From my recollection, re Accord:
>
>
>   *   Q: Will batches now support rollbacks?
>  *   Batches would apply atomically or not, but unlikely to have a
> concept of rollback. Timeouts remain unknown, but hope to have some
> mechanism to provide clients a definitive answer about such transactions
> after the fact.
>   *   Q: Can stale replicas participate in transactions?
>  *   Accord applies conflicting transactions in-order at every
> replica, so only nodes that are up-to-date may participate in the execution
> of a transaction, but any replica may participate in agreeing a
> transaction. To ensure replicas remain up-to-date I anticipate introducing
> a real-time repair facility at the transactional message level, with peers
> reconciling recently processed messages and cross-delivering any that are
> missing.
>   *   Possible UX directions in very vague terms: CQL atomic and
> conditional batches initially; going forwards interactive transactions?
> Complex user defined functions? SQL?
>   *   Discussed possibility of LOCAL_QUORUM reads for globally replicated
> transactional tables, as this is an important use case
>  *   Simple stale reads to transactional tables
>  *   Brainstormed a bit about serializable reads to a single DC
> without (normally) crossing WAN
>  *   Discussed possibility of multiple ACKs providing separate LAN and
> WAN persistence notifications to clients
>   *   Discussed size of fast path quorums in Accord, and how this might
> affect global latency in high RF clusters (i.e. not optimal, and in some
> cases may need every DC to participate) and how this can be modified by
> biasing fast path electorate so that 2 of the 3 DCs may reach fast-path
> decisions with each other (remaining DC having to reach both those DCs to
> reach fast path). Also discussed Calvin-like modes of operation that would
> offer optimal global latency for sufficiently small clusters at RF=3 or
> RF=5.
>
> I’m sure there were other discussions I can’t remember, perhaps others can
> fill in the blanks.
>
>
> From: Jonathan Ellis 
> Date: Friday, 24 September 2021 at 20:28
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Does anyone have notes for those of us who couldn't make the call?
>
> On Wed, Sep 22, 2021 at 1:35 PM [email protected] 
> wrote:
>
> > Hi everyone,
> >
> > Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> > 4pm BST to discuss Accord and other things in the community. There are no
> > plans to make any kind of project decisions. Everyone is welcome to drop
> in
> > to discuss Accord or whatever else might be on your mind.
> >
> > https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
> >
> >
> > From: [email protected] 
> > Date: Wednesday, 22 September 2021 at 16:22
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > No, I would expect to deliver strict serializable interactive
> transactions
> > using Accord. These would simply corroborate that the participating keys
> > had not modified their write timestamps during the final transaction.
> These
> > could even be undertaken with still only a single wide area round-trip,
> > using local copies of the data to assemble the transaction (though this
> > would marginally increase the chance of aborts)
> >
> > My goal for MVCC is parallelism, not additional isolation levels (though
> > snapshot isolation is useful and we’ll probably also want to offer that
> > eventually)
> >
> > From: Henrik Ingo 
> > Date: Wednesday, 22 September 2021 at 15:15
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > On Wed, Sep 22, 2021 at 7:56 AM [email protected]  >
> > wrote:
> >
> > > Could you explain why you believe this trade-off is necessary? We can
> > > support full SQL just fine with Accord, and I hope that we eventually
> do
> > so.
> > >
> >
&g

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-27 Thread [email protected]
Ok, it’s time for the weekly poking of the hornet’s nest.

Any more thoughts, questions or criticisms, anyone?

From: [email protected] 
Date: Friday, 24 September 2021 at 22:41
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I’m not aware of anybody having taken any notes, but somebody please chime in 
if I’m wrong.

>From my recollection, re Accord:


  *   Q: Will batches now support rollbacks?
 *   Batches would apply atomically or not, but unlikely to have a concept 
of rollback. Timeouts remain unknown, but hope to have some mechanism to 
provide clients a definitive answer about such transactions after the fact.
  *   Q: Can stale replicas participate in transactions?
 *   Accord applies conflicting transactions in-order at every replica, so 
only nodes that are up-to-date may participate in the execution of a 
transaction, but any replica may participate in agreeing a transaction. To 
ensure replicas remain up-to-date I anticipate introducing a real-time repair 
facility at the transactional message level, with peers reconciling recently 
processed messages and cross-delivering any that are missing.
  *   Possible UX directions in very vague terms: CQL atomic and conditional 
batches initially; going forwards interactive transactions? Complex user 
defined functions? SQL?
  *   Discussed possibility of LOCAL_QUORUM reads for globally replicated 
transactional tables, as this is an important use case
 *   Simple stale reads to transactional tables
 *   Brainstormed a bit about serializable reads to a single DC without 
(normally) crossing WAN
 *   Discussed possibility of multiple ACKs providing separate LAN and WAN 
persistence notifications to clients
  *   Discussed size of fast path quorums in Accord, and how this might affect 
global latency in high RF clusters (i.e. not optimal, and in some cases may 
need every DC to participate) and how this can be modified by biasing fast path 
electorate so that 2 of the 3 DCs may reach fast-path decisions with each other 
(remaining DC having to reach both those DCs to reach fast path). Also 
discussed Calvin-like modes of operation that would offer optimal global 
latency for sufficiently small clusters at RF=3 or RF=5.

I’m sure there were other discussions I can’t remember, perhaps others can fill 
in the blanks.


From: Jonathan Ellis 
Date: Friday, 24 September 2021 at 20:28
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Does anyone have notes for those of us who couldn't make the call?

On Wed, Sep 22, 2021 at 1:35 PM [email protected] 
wrote:

> Hi everyone,
>
> Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> 4pm BST to discuss Accord and other things in the community. There are no
> plans to make any kind of project decisions. Everyone is welcome to drop in
> to discuss Accord or whatever else might be on your mind.
>
> https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
>
>
> From: [email protected] 
> Date: Wednesday, 22 September 2021 at 16:22
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> No, I would expect to deliver strict serializable interactive transactions
> using Accord. These would simply corroborate that the participating keys
> had not modified their write timestamps during the final transaction. These
> could even be undertaken with still only a single wide area round-trip,
> using local copies of the data to assemble the transaction (though this
> would marginally increase the chance of aborts)
>
> My goal for MVCC is parallelism, not additional isolation levels (though
> snapshot isolation is useful and we’ll probably also want to offer that
> eventually)
>
> From: Henrik Ingo 
> Date: Wednesday, 22 September 2021 at 15:15
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
> wrote:
>
> > Could you explain why you believe this trade-off is necessary? We can
> > support full SQL just fine with Accord, and I hope that we eventually do
> so.
> >
>
> I assume this is really referring to interactive transactions = multiple
> round trips to the client within a transaction.
>
> You mentioned previously we could later build a more MVCC like transaction
> semantic on top of Accord. (Independent reads from a single snapshot,
> followed by a commit using Accord.) In this case I think the relevant
> discussion is whether Accord is still the optimal building block
> performance wise to do so, or whether users would then have lower
> consistency level but still pay the performance cost of a stricter
> consistency level.
>
> henrik
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-24 Thread [email protected]
I’m not aware of anybody having taken any notes, but somebody please chime in 
if I’m wrong.

>From my recollection, re Accord:


  *   Q: Will batches now support rollbacks?
 *   Batches would apply atomically or not, but unlikely to have a concept 
of rollback. Timeouts remain unknown, but hope to have some mechanism to 
provide clients a definitive answer about such transactions after the fact.
  *   Q: Can stale replicas participate in transactions?
 *   Accord applies conflicting transactions in-order at every replica, so 
only nodes that are up-to-date may participate in the execution of a 
transaction, but any replica may participate in agreeing a transaction. To 
ensure replicas remain up-to-date I anticipate introducing a real-time repair 
facility at the transactional message level, with peers reconciling recently 
processed messages and cross-delivering any that are missing.
  *   Possible UX directions in very vague terms: CQL atomic and conditional 
batches initially; going forwards interactive transactions? Complex user 
defined functions? SQL?
  *   Discussed possibility of LOCAL_QUORUM reads for globally replicated 
transactional tables, as this is an important use case
 *   Simple stale reads to transactional tables
 *   Brainstormed a bit about serializable reads to a single DC without 
(normally) crossing WAN
 *   Discussed possibility of multiple ACKs providing separate LAN and WAN 
persistence notifications to clients
  *   Discussed size of fast path quorums in Accord, and how this might affect 
global latency in high RF clusters (i.e. not optimal, and in some cases may 
need every DC to participate) and how this can be modified by biasing fast path 
electorate so that 2 of the 3 DCs may reach fast-path decisions with each other 
(remaining DC having to reach both those DCs to reach fast path). Also 
discussed Calvin-like modes of operation that would offer optimal global 
latency for sufficiently small clusters at RF=3 or RF=5.

I’m sure there were other discussions I can’t remember, perhaps others can fill 
in the blanks.


From: Jonathan Ellis 
Date: Friday, 24 September 2021 at 20:28
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Does anyone have notes for those of us who couldn't make the call?

On Wed, Sep 22, 2021 at 1:35 PM [email protected] 
wrote:

> Hi everyone,
>
> Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> 4pm BST to discuss Accord and other things in the community. There are no
> plans to make any kind of project decisions. Everyone is welcome to drop in
> to discuss Accord or whatever else might be on your mind.
>
> https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
>
>
> From: [email protected] 
> Date: Wednesday, 22 September 2021 at 16:22
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> No, I would expect to deliver strict serializable interactive transactions
> using Accord. These would simply corroborate that the participating keys
> had not modified their write timestamps during the final transaction. These
> could even be undertaken with still only a single wide area round-trip,
> using local copies of the data to assemble the transaction (though this
> would marginally increase the chance of aborts)
>
> My goal for MVCC is parallelism, not additional isolation levels (though
> snapshot isolation is useful and we’ll probably also want to offer that
> eventually)
>
> From: Henrik Ingo 
> Date: Wednesday, 22 September 2021 at 15:15
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
> wrote:
>
> > Could you explain why you believe this trade-off is necessary? We can
> > support full SQL just fine with Accord, and I hope that we eventually do
> so.
> >
>
> I assume this is really referring to interactive transactions = multiple
> round trips to the client within a transaction.
>
> You mentioned previously we could later build a more MVCC like transaction
> semantic on top of Accord. (Independent reads from a single snapshot,
> followed by a commit using Accord.) In this case I think the relevant
> discussion is whether Accord is still the optimal building block
> performance wise to do so, or whether users would then have lower
> consistency level but still pay the performance cost of a stricter
> consistency level.
>
> henrik
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSM

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-24 Thread Jonathan Ellis
Does anyone have notes for those of us who couldn't make the call?

On Wed, Sep 22, 2021 at 1:35 PM [email protected] 
wrote:

> Hi everyone,
>
> Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> 4pm BST to discuss Accord and other things in the community. There are no
> plans to make any kind of project decisions. Everyone is welcome to drop in
> to discuss Accord or whatever else might be on your mind.
>
> https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social
>
>
> From: [email protected] 
> Date: Wednesday, 22 September 2021 at 16:22
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> No, I would expect to deliver strict serializable interactive transactions
> using Accord. These would simply corroborate that the participating keys
> had not modified their write timestamps during the final transaction. These
> could even be undertaken with still only a single wide area round-trip,
> using local copies of the data to assemble the transaction (though this
> would marginally increase the chance of aborts)
>
> My goal for MVCC is parallelism, not additional isolation levels (though
> snapshot isolation is useful and we’ll probably also want to offer that
> eventually)
>
> From: Henrik Ingo 
> Date: Wednesday, 22 September 2021 at 15:15
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
> wrote:
>
> > Could you explain why you believe this trade-off is necessary? We can
> > support full SQL just fine with Accord, and I hope that we eventually do
> so.
> >
>
> I assume this is really referring to interactive transactions = multiple
> round trips to the client within a transaction.
>
> You mentioned previously we could later build a more MVCC like transaction
> semantic on top of Accord. (Independent reads from a single snapshot,
> followed by a commit using Accord.) In this case I think the relevant
> discussion is whether Accord is still the optimal building block
> performance wise to do so, or whether users would then have lower
> consistency level but still pay the performance cost of a stricter
> consistency level.
>
> henrik
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >
>   [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/
> >
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread [email protected]
Hi everyone,

Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST / 4pm BST 
to discuss Accord and other things in the community. There are no plans to make 
any kind of project decisions. Everyone is welcome to drop in to discuss Accord 
or whatever else might be on your mind.

https://gather.town/app/2UKSboSjqKXIXliE/ac2021-cass-social


From: [email protected] 
Date: Wednesday, 22 September 2021 at 16:22
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
No, I would expect to deliver strict serializable interactive transactions 
using Accord. These would simply corroborate that the participating keys had 
not modified their write timestamps during the final transaction. These could 
even be undertaken with still only a single wide area round-trip, using local 
copies of the data to assemble the transaction (though this would marginally 
increase the chance of aborts)

My goal for MVCC is parallelism, not additional isolation levels (though 
snapshot isolation is useful and we’ll probably also want to offer that 
eventually)

From: Henrik Ingo 
Date: Wednesday, 22 September 2021 at 15:15
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread [email protected]
No, I would expect to deliver strict serializable interactive transactions 
using Accord. These would simply corroborate that the participating keys had 
not modified their write timestamps during the final transaction. These could 
even be undertaken with still only a single wide area round-trip, using local 
copies of the data to assemble the transaction (though this would marginally 
increase the chance of aborts)

My goal for MVCC is parallelism, not additional isolation levels (though 
snapshot isolation is useful and we’ll probably also want to offer that 
eventually)

From: Henrik Ingo 
Date: Wednesday, 22 September 2021 at 15:15
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread Henrik Ingo
On Wed, Sep 22, 2021 at 7:56 AM [email protected] 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread Henrik Ingo
I feel like I should volunteer to write about MongoDB transactions.

TL;DR Snapshot Isolation and Causal Consistency using Raft'ish, Lamport
clock and 2PC. This leads to the age old discussion whether users really
want serializability or not.


On Wed, Sep 22, 2021 at 1:44 AM Jonathan Ellis  wrote:

> The whitepaper here is a good description of the consensus algorithm itself
> as well as its robustness and stability characteristics, and its comparison
> with other state-of-the-art consensus algorithms is very useful.  In the
> context of Cassandra, where a consensus algorithm is only part of what will
> be implemented, I'd like to see a more complete evaluation of the
> transactional side of things as well, including performance characteristics
> as well as the types of transactions that can be supported and at least a
> general idea of what it would look like applied to Cassandra. This will
> allow the PMC to make a more informed decision about what tradeoffs are
> best for the entire long-term project of first supplementing and ultimately
> replacing LWT.
>
> (Allowing users to mix LWT and AP Cassandra operations against the same
> rows was probably a mistake, so in contrast with LWT we’re not looking for
> something fast enough for occasional use but rather something within a
> reasonable factor of AP operations, appropriate to being the only way to
> interact with tables declared as such.)
>
> Besides Accord, this should cover
>
> - Calvin and FaunaDB
> - A Spanner derivative (no opinion on whether that should be Cockroach or
> Yugabyte, I don’t think it’s necessary to cover both)
> - A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
> there is more public information about MongoDB)
> - RAMP
>
>
=MongoDB=

References:
Presentation: https://www.youtube.com/watch?v=quFheFrLLGQ
Slides:
http://henrikingo.github.io/presentations/HighLoad%202019%20-%20Distributed%20transactions%20top%20to%20bottom/index.html#/step-1
Lamport implementation:
http://delivery.acm.org/10.1145/332/3314049/p636-tyulenev.pdf
Replication: http://www.vldb.org/pvldb/vol12/p2071-schultz.pdf and
https://www.usenix.org/system/files/nsdi21-zhou.pdf
TPC-C benchmark: http://www.vldb.org/pvldb/vol12/p2254-kamsky.pdf
(Nothing published on cross shard trx...)

Approach and Guarantees: Shards are independent replica sets, multi shard
transactions and queries handled through a coordinator aka query router.
Replica sets are Raft-like, so leader-based. When using 2PC, also the 2PC
coordinator is a replica set, so that the coordinator state is made durable
via majority commits. This means that a cross shard transaction actually
needs 4 majority commits, but it would be possible to reduce latency to
client ack to 2 commits (https://jira.mongodb.org/browse/SERVER-47130)
Because of this the trx-coordinator is also its own recovery manager and it
is assumed that the replica set will always be able to recover from
failures, usually quickly.

Cluster time is a Lamport clock, in practice the implementation is to
generate use unix timestamp+counter to generate monotonically increasing
integers. Time is passed along each message, and each recipient, updates
its own cluster time to the higher timestamp. All nodes, including clients
participate this. Causal Consistency is basically a client asking to read
at or later than its current timestamp. A replica will block if needed to
satisfy this request. The lamport clock is incremented by leaders to ensure
progress in the absence of write transactions.

The storage engine provides MVCC semantics. Extending this to the
replication system is straightforward, since replicas apply transactions
serially in the same order. For cross shard transactions it's the job of
the transaction coordinator to commit the transaction with the same cluster
time on all shards. If I remember correctly in the 2PC phase it will simply
choose the timestamp returned by each shard as the global transaction
timestamp. Combined, MongoDB transactions are snapshot isolation + causal
consistency.


Performance: 2PC is used only if a transaction actually has multiple
participating shards. It is possible though not fun or realistic to specify
partition boundaries so that related records from two collections will
always reside on the same shard. The 2PC protocol actually requires 4
majority commits, although as of MongoDB 5.0, client only waits for 3.
Majority commit is exactly what QUORUM is in Cassandra, so in a multi-DC
cluster, commit waits for replication latency. Notably, single shard
transactions parallelize well, because conflicting transactions can execute
on the leader, even when the majority commit isn't yet finished. (This
involves some speculative execution optimization.) I don't believe the same
is true for cross shard transactions using 2PC.

The paper by Asya Kamsky uses a single replica set and reports 60-70k TPM
for a non-standard TPC-C where varying client threads was allowed and
schema was modified to ta

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread [email protected]
FWIW I retract this – looking again at the blog post I don’t see adequate 
reason to infer they are using a leaderless approach. On balance I expect Fauna 
is still using a stable leader. Do you have reason to believe they are now 
leaderless?

From: [email protected] 
Date: Wednesday, 22 September 2021 at 04:19
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: [email protected] 
Date: Wednesday, 22 September 2021 at 03:52
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replac

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread [email protected]
Sure, that works for me.

From: Patrick McFadin 
Date: Wednesday, 22 September 2021 at 04:47
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I would be happy to host a Zoom as I've done in the past. I can post a
transcript and the recording after the call.

Instead of right after your talk Benedict, maybe we can set a time for next
week and let everyone know the time?

Patrick

On Mon, Sep 20, 2021 at 11:05 AM [email protected] 
wrote:

> Hi Joey,
>
> Thanks for the feedback and suggestions.
>
> > I was wondering what do you think about having some extended Q&A after
> your ApacheCon talk Wednesday
>
> I would love to do this. I’ll have to figure out how though – my
> understanding is that I have a hard 40m for my talk and any Q&A, and I
> expect the talk to occupy most of those 40m as I try to cover both the
> CEP-14 and CEP-15. I’m not sure what facilities are made available by
> Hopin, but if necessary we can perhaps post some external video chat link?
>
> The time of day is also a question, as I think the last talk ends at
> 9:20pm local time. But we can make that work if necessary.
>
> > It might help to have a diagram (perhaps I can collaborate with you
> on this?)
>
> I absolutely agree. This is something I had planned to produce but it’s
> been a question of time. In part I wanted to ensure we published long in
> advance of ApacheCon, but now also with CEP-10, CEP-14 and CEP-15 in flight
> it’s hard to get back to improving the draft. If you’d be interested in
> collaborating on this that would be super appreciated, as this would
> certainly help the reader.
>
> >I think that WAN is always paid during the Consensus Protocol, and then
> in most cases execution can remain LAN except in 3+ datacenters where I
> think you'd have to include at least one replica in a neighboring
> datacenter…
>
> As designed the only WAN cost is consensus as Accord ensures every replica
> receives a complete copy of every transaction, and is aware of any gaps. If
> there are gaps there may be WAN delays as those are filled in. This might
> occur because of network outages, but is most likely to occur when
> transactions are being actively executed by multiple DCs at once – in which
> case there’ll be one further unidirectional WAN latency during execution
> while the earlier transaction disseminates its result to the later
> transaction(s). There are other similar scenario we can discuss, e.g. if a
> transaction takes the slow path and will execute after a transaction being
> executed in another DC, that remote transaction needs to receive this
> notification before executing.
>
> There might potentially be some interesting optimisations to make in
> future, where with many queued transactions a single DC may nominate itself
> to execute all outstanding queries and respond to the remote DCs that
> issued them so as to eliminate the WAN latency for disseminating the result
> of each transaction. But we’re getting way ahead of ourselves there 😊
>
> There’s also no LAN cost on write, at least for responding to the client.
> If there is a dependent transaction within the same DC then (as in the
> above case) there will be a LAN penalty for the second transaction to
> execute.
>
> > Relatedly I'm curious if there is any way that the client can
> acquire the timestamp used by the transaction before sending the data
> so we can make the operations idempotent and unrelated to the
> coordinator that was executing them as the storage nodes are
> vulnerable to disk and heap failure modes which makes them much more
> likely to enter grey failure (slow). Alternatively, perhaps it would
> make sense to introduce a set of optional dedicated C* nodes for
> reaching consensus that do not act as storage nodes so we don't have
> to worry about hanging coordinators (join_ring=false?)?
>
> So, in principle coordination can be performed by any node on the network
> including a client – though we’d need to issue the client a unique id this
> can be done cheaply on joining. This might be something to explore in
> future, though there are downsides to having more coordinators too (more
> likely to fail, and stall further transactions that depend on transactions
> it is coordinating).
>
> However, with respect to idempotency, I expect Accord not to perpetuate
> the problems of LWTs where the result of an earlier query is unknown. At
> least success/fail will be maintained in a distributed fashion for some
> reasonable time horizon, and there will also be protection against zombie
> transactions (those proposed to a node that went into a failure spiral
> before reaching healthy nodes, that somehow regurgitates it hours or days
&g

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread [email protected]
Could you explain why you believe this trade-off is necessary? We can support 
full SQL just fine with Accord, and I hope that we eventually do so.

This domain is incredibly complex, so it is easy to reach wrong conclusions. I 
would invite you again to propose a system for discussion that you think offers 
something Accord is unable to, and that you consider desirable, and we can work 
from there.

To pre-empt some possible discussions, I am not aware of anything we cannot do 
with Accord that we could do with either Calvin or Spanner. Interactive 
transactions are possible on top of Accord, as are transactions with an unknown 
read/write set. In each case the only cost is that they would use optimistic 
concurrency control, which is no worse the spanner derivatives anyway (which I 
have to assume is your benchmark in this regard). I do not expect to deliver 
either functionality initially, but Accord takes us most of the way there for 
both.


From: Jonathan Ellis 
Date: Wednesday, 22 September 2021 at 05:36
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Right, I'm looking for exactly a discussion on the high level goals.
Instead of saying "here's the goals and we ruled out X because Y" we should
start with a discussion around, "Approach A allows X and W, approach B
allows Y and Z" and decide together what the goals should be and and what
we are willing to trade to get those goals, e.g., are we willing to give up
global strict serializability to get the ability to support full SQL.  Both
of these are nice to have!

On Tue, Sep 21, 2021 at 9:52 PM [email protected] 
wrote:

> Hi Jonathan,
>
> These other systems are incompatible with the goals of the CEP. I do
> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
> summarise that discussion below. A true and accurate comparison of these
> other systems is essentially intractable, as there are complex subtleties
> to each flavour, and those who are interested would be better served by
> performing their own research.
>
> I think it is more productive to focus on what we want to achieve as a
> community. If you believe the goals of this CEP are wrong for the project,
> let’s focus on that. If you want to compare and contrast specific facets of
> alternative systems that you consider to be preferable in some dimension,
> let’s do that here or in a Q&A as proposed by Joey.
>
> The relevant goals are that we:
>
>
>   1.  Guarantee strict serializable isolation on commodity hardware
>   2.  Scale to any cluster size
>   3.  Achieve optimal latency
>
> The approach taken by Spanner derivatives is rejected by (1) because they
> guarantee only Serializable isolation (they additionally fail (3)). From
> watching talks by YugaByte, and inferring from Cockroach’s
> panic-cluster-death under clock skew, this is clearly considered by
> everyone to be undesirable but necessary to achieve scalability.
>
> The approach taken by FaunaDB (Calvin) is rejected by (2) because its
> sequencing layer requires a global leader process for the cluster, which is
> incompatible with Cassandra’s scalability requirements. It additionally
> fails (3) for global clients.
>
> Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a
> Spanner clone for its multi-key transaction functionality, not 2PC.
>
> Systems such as RAMP with even weaker isolation are not considered for the
> simple reason that they do not even claim to meet (1).
>
> If we want to additionally offer weaker isolation levels than
> Serializable, such as that provided by the recent RAMP-TAO paper, Cassandra
> is likely able to support multiple distinct transaction layers that operate
> independently. I would encourage you to file a CEP to explore how we can
> meet these distinct use cases, but I consider them to be niche. I expect
> that a majority of our user base desire strict serializable isolation, and
> certainly no less than serializable isolation, to augment the existing
> weaker isolation offered by quorum reads and writes.
>
> I would tangentially note that we are not an AP database under normal
> recommended operation. A minority in any network partition cannot reach
> QUORUM, so under recommended usage we are a high-availability leaderless CP
> database.
>
>
> From: Jonathan Ellis 
> Date: Tuesday, 21 September 2021 at 23:45
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Benedict, thanks for taking the lead in putting this together. Since
> Cassandra is the only relevant database today designed around a leaderless
> architecture, it's quite likely that we'll be better served with a custom
> transaction design instead of trying to retrofit one from CP systems.
>
> The whitepaper here is a

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread Jonathan Ellis
Right, I'm looking for exactly a discussion on the high level goals.
Instead of saying "here's the goals and we ruled out X because Y" we should
start with a discussion around, "Approach A allows X and W, approach B
allows Y and Z" and decide together what the goals should be and and what
we are willing to trade to get those goals, e.g., are we willing to give up
global strict serializability to get the ability to support full SQL.  Both
of these are nice to have!

On Tue, Sep 21, 2021 at 9:52 PM [email protected] 
wrote:

> Hi Jonathan,
>
> These other systems are incompatible with the goals of the CEP. I do
> discuss them (besides 2PC) in both the whitepaper and the CEP, and will
> summarise that discussion below. A true and accurate comparison of these
> other systems is essentially intractable, as there are complex subtleties
> to each flavour, and those who are interested would be better served by
> performing their own research.
>
> I think it is more productive to focus on what we want to achieve as a
> community. If you believe the goals of this CEP are wrong for the project,
> let’s focus on that. If you want to compare and contrast specific facets of
> alternative systems that you consider to be preferable in some dimension,
> let’s do that here or in a Q&A as proposed by Joey.
>
> The relevant goals are that we:
>
>
>   1.  Guarantee strict serializable isolation on commodity hardware
>   2.  Scale to any cluster size
>   3.  Achieve optimal latency
>
> The approach taken by Spanner derivatives is rejected by (1) because they
> guarantee only Serializable isolation (they additionally fail (3)). From
> watching talks by YugaByte, and inferring from Cockroach’s
> panic-cluster-death under clock skew, this is clearly considered by
> everyone to be undesirable but necessary to achieve scalability.
>
> The approach taken by FaunaDB (Calvin) is rejected by (2) because its
> sequencing layer requires a global leader process for the cluster, which is
> incompatible with Cassandra’s scalability requirements. It additionally
> fails (3) for global clients.
>
> Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a
> Spanner clone for its multi-key transaction functionality, not 2PC.
>
> Systems such as RAMP with even weaker isolation are not considered for the
> simple reason that they do not even claim to meet (1).
>
> If we want to additionally offer weaker isolation levels than
> Serializable, such as that provided by the recent RAMP-TAO paper, Cassandra
> is likely able to support multiple distinct transaction layers that operate
> independently. I would encourage you to file a CEP to explore how we can
> meet these distinct use cases, but I consider them to be niche. I expect
> that a majority of our user base desire strict serializable isolation, and
> certainly no less than serializable isolation, to augment the existing
> weaker isolation offered by quorum reads and writes.
>
> I would tangentially note that we are not an AP database under normal
> recommended operation. A minority in any network partition cannot reach
> QUORUM, so under recommended usage we are a high-availability leaderless CP
> database.
>
>
> From: Jonathan Ellis 
> Date: Tuesday, 21 September 2021 at 23:45
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Benedict, thanks for taking the lead in putting this together. Since
> Cassandra is the only relevant database today designed around a leaderless
> architecture, it's quite likely that we'll be better served with a custom
> transaction design instead of trying to retrofit one from CP systems.
>
> The whitepaper here is a good description of the consensus algorithm itself
> as well as its robustness and stability characteristics, and its comparison
> with other state-of-the-art consensus algorithms is very useful.  In the
> context of Cassandra, where a consensus algorithm is only part of what will
> be implemented, I'd like to see a more complete evaluation of the
> transactional side of things as well, including performance characteristics
> as well as the types of transactions that can be supported and at least a
> general idea of what it would look like applied to Cassandra. This will
> allow the PMC to make a more informed decision about what tradeoffs are
> best for the entire long-term project of first supplementing and ultimately
> replacing LWT.
>
> (Allowing users to mix LWT and AP Cassandra operations against the same
> rows was probably a mistake, so in contrast with LWT we’re not looking for
> something fast enough for occasional use but rather something within a
> reasonable factor of AP operations, appropriate to being the only way to
&g

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread [email protected]
Oh, finally, to address your question about how Fauna achieves low-cost reads: 
they default to serializable isolation only. They no doubt ensure the 
transaction log is replicated in order, so that any read from the DC-local 
transaction log is serializable. Accord will similarly be able to offer cheap 
serializable reads, and additionally is able to offer strict serializable reads 
without performing any write during consensus (nod to Alex Miller for pointing 
out this advantage over Calvin)

From: [email protected] 
Date: Wednesday, 22 September 2021 at 04:19
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: [email protected] 
Date: Wednesday, 22 September 2021 at 03:52
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread Patrick McFadin
arbitrary client-provided id that can be utilised to deduplicate
> requests or query the status of a transaction is something we can explore
> later. This is something we should explore in a dedicated discussion as
> development of Accord progresses.
>
> > Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should
> line 2 read Qt instead of Et?
>
> So, technically as it reads today I think it’s correct. For Line 2 there
> is always some Qt \subseteq Et. I think the problem here is that actually
> there’s a bunch of valid things to do, including picking some arbitrary
> subset of each rho in Pt so long as it contains some Qt. It’s hard to
> convey the range of options precisely. Line 12 of course really wants to
> execute only when some Ft has responded, but if no such response is
> forthcoming it wants to execute on some Qt, but of course Ft \superseteq
> Qt. Perhaps I should try to state the set inequalities here. I will think
> about what I can do to improve the clarity, thanks.
>
> > It might make sense for participating members to wait for a minimum
> detected clock skew before becoming eligible for electorate?
>
> This is a great idea, thanks!
>
> > I don't really understand how temporarily down replicas will learn
> of mutations they missed .. are we just leveraging some
> external repair?
>
> Yes, precisely. Though in practice any transaction they need to know to
> answer a Read etc, they can query a peer for. But in practice I expect to
> deliver a real-time repair mechanism scoped (initially, at least) to Accord
> transactions to ensure this happens promptly.
>
> > Relatedly since non-transactional reads wouldn't flow through
> consensus (I hope) would it make sense for a restarting node to learn
> the latest accepted time once and then be deprioritized for all reads
> until it has accepted what it missed? Or is the idea that you would
> _always_ read transactionally (and since it's a read only transaction
> you can skip the WAN consensus and just go straight to fast path
> reads)?
>
> I expect that tables will be marked transactional, and that every
> operation that goes through them will be transactional. However I can
> imagine offering weaker read semantics, particularly if you’re looking to
> avoid paying the WAN price if you aren’t worried about consistency. I
> haven’t really considered how we might marry the two within a table, and
> I’m open to suggestions here. I expect that this dovetails with future
> improvements to transactional cluster metadata. I think also in part this
> kind of behaviour is limited today because repair is too unwieldy, and also
> because we don’t have an “on but catching up” state. If we improve repair
> for transactions the first part may be solved, and perhaps we can introduce
> a new node state as part of improving our approach to cluster management.
>
> I could imagine having some bounded divergence  in general, e.g. I haven’t
> corroborated my transaction history in Xms with a majority, or I haven’t
> received Xms of the transaction history I’ve witnessed, so I’m going to
> remove myself from the read set for non-transactional operations. But I
> don’t envisage this landing in V1.
>
> * I know the paper says that we elide details of how the shards (aka
> replica sets?) are chosen, but it seems that this system would have a
> hard dependency on a strongly consistent shard selection system (aka
> token metadata?) wouldn't it? In particular if the simple quorums
> (which I interpreted to be replica sets in current C*, not sure if
> that's correct) can change in non linearizable ways I don't think
> Property 3.3 can hold. I think you hint at a solution to this in
> section 5 but I'm not sure I grok it.
>
> Yes, it does. That’s something that’s in hand, and colleagues will be
> reaching out to the list about in the next couple of months. I anticipate
> this being a solved problem before Accord depends on it. There’s still a
> bunch of complexity within Accord for applying topology changes safely
> (which Section 5 nods to), but the membership decisions will be taken by
> Cassandra – safely.
>
>
> From: Joseph Lynch 
> Date: Monday, 20 September 2021 at 17:17
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Benedict,
>
> Thank you very much for advancing this proposal, I'm extremely excited
> to see flexible quorums used in this way and am looking forward to the
> integration of Accord into Cassandra! I read the whitepaper and have a
> few questions, but I was wondering what do you think about having some
> extended Q&A after your ApacheCon talk Wednesday (maybe at the end of
> the C* track)? It might

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread [email protected]
Demonstrating how subtle, complex and difficult to pin-down this topic is, 
Fauna’s recent blog post implies they may have migrated to a leaderless 
sequencing protocol (an earlier blog post made clear they used a leader 
process). However, Calvin still assumes a global sequencing shard, so this only 
modifies latency for clients, i.e. goal (3). Whether they have also removed 
Calvin’s single-shard linearization of transactions is unclear; there is no 
public information to suggest that they have met goal (1). With this the 
protocol would in essence begin to look a lot like Accord, and perhaps they are 
moving towards a similar approach.


From: [email protected] 
Date: Wednesday, 22 September 2021 at 03:52
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spa

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread [email protected]
Hi Jonathan,

These other systems are incompatible with the goals of the CEP. I do discuss 
them (besides 2PC) in both the whitepaper and the CEP, and will summarise that 
discussion below. A true and accurate comparison of these other systems is 
essentially intractable, as there are complex subtleties to each flavour, and 
those who are interested would be better served by performing their own 
research.

I think it is more productive to focus on what we want to achieve as a 
community. If you believe the goals of this CEP are wrong for the project, 
let’s focus on that. If you want to compare and contrast specific facets of 
alternative systems that you consider to be preferable in some dimension, let’s 
do that here or in a Q&A as proposed by Joey.

The relevant goals are that we:


  1.  Guarantee strict serializable isolation on commodity hardware
  2.  Scale to any cluster size
  3.  Achieve optimal latency

The approach taken by Spanner derivatives is rejected by (1) because they 
guarantee only Serializable isolation (they additionally fail (3)). From 
watching talks by YugaByte, and inferring from Cockroach’s panic-cluster-death 
under clock skew, this is clearly considered by everyone to be undesirable but 
necessary to achieve scalability.

The approach taken by FaunaDB (Calvin) is rejected by (2) because its 
sequencing layer requires a global leader process for the cluster, which is 
incompatible with Cassandra’s scalability requirements. It additionally fails 
(3) for global clients.

Two phase commit fails (3). As an aside, AFAICT DynamoDB is today a Spanner 
clone for its multi-key transaction functionality, not 2PC.

Systems such as RAMP with even weaker isolation are not considered for the 
simple reason that they do not even claim to meet (1).

If we want to additionally offer weaker isolation levels than Serializable, 
such as that provided by the recent RAMP-TAO paper, Cassandra is likely able to 
support multiple distinct transaction layers that operate independently. I 
would encourage you to file a CEP to explore how we can meet these distinct use 
cases, but I consider them to be niche. I expect that a majority of our user 
base desire strict serializable isolation, and certainly no less than 
serializable isolation, to augment the existing weaker isolation offered by 
quorum reads and writes.

I would tangentially note that we are not an AP database under normal 
recommended operation. A minority in any network partition cannot reach QUORUM, 
so under recommended usage we are a high-availability leaderless CP database.


From: Jonathan Ellis 
Date: Tuesday, 21 September 2021 at 23:45
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spanner derivative (no opinion on whether that should be Cockroach or
Yugabyte, I don’t think it’s necessary to cover both)
- A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
there is more public information about MongoDB)
- RAMP

Here’s an example of what I mean:

=Calvin=

Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
transactions, then replicas execute the transactions independently with no
further coordination.  No SPOF.  Transactions are batched by each sequencer
to keep this from becoming a bottleneck.

Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
with 7GB ram and 8 virtual cores).  Note that TPC-C New Order is composed
of four reads and four writes, so 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread Jonathan Ellis
Benedict, thanks for taking the lead in putting this together. Since
Cassandra is the only relevant database today designed around a leaderless
architecture, it's quite likely that we'll be better served with a custom
transaction design instead of trying to retrofit one from CP systems.

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spanner derivative (no opinion on whether that should be Cockroach or
Yugabyte, I don’t think it’s necessary to cover both)
- A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
there is more public information about MongoDB)
- RAMP

Here’s an example of what I mean:

=Calvin=

Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
transactions, then replicas execute the transactions independently with no
further coordination.  No SPOF.  Transactions are batched by each sequencer
to keep this from becoming a bottleneck.

Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
with 7GB ram and 8 virtual cores).  Note that TPC-C New Order is composed
of four reads and four writes, so this is effectively 2M reads and 2M
writes as we normally measure them in C*.

Calvin supports mixed read/write transactions, but because the transaction
execution logic requires knowing all partition keys in advance to ensure
that all replicas can reproduce the same results with no coordination,
reads against non-PK predicates must be done ahead of time (transparently,
by the server) to determine the set of keys, and this must be retried if
the set of rows affected is updated before the actual transaction executes.

Batching and global consensus adds latency -- 100ms in the Calvin paper and
apparently about 50ms in FaunaDB.  Glass half full: all transactions
(including multi-partition updates) are equally performant in Calvin since
the coordination is handled up front in the sequencing step.  Glass half
empty: even single-row reads and writes have to pay the full coordination
cost.  Fauna has optimized this away for reads but I am not aware of a
description of how they changed the design to allow this.

Functionality and limitations: since the entire transaction must be known
in advance to allow coordination-less execution at the replicas, Calvin
cannot support interactive transactions at all.  FaunaDB mitigates this by
allowing server-side logic to be included, but a Calvin approach will never
be able to offer SQL compatibility.

Guarantees: Calvin transactions are strictly serializable.  There is no
additional complexity or performance hit to generalizing to multiple
regions, apart from the speed of light.  And since Calvin is already paying
a batching latency penalty, this is less painful than for other systems.

Application to Cassandra: B-.  Distributed transactions are handled by the
sequencing and scheduling layers, which are leaderless, and Calvin’s
requirements for the storage layer are easily met by C*.  But Calvin also
requires a global consensus protocol and LWT is almost certainly not
sufficiently performant, so this would require ZK or etcd (reasonable for a
library approach but not for replacing LWT in C* itself), or an
implementation of Accord.  I don’t believe Calvin would require additional
table-level metadata in Cassandra.

On Sun, Sep 5, 2021 at 9:33 AM [email protected] 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread Henrik Ingo
On client generated timestamps...

On Mon, Sep 20, 2021 at 7:17 PM Joseph Lynch  wrote:

> * Relatedly I'm curious if there is any way that the client can
> acquire the timestamp used by the transaction before sending the data
> so we can make the operations idempotent and unrelated to the
> coordinator that was executing them as the storage nodes are
> vulnerable to disk and heap failure modes which makes them much more
> likely to enter grey failure (slow). Alternatively, perhaps it would
> make sense to introduce a set of optional dedicated C* nodes for
> reaching consensus that do not act as storage nodes so we don't have
> to worry about hanging coordinators (join_ring=false?)?
>


I've thought about this myself some time ago. The answer is yes, the client
could generate its own timestamps, provided that the client is also in sync
with the clock of the cluster. The coordinator that receives the
transaction from the client would simply need to enforce that the client
generated timestamp is within the margin that would be acceptable if the
coordinator itself had generated the timestamp. In addition, coordinator
must ensure that the transaction id is unique.

But... This still wouldn't give you idempotency in itself. This is because
if something failed with the transaction, you cannot resend the same
timestamp later, because it would now be outside the acceptable range of
timestamps. (Expired, if you will.) At best maybe the client could somehow
use the (timestamp, id) to query a node to verify whether such a
transaction was recently committed. I'm unsure whether that's convenient
for a user though.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread [email protected]
vey the range of 
options precisely. Line 12 of course really wants to execute only when some Ft 
has responded, but if no such response is forthcoming it wants to execute on 
some Qt, but of course Ft \superseteq Qt. Perhaps I should try to state the set 
inequalities here. I will think about what I can do to improve the clarity, 
thanks.

> It might make sense for participating members to wait for a minimum detected 
> clock skew before becoming eligible for electorate?

This is a great idea, thanks!

> I don't really understand how temporarily down replicas will learn
of mutations they missed .. are we just leveraging some
external repair?

Yes, precisely. Though in practice any transaction they need to know to answer 
a Read etc, they can query a peer for. But in practice I expect to deliver a 
real-time repair mechanism scoped (initially, at least) to Accord transactions 
to ensure this happens promptly.

> Relatedly since non-transactional reads wouldn't flow through
consensus (I hope) would it make sense for a restarting node to learn
the latest accepted time once and then be deprioritized for all reads
until it has accepted what it missed? Or is the idea that you would
_always_ read transactionally (and since it's a read only transaction
you can skip the WAN consensus and just go straight to fast path
reads)?

I expect that tables will be marked transactional, and that every operation 
that goes through them will be transactional. However I can imagine offering 
weaker read semantics, particularly if you’re looking to avoid paying the WAN 
price if you aren’t worried about consistency. I haven’t really considered how 
we might marry the two within a table, and I’m open to suggestions here. I 
expect that this dovetails with future improvements to transactional cluster 
metadata. I think also in part this kind of behaviour is limited today because 
repair is too unwieldy, and also because we don’t have an “on but catching up” 
state. If we improve repair for transactions the first part may be solved, and 
perhaps we can introduce a new node state as part of improving our approach to 
cluster management.

I could imagine having some bounded divergence  in general, e.g. I haven’t 
corroborated my transaction history in Xms with a majority, or I haven’t 
received Xms of the transaction history I’ve witnessed, so I’m going to remove 
myself from the read set for non-transactional operations. But I don’t envisage 
this landing in V1.

* I know the paper says that we elide details of how the shards (aka
replica sets?) are chosen, but it seems that this system would have a
hard dependency on a strongly consistent shard selection system (aka
token metadata?) wouldn't it? In particular if the simple quorums
(which I interpreted to be replica sets in current C*, not sure if
that's correct) can change in non linearizable ways I don't think
Property 3.3 can hold. I think you hint at a solution to this in
section 5 but I'm not sure I grok it.

Yes, it does. That’s something that’s in hand, and colleagues will be reaching 
out to the list about in the next couple of months. I anticipate this being a 
solved problem before Accord depends on it. There’s still a bunch of complexity 
within Accord for applying topology changes safely (which Section 5 nods to), 
but the membership decisions will be taken by Cassandra – safely.


From: Joseph Lynch 
Date: Monday, 20 September 2021 at 17:17
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict,

Thank you very much for advancing this proposal, I'm extremely excited
to see flexible quorums used in this way and am looking forward to the
integration of Accord into Cassandra! I read the whitepaper and have a
few questions, but I was wondering what do you think about having some
extended Q&A after your ApacheCon talk Wednesday (maybe at the end of
the C* track)? It might be higher bandwidth than going back and forth
on email/slack (also given you're presenting on it that might be a
good time to discuss it)?

Briefly
* It might help to have a diagram (perhaps I can collaborate with you
on this?) showing the happy path delay waiting in the reorder buffer
and the messages that are sent in a 2 and 3 datacenter deployment
during the PreAccept, Accept, Commit, Execute, Apply phases. In
particular it was hard for me to follow where exactly I was paying WAN
latency and where we could achieve progress with LAN only (I think
that WAN is always paid during the Consensus Protocol, and then in
most cases execution can remain LAN except in 3+ datacenters where I
think you'd have to include at least one replica in a neighboring
datacenter). In particular, it seems that Accord always pays clock
skew + WAN latency during the reorder buffer (as part of consensus) +
2x LAN latency during execution (to read and then write).
* Relatedly I'm curious if there is any way that the client 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread [email protected]
s simply all those conflicting 
transactions the replica has seen that were initially proposed a lower 
execution timestamp.

> Every replica? Or only those participating in the transaction?

Those participating in the transaction.

> When speaking about the simple majority of nodes to whom the max(t) value 
> returned will be proposed to - It sounds like this need not be the same 
> majority from whom the original sets of T_n and dependancies was obtained?
> Is there a proof to show that the dependancies created from the union of the 
> first set of replicas resolves to an acceptable dependancy graph for an 
> arbitrary majority of replicas? (Especially given that a majority of replicas 
> is not a majority of nodes, given we are in a cross-shard scenario here).

I think there’s some confusion about how dependencies work, as well as some 
issues in nomenclature that mean I can’t unfortunately parse all of your 
questions. I think it might be better to revisit after digesting these answers. 
One confusion that I infer might be arising is that these majorities are _per 
shard_ not _global_. All quorums are obtained per-shard, and a cross-shard 
operation must achieve simultaneous quorums in every shard.

The paper has a brief proof of correctness you can read that I think is 
adequately compelling. We have a detailed proof that needs to be cleaned up 
before being published as an appendix.

> What happens in cases where the replica set has changed due to (a) scaling RF 
> in a single DC (b) adding a whole new DC?

> Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
> Lamport clocks only impose partial, not total order. I’m guessing we’re 
> thinking of a different type of logical clock when we speak of Lamport clocks 
> here (but my expertise is sketchy on this topic).

The original paper is available for you to skim. With the addition of a 
per-process id component a total order is achieved.

> I would be interested in further exploration of the unhappy path (where 'a 
> newer ballot has been issued by a recovery coordinator to take over the 
> transaction’). I understand that this may be partially covered in the 
> pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
> been issued’ language with the ‘any R in responses had X as Applied, 
> Committed, or Accepted’ language.

Again, here I would refer you to the whitepaper. However the pseudocode does 
mostly cover it, but it helps if you are conversant with consensus protocols, 
and especially leaderless ones.

The “any R in responses has X as Applied, Committed or Accepted” refers to the 
boolean (Applied|Committed|Accepted)[X]=true that is set on a replica during 
the execution of the protocol, as specified in their pseudocode.

The reference to newer ballots is simply classic paxos leader election, so that 
only one coordinator may complete the transaction.



From: Miles Garnsey 
Date: Monday, 20 September 2021 at 09:34
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
If Accord can fulfil its aims it sounds like a huge improvement to the state of 
the art in distributed transaction processing. Congrats to all involved in 
pulling the proposal together.

I was holding off on feedback since this is quite in depth and I don’t want to 
bike shed, I still haven’t spent as much time understanding this as I’d like.

Regardless, I’ll make the following notes in case they’re helpful. My feedback 
is more to satisfy my own curiosity and stimulate discussion than to suggest 
that there are any flaws here. I applaud the proposed testing approach and 
think it is the only way to be certain that the proposed consistency guarantees 
will be upheld.

General

I’m curious if/how this proposal addresses issues we have seen when scaling; I 
see reference to simple majorities of nodes - is there any plan to ensure 
safety under scaling operations or DC (de)commissioning?

What consistency levels will be supported under Accord? Will it simply be a 
single CL representing a majority of nodes across the whole cluster? (This at 
least would mitigate the issues I’ve seen when folks want to switch from 
EACH_SERIAL to SERIAL).

Accord

> Accord instead assembles an inconsistent set of dependencies.


Further explanation here would be good. Do we mean to say that the dependancies 
may differ according to which transactions the coordinator has witnessed at the 
time the incoming transaction is first seen? This would make sense if some 
nodes had not fully committed a foregoing transaction.

Is it correct to think of this step as assembling a dependancy graph of 
foregoing transactions which must be completed ahead of progressing the 
incoming new transaction?

Fast Path

> A coordinator C proposes a timestamp t0 to at least a quorum of a fast path 
> electorate. If t0 is larger than all timestamps witnessed for a

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread Joseph Lynch
; pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
> been issued’ language with the ‘any R in responses had X as Applied, 
> Committed, or Accepted’ language.
>
> Well done again and thank you for pushing the envelope in this area Benedict.
>
> Miles
>
> > On 15 Sep 2021, at 11:33 pm, [email protected] wrote:
> >
> >> I would kind of expect this work, if it pans out, to _replace_ the current 
> >> paxos implementation
> >
> > That’s a good point. I think the clear direction of travel would be total 
> > replacement of Paxos, but I anticipate that this will be feature-flagged at 
> > least initially. So for some period of time we may maintain both options, 
> > with the advanced CQL functionality disabled if you opt for classic Paxos.
> >
> > I think this is a necessary corollary of a requirement to support live 
> > upgrades – something that is non-negotiable IMO, but that I have also 
> > neglected to discuss in the CEP. I will rectify this. An open question is 
> > if we want to support live downgrades back to Classic Paxos. I kind of 
> > expect that we will, though that will no doubt be informed by the 
> > difficulty of doing so.
> >
> > Either way, this means the deprecation cycle for Classic Paxos is probably 
> > a separate and future decision for the community. We could choose to 
> > maintain it indefinitely, but I would vote to retire it the following major 
> > version.
> >
> > A related open question is defaults – I would probably vote for new 
> > clusters to default to Accord, and existing clusters to need to run a 
> > migration command after fully upgrading the cluster.
> >
> > From: Sylvain Lebresne 
> > Date: Wednesday, 15 September 2021 at 14:13
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > Fwiw, it makes sense to me to talk about CQL syntax evolution separately.
> >
> > It's pretty clear to me that we _can_ extend CQL to make sure of a general
> > purpose transaction mechanism, so I don't think deciding if we want a
> > general purpose transaction mechanism has to depend on deciding on the
> > syntax. Especially since the syntax question can get pretty far on its own
> > and could be a serious upfront distraction.
> >
> > And as you said, there are even queries that can be expressed with the
> > current syntax that we refuse now and would be able to accept with this, so
> > those could be "ground zero" of what this work would allow.
> >
> > But outside of pure syntax questions, one thing that I don't see discussed
> > in the CEP (or did I miss it) is what the relationship of this new
> > mechanism with the existing paxos implementation would be? I would kind of
> > expect this work, if it pans out, to _replace_ the current paxos
> > implementation (because 1) why not and 2) the idea of having 2
> > serialization mechanisms that serialize separately sounds like a nightmare
> > from the user POV) but it isn't stated clearly. If replacement is indeed
> > the intent, then I think there needs to be a plan for the upgrade path. If
> > that's not the intent, then what?
> > --
> > Sylvain
> >
> >
> > On Wed, Sep 15, 2021 at 12:09 PM [email protected] 
> > wrote:
> >
> >> Ok, so the act of typing out an example was actually a really good
> >> reminder of just how limited our functionality is today, even for single
> >> partition operations.
> >>
> >> I don’t want to distract from any discussion around the underlying
> >> protocol, but we could kick off a separate conversation about how to evolve
> >> CQL sooner than later if there is the appetite. There are no concrete
> >> proposals to discuss, it would be brainstorming.
> >>
> >> Do people also generally agree this work warrants a distinct CEP, or would
> >> people prefer to see this developed under the same umbrella?
> >>
> >>
> >>
> >> From: [email protected] 
> >> Date: Wednesday, 15 September 2021 at 09:19
> >> To: [email protected] 
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >>> perhaps we can prepare these as examples
> >>
> >> There are grammatically correct CQL queries today that cannot be executed,
> >> that this work will naturally remove the restrictions on. I’m certainly
> >> happy to specify one of these for the CEP if it will help the reader.
> >>
> >> I want to exclude “new CQL com

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread Miles Garnsey
e).
What happens in cases where the replica set has changed due to (a) scaling RF 
in a single DC (b) adding a whole new DC?
Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
Lamport clocks only impose partial, not total order. I’m guessing we’re 
thinking of a different type of logical clock when we speak of Lamport clocks 
here (but my expertise is sketchy on this topic).

Recovery

I would be interested in further exploration of the unhappy path (where 'a 
newer ballot has been issued by a recovery coordinator to take over the 
transaction’). I understand that this may be partially covered in the 
pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
been issued’ language with the ‘any R in responses had X as Applied, Committed, 
or Accepted’ language.

Well done again and thank you for pushing the envelope in this area Benedict.

Miles

> On 15 Sep 2021, at 11:33 pm, [email protected] wrote:
> 
>> I would kind of expect this work, if it pans out, to _replace_ the current 
>> paxos implementation
> 
> That’s a good point. I think the clear direction of travel would be total 
> replacement of Paxos, but I anticipate that this will be feature-flagged at 
> least initially. So for some period of time we may maintain both options, 
> with the advanced CQL functionality disabled if you opt for classic Paxos.
> 
> I think this is a necessary corollary of a requirement to support live 
> upgrades – something that is non-negotiable IMO, but that I have also 
> neglected to discuss in the CEP. I will rectify this. An open question is if 
> we want to support live downgrades back to Classic Paxos. I kind of expect 
> that we will, though that will no doubt be informed by the difficulty of 
> doing so.
> 
> Either way, this means the deprecation cycle for Classic Paxos is probably a 
> separate and future decision for the community. We could choose to maintain 
> it indefinitely, but I would vote to retire it the following major version.
> 
> A related open question is defaults – I would probably vote for new clusters 
> to default to Accord, and existing clusters to need to run a migration 
> command after fully upgrading the cluster.
> 
> From: Sylvain Lebresne 
> Date: Wednesday, 15 September 2021 at 14:13
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Fwiw, it makes sense to me to talk about CQL syntax evolution separately.
> 
> It's pretty clear to me that we _can_ extend CQL to make sure of a general
> purpose transaction mechanism, so I don't think deciding if we want a
> general purpose transaction mechanism has to depend on deciding on the
> syntax. Especially since the syntax question can get pretty far on its own
> and could be a serious upfront distraction.
> 
> And as you said, there are even queries that can be expressed with the
> current syntax that we refuse now and would be able to accept with this, so
> those could be "ground zero" of what this work would allow.
> 
> But outside of pure syntax questions, one thing that I don't see discussed
> in the CEP (or did I miss it) is what the relationship of this new
> mechanism with the existing paxos implementation would be? I would kind of
> expect this work, if it pans out, to _replace_ the current paxos
> implementation (because 1) why not and 2) the idea of having 2
> serialization mechanisms that serialize separately sounds like a nightmare
> from the user POV) but it isn't stated clearly. If replacement is indeed
> the intent, then I think there needs to be a plan for the upgrade path. If
> that's not the intent, then what?
> --
> Sylvain
> 
> 
> On Wed, Sep 15, 2021 at 12:09 PM [email protected] 
> wrote:
> 
>> Ok, so the act of typing out an example was actually a really good
>> reminder of just how limited our functionality is today, even for single
>> partition operations.
>> 
>> I don’t want to distract from any discussion around the underlying
>> protocol, but we could kick off a separate conversation about how to evolve
>> CQL sooner than later if there is the appetite. There are no concrete
>> proposals to discuss, it would be brainstorming.
>> 
>> Do people also generally agree this work warrants a distinct CEP, or would
>> people prefer to see this developed under the same umbrella?
>> 
>> 
>> 
>> From: [email protected] 
>> Date: Wednesday, 15 September 2021 at 09:19
>> To: [email protected] 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>>> perhaps we can prepare these as examples
>> 
>> There are grammatically correct CQL queries today that cannot be executed,
>

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread [email protected]
> I would kind of expect this work, if it pans out, to _replace_ the current 
> paxos implementation

That’s a good point. I think the clear direction of travel would be total 
replacement of Paxos, but I anticipate that this will be feature-flagged at 
least initially. So for some period of time we may maintain both options, with 
the advanced CQL functionality disabled if you opt for classic Paxos.

I think this is a necessary corollary of a requirement to support live upgrades 
– something that is non-negotiable IMO, but that I have also neglected to 
discuss in the CEP. I will rectify this. An open question is if we want to 
support live downgrades back to Classic Paxos. I kind of expect that we will, 
though that will no doubt be informed by the difficulty of doing so.

Either way, this means the deprecation cycle for Classic Paxos is probably a 
separate and future decision for the community. We could choose to maintain it 
indefinitely, but I would vote to retire it the following major version.

A related open question is defaults – I would probably vote for new clusters to 
default to Accord, and existing clusters to need to run a migration command 
after fully upgrading the cluster.

From: Sylvain Lebresne 
Date: Wednesday, 15 September 2021 at 14:13
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Fwiw, it makes sense to me to talk about CQL syntax evolution separately.

It's pretty clear to me that we _can_ extend CQL to make sure of a general
purpose transaction mechanism, so I don't think deciding if we want a
general purpose transaction mechanism has to depend on deciding on the
syntax. Especially since the syntax question can get pretty far on its own
and could be a serious upfront distraction.

And as you said, there are even queries that can be expressed with the
current syntax that we refuse now and would be able to accept with this, so
those could be "ground zero" of what this work would allow.

But outside of pure syntax questions, one thing that I don't see discussed
in the CEP (or did I miss it) is what the relationship of this new
mechanism with the existing paxos implementation would be? I would kind of
expect this work, if it pans out, to _replace_ the current paxos
implementation (because 1) why not and 2) the idea of having 2
serialization mechanisms that serialize separately sounds like a nightmare
from the user POV) but it isn't stated clearly. If replacement is indeed
the intent, then I think there needs to be a plan for the upgrade path. If
that's not the intent, then what?
--
Sylvain


On Wed, Sep 15, 2021 at 12:09 PM [email protected] 
wrote:

> Ok, so the act of typing out an example was actually a really good
> reminder of just how limited our functionality is today, even for single
> partition operations.
>
> I don’t want to distract from any discussion around the underlying
> protocol, but we could kick off a separate conversation about how to evolve
> CQL sooner than later if there is the appetite. There are no concrete
> proposals to discuss, it would be brainstorming.
>
> Do people also generally agree this work warrants a distinct CEP, or would
> people prefer to see this developed under the same umbrella?
>
>
>
> From: [email protected] 
> Date: Wednesday, 15 September 2021 at 09:19
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > perhaps we can prepare these as examples
>
> There are grammatically correct CQL queries today that cannot be executed,
> that this work will naturally remove the restrictions on. I’m certainly
> happy to specify one of these for the CEP if it will help the reader.
>
> I want to exclude “new CQL commands” or any other enhancement to the
> grammar from the scope of the CEP, however. This work will enable a range
> of improvements to the UX, but I think this work is a separate, long-term
> project of evolution that deserves its own CEPs, and will likely involve
> input from a wider range of contributors and users. If nobody else starts
> such CEPs, I will do so in due course (much further down the line).
>
> Assuming there is not significant dissent on this point I will update the
> CEP to reflect this non-goal.
>
>
>
> From: C. Scott Andreas 
> Date: Wednesday, 15 September 2021 at 00:31
> To: [email protected] 
> Cc: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Adding a few notes from my perspective as well –
>
> Re: the UX question, thanks for asking this.
>
> I agree that offering a set of example queries and use cases may help make
> the specific use cases more understandable; perhaps we can prepare these as
> examples to be included in the CEP.
>
> I do think that all potential UX directions beg

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread Sylvain Lebresne
Fwiw, it makes sense to me to talk about CQL syntax evolution separately.

It's pretty clear to me that we _can_ extend CQL to make sure of a general
purpose transaction mechanism, so I don't think deciding if we want a
general purpose transaction mechanism has to depend on deciding on the
syntax. Especially since the syntax question can get pretty far on its own
and could be a serious upfront distraction.

And as you said, there are even queries that can be expressed with the
current syntax that we refuse now and would be able to accept with this, so
those could be "ground zero" of what this work would allow.

But outside of pure syntax questions, one thing that I don't see discussed
in the CEP (or did I miss it) is what the relationship of this new
mechanism with the existing paxos implementation would be? I would kind of
expect this work, if it pans out, to _replace_ the current paxos
implementation (because 1) why not and 2) the idea of having 2
serialization mechanisms that serialize separately sounds like a nightmare
from the user POV) but it isn't stated clearly. If replacement is indeed
the intent, then I think there needs to be a plan for the upgrade path. If
that's not the intent, then what?
--
Sylvain


On Wed, Sep 15, 2021 at 12:09 PM [email protected] 
wrote:

> Ok, so the act of typing out an example was actually a really good
> reminder of just how limited our functionality is today, even for single
> partition operations.
>
> I don’t want to distract from any discussion around the underlying
> protocol, but we could kick off a separate conversation about how to evolve
> CQL sooner than later if there is the appetite. There are no concrete
> proposals to discuss, it would be brainstorming.
>
> Do people also generally agree this work warrants a distinct CEP, or would
> people prefer to see this developed under the same umbrella?
>
>
>
> From: [email protected] 
> Date: Wednesday, 15 September 2021 at 09:19
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > perhaps we can prepare these as examples
>
> There are grammatically correct CQL queries today that cannot be executed,
> that this work will naturally remove the restrictions on. I’m certainly
> happy to specify one of these for the CEP if it will help the reader.
>
> I want to exclude “new CQL commands” or any other enhancement to the
> grammar from the scope of the CEP, however. This work will enable a range
> of improvements to the UX, but I think this work is a separate, long-term
> project of evolution that deserves its own CEPs, and will likely involve
> input from a wider range of contributors and users. If nobody else starts
> such CEPs, I will do so in due course (much further down the line).
>
> Assuming there is not significant dissent on this point I will update the
> CEP to reflect this non-goal.
>
>
>
> From: C. Scott Andreas 
> Date: Wednesday, 15 September 2021 at 00:31
> To: [email protected] 
> Cc: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Adding a few notes from my perspective as well –
>
> Re: the UX question, thanks for asking this.
>
> I agree that offering a set of example queries and use cases may help make
> the specific use cases more understandable; perhaps we can prepare these as
> examples to be included in the CEP.
>
> I do think that all potential UX directions begin with the specification
> of the protocol that will underly them, as what can be expressed by it may
> be a superset of what's immediately exposed by CQL. But at minimum it's
> great to have a sense of the queries one might be able to issue to focus a
> reading of the whitepaper.
>
> Re: "Can we not start using it as an external dependency, and later
> re-evaluate if it's necessary to bring it into the project or even incubate
> it as another Apache project"
>
> I think it would be valuable to the project for the work to be incubated
> in a separate repository as part of the Apache Cassandra project itself,
> much like the in-JVM dtest API and Harry. This pattern worked well for
> those projects as they incubated as it allowed them to evolve outside the
> primary codebase, but subject to the same project governance, set of PMC
> members, committers, and so on. Like those libraries, it also makes sense
> as the Cassandra project is the first (and at this time) only known
> intended consumer of the library, though there may be more in the future.
>
> If the proposal is accepted, the time horizon envisioned for this work's
> completion is ~9 months to a standard of production readiness. The
> contributors see value in the work being donated to and governed by the

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread [email protected]
Ok, so the act of typing out an example was actually a really good reminder of 
just how limited our functionality is today, even for single partition 
operations.

I don’t want to distract from any discussion around the underlying protocol, 
but we could kick off a separate conversation about how to evolve CQL sooner 
than later if there is the appetite. There are no concrete proposals to 
discuss, it would be brainstorming.

Do people also generally agree this work warrants a distinct CEP, or would 
people prefer to see this developed under the same umbrella?



From: [email protected] 
Date: Wednesday, 15 September 2021 at 09:19
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> perhaps we can prepare these as examples

There are grammatically correct CQL queries today that cannot be executed, that 
this work will naturally remove the restrictions on. I’m certainly happy to 
specify one of these for the CEP if it will help the reader.

I want to exclude “new CQL commands” or any other enhancement to the grammar 
from the scope of the CEP, however. This work will enable a range of 
improvements to the UX, but I think this work is a separate, long-term project 
of evolution that deserves its own CEPs, and will likely involve input from a 
wider range of contributors and users. If nobody else starts such CEPs, I will 
do so in due course (much further down the line).

Assuming there is not significant dissent on this point I will update the CEP 
to reflect this non-goal.



From: C. Scott Andreas 
Date: Wednesday, 15 September 2021 at 00:31
To: [email protected] 
Cc: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Adding a few notes from my perspective as well –

Re: the UX question, thanks for asking this.

I agree that offering a set of example queries and use cases may help make the 
specific use cases more understandable; perhaps we can prepare these as 
examples to be included in the CEP.

I do think that all potential UX directions begin with the specification of the 
protocol that will underly them, as what can be expressed by it may be a 
superset of what's immediately exposed by CQL. But at minimum it's great to 
have a sense of the queries one might be able to issue to focus a reading of 
the whitepaper.

Re: "Can we not start using it as an external dependency, and later re-evaluate 
if it's necessary to bring it into the project or even incubate it as another 
Apache project"

I think it would be valuable to the project for the work to be incubated in a 
separate repository as part of the Apache Cassandra project itself, much like 
the in-JVM dtest API and Harry. This pattern worked well for those projects as 
they incubated as it allowed them to evolve outside the primary codebase, but 
subject to the same project governance, set of PMC members, committers, and so 
on. Like those libraries, it also makes sense as the Cassandra project is the 
first (and at this time) only known intended consumer of the library, though 
there may be more in the future.

If the proposal is accepted, the time horizon envisioned for this work's 
completion is ~9 months to a standard of production readiness. The contributors 
see value in the work being donated to and governed by the contribution 
practices of the Foundation. Doing so ensures that it is being developed openly 
and with full opportunity for review and contribution of others, while also 
solidifying contribution of the IP to the project.

Spinning up a separate ASF incubation project is an interesting idea, but I 
feel that doing so would introduce a far greater overhead in process and 
governance, and that the most suitable governance and set of committers/PMC 
members are those of the Apache Cassandra project itself.

On Sep 14, 2021, at 3:53 PM, "[email protected]"  wrote:


Hi Paulo,

First and foremost, I believe this proposal in its current form focuses on the 
protocol details (HOW?) but lacks the bigger picture on how this is going to be 
exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularit

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-15 Thread [email protected]
> perhaps we can prepare these as examples

There are grammatically correct CQL queries today that cannot be executed, that 
this work will naturally remove the restrictions on. I’m certainly happy to 
specify one of these for the CEP if it will help the reader.

I want to exclude “new CQL commands” or any other enhancement to the grammar 
from the scope of the CEP, however. This work will enable a range of 
improvements to the UX, but I think this work is a separate, long-term project 
of evolution that deserves its own CEPs, and will likely involve input from a 
wider range of contributors and users. If nobody else starts such CEPs, I will 
do so in due course (much further down the line).

Assuming there is not significant dissent on this point I will update the CEP 
to reflect this non-goal.



From: C. Scott Andreas 
Date: Wednesday, 15 September 2021 at 00:31
To: [email protected] 
Cc: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Adding a few notes from my perspective as well –

Re: the UX question, thanks for asking this.

I agree that offering a set of example queries and use cases may help make the 
specific use cases more understandable; perhaps we can prepare these as 
examples to be included in the CEP.

I do think that all potential UX directions begin with the specification of the 
protocol that will underly them, as what can be expressed by it may be a 
superset of what's immediately exposed by CQL. But at minimum it's great to 
have a sense of the queries one might be able to issue to focus a reading of 
the whitepaper.

Re: "Can we not start using it as an external dependency, and later re-evaluate 
if it's necessary to bring it into the project or even incubate it as another 
Apache project"

I think it would be valuable to the project for the work to be incubated in a 
separate repository as part of the Apache Cassandra project itself, much like 
the in-JVM dtest API and Harry. This pattern worked well for those projects as 
they incubated as it allowed them to evolve outside the primary codebase, but 
subject to the same project governance, set of PMC members, committers, and so 
on. Like those libraries, it also makes sense as the Cassandra project is the 
first (and at this time) only known intended consumer of the library, though 
there may be more in the future.

If the proposal is accepted, the time horizon envisioned for this work's 
completion is ~9 months to a standard of production readiness. The contributors 
see value in the work being donated to and governed by the contribution 
practices of the Foundation. Doing so ensures that it is being developed openly 
and with full opportunity for review and contribution of others, while also 
solidifying contribution of the IP to the project.

Spinning up a separate ASF incubation project is an interesting idea, but I 
feel that doing so would introduce a far greater overhead in process and 
governance, and that the most suitable governance and set of committers/PMC 
members are those of the Apache Cassandra project itself.

On Sep 14, 2021, at 3:53 PM, "[email protected]"  wrote:


Hi Paulo,

First and foremost, I believe this proposal in its current form focuses on the 
protocol details (HOW?) but lacks the bigger picture on how this is going to be 
exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularity reasons. The reality is that this 
option should anyway be considered unavailable. This is a proposed contribution 
to the Cassandra project, which we can either accept or reject.

Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points

It has recently been demonstrated to be possible to build a system that can 
safely switch between different consensus protocols. However, this was very 
sophisticated work that would require its own CEP, one that we would be unable 
to resource. Even if we could this would be insufficient. This goal has never 
been achieved for a multi-shard transaction protocol to my knowledge, and 
multi-shard transaction protocols are much more di

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread C. Scott Andreas

Adding a few notes from my perspective as well – Re: the UX question, thanks for asking this.I agree that offering a set of example queries and use cases may help make the specific use cases more understandable; perhaps we can prepare these as examples to be included in the CEP.I do think that all potential UX directions begin with the specification of the protocol that will underly them, as what can be expressed by it may be a superset of what's immediately exposed by CQL. But at minimum it's great to have a sense of the queries one might be able to issue to focus a reading of the whitepaper.Re: "Can we not start using it as an external dependency, and later re-evaluate if it's necessary to bring it into the project or even incubate it as another Apache 
project"I think it would be valuable to the project for the work to be incubated in a separate repository as part of the Apache Cassandra project itself, much like the in-JVM dtest API and Harry. This pattern worked well for those projects as they incubated as it allowed them to evolve outside the primary codebase, but subject to the same project governance, set of PMC members, committers, and so on. Like those libraries, it also makes sense as the Cassandra project is the first (and at this time) only known intended consumer of the library, though there may be more in the future.If the proposal is accepted, the time horizon envisioned for this work's completion is ~9 months to a standard of production readiness. The contributors see value in the work 
being donated to and governed by the contribution practices of the Foundation. Doing so ensures that it is being developed openly and with full opportunity for review and contribution of others, while also solidifying contribution of the IP to the project.Spinning up a separate ASF incubation project is an interesting idea, but I feel that doing so would introduce a far greater overhead in process and governance, and that the most suitable governance and set of committers/PMC members are those of the Apache Cassandra project itself.On Sep 14, 2021, at 3:53 PM, "[email protected]"  wrote:Hi Paulo,First and foremost, I believe this proposal in its current form focuses on the protocol details (HOW?) but lacks the bigger 
picture on how this is going to be exposed to the user (WHAT)?In my opinion this CEP embodies a coherent distinct and complex piece of work, that requires specialist expertise. You have after all just suggested a month to read only the existing proposal 😊UX is a whole other kind of discussion, that can be quite opinionated, and requires different expertise. It is in my opinion helpful to break out work that is not tightly coupled, as well as work that requires different expertise. As you point out, multi-key UX features are largely independent of any underlying implementation, likely can be done in parallel, and even with different contributors.Can we not start using it as an external dependencyI would love to understand your rationale, as this is a surprising 
suggestion to me. This is just like any other subsystem, but we would be managing it as a separate library primarily for modularity reasons. The reality is that this option should anyway be considered unavailable. This is a proposed contribution to the Cassandra project, which we can either accept or reject.Isn't this a good chance to make the serialization protocol pluggablewith clearly defined integration pointsIt has recently been demonstrated to be possible to build a system that can safely switch between different consensus protocols. However, this was very sophisticated work that would require its own CEP, one that we would be unable to resource. Even if we could this would be insufficient. This goal has never been achieved for a multi-shard transaction 
protocol to my knowledge, and multi-shard transaction protocols are much more divergent in implementation detail than consensus protocols.so we could easily switch implementations with different guarantees… (ie. Apache Ratis)As far as I know, there are no other strict serializable protocols available to plug in today. Apache Ratis appears to be a straightforward Raft implementation, and therefore it is a linearizable consensus protocol. It is not multi-shard transaction protocol at all, let alone strict serializable. It could be used in place of Paxos, but not Accord.From: Paulo Motta Date: Tuesday, 14 September 2021 at 22:55To: Cassandra DEV Subject: Re: [DISCUSS] CEP-15: General Purpose 
TransactionsI can start with some preliminary comments while I get more familiarizedwith the proposal:- First and foremost, I believe this proposal in its current form focuseson the protocol details (HOW?) but lacks the bigger picture on how this isgoing to be exposed to the user (WHAT)? Is exposing linearizabletransactions to the user not a goal of this proposal? If not, I think theproposal is missing the UX (ie. what CQL commands are going to be addedetc) on how these transactions are going

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread [email protected]
Hi Paulo,

> First and foremost, I believe this proposal in its current form focuses on 
> the protocol details (HOW?) but lacks the bigger picture on how this is going 
> to be exposed to the user (WHAT)?

In my opinion this CEP embodies a coherent distinct and complex piece of work, 
that requires specialist expertise. You have after all just suggested a month 
to read only the existing proposal 😊

UX is a whole other kind of discussion, that can be quite opinionated, and 
requires different expertise. It is in my opinion helpful to break out work 
that is not tightly coupled, as well as work that requires different expertise. 
As you point out, multi-key UX features are largely independent of any 
underlying implementation, likely can be done in parallel, and even with 
different contributors.

> Can we not start using it as an external dependency

I would love to understand your rationale, as this is a surprising suggestion 
to me. This is just like any other subsystem, but we would be managing it as a 
separate library primarily for modularity reasons. The reality is that this 
option should anyway be considered unavailable. This is a proposed contribution 
to the Cassandra project, which we can either accept or reject.

> Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points

It has recently been demonstrated to be possible to build a system that can 
safely switch between different consensus protocols. However, this was very 
sophisticated work that would require its own CEP, one that we would be unable 
to resource. Even if we could this would be insufficient. This goal has never 
been achieved for a multi-shard transaction protocol to my knowledge, and 
multi-shard transaction protocols are much more divergent in implementation 
detail than consensus protocols.

> so we could easily switch implementations with different guarantees… (ie. 
> Apache Ratis)

As far as I know, there are no other strict serializable protocols available to 
plug in today. Apache Ratis appears to be a straightforward Raft 
implementation, and therefore it is a linearizable consensus protocol. It is 
not multi-shard transaction protocol at all, let alone strict serializable. It 
could be used in place of Paxos, but not Accord.



From: Paulo Motta 
Date: Tuesday, 14 September 2021 at 22:55
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
I can start with some preliminary comments while I get more familiarized
with the proposal:

- First and foremost, I believe this proposal in its current form focuses
on the protocol details (HOW?) but lacks the bigger picture on how this is
going to be exposed to the user (WHAT)? Is exposing linearizable
transactions to the user not a goal of this proposal? If not, I think the
proposal is missing the UX (ie. what CQL commands are going to be added
etc) on how these transactions are going to be exposed.

- Why do we need to bring the library into the project umbrella? Can we not
start using it as an external dependency, and later re-evaluate if it's
necessary to bring it into the project or even incubate it as another
Apache project? I feel we may be importing unnecessary management overhead
into the project while only a small subset of contributors will be involved
with the core protocol.

- Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points, so we could easily switch
implementations with different guarantees, trade-offs and performance
considerations while leaving the UX intact? This would also allow us to
easily benchmark the protocol against alternatives (ie. Apache Ratis) and
validate the performance claims. I think the best way to do that would be
to define what the feature will look like to the end user (UX), define the
integration points necessary to support this feature, and use accord as the
first implementation of these integration points.

Em ter., 14 de set. de 2021 às 17:57, Paulo Motta 
escreveu:

> Given the extensiveness and complexity of the proposal I'd suggest leaving
> it a little longer (perhaps 4 weeks from the publish date?) for people to
> get a bit more familiarized and have the chance to comment before casting a
> vote. I glanced through the proposal - and it looks outstanding, very
> promising work guys! - but would like a bit more time to take a deeper look
> and digest it before potentially commenting on it.
>
> Em ter., 14 de set. de 2021 às 17:30, [email protected] <
> [email protected]> escreveu:
>
>> Has anyone had a chance to read the drafts, and has any feedback or
>> questions? Does anybody still anticipate doing so in the near future? Or
>> shall we move to a vote?
>>
>> From: [email protected] 
>> Date: Tuesday, 7 September 2021 at 21:27
>> To: [email protected] 
>> Subj

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread Paulo Motta
I can start with some preliminary comments while I get more familiarized
with the proposal:

- First and foremost, I believe this proposal in its current form focuses
on the protocol details (HOW?) but lacks the bigger picture on how this is
going to be exposed to the user (WHAT)? Is exposing linearizable
transactions to the user not a goal of this proposal? If not, I think the
proposal is missing the UX (ie. what CQL commands are going to be added
etc) on how these transactions are going to be exposed.

- Why do we need to bring the library into the project umbrella? Can we not
start using it as an external dependency, and later re-evaluate if it's
necessary to bring it into the project or even incubate it as another
Apache project? I feel we may be importing unnecessary management overhead
into the project while only a small subset of contributors will be involved
with the core protocol.

- Isn't this a good chance to make the serialization protocol pluggable
with clearly defined integration points, so we could easily switch
implementations with different guarantees, trade-offs and performance
considerations while leaving the UX intact? This would also allow us to
easily benchmark the protocol against alternatives (ie. Apache Ratis) and
validate the performance claims. I think the best way to do that would be
to define what the feature will look like to the end user (UX), define the
integration points necessary to support this feature, and use accord as the
first implementation of these integration points.

Em ter., 14 de set. de 2021 às 17:57, Paulo Motta 
escreveu:

> Given the extensiveness and complexity of the proposal I'd suggest leaving
> it a little longer (perhaps 4 weeks from the publish date?) for people to
> get a bit more familiarized and have the chance to comment before casting a
> vote. I glanced through the proposal - and it looks outstanding, very
> promising work guys! - but would like a bit more time to take a deeper look
> and digest it before potentially commenting on it.
>
> Em ter., 14 de set. de 2021 às 17:30, [email protected] <
> [email protected]> escreveu:
>
>> Has anyone had a chance to read the drafts, and has any feedback or
>> questions? Does anybody still anticipate doing so in the near future? Or
>> shall we move to a vote?
>>
>> From: [email protected] 
>> Date: Tuesday, 7 September 2021 at 21:27
>> To: [email protected] 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Hi Jake,
>>
>> > What structural changes are planned to support an external dependency
>> project like this
>>
>> To add to Blake’s answer, in case there’s some confusion over this, the
>> proposal is to include this library within the Apache Cassandra project. So
>> I wouldn’t think of it as an external dependency. This PMC and community
>> will still have the usual oversight over direction and development, and
>> APIs will be developed solely with the intention of their integration with
>> Cassandra.
>>
>> > Will this effort eventually replace consistency levels in C*?
>>
>> I hope we’ll have some very related discussions around consistency levels
>> in the coming months more generally, but I don’t think that is tightly
>> coupled to this work. I agree with you both that we won’t want to
>> perpetuate the problems you’ve highlighted though.
>>
>> Henrik:
>> > I was referring to the property that Calvin transactions also need to
>> be sent to the cluster in a single shot
>>
>> Ah, yes. In that case I agree, and I tried to point to this direction in
>> an earlier email, where I discussed the use of scripting languages (i.e.
>> transactionally modifying the database with some subset of arbitrary
>> computation). I think the JVM is particularly suited to offering quite
>> powerful distributed transactions in this vein, and it will be interesting
>> to see what we might develop in this direction in future.
>>
>>
>> From: Jake Luciani 
>> Date: Tuesday, 7 September 2021 at 19:27
>> To: [email protected] 
>> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
>> Great thanks for the information
>>
>> On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
>>  wrote:
>>
>> > Hi Jake,
>> >
>> > > 1.  Will this effort eventually replace consistency levels in C*?  I
>> ask
>> > > because one of the shortcomings of our paxos today is
>> > > it can be easily mixed with non serialized consistencies and therefore
>> > > users commonly break consistency by for example reading at CL.ONE
>> while
>> > > also
>> > > using LWTs.
>>

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread Paulo Motta
Given the extensiveness and complexity of the proposal I'd suggest leaving
it a little longer (perhaps 4 weeks from the publish date?) for people to
get a bit more familiarized and have the chance to comment before casting a
vote. I glanced through the proposal - and it looks outstanding, very
promising work guys! - but would like a bit more time to take a deeper look
and digest it before potentially commenting on it.

Em ter., 14 de set. de 2021 às 17:30, [email protected] <
[email protected]> escreveu:

> Has anyone had a chance to read the drafts, and has any feedback or
> questions? Does anybody still anticipate doing so in the near future? Or
> shall we move to a vote?
>
> From: [email protected] 
> Date: Tuesday, 7 September 2021 at 21:27
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Jake,
>
> > What structural changes are planned to support an external dependency
> project like this
>
> To add to Blake’s answer, in case there’s some confusion over this, the
> proposal is to include this library within the Apache Cassandra project. So
> I wouldn’t think of it as an external dependency. This PMC and community
> will still have the usual oversight over direction and development, and
> APIs will be developed solely with the intention of their integration with
> Cassandra.
>
> > Will this effort eventually replace consistency levels in C*?
>
> I hope we’ll have some very related discussions around consistency levels
> in the coming months more generally, but I don’t think that is tightly
> coupled to this work. I agree with you both that we won’t want to
> perpetuate the problems you’ve highlighted though.
>
> Henrik:
> > I was referring to the property that Calvin transactions also need to be
> sent to the cluster in a single shot
>
> Ah, yes. In that case I agree, and I tried to point to this direction in
> an earlier email, where I discussed the use of scripting languages (i.e.
> transactionally modifying the database with some subset of arbitrary
> computation). I think the JVM is particularly suited to offering quite
> powerful distributed transactions in this vein, and it will be interesting
> to see what we might develop in this direction in future.
>
>
> From: Jake Luciani 
> Date: Tuesday, 7 September 2021 at 19:27
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Great thanks for the information
>
> On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
>  wrote:
>
> > Hi Jake,
> >
> > > 1.  Will this effort eventually replace consistency levels in C*?  I
> ask
> > > because one of the shortcomings of our paxos today is
> > > it can be easily mixed with non serialized consistencies and therefore
> > > users commonly break consistency by for example reading at CL.ONE while
> > > also
> > > using LWTs.
> >
> > This will likely require CLs to be specified at the schema level for
> > tables using multi partition transactions. I’d expect this to be
> available
> > for other tables, but not required.
> >
> > > 2. What structural changes are planned to support an external
> dependency
> > > project like this?  Are there some high level interfaces you expect the
> > > project to adhere to?
> >
> > There will be some interfaces that need to be implemented in C* to
> support
> > the library. You can find the current interfaces in the accord.api
> package,
> > but these were written to support some initial testing, and not intended
> > for integration into C* as is. Things are pretty fluid right now and will
> > be rewritten / refactored multiple times over the next few months.
> >
> > Thanks,
> >
> > Blake
> >
> >
> > > On Sun, Sep 5, 2021 at 10:33 AM [email protected] <
> [email protected]
> > >
> > > wrote:
> > >
> > >> Wiki:
> > >>
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> > >> Whitepaper:
> > >>
> >
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> > >>>
> > >> Prototype: https://github.com/belliottsmith/accord
> > >>
> > >> Hi everyone, I’d like to propose this CEP for adoption by the
> community.
> > >>
> > >> Cassandra has benefitted from LWTs for many years, b

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-14 Thread [email protected]
Has anyone had a chance to read the drafts, and has any feedback or questions? 
Does anybody still anticipate doing so in the near future? Or shall we move to 
a vote?

From: [email protected] 
Date: Tuesday, 7 September 2021 at 21:27
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Jake,

> What structural changes are planned to support an external dependency project 
> like this

To add to Blake’s answer, in case there’s some confusion over this, the 
proposal is to include this library within the Apache Cassandra project. So I 
wouldn’t think of it as an external dependency. This PMC and community will 
still have the usual oversight over direction and development, and APIs will be 
developed solely with the intention of their integration with Cassandra.

> Will this effort eventually replace consistency levels in C*?

I hope we’ll have some very related discussions around consistency levels in 
the coming months more generally, but I don’t think that is tightly coupled to 
this work. I agree with you both that we won’t want to perpetuate the problems 
you’ve highlighted though.

Henrik:
> I was referring to the property that Calvin transactions also need to be sent 
> to the cluster in a single shot

Ah, yes. In that case I agree, and I tried to point to this direction in an 
earlier email, where I discussed the use of scripting languages (i.e. 
transactionally modifying the database with some subset of arbitrary 
computation). I think the JVM is particularly suited to offering quite powerful 
distributed transactions in this vein, and it will be interesting to see what 
we might develop in this direction in future.


From: Jake Luciani 
Date: Tuesday, 7 September 2021 at 19:27
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Great thanks for the information

On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
 wrote:

> Hi Jake,
>
> > 1.  Will this effort eventually replace consistency levels in C*?  I ask
> > because one of the shortcomings of our paxos today is
> > it can be easily mixed with non serialized consistencies and therefore
> > users commonly break consistency by for example reading at CL.ONE while
> > also
> > using LWTs.
>
> This will likely require CLs to be specified at the schema level for
> tables using multi partition transactions. I’d expect this to be available
> for other tables, but not required.
>
> > 2. What structural changes are planned to support an external dependency
> > project like this?  Are there some high level interfaces you expect the
> > project to adhere to?
>
> There will be some interfaces that need to be implemented in C* to support
> the library. You can find the current interfaces in the accord.api package,
> but these were written to support some initial testing, and not intended
> for integration into C* as is. Things are pretty fluid right now and will
> be rewritten / refactored multiple times over the next few months.
>
> Thanks,
>
> Blake
>
>
> > On Sun, Sep 5, 2021 at 10:33 AM [email protected]  >
> > wrote:
> >
> >> Wiki:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >> Whitepaper:
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >> <
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>
> >> Prototype: https://github.com/belliottsmith/accord
> >>
> >> Hi everyone, I’d like to propose this CEP for adoption by the community.
> >>
> >> Cassandra has benefitted from LWTs for many years, but application
> >> developers that want to ensure consistency for complex operations must
> >> either accept the scalability bottleneck of serializing all related
> state
> >> through a single partition, or layer a complex state machine on top of
> the
> >> database. These are sophisticated and costly activities that our users
> >> should not be expected to undertake. Since distributed databases are
> >> beginning to offer distributed transactions with fewer caveats, it is
> past
> >> time for Cassandra to do so as well.
> >>
> >> This CEP proposes the use of several novel techniques that build upon
> >> research (that followed EPaxos) to deliver (non-interactive) general
> >> purpose distributed transactions. The approach is outlined in the
> wikipage
> >> and in more detail in the linked whitepaper. Importantly, by adopting
> this
> >> approach we will be the _only_ distributed database to offer glob

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread [email protected]
Hi Jake,

> What structural changes are planned to support an external dependency project 
> like this

To add to Blake’s answer, in case there’s some confusion over this, the 
proposal is to include this library within the Apache Cassandra project. So I 
wouldn’t think of it as an external dependency. This PMC and community will 
still have the usual oversight over direction and development, and APIs will be 
developed solely with the intention of their integration with Cassandra.

> Will this effort eventually replace consistency levels in C*?

I hope we’ll have some very related discussions around consistency levels in 
the coming months more generally, but I don’t think that is tightly coupled to 
this work. I agree with you both that we won’t want to perpetuate the problems 
you’ve highlighted though.

Henrik:
> I was referring to the property that Calvin transactions also need to be sent 
> to the cluster in a single shot

Ah, yes. In that case I agree, and I tried to point to this direction in an 
earlier email, where I discussed the use of scripting languages (i.e. 
transactionally modifying the database with some subset of arbitrary 
computation). I think the JVM is particularly suited to offering quite powerful 
distributed transactions in this vein, and it will be interesting to see what 
we might develop in this direction in future.


From: Jake Luciani 
Date: Tuesday, 7 September 2021 at 19:27
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Great thanks for the information

On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
 wrote:

> Hi Jake,
>
> > 1.  Will this effort eventually replace consistency levels in C*?  I ask
> > because one of the shortcomings of our paxos today is
> > it can be easily mixed with non serialized consistencies and therefore
> > users commonly break consistency by for example reading at CL.ONE while
> > also
> > using LWTs.
>
> This will likely require CLs to be specified at the schema level for
> tables using multi partition transactions. I’d expect this to be available
> for other tables, but not required.
>
> > 2. What structural changes are planned to support an external dependency
> > project like this?  Are there some high level interfaces you expect the
> > project to adhere to?
>
> There will be some interfaces that need to be implemented in C* to support
> the library. You can find the current interfaces in the accord.api package,
> but these were written to support some initial testing, and not intended
> for integration into C* as is. Things are pretty fluid right now and will
> be rewritten / refactored multiple times over the next few months.
>
> Thanks,
>
> Blake
>
>
> > On Sun, Sep 5, 2021 at 10:33 AM [email protected]  >
> > wrote:
> >
> >> Wiki:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >> Whitepaper:
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >> <
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>
> >> Prototype: https://github.com/belliottsmith/accord
> >>
> >> Hi everyone, I’d like to propose this CEP for adoption by the community.
> >>
> >> Cassandra has benefitted from LWTs for many years, but application
> >> developers that want to ensure consistency for complex operations must
> >> either accept the scalability bottleneck of serializing all related
> state
> >> through a single partition, or layer a complex state machine on top of
> the
> >> database. These are sophisticated and costly activities that our users
> >> should not be expected to undertake. Since distributed databases are
> >> beginning to offer distributed transactions with fewer caveats, it is
> past
> >> time for Cassandra to do so as well.
> >>
> >> This CEP proposes the use of several novel techniques that build upon
> >> research (that followed EPaxos) to deliver (non-interactive) general
> >> purpose distributed transactions. The approach is outlined in the
> wikipage
> >> and in more detail in the linked whitepaper. Importantly, by adopting
> this
> >> approach we will be the _only_ distributed database to offer global,
> >> scalable, strict serializable transactions in one wide area round-trip.
> >> This would represent a significant improvement in the state of the art,
> >> both in the academic literature and in commercial or open source
> offerings.
> >>
> >> This work has been partially 

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Jake Luciani
Great thanks for the information

On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
 wrote:

> Hi Jake,
>
> > 1.  Will this effort eventually replace consistency levels in C*?  I ask
> > because one of the shortcomings of our paxos today is
> > it can be easily mixed with non serialized consistencies and therefore
> > users commonly break consistency by for example reading at CL.ONE while
> > also
> > using LWTs.
>
> This will likely require CLs to be specified at the schema level for
> tables using multi partition transactions. I’d expect this to be available
> for other tables, but not required.
>
> > 2. What structural changes are planned to support an external dependency
> > project like this?  Are there some high level interfaces you expect the
> > project to adhere to?
>
> There will be some interfaces that need to be implemented in C* to support
> the library. You can find the current interfaces in the accord.api package,
> but these were written to support some initial testing, and not intended
> for integration into C* as is. Things are pretty fluid right now and will
> be rewritten / refactored multiple times over the next few months.
>
> Thanks,
>
> Blake
>
>
> > On Sun, Sep 5, 2021 at 10:33 AM [email protected]  >
> > wrote:
> >
> >> Wiki:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >> Whitepaper:
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >> <
> >>
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>
> >> Prototype: https://github.com/belliottsmith/accord
> >>
> >> Hi everyone, I’d like to propose this CEP for adoption by the community.
> >>
> >> Cassandra has benefitted from LWTs for many years, but application
> >> developers that want to ensure consistency for complex operations must
> >> either accept the scalability bottleneck of serializing all related
> state
> >> through a single partition, or layer a complex state machine on top of
> the
> >> database. These are sophisticated and costly activities that our users
> >> should not be expected to undertake. Since distributed databases are
> >> beginning to offer distributed transactions with fewer caveats, it is
> past
> >> time for Cassandra to do so as well.
> >>
> >> This CEP proposes the use of several novel techniques that build upon
> >> research (that followed EPaxos) to deliver (non-interactive) general
> >> purpose distributed transactions. The approach is outlined in the
> wikipage
> >> and in more detail in the linked whitepaper. Importantly, by adopting
> this
> >> approach we will be the _only_ distributed database to offer global,
> >> scalable, strict serializable transactions in one wide area round-trip.
> >> This would represent a significant improvement in the state of the art,
> >> both in the academic literature and in commercial or open source
> offerings.
> >>
> >> This work has been partially realised in a prototype. This partial
> >> prototype has been verified against Jepsen.io’s Maelstrom library and
> >> dedicated in-tree strict serializability verification tools, but much
> work
> >> remains for the work to be production capable and integrated into
> Cassandra.
> >>
> >> I propose including the prototype in the project as a new source
> >> repository, to be developed as a standalone library for integration into
> >> Cassandra. I hope the community sees the important value proposition of
> >> this proposal, and will adopt the CEP after this discussion, so that the
> >> library and its integration into Cassandra can be developed in parallel
> and
> >> with the involvement of the wider community.
> >>
> >
> >
> > --
> > http://twitter.com/tjake
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
http://twitter.com/tjake


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Blake Eggleston
Hi Jake,

> 1.  Will this effort eventually replace consistency levels in C*?  I ask
> because one of the shortcomings of our paxos today is
> it can be easily mixed with non serialized consistencies and therefore
> users commonly break consistency by for example reading at CL.ONE while
> also
> using LWTs.

This will likely require CLs to be specified at the schema level for tables 
using multi partition transactions. I’d expect this to be available for other 
tables, but not required.

> 2. What structural changes are planned to support an external dependency
> project like this?  Are there some high level interfaces you expect the
> project to adhere to?

There will be some interfaces that need to be implemented in C* to support the 
library. You can find the current interfaces in the accord.api package, but 
these were written to support some initial testing, and not intended for 
integration into C* as is. Things are pretty fluid right now and will be 
rewritten / refactored multiple times over the next few months.

Thanks,

Blake


> On Sun, Sep 5, 2021 at 10:33 AM [email protected] 
> wrote:
> 
>> Wiki:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
>> Whitepaper:
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
>> <
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
>>> 
>> Prototype: https://github.com/belliottsmith/accord
>> 
>> Hi everyone, I’d like to propose this CEP for adoption by the community.
>> 
>> Cassandra has benefitted from LWTs for many years, but application
>> developers that want to ensure consistency for complex operations must
>> either accept the scalability bottleneck of serializing all related state
>> through a single partition, or layer a complex state machine on top of the
>> database. These are sophisticated and costly activities that our users
>> should not be expected to undertake. Since distributed databases are
>> beginning to offer distributed transactions with fewer caveats, it is past
>> time for Cassandra to do so as well.
>> 
>> This CEP proposes the use of several novel techniques that build upon
>> research (that followed EPaxos) to deliver (non-interactive) general
>> purpose distributed transactions. The approach is outlined in the wikipage
>> and in more detail in the linked whitepaper. Importantly, by adopting this
>> approach we will be the _only_ distributed database to offer global,
>> scalable, strict serializable transactions in one wide area round-trip.
>> This would represent a significant improvement in the state of the art,
>> both in the academic literature and in commercial or open source offerings.
>> 
>> This work has been partially realised in a prototype. This partial
>> prototype has been verified against Jepsen.io’s Maelstrom library and
>> dedicated in-tree strict serializability verification tools, but much work
>> remains for the work to be production capable and integrated into Cassandra.
>> 
>> I propose including the prototype in the project as a new source
>> repository, to be developed as a standalone library for integration into
>> Cassandra. I hope the community sees the important value proposition of
>> this proposal, and will adopt the CEP after this discussion, so that the
>> library and its integration into Cassandra can be developed in parallel and
>> with the involvement of the wider community.
>> 
> 
> 
> -- 
> http://twitter.com/tjake


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Jake Luciani
Hi Benedict!

I haven't gone too deeply into this proposal but it's very exciting to see
this kind of innovation!

Some basic questions which are tangentially related with this effort I
didn't see covered in the CEP.

1.  Will this effort eventually replace consistency levels in C*?  I ask
because one of the shortcomings of our paxos today is
it can be easily mixed with non serialized consistencies and therefore
users commonly break consistency by for example reading at CL.ONE while
also
using LWTs.

2. What structural changes are planned to support an external dependency
project like this?  Are there some high level interfaces you expect the
project to adhere to?

Thanks
Jake




On Sun, Sep 5, 2021 at 10:33 AM [email protected] 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


-- 
http://twitter.com/tjake


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 5:06 PM [email protected] 
wrote:

> > I was thinking that a path similar to Calvin/FaunaDB is certainly
> looming in the horizon at least.
>
> I’m not sure which aspect of these systems you are referring to. Unless I
> have misunderstood, I consider them to be strictly inferior approaches
> (particularly for Cassandra) as they require a _global_ leader process and
> as a result have scalability limits. Users simply shift the sharding
> problem to the cluster level rather than the node level, but the
> fundamental problem remains. This may be acceptable for many users, but was
> contrary to the goals of this CEP.
>

Oh yes. For sure it's one of the strengths of the CEP that it is clearly
designed to fit well into the existing Cassandra architecture and
experience.

I was referring to the property that Calvin transactions also need to be
sent to the cluster in a single shot, but then they have extended the
functionality by allowing programming logic to be executed inside the
transaction. (Like a stored procedure, if you will.) So the transactions
can be multi-statement with complex logic, they just can't communicate
outside the cluster - such as back and forth with the client and server.


> > good job pulling together ingredients from state of the art work in this
> area
>
> In case this was lost in the noise: this work is not simply an assembly of
> prior work. It introduces entirely novel approaches that permit the work to
> exceed the capabilities of any prior research or production system. It is
> worth properly highlighting that if we deliver this, Cassandra will have
> the most sophisticated transaction system full stop.
>
>
Of course. Maybe it's just me, but I'm at least equally impressed by the
"level of education" the authors show in not reinventing the wheel for the
details where copying a feature, or at least being inspired by one, from
some existing publication or implementation was possible. Knowing what to
keep vs what you want to improve isn't easy. Also, it makes the whitepaper
an interesting read when in addition to learning about Accord I also
learned about several other systems that I hadn't previously read about.

henrik


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread [email protected]
> I was thinking that a path similar to Calvin/FaunaDB is certainly looming in 
> the horizon at least.

I’m not sure which aspect of these systems you are referring to. Unless I have 
misunderstood, I consider them to be strictly inferior approaches (particularly 
for Cassandra) as they require a _global_ leader process and as a result have 
scalability limits. Users simply shift the sharding problem to the cluster 
level rather than the node level, but the fundamental problem remains. This may 
be acceptable for many users, but was contrary to the goals of this CEP.

> It seems to me at that point long running queries and interactive 
> transactions are mostly the same problem.

I would estimate long running queries to be easier to deliver by at least an 
order of magnitude. They’re not unrelated, but they’re still quite distinct in 
my opinion.

> good job pulling together ingredients from state of the art work in this area

In case this was lost in the noise: this work is not simply an assembly of 
prior work. It introduces entirely novel approaches that permit the work to 
exceed the capabilities of any prior research or production system. It is worth 
properly highlighting that if we deliver this, Cassandra will have the most 
sophisticated transaction system full stop.

There are to my knowledge no databases offering distributed transactions that 
are both strict serializable and have no scalability bottleneck. Every database 
today clearly aims for this combination, but accepts some trade-off: either 
only guaranteeing serializable isolation, requiring special time keeping 
hardware to guarantee strict serializability, or using a global leader process 
(or uses two phase commit, but this is quite niche).



From: Henrik Ingo 
Date: Tuesday, 7 September 2021 at 14:06
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Tue, Sep 7, 2021 at 12:26 PM [email protected] 
wrote:

> > whether I should just* think of this as "better and more efficient LWT”
>
> So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon
> definition. My understanding of a core feature/limitation of LWTs is that
> they operate over a single partition, and as a result many operations are
> impossible even in multiple rounds without complex distributed state
> machines. The core improvement here, besides improved performance, is that
> we will be able to operate over any set of keys at-once.
>
>
My bad, I have never used LWT and forgot / didn't know they were single
partition. The CEP makes more sense now.



> How this facility is evolved into user-facing capabilities is an
> open-ended question. Initially of course we will at least support the same
> syntax but remove the restriction on operating over a single partition. I
> haven’t thought about this much, as the CEP is primarily for enabling
> works, but I think we will want to expand the syntax in two ways:
>
>  1) to support more complex conditions (simple AND conditions across all
> partitions seem likely too restrictive, though they might make sense for
> the single partition case);
>   2) to support inserting data from one row into another, potentially with
> transformations being applied (including via UDFs).
>
> These are both relatively manageable improvements that we might want to
> land in the same major release as the transactions themselves. The core
> facility can be expanded quite broadly, though. It would be possible for
> instance to support some interpreted language(s) as part of a query, so
> that arbitrary work can be applied in the transaction.
>

I was thinking that a path similar to Calvin/FaunaDB is certainly looming
in the horizon at least. I've been following those with interest, because
a) it's refreshingly outside of the box thinking, and b) they seem to be
able to push the limitations of this approach much beyond what one might
imagine when reading about it the first time. But like you also point out,
it remains to be seen whether users actually want those kinds of
transactions. We are creatures of habit for sure.



> Or, perhaps the community would rather build atop the feature to support
> interactive transactions at the client. I can’t predict resourcing for
> this, though, and it might be a community effort. I think it would be quite
> tractable once this work lands, however.
>
> > Suppose I wanted to do a long running read-only transaction
>
> So, there’s two sides to this: with and without paging. A long running
> read-only transaction taking a few seconds is quite likely to be fine and
> we will probably support with some MVCC within the transaction system
> itself. This may or may not be part of v1, it’s hard to predict with
> certainty as this is going to be a large undertaking.
>
> But for paged queries we’d be talki

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 12:26 PM [email protected] 
wrote:

> > whether I should just* think of this as "better and more efficient LWT”
>
> So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon
> definition. My understanding of a core feature/limitation of LWTs is that
> they operate over a single partition, and as a result many operations are
> impossible even in multiple rounds without complex distributed state
> machines. The core improvement here, besides improved performance, is that
> we will be able to operate over any set of keys at-once.
>
>
My bad, I have never used LWT and forgot / didn't know they were single
partition. The CEP makes more sense now.



> How this facility is evolved into user-facing capabilities is an
> open-ended question. Initially of course we will at least support the same
> syntax but remove the restriction on operating over a single partition. I
> haven’t thought about this much, as the CEP is primarily for enabling
> works, but I think we will want to expand the syntax in two ways:
>
>  1) to support more complex conditions (simple AND conditions across all
> partitions seem likely too restrictive, though they might make sense for
> the single partition case);
>   2) to support inserting data from one row into another, potentially with
> transformations being applied (including via UDFs).
>
> These are both relatively manageable improvements that we might want to
> land in the same major release as the transactions themselves. The core
> facility can be expanded quite broadly, though. It would be possible for
> instance to support some interpreted language(s) as part of a query, so
> that arbitrary work can be applied in the transaction.
>

I was thinking that a path similar to Calvin/FaunaDB is certainly looming
in the horizon at least. I've been following those with interest, because
a) it's refreshingly outside of the box thinking, and b) they seem to be
able to push the limitations of this approach much beyond what one might
imagine when reading about it the first time. But like you also point out,
it remains to be seen whether users actually want those kinds of
transactions. We are creatures of habit for sure.



> Or, perhaps the community would rather build atop the feature to support
> interactive transactions at the client. I can’t predict resourcing for
> this, though, and it might be a community effort. I think it would be quite
> tractable once this work lands, however.
>
> > Suppose I wanted to do a long running read-only transaction
>
> So, there’s two sides to this: with and without paging. A long running
> read-only transaction taking a few seconds is quite likely to be fine and
> we will probably support with some MVCC within the transaction system
> itself. This may or may not be part of v1, it’s hard to predict with
> certainty as this is going to be a large undertaking.
>
> But for paged queries we’d be talking about SNAPSHOT isolation. This is
> likely to be something the community wants to support before long anyway
> and is probably not as hard as you might think. It is probably outside of
> the scope of this work, though the two would dovetail very nicely.
>

I've pointed out to some of my colleagues that since Cassandra's storage
engine is an LSM engine, with some additional work it could become an MVCC
style storage engine. Your thinking here seems to be in the same direction,
even if it's beyond version 1. (Just for context, also for benefit of other
readers on the list, it took MongoDB 5 years and 6 major releases to
develop distributed multi-shard transactions. So it's good to talk about
the general direction, but understanding that this is not something anyone
will finish before Christmas.)

It seems to me at that point long running queries and interactive
transactions are mostly the same problem.



Benedict, thanks for the answers. Since I'm not a Cassandra developer I
feel it would be inappropriate for me to express an opinion for or against,
so I'll just end with saying this is an interesting proposal and the
authors have done a good job pulling together ingredients from state of the
art work in this area. As such it will be interesting to follow the
discussion and work from whitepaper to implementation.


A secondary objective was also to just let everyone know I am lurking here.
If you ever want to reach out for an off-band discussion, you now have my
contact details.

henrik


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread [email protected]
> Sorry if a few comments were a bit "editorial" in the first message

Not a problem at all – more than happy to talk about suggestions in that vein! 
Just probably best not to subject everyone else to the discussion.

> What I would like to understand better and without guessing is, what do these 
> transactions look like from a client/user point of view?

This is a fair question, and perhaps something I should pinpoint more directly 
for the reader. The CEP does stipulate non-interactive transactions, i.e. those 
that are one-shot. The only other limitation is that the partition keys must be 
known upfront, however I expect we will follow-up soon after with some weaker 
semantics that build on top (probably using optimistic concurrency control) to 
support transactions where only some partition keys are known upfront, so that 
we may support global secondary indexes with proper isolation and consistency.

> whether I should just* think of this as "better and more efficient LWT”

So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon 
definition. My understanding of a core feature/limitation of LWTs is that they 
operate over a single partition, and as a result many operations are impossible 
even in multiple rounds without complex distributed state machines. The core 
improvement here, besides improved performance, is that we will be able to 
operate over any set of keys at-once.

How this facility is evolved into user-facing capabilities is an open-ended 
question. Initially of course we will at least support the same syntax but 
remove the restriction on operating over a single partition. I haven’t thought 
about this much, as the CEP is primarily for enabling works, but I think we 
will want to expand the syntax in two ways:

 1) to support more complex conditions (simple AND conditions across all 
partitions seem likely too restrictive, though they might make sense for the 
single partition case);
  2) to support inserting data from one row into another, potentially with 
transformations being applied (including via UDFs).

These are both relatively manageable improvements that we might want to land in 
the same major release as the transactions themselves. The core facility can be 
expanded quite broadly, though. It would be possible for instance to support 
some interpreted language(s) as part of a query, so that arbitrary work can be 
applied in the transaction.

Or, perhaps the community would rather build atop the feature to support 
interactive transactions at the client. I can’t predict resourcing for this, 
though, and it might be a community effort. I think it would be quite tractable 
once this work lands, however.

> Suppose I wanted to do a long running read-only transaction

So, there’s two sides to this: with and without paging. A long running 
read-only transaction taking a few seconds is quite likely to be fine and we 
will probably support with some MVCC within the transaction system itself. This 
may or may not be part of v1, it’s hard to predict with certainty as this is 
going to be a large undertaking.

But for paged queries we’d be talking about SNAPSHOT isolation. This is likely 
to be something the community wants to support before long anyway and is 
probably not as hard as you might think. It is probably outside of the scope of 
this work, though the two would dovetail very nicely.


From: Henrik Ingo 
Date: Tuesday, 7 September 2021 at 09:24
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Tue, Sep 7, 2021 at 1:31 AM [email protected] 
wrote:

>
> Of course, but we may have to be selective in our back-and-forth. We can
> always take some discussion off-list to keep it manageable.
>
>
I'll try to converge.Sorry if a few comments were a bit "editorial" in the
first message. I find that sometimes it pays off to also ask the dumb
questions, as long as we don't get stuck on any of them.


> > The algorithm is hard to read since you omit the roles of the
> participants.
>
> Thanks. I will consider how I might make it clearer that the portions of
> the algorithm that execute on receipt of messages that may only be received
> by replicas, are indeed executed by those replicas.
>
>
In fact the same algorithm in the CEP was easier to read exactly because of
this, I now realize.


> > So I guess my question is how and when reads happen?
>
> I think this is reasonably well specified in the protocol and, since it’s
> unclear what you’ve found confusing, I don’t know it would be productive to
> try to explain it again here on list. You can look at the prototype, if
> Java is easier for you to parse, as it is of course fully specified there
> with no ambiguity. Or we can discuss off list, or perhaps on the community
> slack channel.
>
>
Maybe my question was a bit too open ended, as I didn't w

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 1:31 AM [email protected] 
wrote:

>
> Of course, but we may have to be selective in our back-and-forth. We can
> always take some discussion off-list to keep it manageable.
>
>
I'll try to converge.Sorry if a few comments were a bit "editorial" in the
first message. I find that sometimes it pays off to also ask the dumb
questions, as long as we don't get stuck on any of them.


> > The algorithm is hard to read since you omit the roles of the
> participants.
>
> Thanks. I will consider how I might make it clearer that the portions of
> the algorithm that execute on receipt of messages that may only be received
> by replicas, are indeed executed by those replicas.
>
>
In fact the same algorithm in the CEP was easier to read exactly because of
this, I now realize.


> > So I guess my question is how and when reads happen?
>
> I think this is reasonably well specified in the protocol and, since it’s
> unclear what you’ve found confusing, I don’t know it would be productive to
> try to explain it again here on list. You can look at the prototype, if
> Java is easier for you to parse, as it is of course fully specified there
> with no ambiguity. Or we can discuss off list, or perhaps on the community
> slack channel.
>
>
Maybe my question was a bit too open ended, as I didn't want to lead into
any specific direction.

I can of course tell where reads happen in the execution algorithm. What I
would like to understand better and without guessing is, what do these
transactions look like from a client/user point of view? You already
confirmed that interactive transactions aren't intended by this proposal.
At the other end of the spectrum, given that this is a Cassandra
Enhancement Proposal, and the CEP does in fact state this, it seems like
providing equivalent functionality to already existing LWT is a goal. So my
question is whether I should just* think of this as "better and more
efficient LWT" or is there something more? Would this CEP or follow-up work
introduce any new CQL syntax, for example?

To give just one more example of the kind of questions I'm triangulating
at: Suppose I wanted to do a long running read-only transaction, such as
querying a secondary index. Like SERIAL in current Cassandra, but taking
seconds to execute and returning thousands of rows. How would you see the
possibilities and limits of such operations in Accord?

*) Should emphasize that better scaling LWTs isn't just "just". If I
imagine a future Cassandra cluster where all reads and writes are
transactional and therefore strict serializeable, that would be quite a
change from today.

henrik


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-06 Thread [email protected]
Hi Henrik,

Welcome, and thanks for the feedback.

> I hope it's ok to use this list for comments on the whitepaper?

Of course, but we may have to be selective in our back-and-forth. We can always 
take some discussion off-list to keep it manageable.

> if in addition to a deadline you also impose some upper bound for the maximum 
> allowed timestamp

I expect that, much like with LWTs, there will be no facility for user-provided 
timestamps with these transactions. But yes, I anticipate many knock-on 
improvements for tables that are managed with this transaction facility.

> The algorithm is hard to read since you omit the roles of the participants.

Thanks. I will consider how I might make it clearer that the portions of the 
algorithm that execute on receipt of messages that may only be received by 
replicas, are indeed executed by those replicas.

> Is this sentence correct?

Yes, but perhaps it may be made clearer. In a previous draft there was an 
additional upsilon variable that likely clarified, but in this location for 
consistency this is hard to use (as it would replace tau, which is already 
bound by wider context), and for consistency I have tried to ensure gamma < tau 
< upsilon throughout the paper.

> Proofs of theorems 3.1 and 3.2 appear to be identical?

Nope. There’s a single but important digit difference.

>* Are interactive transactions possible?

No, I don’t think this protocol can be easily made to natively support 
interactive transactions, even discounting the problems you highlight - but I 
haven’t thought about it much as it was not a goal. Interactive transactions 
can certainly be built on top.

> Are the results of the Jepsen testing available too? (Or will be?)

There are no publishable results, nor any intention to publish them. There is a 
(fairly rough) implementation of the Jepsen.io Maelstrom txn-append workload 
that you may run at your leisure in the prototype repository. The in-tree 
strict serializability verifier is in all honesty probably more useful today 
and is I think functionally equivalent. You are welcome to browse and run both. 
As things progress towards completion, if Kyle is interested or funding can be 
found I’d love to discuss the possibility of an in-depth Jepsen analysis that 
could be published, but that’s a totally separate conversation and I think very 
premature.

> So I guess my question is how and when reads happen?

I think this is reasonably well specified in the protocol and, since it’s 
unclear what you’ve found confusing, I don’t know it would be productive to try 
to explain it again here on list. You can look at the prototype, if Java is 
easier for you to parse, as it is of course fully specified there with no 
ambiguity. Or we can discuss off list, or perhaps on the community slack 
channel.


From: Henrik Ingo 
Date: Monday, 6 September 2021 at 19:08
To: [email protected] 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi all

I should start by briefly introducing myself: I've worked a year plus at
Datastax, but in a manager role. I have no expectations near term to
actually contribute code or docs to Cassandra, rather I hope my work
indirectly will enable others to do so. As such I also don't expect to be
very vocal on this list, but today seemed like a perfect day to make that
one exception! I hope that's ok?

Before joining the Cassandra world I've worked at MongoDB and several
companies in the MySQL ecosystem. If you read the Raft mailing list you
will have met me there. Since my focus was always on high availability and
performance, I've felt very much at home working in the Cassandra ecosystem.



To the authors of the white paper I want to say this is very inspiring
work. I agree it is time to bring general purpose transactions to
Cassandra, and you are introducing them in a way that builds upon
Cassandra's existing Dynamo protocol with natural timestamps. When I was
learning Cassandra 16 months ago I had similar thoughts to what you are now
presenting.

I hope it's ok to use this list for comments on the whitepaper?

1. Introduction

While I agree that cross shard transactions are only recently becoming
mainstream, for academic level accuracy of your paper you may want to
reference NDB, also known as MySQL NDB Cluster.
 * https://en.wikipedia.org/wiki/MySQL_Cluster
 * http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.884

Above thesis is from 1997 and MySQL acquired the technology for 1 dollar in
2004. Since shortly after that year it has been in widespread use in
our mobile phone networks, with some early e-commerce and OLAP/ML type use
as secondary use cases. In short, NDB provides cross shard transactions
simply via 2 PC. A curious detail of the design is that it actually does
both replication and cross-shard both via 2PC. Two of the participants just
happen to be replicas of each other.



2.2 Timestamp Reorder buff

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-06 Thread Henrik Ingo
Hi all

I should start by briefly introducing myself: I've worked a year plus at
Datastax, but in a manager role. I have no expectations near term to
actually contribute code or docs to Cassandra, rather I hope my work
indirectly will enable others to do so. As such I also don't expect to be
very vocal on this list, but today seemed like a perfect day to make that
one exception! I hope that's ok?

Before joining the Cassandra world I've worked at MongoDB and several
companies in the MySQL ecosystem. If you read the Raft mailing list you
will have met me there. Since my focus was always on high availability and
performance, I've felt very much at home working in the Cassandra ecosystem.



To the authors of the white paper I want to say this is very inspiring
work. I agree it is time to bring general purpose transactions to
Cassandra, and you are introducing them in a way that builds upon
Cassandra's existing Dynamo protocol with natural timestamps. When I was
learning Cassandra 16 months ago I had similar thoughts to what you are now
presenting.

I hope it's ok to use this list for comments on the whitepaper?

1. Introduction

While I agree that cross shard transactions are only recently becoming
mainstream, for academic level accuracy of your paper you may want to
reference NDB, also known as MySQL NDB Cluster.
 * https://en.wikipedia.org/wiki/MySQL_Cluster
 * http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.884

Above thesis is from 1997 and MySQL acquired the technology for 1 dollar in
2004. Since shortly after that year it has been in widespread use in
our mobile phone networks, with some early e-commerce and OLAP/ML type use
as secondary use cases. In short, NDB provides cross shard transactions
simply via 2 PC. A curious detail of the design is that it actually does
both replication and cross-shard both via 2PC. Two of the participants just
happen to be replicas of each other.



2.2 Timestamp Reorder buffer

It's probably the case this is obvious, and it's omitted because it's not
required by ACCORD, but I wanted to add here that if in addition to a
deadline you also impose some upper bound for the maximum allowed
timestamp, you will make all our issues with tombstones from the future go
away. (And since you are now creating an ordered commit log, this will also
avoid having to keep tombstones for 10 days, simplify anti-entropy for
failed nodes, etc...)

3.2 Consensus

The algorithm is hard to read since you omit the roles of the participants.
It's as if all of it was executed on the Coordinator.

Is this sentence correct? Probably it is and I'm at the limits of my
understanding... *"Note that any transitive dependency of another γ ∈depsτ
where Committedγ may be pruned from depsτ, as it is durably a transitive
dependency of τ."*



3.4 Safety

Proofs of theorems 3.1 and 3.2 appear to be identical?

End:

Ok so reads were discussed very briefly in 3.3, leaving the reader to guess
quite a lot...

* Are interactive transactions possible? It appears they could be, even if
Algorithm 2 only allows for one pass at reads.
* Do I understand correctly that t0 is essentially both the start and end
time of the transaction? ...and that serializability is provided by the
fact that a later transaction gamma will not even start to execute reads
before earlier transaction tau has committed?
* If interactive transactions are possible, it seems a client can
denial-of-service a row by never committing, keeping locks open forever?

So I guess my question is how and when reads happen?

More precisely... how is it possible that the Consensus protocol is
executed first, and it already knows its dependencies, even if the
Execution protocol - aka reads and writes - are only executed after?

Similarly, how do you expect to apply writes before reads were returned to
the client? Even if you were proposing some Calvin-like single-shot
transaction, it still begs the question what mechanism can consume read
results and based on those impact the writes?


Reading the CEP:

Are the results of the Jepsen testing available too? (Or will be?)


henrik

On Sun, Sep 5, 2021 at 5:33 PM [email protected] 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our use

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-05 Thread Dinesh Joshi
+1

One of the major advantages of a separate library would be modularity.

Dinesh

> On Sep 5, 2021, at 3:02 PM, [email protected] wrote:
> 
> Yep, that’s correct. In fact my goal is that we maintain this as a 
> standalone library long term. While its primary goal will be integration with 
> Cassandra, I think there is value in maintaining a distinct library for the 
> core functionality - so long as the burden remains manageable.
> 
> From: Nate McCall 
> Date: Sunday, 5 September 2021 at 22:30
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Hi Benedict,
> If I'm parsing this correctly, you want to include the stand-alone library
> in the project as a separate repo to begin with, correct? (I'm +1 on that,
> if so).
> 
> Otherwise I am very intrigued by the paper and proposal. This looks
> excellent. Thanks Benedict, et all for putting this together!
> 
> -Nate
> 
>> On Mon, Sep 6, 2021 at 2:33 AM [email protected] 
>> wrote:
>> 
>> Wiki:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
>> Whitepaper:
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
>> <
>> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
>>> 
>> Prototype: https://github.com/belliottsmith/accord
>> 
>> Hi everyone, I’d like to propose this CEP for adoption by the community.
>> 
>> Cassandra has benefitted from LWTs for many years, but application
>> developers that want to ensure consistency for complex operations must
>> either accept the scalability bottleneck of serializing all related state
>> through a single partition, or layer a complex state machine on top of the
>> database. These are sophisticated and costly activities that our users
>> should not be expected to undertake. Since distributed databases are
>> beginning to offer distributed transactions with fewer caveats, it is past
>> time for Cassandra to do so as well.
>> 
>> This CEP proposes the use of several novel techniques that build upon
>> research (that followed EPaxos) to deliver (non-interactive) general
>> purpose distributed transactions. The approach is outlined in the wikipage
>> and in more detail in the linked whitepaper. Importantly, by adopting this
>> approach we will be the _only_ distributed database to offer global,
>> scalable, strict serializable transactions in one wide area round-trip.
>> This would represent a significant improvement in the state of the art,
>> both in the academic literature and in commercial or open source offerings.
>> 
>> This work has been partially realised in a prototype. This partial
>> prototype has been verified against Jepsen.io’s Maelstrom library and
>> dedicated in-tree strict serializability verification tools, but much work
>> remains for the work to be production capable and integrated into Cassandra.
>> 
>> I propose including the prototype in the project as a new source
>> repository, to be developed as a standalone library for integration into
>> Cassandra. I hope the community sees the important value proposition of
>> this proposal, and will adopt the CEP after this discussion, so that the
>> library and its integration into Cassandra can be developed in parallel and
>> with the involvement of the wider community.
>> 

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-05 Thread [email protected]
Yep, that’s correct. In fact my goal is that we maintain this as a standalone 
library long term. While its primary goal will be integration with Cassandra, I 
think there is value in maintaining a distinct library for the core 
functionality - so long as the burden remains manageable.

From: Nate McCall 
Date: Sunday, 5 September 2021 at 22:30
To: dev 
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Hi Benedict,
If I'm parsing this correctly, you want to include the stand-alone library
in the project as a separate repo to begin with, correct? (I'm +1 on that,
if so).

Otherwise I am very intrigued by the paper and proposal. This looks
excellent. Thanks Benedict, et all for putting this together!

-Nate

On Mon, Sep 6, 2021 at 2:33 AM [email protected] 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-05 Thread Nate McCall
Hi Benedict,
If I'm parsing this correctly, you want to include the stand-alone library
in the project as a separate repo to begin with, correct? (I'm +1 on that,
if so).

Otherwise I am very intrigued by the paper and proposal. This looks
excellent. Thanks Benedict, et all for putting this together!

-Nate

On Mon, Sep 6, 2021 at 2:33 AM [email protected] 
wrote:

> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> Whitepaper:
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> <
> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >
> Prototype: https://github.com/belliottsmith/accord
>
> Hi everyone, I’d like to propose this CEP for adoption by the community.
>
> Cassandra has benefitted from LWTs for many years, but application
> developers that want to ensure consistency for complex operations must
> either accept the scalability bottleneck of serializing all related state
> through a single partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>