Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-20 Thread Jeremiah D Jordan
+1 from me for the proposal ignoring the "where it goes".  I think the 
refactors proposed in it make sense no matter what, and the simulation ability 
should provide some very much needed testability improvements.

In particular replacing File with Path is something we have been looking to do 
(and were planning to bring up as a CEP in the coming months), as it gives a 
much better ability to plugin alternate file system access code.  We had 
someone do a POC internally at one point showing you could do fun things like 
access files in Google Cloud buckets directly from sstableloader with such a 
change (https://github.com/googleapis/java-storage-nio 
).

-Jeremiah


> On Jul 15, 2021, at 8:21 AM, Benjamin Lerer  wrote:
> 
> Does anybody have some other concerns than the target date?
> If not, I believe that we can start a vote tomorrow.
> 
> Le mer. 14 juil. 2021 à 23:18, Nate McCall  a écrit :
> 
>>> 
>>> 
>>> 
 Yes, we should perhaps remove target version from the template, and
 introduce guidance on describing stability impact etc.
>>> 
>>> Strong +1 to remove this from the template. I got sucked into the mistake
>>> of conflating implementation details and implications on where it lands
>>> instead of staying high level in the "do we agree we need this".
>>> 
>>> And I'm a +1 on the "I agree we need this".
>>> 
>> 
>> +1 to focusing on the _if_ (I think we need it).
>> 
>> IMO we could keep the target version in the template and allow "To Be
>> Decided (TBD)" as it could be useful for larger efforts or specific
>> features. (I don't want to bikeshed on that though and won't complain if
>> that field goes away.)
>> 
>> Appreciate the debate and refocusing, though!
>> 



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-17 Thread Mick Semb Wever
>
>
> Regarding waivers, I’m not sure we’ve really agreed as a community what
> the criteria are for determining if work goes into a patch release – so I’m
> not sure it would be right to call it a waiver.
>
>

Yes, and this ties into the compatibility documentation, and how we
approach and define semver categories, which I previously said I would work
on and propose at least a strawman to. ETA on that unfortunately is looking
to be during Autumn, but I have not forgotten about it.


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-15 Thread Benjamin Lerer
Does anybody have some other concerns than the target date?
If not, I believe that we can start a vote tomorrow.

Le mer. 14 juil. 2021 à 23:18, Nate McCall  a écrit :

> >
> >
> >
> > > Yes, we should perhaps remove target version from the template, and
> > > introduce guidance on describing stability impact etc.
> >
> > Strong +1 to remove this from the template. I got sucked into the mistake
> > of conflating implementation details and implications on where it lands
> > instead of staying high level in the "do we agree we need this".
> >
> > And I'm a +1 on the "I agree we need this".
> >
>
> +1 to focusing on the _if_ (I think we need it).
>
> IMO we could keep the target version in the template and allow "To Be
> Decided (TBD)" as it could be useful for larger efforts or specific
> features. (I don't want to bikeshed on that though and won't complain if
> that field goes away.)
>
> Appreciate the debate and refocusing, though!
>


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread Nate McCall
>
>
>
> > Yes, we should perhaps remove target version from the template, and
> > introduce guidance on describing stability impact etc.
>
> Strong +1 to remove this from the template. I got sucked into the mistake
> of conflating implementation details and implications on where it lands
> instead of staying high level in the "do we agree we need this".
>
> And I'm a +1 on the "I agree we need this".
>

+1 to focusing on the _if_ (I think we need it).

IMO we could keep the target version in the template and allow "To Be
Decided (TBD)" as it could be useful for larger efforts or specific
features. (I don't want to bikeshed on that though and won't complain if
that field goes away.)

Appreciate the debate and refocusing, though!


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread Jeff Jirsa
Same


On Wed, Jul 14, 2021 at 9:16 AM Brandon Williams  wrote:

> I am +1 to both removal from the template and "we need this"
>
> On Wed, Jul 14, 2021 at 9:05 AM Joshua McKenzie 
> wrote:
> >
> > And I'm a +1 on the "I agree we need this".
>
>


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread Brandon Williams
I am +1 to both removal from the template and "we need this"

On Wed, Jul 14, 2021 at 9:05 AM Joshua McKenzie  wrote:
>
> >
> > Yes, we should perhaps remove target version from the template, and
> > introduce guidance on describing stability impact etc.
>
> Strong +1 to remove this from the template. I got sucked into the mistake
> of conflating implementation details and implications on where it lands
> instead of staying high level in the "do we agree we need this".
>
> And I'm a +1 on the "I agree we need this".
>
>
> On Wed, Jul 14, 2021 at 7:32 AM bened...@apache.org 
> wrote:
>
> > > I think CEPs would benefit from describing their compatibility and
> > stability impacts, rather than trying to tie themselves to a
> > version, regardless of what context a specific version provides.
> >
> > Yes, we should perhaps remove target version from the template, and
> > introduce guidance on describing stability impact etc.
> >
> > Regarding waivers, I’m not sure we’ve really agreed as a community what
> > the criteria are for determining if work goes into a patch release – so I’m
> > not sure it would be right to call it a waiver. But I agree that scheduling
> > the release to contain some work should be a mixture of project roadmap
> > planning (distinct from CEP), and Jira/dev list discussion near the point
> > of merge.
> >
> > The question is if there is still value in the CEP pages maintaining the
> > endeavour’s goal for when the work will be ready, but perhaps this can be
> > communicated in normal date format, and used to inform project roadmap
> > planning.
> >
> >
> > From: Mick Semb Wever 
> > Date: Wednesday, 14 July 2021 at 10:41
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > >
> > > PRs will land soon for people to look at, but honestly we’re getting into
> > > an unnecessary tangle over target release. I think it would be a mistake
> > to
> > > push this to a later release, because it is valuable and it will bring
> > pain
> > > by creating divergence - but the question a CEP is meant to answer is
> > _if_
> > > the community wants a piece of work.
> > >
> > > Since it’s become an explicit point of contention, we can perhaps
> > > disaggregate a vote on _when_ to happen in parallel, once discussion on
> > > _if_ wraps up.
> >
> >
> >
> > Totally agree, can we remove the "Target Version" from the CEP, so the vote
> > is based on the _if_ ?
> >
> > Some further thoughts…
> >
> > I think CEPs would benefit from describing their compatibility and
> > stability impacts, rather than trying to tie themselves to a
> > version, regardless of what context a specific version provides.
> >
> > Rather than a subsequent vote on the CEP trying to get it into 4.0.x, what
> > about requests for waivers on each jira ticket as they are ready to land? I
> > suspect much of the work (once we see it) will be easier to agree to such
> > waivers than the only other position we have to stand by currently, which
> > is categories defined by SemVer. (A lot of people are really keen to see us
> > practice PATCH-only patch versions.) This also ties back to my request to
> > see a "rough timeline/plan of how the proposed changes are to be defined in
> > JIRAs and ordered."
> >
> > It's worth noting that the code divergence will happen between two branches
> > no matter what, e.g. 3.11, and next April is really not far away at all. Is
> > it really a problem if the LWT fix is also pushed back to 4.1 (though I
> > understand this is a bigger discussion) for the sake of driving home
> > we are a project now serious about stability?
> >
> > All in all, I am betting this discussion will be a lot more productive a)
> > when we see more of the work involved and its impact, and b) in a month or
> > two when we have better witnessed the stability of 4.0.0 and what has gone
> > into 4.0.1 and 4.0.2.
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread Joshua McKenzie
>
> Yes, we should perhaps remove target version from the template, and
> introduce guidance on describing stability impact etc.

Strong +1 to remove this from the template. I got sucked into the mistake
of conflating implementation details and implications on where it lands
instead of staying high level in the "do we agree we need this".

And I'm a +1 on the "I agree we need this".


On Wed, Jul 14, 2021 at 7:32 AM bened...@apache.org 
wrote:

> > I think CEPs would benefit from describing their compatibility and
> stability impacts, rather than trying to tie themselves to a
> version, regardless of what context a specific version provides.
>
> Yes, we should perhaps remove target version from the template, and
> introduce guidance on describing stability impact etc.
>
> Regarding waivers, I’m not sure we’ve really agreed as a community what
> the criteria are for determining if work goes into a patch release – so I’m
> not sure it would be right to call it a waiver. But I agree that scheduling
> the release to contain some work should be a mixture of project roadmap
> planning (distinct from CEP), and Jira/dev list discussion near the point
> of merge.
>
> The question is if there is still value in the CEP pages maintaining the
> endeavour’s goal for when the work will be ready, but perhaps this can be
> communicated in normal date format, and used to inform project roadmap
> planning.
>
>
> From: Mick Semb Wever 
> Date: Wednesday, 14 July 2021 at 10:41
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >
> > PRs will land soon for people to look at, but honestly we’re getting into
> > an unnecessary tangle over target release. I think it would be a mistake
> to
> > push this to a later release, because it is valuable and it will bring
> pain
> > by creating divergence - but the question a CEP is meant to answer is
> _if_
> > the community wants a piece of work.
> >
> > Since it’s become an explicit point of contention, we can perhaps
> > disaggregate a vote on _when_ to happen in parallel, once discussion on
> > _if_ wraps up.
>
>
>
> Totally agree, can we remove the "Target Version" from the CEP, so the vote
> is based on the _if_ ?
>
> Some further thoughts…
>
> I think CEPs would benefit from describing their compatibility and
> stability impacts, rather than trying to tie themselves to a
> version, regardless of what context a specific version provides.
>
> Rather than a subsequent vote on the CEP trying to get it into 4.0.x, what
> about requests for waivers on each jira ticket as they are ready to land? I
> suspect much of the work (once we see it) will be easier to agree to such
> waivers than the only other position we have to stand by currently, which
> is categories defined by SemVer. (A lot of people are really keen to see us
> practice PATCH-only patch versions.) This also ties back to my request to
> see a "rough timeline/plan of how the proposed changes are to be defined in
> JIRAs and ordered."
>
> It's worth noting that the code divergence will happen between two branches
> no matter what, e.g. 3.11, and next April is really not far away at all. Is
> it really a problem if the LWT fix is also pushed back to 4.1 (though I
> understand this is a bigger discussion) for the sake of driving home
> we are a project now serious about stability?
>
> All in all, I am betting this discussion will be a lot more productive a)
> when we see more of the work involved and its impact, and b) in a month or
> two when we have better witnessed the stability of 4.0.0 and what has gone
> into 4.0.1 and 4.0.2.
>


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread bened...@apache.org
> I think CEPs would benefit from describing their compatibility and
stability impacts, rather than trying to tie themselves to a
version, regardless of what context a specific version provides.

Yes, we should perhaps remove target version from the template, and introduce 
guidance on describing stability impact etc.

Regarding waivers, I’m not sure we’ve really agreed as a community what the 
criteria are for determining if work goes into a patch release – so I’m not 
sure it would be right to call it a waiver. But I agree that scheduling the 
release to contain some work should be a mixture of project roadmap planning 
(distinct from CEP), and Jira/dev list discussion near the point of merge.

The question is if there is still value in the CEP pages maintaining the 
endeavour’s goal for when the work will be ready, but perhaps this can be 
communicated in normal date format, and used to inform project roadmap planning.


From: Mick Semb Wever 
Date: Wednesday, 14 July 2021 at 10:41
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> PRs will land soon for people to look at, but honestly we’re getting into
> an unnecessary tangle over target release. I think it would be a mistake to
> push this to a later release, because it is valuable and it will bring pain
> by creating divergence - but the question a CEP is meant to answer is _if_
> the community wants a piece of work.
>
> Since it’s become an explicit point of contention, we can perhaps
> disaggregate a vote on _when_ to happen in parallel, once discussion on
> _if_ wraps up.



Totally agree, can we remove the "Target Version" from the CEP, so the vote
is based on the _if_ ?

Some further thoughts…

I think CEPs would benefit from describing their compatibility and
stability impacts, rather than trying to tie themselves to a
version, regardless of what context a specific version provides.

Rather than a subsequent vote on the CEP trying to get it into 4.0.x, what
about requests for waivers on each jira ticket as they are ready to land? I
suspect much of the work (once we see it) will be easier to agree to such
waivers than the only other position we have to stand by currently, which
is categories defined by SemVer. (A lot of people are really keen to see us
practice PATCH-only patch versions.) This also ties back to my request to
see a "rough timeline/plan of how the proposed changes are to be defined in
JIRAs and ordered."

It's worth noting that the code divergence will happen between two branches
no matter what, e.g. 3.11, and next April is really not far away at all. Is
it really a problem if the LWT fix is also pushed back to 4.1 (though I
understand this is a bigger discussion) for the sake of driving home
we are a project now serious about stability?

All in all, I am betting this discussion will be a lot more productive a)
when we see more of the work involved and its impact, and b) in a month or
two when we have better witnessed the stability of 4.0.0 and what has gone
into 4.0.1 and 4.0.2.


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-14 Thread Mick Semb Wever
>
> PRs will land soon for people to look at, but honestly we’re getting into
> an unnecessary tangle over target release. I think it would be a mistake to
> push this to a later release, because it is valuable and it will bring pain
> by creating divergence - but the question a CEP is meant to answer is _if_
> the community wants a piece of work.
>
> Since it’s become an explicit point of contention, we can perhaps
> disaggregate a vote on _when_ to happen in parallel, once discussion on
> _if_ wraps up.



Totally agree, can we remove the "Target Version" from the CEP, so the vote
is based on the _if_ ?

Some further thoughts…

I think CEPs would benefit from describing their compatibility and
stability impacts, rather than trying to tie themselves to a
version, regardless of what context a specific version provides.

Rather than a subsequent vote on the CEP trying to get it into 4.0.x, what
about requests for waivers on each jira ticket as they are ready to land? I
suspect much of the work (once we see it) will be easier to agree to such
waivers than the only other position we have to stand by currently, which
is categories defined by SemVer. (A lot of people are really keen to see us
practice PATCH-only patch versions.) This also ties back to my request to
see a "rough timeline/plan of how the proposed changes are to be defined in
JIRAs and ordered."

It's worth noting that the code divergence will happen between two branches
no matter what, e.g. 3.11, and next April is really not far away at all. Is
it really a problem if the LWT fix is also pushed back to 4.1 (though I
understand this is a bigger discussion) for the sake of driving home
we are a project now serious about stability?

All in all, I am betting this discussion will be a lot more productive a)
when we see more of the work involved and its impact, and b) in a month or
two when we have better witnessed the stability of 4.0.0 and what has gone
into 4.0.1 and 4.0.2.


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> If we're talking about introducing wrapper APIs that are compatible 
> w/existing concurrent classes

Unfortunately we’re not, as we often don’t use interfaces. Semaphore, 
CountDownLatch etc are concrete classes. We have quite a hodge-podge of 
concurrent API usages, and many of them are not readily mockable as they stand.

The majority of this work is cleaning the codebase, in all honesty. There is a 
lot of ugliness in there, and a lot of inconsistent behaviour.
- We use four different Futures APIs, I think (Future, ListenableFuture, 
CompletableFuture, netty’s Future), for instance. To minimise churn I implement 
three of the four in a single interface, and standardise on this for our 
Executors; this is a breaking change, and necessary to support mocking for all 
of these use cases without rewriting the application code. In this case, we use 
as a basis the futures we already introduced as part of the internode 
networking rewrite.
- To mock our executors I introduce factories, but the current hierarchy is a 
mess of inconsistency, so even discounting the above breaking change this 
necessitated introducing a new interface hierarchy to implement, and 
overhauling the internals for consistency.

PRs will land soon for people to look at, but honestly we’re getting into an 
unnecessary tangle over target release. I think it would be a mistake to push 
this to a later release, because it is valuable and it will bring pain by 
creating divergence - but the question a CEP is meant to answer is _if_ the 
community wants a piece of work.

Since it’s become an explicit point of contention, we can perhaps disaggregate 
a vote on _when_ to happen in parallel, once discussion on _if_ wraps up.


From: Joshua McKenzie 
Date: Tuesday, 13 July 2021 at 17:34
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
So stepping back from the feature vs. bug and rel cycle debate (a valuable
one, but not the original purpose of this thread):

>From the CEP:

>
>- Refactor internal APIs around concurrency to support mock
>implementations that are able to control execution, including
>
>
>- SimpleCondition, Semaphore, CountDownLatch, BlockingQueue, etc
>
>
>- Executors, futures, starting threads, etc - including important
>improvements to consistency of approach in the codebase
>
>
>- The use of currentTimeMillis and nanoTime
>
>
>- The replacement of java.io.File with a wrapper on java.nio.files.Path
> providing an ergonomic API, and some improvements to consistency of
>file handling
>
>
>- Support for alternative streaming implementations
>
>
>- Improvements to the dtest API to support necessary functionality
>
> If we're talking about introducing wrapper APIs that are compatible
w/existing concurrent classes with just a bimorphic call based on whether
we're testing or not, that's a very low risk change IMO. I'd expect any and
all invasive / new / possibly bugged changes to occur in "Introduction of a
simulator package", not in the basic interfaces we're shimming between
things.

Cleaning up inconsistency of our time units and calls of various concurrent
objects is bugfixing so should be fair game any time.

~Josh


On Tue, Jul 13, 2021 at 10:26 AM Benjamin Lerer  wrote:

> >
> > "Where do we do that?" is a more tricky question.
>
>
> Sorry, I was not really clear with that comment. What I was wondering is if
> we should create a minor version to address that issue (e.g. 4.1).
>
> I am also against making the change in the 4.0 branch.
>
> Le mar. 13 juil. 2021 à 16:09, bened...@apache.org  a
> écrit :
>
> > My point is that we all have different premises we are working from. I
> > don’t think you can convince me that I am mistaken about how I interpret
> > the word feature. The release lifecycle document we voted on is
> ambiguous,
> > and we all clearly take it to mean different things.
> >
> > From: Jeremiah D Jordan 
> > Date: Tuesday, 13 July 2021 at 15:06
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Just because it is a feature for users who are developers does not mean
> it
> > is not a new feature?  Adding this capability is adding new functionality
> > to what developers can do with Apache Cassandra.  How is that not a new
> > feature?
> >
> > Semver has been brought up a lot in conversations around what can go
> > where.  If we look at how semver defines such things:
> >
> > MAJOR version when you make incompatible API changes,
> > MINOR version when you add functionality in a backwards compatible
> manner,
> > and
> > PATCH version when you make backwards compatible bug fixes.
> >
> > 

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Joshua McKenzie
So stepping back from the feature vs. bug and rel cycle debate (a valuable
one, but not the original purpose of this thread):

>From the CEP:

>
>- Refactor internal APIs around concurrency to support mock
>implementations that are able to control execution, including
>
>
>- SimpleCondition, Semaphore, CountDownLatch, BlockingQueue, etc
>
>
>- Executors, futures, starting threads, etc - including important
>improvements to consistency of approach in the codebase
>
>
>- The use of currentTimeMillis and nanoTime
>
>
>- The replacement of java.io.File with a wrapper on java.nio.files.Path
> providing an ergonomic API, and some improvements to consistency of
>file handling
>
>
>- Support for alternative streaming implementations
>
>
>- Improvements to the dtest API to support necessary functionality
>
> If we're talking about introducing wrapper APIs that are compatible
w/existing concurrent classes with just a bimorphic call based on whether
we're testing or not, that's a very low risk change IMO. I'd expect any and
all invasive / new / possibly bugged changes to occur in "Introduction of a
simulator package", not in the basic interfaces we're shimming between
things.

Cleaning up inconsistency of our time units and calls of various concurrent
objects is bugfixing so should be fair game any time.

~Josh


On Tue, Jul 13, 2021 at 10:26 AM Benjamin Lerer  wrote:

> >
> > "Where do we do that?" is a more tricky question.
>
>
> Sorry, I was not really clear with that comment. What I was wondering is if
> we should create a minor version to address that issue (e.g. 4.1).
>
> I am also against making the change in the 4.0 branch.
>
> Le mar. 13 juil. 2021 à 16:09, bened...@apache.org  a
> écrit :
>
> > My point is that we all have different premises we are working from. I
> > don’t think you can convince me that I am mistaken about how I interpret
> > the word feature. The release lifecycle document we voted on is
> ambiguous,
> > and we all clearly take it to mean different things.
> >
> > From: Jeremiah D Jordan 
> > Date: Tuesday, 13 July 2021 at 15:06
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Just because it is a feature for users who are developers does not mean
> it
> > is not a new feature?  Adding this capability is adding new functionality
> > to what developers can do with Apache Cassandra.  How is that not a new
> > feature?
> >
> > Semver has been brought up a lot in conversations around what can go
> > where.  If we look at how semver defines such things:
> >
> > MAJOR version when you make incompatible API changes,
> > MINOR version when you add functionality in a backwards compatible
> manner,
> > and
> > PATCH version when you make backwards compatible bug fixes.
> >
> > This change to me sounds like 2.  Adding new functionality in a backwards
> > compatible manner.  I guess our issue here is that we have never actually
> > done MINOR releases in the C* project, we only make MAJOR releases and
> > PATCH releases.  So we need to decide where things that in semver would
> go
> > in a MINOR version should go.  In my mind it was always that such things
> > should only go to a MAJOR, as it seems less safe to relax what goes in a
> > PATCH and allow them there.
> >
> > -Jeremiah
> >
> > > On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
> > >
> > >> I do think adding the ability to do “Cluster and Code Simulations” is
> a
> > new feature.
> > >
> > > I don’t. I understand a feature to be a user-visible change, such as
> new
> > functionality, and it was on this basis I endorsed the release lifecycle
> > document. I do not believe that all improvement should stop to patch
> > releases, as I do not believe this produces the highest quality outcome.
> > >
> > >
> > >
> > >
> > > From: Jeremiah D Jordan 
> > > Date: Tuesday, 13 July 2021 at 14:41
> > > To: Cassandra DEV 
> > > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > > I do not think fixing CASSANDRA-12126 is not a new feature.  I do think
> > adding the ability to do “Cluster and Code Simulations” is a new feature.
> > >
> > > -Jeremiah
> > >
> > >> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
> > >>
> > >> Nothing we’re discussing constitutes a feature. We’re discussing
> > stability enhancements, and important bug fixes.
> > >>

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Benjamin Lerer
>
> "Where do we do that?" is a more tricky question.


Sorry, I was not really clear with that comment. What I was wondering is if
we should create a minor version to address that issue (e.g. 4.1).

I am also against making the change in the 4.0 branch.

Le mar. 13 juil. 2021 à 16:09, bened...@apache.org  a
écrit :

> My point is that we all have different premises we are working from. I
> don’t think you can convince me that I am mistaken about how I interpret
> the word feature. The release lifecycle document we voted on is ambiguous,
> and we all clearly take it to mean different things.
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 15:06
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Just because it is a feature for users who are developers does not mean it
> is not a new feature?  Adding this capability is adding new functionality
> to what developers can do with Apache Cassandra.  How is that not a new
> feature?
>
> Semver has been brought up a lot in conversations around what can go
> where.  If we look at how semver defines such things:
>
> MAJOR version when you make incompatible API changes,
> MINOR version when you add functionality in a backwards compatible manner,
> and
> PATCH version when you make backwards compatible bug fixes.
>
> This change to me sounds like 2.  Adding new functionality in a backwards
> compatible manner.  I guess our issue here is that we have never actually
> done MINOR releases in the C* project, we only make MAJOR releases and
> PATCH releases.  So we need to decide where things that in semver would go
> in a MINOR version should go.  In my mind it was always that such things
> should only go to a MAJOR, as it seems less safe to relax what goes in a
> PATCH and allow them there.
>
> -Jeremiah
>
> > On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
> >
> >> I do think adding the ability to do “Cluster and Code Simulations” is a
> new feature.
> >
> > I don’t. I understand a feature to be a user-visible change, such as new
> functionality, and it was on this basis I endorsed the release lifecycle
> document. I do not believe that all improvement should stop to patch
> releases, as I do not believe this produces the highest quality outcome.
> >
> >
> >
> >
> > From: Jeremiah D Jordan 
> > Date: Tuesday, 13 July 2021 at 14:41
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > I do not think fixing CASSANDRA-12126 is not a new feature.  I do think
> adding the ability to do “Cluster and Code Simulations” is a new feature.
> >
> > -Jeremiah
> >
> >> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
> >>
> >> Nothing we’re discussing constitutes a feature. We’re discussing
> stability enhancements, and important bug fixes.
> >>
> >> I think this disagreement is to some extent founded on our different
> premises about what a patch release should contain, and this seems to be
> the fault of incompletely specified documentation.
> >>
> >> 1. The release lifecycle only forbids feature work from being developed
> in a patch release, and only expressly includes bug fixes. Note that, the
> document even has a comment by the author suggesting that features may be
> backported to a patch release from trunk (not something I agree with, but
> it demonstrates the ambiguity of the definition).
> >> 2. There seems to be some conflation of size-of-change with the
> admissibility wrt release lifecycle – I don’t think there’s any criteria
> here, and it’s open to the community’s case-by-case assessment. Whatever we
> do to fix the bug in question will necessarily be a very significant piece
> of work itself, for instance.
> >>
> >> My interpretation of the release lifecycle document is that it is
> acceptable to include this work in a patch release. My belief about its
> impact is that it would contribute positively to the stability of the
> project’s 4.0 releases over the lifecycle, and improve project velocity.
> >>
> >> With respect to whether we can ship a fix to 12126 without validation,
> I would be strongly opposed to this, and certainly would not produce a
> patch myself in this way. Not only would it be burdensome (given the
> divergences in the codebase), but I would not consider it acceptably safe
> (given the divergence).
> >>
> >>
> >> From: Jeremiah D Jordan 
> >> Date: Tuesday, 13 July 2021 at 14:15
> >> To: Cassandra DEV 
> >> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> >> I tend to agree wi

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
My point is that we all have different premises we are working from. I don’t 
think you can convince me that I am mistaken about how I interpret the word 
feature. The release lifecycle document we voted on is ambiguous, and we all 
clearly take it to mean different things.

From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 15:06
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Just because it is a feature for users who are developers does not mean it is 
not a new feature?  Adding this capability is adding new functionality to what 
developers can do with Apache Cassandra.  How is that not a new feature?

Semver has been brought up a lot in conversations around what can go where.  If 
we look at how semver defines such things:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

This change to me sounds like 2.  Adding new functionality in a backwards 
compatible manner.  I guess our issue here is that we have never actually done 
MINOR releases in the C* project, we only make MAJOR releases and PATCH 
releases.  So we need to decide where things that in semver would go in a MINOR 
version should go.  In my mind it was always that such things should only go to 
a MAJOR, as it seems less safe to relax what goes in a PATCH and allow them 
there.

-Jeremiah

> On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
>
>> I do think adding the ability to do “Cluster and Code Simulations” is a new 
>> feature.
>
> I don’t. I understand a feature to be a user-visible change, such as new 
> functionality, and it was on this basis I endorsed the release lifecycle 
> document. I do not believe that all improvement should stop to patch 
> releases, as I do not believe this produces the highest quality outcome.
>
>
>
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:41
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I do not think fixing CASSANDRA-12126 is not a new feature.  I do think 
> adding the ability to do “Cluster and Code Simulations” is a new feature.
>
> -Jeremiah
>
>> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>>
>> Nothing we’re discussing constitutes a feature. We’re discussing stability 
>> enhancements, and important bug fixes.
>>
>> I think this disagreement is to some extent founded on our different 
>> premises about what a patch release should contain, and this seems to be the 
>> fault of incompletely specified documentation.
>>
>> 1. The release lifecycle only forbids feature work from being developed in a 
>> patch release, and only expressly includes bug fixes. Note that, the 
>> document even has a comment by the author suggesting that features may be 
>> backported to a patch release from trunk (not something I agree with, but it 
>> demonstrates the ambiguity of the definition).
>> 2. There seems to be some conflation of size-of-change with the 
>> admissibility wrt release lifecycle – I don’t think there’s any criteria 
>> here, and it’s open to the community’s case-by-case assessment. Whatever we 
>> do to fix the bug in question will necessarily be a very significant piece 
>> of work itself, for instance.
>>
>> My interpretation of the release lifecycle document is that it is acceptable 
>> to include this work in a patch release. My belief about its impact is that 
>> it would contribute positively to the stability of the project’s 4.0 
>> releases over the lifecycle, and improve project velocity.
>>
>> With respect to whether we can ship a fix to 12126 without validation, I 
>> would be strongly opposed to this, and certainly would not produce a patch 
>> myself in this way. Not only would it be burdensome (given the divergences 
>> in the codebase), but I would not consider it acceptably safe (given the 
>> divergence).
>>
>>
>> From: Jeremiah D Jordan 
>> Date: Tuesday, 13 July 2021 at 14:15
>> To: Cassandra DEV 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>> I tend to agree with Paulo that a major refactoring of some internal 
>> interfaces sounds like something to be explicitly avoided in a patch 
>> release.  I thought this was the type of change we all agreed we should stop 
>> letting in to patch releases, and that we would attempt to release more 
>> often (once a year) so changes that only go to trunk would get out faster?  
>> Are we really wanting to break that promise to ourselves before we even 
>> release 4.0?  To me “I think we need this feature released faster” is not a 
&

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Jeremiah D Jordan
Just because it is a feature for users who are developers does not mean it is 
not a new feature?  Adding this capability is adding new functionality to what 
developers can do with Apache Cassandra.  How is that not a new feature?

Semver has been brought up a lot in conversations around what can go where.  If 
we look at how semver defines such things:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

This change to me sounds like 2.  Adding new functionality in a backwards 
compatible manner.  I guess our issue here is that we have never actually done 
MINOR releases in the C* project, we only make MAJOR releases and PATCH 
releases.  So we need to decide where things that in semver would go in a MINOR 
version should go.  In my mind it was always that such things should only go to 
a MAJOR, as it seems less safe to relax what goes in a PATCH and allow them 
there.

-Jeremiah

> On Jul 13, 2021, at 8:47 AM, bened...@apache.org wrote:
> 
>> I do think adding the ability to do “Cluster and Code Simulations” is a new 
>> feature.
> 
> I don’t. I understand a feature to be a user-visible change, such as new 
> functionality, and it was on this basis I endorsed the release lifecycle 
> document. I do not believe that all improvement should stop to patch 
> releases, as I do not believe this produces the highest quality outcome.
> 
> 
> 
> 
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:41
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I do not think fixing CASSANDRA-12126 is not a new feature.  I do think 
> adding the ability to do “Cluster and Code Simulations” is a new feature.
> 
> -Jeremiah
> 
>> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>> 
>> Nothing we’re discussing constitutes a feature. We’re discussing stability 
>> enhancements, and important bug fixes.
>> 
>> I think this disagreement is to some extent founded on our different 
>> premises about what a patch release should contain, and this seems to be the 
>> fault of incompletely specified documentation.
>> 
>> 1. The release lifecycle only forbids feature work from being developed in a 
>> patch release, and only expressly includes bug fixes. Note that, the 
>> document even has a comment by the author suggesting that features may be 
>> backported to a patch release from trunk (not something I agree with, but it 
>> demonstrates the ambiguity of the definition).
>> 2. There seems to be some conflation of size-of-change with the 
>> admissibility wrt release lifecycle – I don’t think there’s any criteria 
>> here, and it’s open to the community’s case-by-case assessment. Whatever we 
>> do to fix the bug in question will necessarily be a very significant piece 
>> of work itself, for instance.
>> 
>> My interpretation of the release lifecycle document is that it is acceptable 
>> to include this work in a patch release. My belief about its impact is that 
>> it would contribute positively to the stability of the project’s 4.0 
>> releases over the lifecycle, and improve project velocity.
>> 
>> With respect to whether we can ship a fix to 12126 without validation, I 
>> would be strongly opposed to this, and certainly would not produce a patch 
>> myself in this way. Not only would it be burdensome (given the divergences 
>> in the codebase), but I would not consider it acceptably safe (given the 
>> divergence).
>> 
>> 
>> From: Jeremiah D Jordan 
>> Date: Tuesday, 13 July 2021 at 14:15
>> To: Cassandra DEV 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>> I tend to agree with Paulo that a major refactoring of some internal 
>> interfaces sounds like something to be explicitly avoided in a patch 
>> release.  I thought this was the type of change we all agreed we should stop 
>> letting in to patch releases, and that we would attempt to release more 
>> often (once a year) so changes that only go to trunk would get out faster?  
>> Are we really wanting to break that promise to ourselves before we even 
>> release 4.0?  To me “I think we need this feature released faster” is not a 
>> reason to put it in 4.0, it could be a reason to release 4.1 sooner.  This 
>> is where having a releasable trunk helps, as if we decided as a project that 
>> some change was worth a new major being released early the effort of doing 
>> that release is much smaller when trunk is releasable.
>> 
>> Any fix we make in 4.0 would be merged forward into trunk and could be f

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> I do think adding the ability to do “Cluster and Code Simulations” is a new 
> feature.

I don’t. I understand a feature to be a user-visible change, such as new 
functionality, and it was on this basis I endorsed the release lifecycle 
document. I do not believe that all improvement should stop to patch releases, 
as I do not believe this produces the highest quality outcome.




From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 14:41
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
I do not think fixing CASSANDRA-12126 is not a new feature.  I do think adding 
the ability to do “Cluster and Code Simulations” is a new feature.

-Jeremiah

> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>
> Nothing we’re discussing constitutes a feature. We’re discussing stability 
> enhancements, and important bug fixes.
>
> I think this disagreement is to some extent founded on our different premises 
> about what a patch release should contain, and this seems to be the fault of 
> incompletely specified documentation.
>
> 1. The release lifecycle only forbids feature work from being developed in a 
> patch release, and only expressly includes bug fixes. Note that, the document 
> even has a comment by the author suggesting that features may be backported 
> to a patch release from trunk (not something I agree with, but it 
> demonstrates the ambiguity of the definition).
> 2. There seems to be some conflation of size-of-change with the admissibility 
> wrt release lifecycle – I don’t think there’s any criteria here, and it’s 
> open to the community’s case-by-case assessment. Whatever we do to fix the 
> bug in question will necessarily be a very significant piece of work itself, 
> for instance.
>
> My interpretation of the release lifecycle document is that it is acceptable 
> to include this work in a patch release. My belief about its impact is that 
> it would contribute positively to the stability of the project’s 4.0 releases 
> over the lifecycle, and improve project velocity.
>
> With respect to whether we can ship a fix to 12126 without validation, I 
> would be strongly opposed to this, and certainly would not produce a patch 
> myself in this way. Not only would it be burdensome (given the divergences in 
> the codebase), but I would not consider it acceptably safe (given the 
> divergence).
>
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:15
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I tend to agree with Paulo that a major refactoring of some internal 
> interfaces sounds like something to be explicitly avoided in a patch release. 
>  I thought this was the type of change we all agreed we should stop letting 
> in to patch releases, and that we would attempt to release more often (once a 
> year) so changes that only go to trunk would get out faster?  Are we really 
> wanting to break that promise to ourselves before we even release 4.0?  To me 
> “I think we need this feature released faster” is not a reason to put it in 
> 4.0, it could be a reason to release 4.1 sooner.  This is where having a 
> releasable trunk helps, as if we decided as a project that some change was 
> worth a new major being released early the effort of doing that release is 
> much smaller when trunk is releasable.
>
> Any fix we make in 4.0 would be merged forward into trunk and could be fully 
> verified there?  Probably not the best, but would give more confidence in a 
> fix than otherwise without adding other major changes to 4.0?
>
> -Jeremiah
>
>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>>
>>>
>>> Furthermore, we introduced a significant performance regression in all
>>> lines of the software by increasing the number of LWT round-trips. Unless
>>> we intend to leave this regression for a further year without _any_ release
>>> offering a solution, we will need suitable verification mechanisms for
>>> whatever fixes we deliver.
>>>
>>> My view is that it is unacceptable to leave such a significant regression
>>> unaddressed in all lines of software we intend to release for the
>>> foreseeable future.
>>
>>
>> I would like to expand a bit on this as I believe it might be important for
>> people to have the full picture. The fix for  CASSANDRA-12126
>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
>> regression by increasing the number of LWT round-trips. Nevertheless, the
>> patch introduced a flag to allow users to revert to the previous behavior
>> (previous performance + consistency issue).
>>
>> Also the patch did not address all paxos consistency iss

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Jeremiah D Jordan
Too many nots.  I do not think fixing 12126 is a new feature.

> On Jul 13, 2021, at 8:40 AM, Jeremiah D Jordan  wrote:
> 
> I do not think fixing CASSANDRA-12126 is not a new feature.  I do think 
> adding the ability to do “Cluster and Code Simulations” is a new feature.
> 
> -Jeremiah
> 
>> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
>> 
>> Nothing we’re discussing constitutes a feature. We’re discussing stability 
>> enhancements, and important bug fixes.
>> 
>> I think this disagreement is to some extent founded on our different 
>> premises about what a patch release should contain, and this seems to be the 
>> fault of incompletely specified documentation.
>> 
>> 1. The release lifecycle only forbids feature work from being developed in a 
>> patch release, and only expressly includes bug fixes. Note that, the 
>> document even has a comment by the author suggesting that features may be 
>> backported to a patch release from trunk (not something I agree with, but it 
>> demonstrates the ambiguity of the definition).
>> 2. There seems to be some conflation of size-of-change with the 
>> admissibility wrt release lifecycle – I don’t think there’s any criteria 
>> here, and it’s open to the community’s case-by-case assessment. Whatever we 
>> do to fix the bug in question will necessarily be a very significant piece 
>> of work itself, for instance.
>> 
>> My interpretation of the release lifecycle document is that it is acceptable 
>> to include this work in a patch release. My belief about its impact is that 
>> it would contribute positively to the stability of the project’s 4.0 
>> releases over the lifecycle, and improve project velocity.
>> 
>> With respect to whether we can ship a fix to 12126 without validation, I 
>> would be strongly opposed to this, and certainly would not produce a patch 
>> myself in this way. Not only would it be burdensome (given the divergences 
>> in the codebase), but I would not consider it acceptably safe (given the 
>> divergence).
>> 
>> 
>> From: Jeremiah D Jordan 
>> Date: Tuesday, 13 July 2021 at 14:15
>> To: Cassandra DEV 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>> I tend to agree with Paulo that a major refactoring of some internal 
>> interfaces sounds like something to be explicitly avoided in a patch 
>> release.  I thought this was the type of change we all agreed we should stop 
>> letting in to patch releases, and that we would attempt to release more 
>> often (once a year) so changes that only go to trunk would get out faster?  
>> Are we really wanting to break that promise to ourselves before we even 
>> release 4.0?  To me “I think we need this feature released faster” is not a 
>> reason to put it in 4.0, it could be a reason to release 4.1 sooner.  This 
>> is where having a releasable trunk helps, as if we decided as a project that 
>> some change was worth a new major being released early the effort of doing 
>> that release is much smaller when trunk is releasable.
>> 
>> Any fix we make in 4.0 would be merged forward into trunk and could be fully 
>> verified there?  Probably not the best, but would give more confidence in a 
>> fix than otherwise without adding other major changes to 4.0?
>> 
>> -Jeremiah
>> 
>>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>>> 
>>>> 
>>>> Furthermore, we introduced a significant performance regression in all
>>>> lines of the software by increasing the number of LWT round-trips. Unless
>>>> we intend to leave this regression for a further year without _any_ release
>>>> offering a solution, we will need suitable verification mechanisms for
>>>> whatever fixes we deliver.
>>>> 
>>>> My view is that it is unacceptable to leave such a significant regression
>>>> unaddressed in all lines of software we intend to release for the
>>>> foreseeable future.
>>> 
>>> 
>>> I would like to expand a bit on this as I believe it might be important for
>>> people to have the full picture. The fix for  CASSANDRA-12126
>>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
>>> regression by increasing the number of LWT round-trips. Nevertheless, the
>>> patch introduced a flag to allow users to revert to the previous behavior
>>> (previous performance + consistency issue).
>>> 
>>> Also the patch did not address all paxos consistency issues. There are
>>> still some issues during

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Jeremiah D Jordan
I do not think fixing CASSANDRA-12126 is not a new feature.  I do think adding 
the ability to do “Cluster and Code Simulations” is a new feature.

-Jeremiah

> On Jul 13, 2021, at 8:37 AM, bened...@apache.org wrote:
> 
> Nothing we’re discussing constitutes a feature. We’re discussing stability 
> enhancements, and important bug fixes.
> 
> I think this disagreement is to some extent founded on our different premises 
> about what a patch release should contain, and this seems to be the fault of 
> incompletely specified documentation.
> 
> 1. The release lifecycle only forbids feature work from being developed in a 
> patch release, and only expressly includes bug fixes. Note that, the document 
> even has a comment by the author suggesting that features may be backported 
> to a patch release from trunk (not something I agree with, but it 
> demonstrates the ambiguity of the definition).
> 2. There seems to be some conflation of size-of-change with the admissibility 
> wrt release lifecycle – I don’t think there’s any criteria here, and it’s 
> open to the community’s case-by-case assessment. Whatever we do to fix the 
> bug in question will necessarily be a very significant piece of work itself, 
> for instance.
> 
> My interpretation of the release lifecycle document is that it is acceptable 
> to include this work in a patch release. My belief about its impact is that 
> it would contribute positively to the stability of the project’s 4.0 releases 
> over the lifecycle, and improve project velocity.
> 
> With respect to whether we can ship a fix to 12126 without validation, I 
> would be strongly opposed to this, and certainly would not produce a patch 
> myself in this way. Not only would it be burdensome (given the divergences in 
> the codebase), but I would not consider it acceptably safe (given the 
> divergence).
> 
> 
> From: Jeremiah D Jordan 
> Date: Tuesday, 13 July 2021 at 14:15
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> I tend to agree with Paulo that a major refactoring of some internal 
> interfaces sounds like something to be explicitly avoided in a patch release. 
>  I thought this was the type of change we all agreed we should stop letting 
> in to patch releases, and that we would attempt to release more often (once a 
> year) so changes that only go to trunk would get out faster?  Are we really 
> wanting to break that promise to ourselves before we even release 4.0?  To me 
> “I think we need this feature released faster” is not a reason to put it in 
> 4.0, it could be a reason to release 4.1 sooner.  This is where having a 
> releasable trunk helps, as if we decided as a project that some change was 
> worth a new major being released early the effort of doing that release is 
> much smaller when trunk is releasable.
> 
> Any fix we make in 4.0 would be merged forward into trunk and could be fully 
> verified there?  Probably not the best, but would give more confidence in a 
> fix than otherwise without adding other major changes to 4.0?
> 
> -Jeremiah
> 
>> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>> 
>>> 
>>> Furthermore, we introduced a significant performance regression in all
>>> lines of the software by increasing the number of LWT round-trips. Unless
>>> we intend to leave this regression for a further year without _any_ release
>>> offering a solution, we will need suitable verification mechanisms for
>>> whatever fixes we deliver.
>>> 
>>> My view is that it is unacceptable to leave such a significant regression
>>> unaddressed in all lines of software we intend to release for the
>>> foreseeable future.
>> 
>> 
>> I would like to expand a bit on this as I believe it might be important for
>> people to have the full picture. The fix for  CASSANDRA-12126
>> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
>> regression by increasing the number of LWT round-trips. Nevertheless, the
>> patch introduced a flag to allow users to revert to the previous behavior
>> (previous performance + consistency issue).
>> 
>> Also the patch did not address all paxos consistency issues. There are
>> still some issues during topologie changes (may be in some other scenarios).
>> 
>> My understanding of Benedict's proposal is to fix paxos once and for all
>> without any performance regression.
>> 
>> That goal makes total sense to me. "Where do we do that?" is a more tricky
>> question.
>> 
>> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org  a
>> écrit :
>> 
>>> Hmm. It occurs to me I’m not entire

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Nothing we’re discussing constitutes a feature. We’re discussing stability 
enhancements, and important bug fixes.

I think this disagreement is to some extent founded on our different premises 
about what a patch release should contain, and this seems to be the fault of 
incompletely specified documentation.

1. The release lifecycle only forbids feature work from being developed in a 
patch release, and only expressly includes bug fixes. Note that, the document 
even has a comment by the author suggesting that features may be backported to 
a patch release from trunk (not something I agree with, but it demonstrates the 
ambiguity of the definition).
2. There seems to be some conflation of size-of-change with the admissibility 
wrt release lifecycle – I don’t think there’s any criteria here, and it’s open 
to the community’s case-by-case assessment. Whatever we do to fix the bug in 
question will necessarily be a very significant piece of work itself, for 
instance.

My interpretation of the release lifecycle document is that it is acceptable to 
include this work in a patch release. My belief about its impact is that it 
would contribute positively to the stability of the project’s 4.0 releases over 
the lifecycle, and improve project velocity.

With respect to whether we can ship a fix to 12126 without validation, I would 
be strongly opposed to this, and certainly would not produce a patch myself in 
this way. Not only would it be burdensome (given the divergences in the 
codebase), but I would not consider it acceptably safe (given the divergence).


From: Jeremiah D Jordan 
Date: Tuesday, 13 July 2021 at 14:15
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
I tend to agree with Paulo that a major refactoring of some internal interfaces 
sounds like something to be explicitly avoided in a patch release.  I thought 
this was the type of change we all agreed we should stop letting in to patch 
releases, and that we would attempt to release more often (once a year) so 
changes that only go to trunk would get out faster?  Are we really wanting to 
break that promise to ourselves before we even release 4.0?  To me “I think we 
need this feature released faster” is not a reason to put it in 4.0, it could 
be a reason to release 4.1 sooner.  This is where having a releasable trunk 
helps, as if we decided as a project that some change was worth a new major 
being released early the effort of doing that release is much smaller when 
trunk is releasable.

Any fix we make in 4.0 would be merged forward into trunk and could be fully 
verified there?  Probably not the best, but would give more confidence in a fix 
than otherwise without adding other major changes to 4.0?

-Jeremiah

> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
>
>>
>> Furthermore, we introduced a significant performance regression in all
>> lines of the software by increasing the number of LWT round-trips. Unless
>> we intend to leave this regression for a further year without _any_ release
>> offering a solution, we will need suitable verification mechanisms for
>> whatever fixes we deliver.
>>
>> My view is that it is unacceptable to leave such a significant regression
>> unaddressed in all lines of software we intend to release for the
>> foreseeable future.
>
>
> I would like to expand a bit on this as I believe it might be important for
> people to have the full picture. The fix for  CASSANDRA-12126
> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
> regression by increasing the number of LWT round-trips. Nevertheless, the
> patch introduced a flag to allow users to revert to the previous behavior
> (previous performance + consistency issue).
>
> Also the patch did not address all paxos consistency issues. There are
> still some issues during topologie changes (may be in some other scenarios).
>
> My understanding of Benedict's proposal is to fix paxos once and for all
> without any performance regression.
>
> That goal makes total sense to me. "Where do we do that?" is a more tricky
> question.
>
> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org  a
> écrit :
>
>> Hmm. It occurs to me I’m not entirely sure how our new release process is
>> going to work.
>>
>> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
>> Or will 4.0 be our only active line of software for the next year?
>>
>> Either way, I bet my bottom dollar there will come some regret if we
>> introduce such divergence between the two most active branches we maintain,
>> so early in their lifecycles. If we invest significant resources in
>> improved testing using this framework (which I very much expect) then
>> branches that are not compatible will not benefit, likely reducin

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Paulo Motta
> "Where do we do that?" is a more tricky question.

I am fully aware of the importance of this testing infra to fix
CASSANDRA-12126 with a higher confidence and of Benedict's ability to
deliver a correct and safe patch.

The question is whether we want to be repeating old practices of including
potentially disruptive changes in minor versions or if we are committed to
changing our culture, no matter how confident we are the change is correct.
In my view, if we open a precedent to this change, we are basically saying
we will stick to the old practices and not be committed to providing long
term stability to our users.

In my view CEP-10 is not a strict blocker to CASSANDRA-12126 since we can
verify it with other means and add additional verification on 4.1 as
Jeremiah suggested. But even if it was, the community has worked around the
limitations of LWT for several years, will one more year until we fix these
limitations really make a difference?

Em ter., 13 de jul. de 2021 às 10:15, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> escreveu:

> I tend to agree with Paulo that a major refactoring of some internal
> interfaces sounds like something to be explicitly avoided in a patch
> release.  I thought this was the type of change we all agreed we should
> stop letting in to patch releases, and that we would attempt to release
> more often (once a year) so changes that only go to trunk would get out
> faster?  Are we really wanting to break that promise to ourselves before we
> even release 4.0?  To me “I think we need this feature released faster” is
> not a reason to put it in 4.0, it could be a reason to release 4.1 sooner.
> This is where having a releasable trunk helps, as if we decided as a
> project that some change was worth a new major being released early the
> effort of doing that release is much smaller when trunk is releasable.
>
> Any fix we make in 4.0 would be merged forward into trunk and could be
> fully verified there?  Probably not the best, but would give more
> confidence in a fix than otherwise without adding other major changes to
> 4.0?
>
> -Jeremiah
>
> > On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
> >
> >>
> >> Furthermore, we introduced a significant performance regression in all
> >> lines of the software by increasing the number of LWT round-trips.
> Unless
> >> we intend to leave this regression for a further year without _any_
> release
> >> offering a solution, we will need suitable verification mechanisms for
> >> whatever fixes we deliver.
> >>
> >> My view is that it is unacceptable to leave such a significant
> regression
> >> unaddressed in all lines of software we intend to release for the
> >> foreseeable future.
> >
> >
> > I would like to expand a bit on this as I believe it might be important
> for
> > people to have the full picture. The fix for  CASSANDRA-12126
> > <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
> > regression by increasing the number of LWT round-trips. Nevertheless, the
> > patch introduced a flag to allow users to revert to the previous behavior
> > (previous performance + consistency issue).
> >
> > Also the patch did not address all paxos consistency issues. There are
> > still some issues during topologie changes (may be in some other
> scenarios).
> >
> > My understanding of Benedict's proposal is to fix paxos once and for all
> > without any performance regression.
> >
> > That goal makes total sense to me. "Where do we do that?" is a more
> tricky
> > question.
> >
> > Le mar. 13 juil. 2021 à 14:46, bened...@apache.org 
> a
> > écrit :
> >
> >> Hmm. It occurs to me I’m not entirely sure how our new release process
> is
> >> going to work.
> >>
> >> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
> >> Or will 4.0 be our only active line of software for the next year?
> >>
> >> Either way, I bet my bottom dollar there will come some regret if we
> >> introduce such divergence between the two most active branches we
> maintain,
> >> so early in their lifecycles. If we invest significant resources in
> >> improved testing using this framework (which I very much expect) then
> >> branches that are not compatible will not benefit, likely reducing their
> >> quality; and the risk of backports will increase, due to divergence.
> >>
> >> Altogether, I think it would be a huge mistake. But if we will be
> shipping
> >> releases soon that can fix these aforementioned regressions, I won’t
> >> campaign for

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Jeremiah D Jordan
I tend to agree with Paulo that a major refactoring of some internal interfaces 
sounds like something to be explicitly avoided in a patch release.  I thought 
this was the type of change we all agreed we should stop letting in to patch 
releases, and that we would attempt to release more often (once a year) so 
changes that only go to trunk would get out faster?  Are we really wanting to 
break that promise to ourselves before we even release 4.0?  To me “I think we 
need this feature released faster” is not a reason to put it in 4.0, it could 
be a reason to release 4.1 sooner.  This is where having a releasable trunk 
helps, as if we decided as a project that some change was worth a new major 
being released early the effort of doing that release is much smaller when 
trunk is releasable.

Any fix we make in 4.0 would be merged forward into trunk and could be fully 
verified there?  Probably not the best, but would give more confidence in a fix 
than otherwise without adding other major changes to 4.0?

-Jeremiah

> On Jul 13, 2021, at 7:59 AM, Benjamin Lerer  wrote:
> 
>> 
>> Furthermore, we introduced a significant performance regression in all
>> lines of the software by increasing the number of LWT round-trips. Unless
>> we intend to leave this regression for a further year without _any_ release
>> offering a solution, we will need suitable verification mechanisms for
>> whatever fixes we deliver.
>> 
>> My view is that it is unacceptable to leave such a significant regression
>> unaddressed in all lines of software we intend to release for the
>> foreseeable future.
> 
> 
> I would like to expand a bit on this as I believe it might be important for
> people to have the full picture. The fix for  CASSANDRA-12126
> <https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
> regression by increasing the number of LWT round-trips. Nevertheless, the
> patch introduced a flag to allow users to revert to the previous behavior
> (previous performance + consistency issue).
> 
> Also the patch did not address all paxos consistency issues. There are
> still some issues during topologie changes (may be in some other scenarios).
> 
> My understanding of Benedict's proposal is to fix paxos once and for all
> without any performance regression.
> 
> That goal makes total sense to me. "Where do we do that?" is a more tricky
> question.
> 
> Le mar. 13 juil. 2021 à 14:46, bened...@apache.org  a
> écrit :
> 
>> Hmm. It occurs to me I’m not entirely sure how our new release process is
>> going to work.
>> 
>> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
>> Or will 4.0 be our only active line of software for the next year?
>> 
>> Either way, I bet my bottom dollar there will come some regret if we
>> introduce such divergence between the two most active branches we maintain,
>> so early in their lifecycles. If we invest significant resources in
>> improved testing using this framework (which I very much expect) then
>> branches that are not compatible will not benefit, likely reducing their
>> quality; and the risk of backports will increase, due to divergence.
>> 
>> Altogether, I think it would be a huge mistake. But if we will be shipping
>> releases soon that can fix these aforementioned regressions, I won’t
>> campaign for it.
>> 
>> 
>> 
>> From: bened...@apache.org 
>> Date: Tuesday, 13 July 2021 at 13:31
>> To: dev@cassandra.apache.org 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>> No change is without risk; we have introduced serious regressions with bug
>> fixes to patch releases. The overall risk to the release lifecycle is
>> reduced significantly in my opinion, as we reduce the likelihood of
>> introducing regressions, and can use the same test infrastructure across
>> all of the actively developed releases, increasing our confidence in 4.0.x
>> releases.
>> 
>> Furthermore, we introduced a significant performance regression in all
>> lines of the software by increasing the number of LWT round-trips. Unless
>> we intend to leave this regression for a further year without _any_ release
>> offering a solution, we will need suitable verification mechanisms for
>> whatever fixes we deliver.
>> 
>> My view is that it is unacceptable to leave such a significant regression
>> unaddressed in all lines of software we intend to release for the
>> foreseeable future.
>> 
>> 
>> From: Paulo Motta 
>> Date: Tuesday, 13 July 2021 at 13:21
>> To: Cassandra DEV 
>> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>>> No, in my opini

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Benjamin Lerer
>
> Furthermore, we introduced a significant performance regression in all
> lines of the software by increasing the number of LWT round-trips. Unless
> we intend to leave this regression for a further year without _any_ release
> offering a solution, we will need suitable verification mechanisms for
> whatever fixes we deliver.
>
> My view is that it is unacceptable to leave such a significant regression
> unaddressed in all lines of software we intend to release for the
> foreseeable future.


I would like to expand a bit on this as I believe it might be important for
people to have the full picture. The fix for  CASSANDRA-12126
<https://issues.apache.org/jira/browse/CASSANDRA-12126> introduced a
regression by increasing the number of LWT round-trips. Nevertheless, the
patch introduced a flag to allow users to revert to the previous behavior
(previous performance + consistency issue).

Also the patch did not address all paxos consistency issues. There are
still some issues during topologie changes (may be in some other scenarios).

My understanding of Benedict's proposal is to fix paxos once and for all
without any performance regression.

That goal makes total sense to me. "Where do we do that?" is a more tricky
question.

Le mar. 13 juil. 2021 à 14:46, bened...@apache.org  a
écrit :

> Hmm. It occurs to me I’m not entirely sure how our new release process is
> going to work.
>
> Will we be releasing 4.1 builds immediately, as part of shippable trunk?
> Or will 4.0 be our only active line of software for the next year?
>
> Either way, I bet my bottom dollar there will come some regret if we
> introduce such divergence between the two most active branches we maintain,
> so early in their lifecycles. If we invest significant resources in
> improved testing using this framework (which I very much expect) then
> branches that are not compatible will not benefit, likely reducing their
> quality; and the risk of backports will increase, due to divergence.
>
> Altogether, I think it would be a huge mistake. But if we will be shipping
> releases soon that can fix these aforementioned regressions, I won’t
> campaign for it.
>
>
>
> From: bened...@apache.org 
> Date: Tuesday, 13 July 2021 at 13:31
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> No change is without risk; we have introduced serious regressions with bug
> fixes to patch releases. The overall risk to the release lifecycle is
> reduced significantly in my opinion, as we reduce the likelihood of
> introducing regressions, and can use the same test infrastructure across
> all of the actively developed releases, increasing our confidence in 4.0.x
> releases.
>
> Furthermore, we introduced a significant performance regression in all
> lines of the software by increasing the number of LWT round-trips. Unless
> we intend to leave this regression for a further year without _any_ release
> offering a solution, we will need suitable verification mechanisms for
> whatever fixes we deliver.
>
> My view is that it is unacceptable to leave such a significant regression
> unaddressed in all lines of software we intend to release for the
> foreseeable future.
>
>
> From: Paulo Motta 
> Date: Tuesday, 13 July 2021 at 13:21
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > No, in my opinion the target should be 4.0.x. We are reaching for a
> shippable trunk and this has no public API impacts. This work is IMO
> central to achieving a shippable trunk, either way. The only reason I do
> not target 3.x is that it would be too burdensome.
>
> In my limited view of the proposal, a major refactor of internal
> concurrency APIs to support the testing facility potentially risks the
> stability of a minor release, something we've been wanting to avoid with
> our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
> we will create precedence to including non-bugfix changes in minor
> versions, something I think we should avoid.
>
> In the past we've been lenient to including seemingly harmless internal
> changes that caused client impact and we should be careful to avoid this in
> the future. To prevent this I think we should take a strict approach and
> only accept bug fixes in minor (ie. 4.0.x) versions moving forward.
>
> I'd go one step further and propose that any CEPs, which are generally
> about new features, major API changes or internal refactorings, should only
> be allowed in subsequent major versions, unless an explicit exception is
> granted.
>
> Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
> bened...@apache.org> escreveu:
>
> > Perhaps it’s worth looking forward at the roadmap tha

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> This is a fair point.  But a CEP isn't required to solve this.

I think the work contained in this CEP is necessary to safely solving this 
problem, and I have some empirical evidence in favour of this assertion.


From: Brandon Williams 
Date: Tuesday, 13 July 2021 at 13:39
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
On Tue, Jul 13, 2021 at 7:31 AM bened...@apache.org  wrote:
> Furthermore, we introduced a significant performance regression in all lines 
> of the software by increasing the number of LWT round-trips. Unless we intend 
> to leave this regression for a further year without _any_ release offering a 
> solution, we will need suitable verification mechanisms for whatever fixes we 
> deliver.
>
> My view is that it is unacceptable to leave such a significant regression 
> unaddressed in all lines of software we intend to release for the foreseeable 
> future.

This is a fair point.  But a CEP isn't required to solve this.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Brandon Williams
On Tue, Jul 13, 2021 at 7:31 AM bened...@apache.org  wrote:
> Furthermore, we introduced a significant performance regression in all lines 
> of the software by increasing the number of LWT round-trips. Unless we intend 
> to leave this regression for a further year without _any_ release offering a 
> solution, we will need suitable verification mechanisms for whatever fixes we 
> deliver.
>
> My view is that it is unacceptable to leave such a significant regression 
> unaddressed in all lines of software we intend to release for the foreseeable 
> future.

This is a fair point.  But a CEP isn't required to solve this.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
No change is without risk; we have introduced serious regressions with bug 
fixes to patch releases. The overall risk to the release lifecycle is reduced 
significantly in my opinion, as we reduce the likelihood of introducing 
regressions, and can use the same test infrastructure across all of the 
actively developed releases, increasing our confidence in 4.0.x releases.

Furthermore, we introduced a significant performance regression in all lines of 
the software by increasing the number of LWT round-trips. Unless we intend to 
leave this regression for a further year without _any_ release offering a 
solution, we will need suitable verification mechanisms for whatever fixes we 
deliver.

My view is that it is unacceptable to leave such a significant regression 
unaddressed in all lines of software we intend to release for the foreseeable 
future.


From: Paulo Motta 
Date: Tuesday, 13 July 2021 at 13:21
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> No, in my opinion the target should be 4.0.x. We are reaching for a
shippable trunk and this has no public API impacts. This work is IMO
central to achieving a shippable trunk, either way. The only reason I do
not target 3.x is that it would be too burdensome.

In my limited view of the proposal, a major refactor of internal
concurrency APIs to support the testing facility potentially risks the
stability of a minor release, something we've been wanting to avoid with
our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
we will create precedence to including non-bugfix changes in minor
versions, something I think we should avoid.

In the past we've been lenient to including seemingly harmless internal
changes that caused client impact and we should be careful to avoid this in
the future. To prevent this I think we should take a strict approach and
only accept bug fixes in minor (ie. 4.0.x) versions moving forward.

I'd go one step further and propose that any CEPs, which are generally
about new features, major API changes or internal refactorings, should only
be allowed in subsequent major versions, unless an explicit exception is
granted.

Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
bened...@apache.org> escreveu:

> Perhaps it’s worth looking forward at the roadmap that we plan to develop,
> and consider whether such a facility would be welcome for proving their
> safety, and we can then worry about evolving the specifics of any API(s)
> together as we deploy the capability? Looking ahead, there are very few
> major features I wouldn’t want to see exercised with this approach, given
> the choice.
>
> The LWT Verifier by itself is an integration test that covers many of the
> affected subsystems, including sstables, memtables and repair. But we will
> have the ability to introduce dedicated verification for each of these
> features and systems, and we will necessarily produce more robust code
> (repair is a great example of a brittle system that would be impossible to
> produce with such an adversarial test system)
>
>
> *Query side improvements:*
>
>   * Storage Attached Index or SAI. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>   * Add support for OR predicates in the CQL where clause
>   * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
> in GROUP BY clause
>   * Ability to read the TTL and WRITE TIME of an element in a collection
> (CASSANDRA-8877)
>   * Multi-Partition LWTs
>   * Materialized views hardening: Addressing the different Materialized
> Views issues (see CASSANDRA-15921 and [1] for some of the work involved)
>
> *Security improvements:*
>
>   * SSTables encryption (CASSANDRA-9633)
>   * Add support for Dynamic Data Masking (CEP pending)
>   * Allow the creation of roles that have the ability to assign arbitrary
> privileges, or scoped privileges without also granting those roles access
> to database objects.
>   * Filter rows from system and system_schema based on users permissions
> (CASSANDRA-15871)
>
> *Performance improvements:*
>
>   * Trie-based index format (CEP pending)
>   * Trie-based memtables (CEP pending)
>   * Paxos improvements: Paxos / LWT implementation that would enable the
> database to serve serial writes with two round-trips and serial reads with
> one round-trip in the uncontended case
>
> *Safety/Usability improvements:*
>
>   * Guardrails. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>   * Add ability to track state in repair (CASSANDRA-15399)
>   * Repair coordinator improvements (CASSANDRA-15399)
>   * Make incremental backup configurable per keyspace and table
> (CASSANDRA-15402)
>   * Add ability to blacklis

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Benjamin Lerer
> (CASSANDRA-12106)
> >   * Add default and required keyspace replication options
> (CASSANDRA-14557)
> >   * Transactional Cluster Metadata: Use of transactions to propagate
> > cluster metadata
> >   * Downgrade-ability: Ability to downgrade to downgrade in the event
> that
> > a serious issue has been identified
> >
> > *Pluggability improvements:*
> >
> >   * Pluggable schema manager (CEP pending)
> >   * Pluggable filesystem (CEP pending)
> >   * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can
> be
> > found at
> >
> >
> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
> >   * Memtable API (CEP pending). The goal being to allow improvements such
> > as CASSANDRA-13981 to be easily plugged into Cassandra
> >
> > *Memtable pluggable implementation:*
> >
> >   * Enable Cassandra for Persistent Memory (CASSANDRA-13981)
> >
> >
> >
> >
> > From: bened...@apache.org 
> > Date: Tuesday, 13 July 2021 at 10:51
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Ach, editing code in the email editor isn’t smart when editors all have
> > different meanings for key combinations (accidentally hit send), but you
> > get the idea. The simulator would intercept these thread executions, the
> > memory accesses for the annotated field, and evaluate them so that in
> some
> > cases the assertions would fail.
> >
> > This is obviously a toy example that is not very interesting, but the
> main
> > real example we have is too complicated to produce a snippet to
> > demonstrate. In my view, the long term outcome of this work is likely the
> > enablement of many unit tests that are a little more complicated than
> this,
> > on less obvious code.
> >
> > But the headline goal of the CEP is not. By itself, the LWT Verifier
> > demonstrates the power and utility of the work. I don’t believe it is
> > terribly helpful to focus on secondary justifications like the example I
> > gave. For me, the _ability_ to prove the correctness of difficult but
> > critical systems is justification enough, whether or not we deliver a
> > simple API as part of the CEP.
> >
> >
> >
> > From: bened...@apache.org 
> > Date: Tuesday, 13 July 2021 at 10:43
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> > > Should target release be 4.1. (not 4.0.x) ?
> >
> >
> >
> > No, in my opinion the target should be 4.0.x. We are reaching for a
> > shippable trunk and this has no public API impacts. This work is IMO
> > central to achieving a shippable trunk, either way. The only reason I do
> > not target 3.x is that it would be too burdensome.
> >
> > > My concern is that changing code and tests at the same time risks
> > regressions…
> >
> >
> >
> > I’ve never heard this position before. Would you care to elaborate? It is
> > quite normal for us to update tests alongside changes to the code.
> >
> > > And seconding Benjamin's comments… some documentation on how to write a
> > test, and a simple test example, that this CEP then allows us to write
> > would help a lot (a la "working backwards").
> >
> > 1) This work is to _enable_ the development of tests, with the only test
> > originally planned to arrive alongside it the fairly sophisticated LWT
> > Verifier. This is something we have sorely needed as a project, as we
> have
> > had serious correctness violations for multiple years. This broad
> category
> > of integrated test for verifying correctness is the main goal of the work
> > and is not easily condensed into an example snippet.
> > 2) It is _possible_ that some simple and fluid APIs will be introduced in
> > a later phase of this work, but they haven’t been designed yet, so I
> cannot
> > share snippets.
> >
> > In principle, however, you would be able to do something like:
> >
> > @Nemesis volatile int x = 0;
> > int foo() {
> > x = x + 1;
> > return x;
> > }
> >
> > @Test
> > void test() {
> > Future f1 = executor.submit(() -> foo());
> > Future f2 = executor.submit(() -> foo());
> > Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
> > }
> >
> >
> > From: Mick Semb Wever 
> > Date: Tuesday, 13 July 2021 at 10:28
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-10: Cluster and Code Si

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Paulo Motta
> No, in my opinion the target should be 4.0.x. We are reaching for a
shippable trunk and this has no public API impacts. This work is IMO
central to achieving a shippable trunk, either way. The only reason I do
not target 3.x is that it would be too burdensome.

In my limited view of the proposal, a major refactor of internal
concurrency APIs to support the testing facility potentially risks the
stability of a minor release, something we've been wanting to avoid with
our focus on stability. So I'd prefer this to go in  trunk/4.1, otherwise
we will create precedence to including non-bugfix changes in minor
versions, something I think we should avoid.

In the past we've been lenient to including seemingly harmless internal
changes that caused client impact and we should be careful to avoid this in
the future. To prevent this I think we should take a strict approach and
only accept bug fixes in minor (ie. 4.0.x) versions moving forward.

I'd go one step further and propose that any CEPs, which are generally
about new features, major API changes or internal refactorings, should only
be allowed in subsequent major versions, unless an explicit exception is
granted.

Em ter., 13 de jul. de 2021 às 07:11, bened...@apache.org <
bened...@apache.org> escreveu:

> Perhaps it’s worth looking forward at the roadmap that we plan to develop,
> and consider whether such a facility would be welcome for proving their
> safety, and we can then worry about evolving the specifics of any API(s)
> together as we deploy the capability? Looking ahead, there are very few
> major features I wouldn’t want to see exercised with this approach, given
> the choice.
>
> The LWT Verifier by itself is an integration test that covers many of the
> affected subsystems, including sstables, memtables and repair. But we will
> have the ability to introduce dedicated verification for each of these
> features and systems, and we will necessarily produce more robust code
> (repair is a great example of a brittle system that would be impossible to
> produce with such an adversarial test system)
>
>
> *Query side improvements:*
>
>   * Storage Attached Index or SAI. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
>   * Add support for OR predicates in the CQL where clause
>   * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
> in GROUP BY clause
>   * Ability to read the TTL and WRITE TIME of an element in a collection
> (CASSANDRA-8877)
>   * Multi-Partition LWTs
>   * Materialized views hardening: Addressing the different Materialized
> Views issues (see CASSANDRA-15921 and [1] for some of the work involved)
>
> *Security improvements:*
>
>   * SSTables encryption (CASSANDRA-9633)
>   * Add support for Dynamic Data Masking (CEP pending)
>   * Allow the creation of roles that have the ability to assign arbitrary
> privileges, or scoped privileges without also granting those roles access
> to database objects.
>   * Filter rows from system and system_schema based on users permissions
> (CASSANDRA-15871)
>
> *Performance improvements:*
>
>   * Trie-based index format (CEP pending)
>   * Trie-based memtables (CEP pending)
>   * Paxos improvements: Paxos / LWT implementation that would enable the
> database to serve serial writes with two round-trips and serial reads with
> one round-trip in the uncontended case
>
> *Safety/Usability improvements:*
>
>   * Guardrails. The CEP can be found at
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
>   * Add ability to track state in repair (CASSANDRA-15399)
>   * Repair coordinator improvements (CASSANDRA-15399)
>   * Make incremental backup configurable per keyspace and table
> (CASSANDRA-15402)
>   * Add ability to blacklist a CQL partition so all requests are ignored
> (CASSANDRA-12106)
>   * Add default and required keyspace replication options (CASSANDRA-14557)
>   * Transactional Cluster Metadata: Use of transactions to propagate
> cluster metadata
>   * Downgrade-ability: Ability to downgrade to downgrade in the event that
> a serious issue has been identified
>
> *Pluggability improvements:*
>
>   * Pluggable schema manager (CEP pending)
>   * Pluggable filesystem (CEP pending)
>   * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can be
> found at
>
> https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
>   * Memtable API (CEP pending). The goal being to allow improvements such
> as CASSANDRA-13981 to be easily plugged into Cassandra
>
> *Memtable pluggable implementation:*
>
>   * Enable Cassandra for Persistent Memory (CASSANDRA-13981)
>
>
>
>
> From: bened...@ap

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Perhaps it’s worth looking forward at the roadmap that we plan to develop, and 
consider whether such a facility would be welcome for proving their safety, and 
we can then worry about evolving the specifics of any API(s) together as we 
deploy the capability? Looking ahead, there are very few major features I 
wouldn’t want to see exercised with this approach, given the choice.

The LWT Verifier by itself is an integration test that covers many of the 
affected subsystems, including sstables, memtables and repair. But we will have 
the ability to introduce dedicated verification for each of these features and 
systems, and we will necessarily produce more robust code (repair is a great 
example of a brittle system that would be impossible to produce with such an 
adversarial test system)


*Query side improvements:*

  * Storage Attached Index or SAI. The CEP can be found at
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
  * Add support for OR predicates in the CQL where clause
  * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
in GROUP BY clause
  * Ability to read the TTL and WRITE TIME of an element in a collection
(CASSANDRA-8877)
  * Multi-Partition LWTs
  * Materialized views hardening: Addressing the different Materialized
Views issues (see CASSANDRA-15921 and [1] for some of the work involved)

*Security improvements:*

  * SSTables encryption (CASSANDRA-9633)
  * Add support for Dynamic Data Masking (CEP pending)
  * Allow the creation of roles that have the ability to assign arbitrary
privileges, or scoped privileges without also granting those roles access
to database objects.
  * Filter rows from system and system_schema based on users permissions
(CASSANDRA-15871)

*Performance improvements:*

  * Trie-based index format (CEP pending)
  * Trie-based memtables (CEP pending)
  * Paxos improvements: Paxos / LWT implementation that would enable the
database to serve serial writes with two round-trips and serial reads with
one round-trip in the uncontended case

*Safety/Usability improvements:*

  * Guardrails. The CEP can be found at
https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
  * Add ability to track state in repair (CASSANDRA-15399)
  * Repair coordinator improvements (CASSANDRA-15399)
  * Make incremental backup configurable per keyspace and table
(CASSANDRA-15402)
  * Add ability to blacklist a CQL partition so all requests are ignored
(CASSANDRA-12106)
  * Add default and required keyspace replication options (CASSANDRA-14557)
  * Transactional Cluster Metadata: Use of transactions to propagate
cluster metadata
  * Downgrade-ability: Ability to downgrade to downgrade in the event that
a serious issue has been identified

*Pluggability improvements:*

  * Pluggable schema manager (CEP pending)
  * Pluggable filesystem (CEP pending)
  * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can be
found at
https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
  * Memtable API (CEP pending). The goal being to allow improvements such
as CASSANDRA-13981 to be easily plugged into Cassandra

*Memtable pluggable implementation:*

  * Enable Cassandra for Persistent Memory (CASSANDRA-13981)




From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:51
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Ach, editing code in the email editor isn’t smart when editors all have 
different meanings for key combinations (accidentally hit send), but you get 
the idea. The simulator would intercept these thread executions, the memory 
accesses for the annotated field, and evaluate them so that in some cases the 
assertions would fail.

This is obviously a toy example that is not very interesting, but the main real 
example we have is too complicated to produce a snippet to demonstrate. In my 
view, the long term outcome of this work is likely the enablement of many unit 
tests that are a little more complicated than this, on less obvious code.

But the headline goal of the CEP is not. By itself, the LWT Verifier 
demonstrates the power and utility of the work. I don’t believe it is terribly 
helpful to focus on secondary justifications like the example I gave. For me, 
the _ability_ to prove the correctness of difficult but critical systems is 
justification enough, whether or not we deliver a simple API as part of the CEP.



From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:43
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Should target release be 4.1. (not 4.0.x) ?



No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changin

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Ach, editing code in the email editor isn’t smart when editors all have 
different meanings for key combinations (accidentally hit send), but you get 
the idea. The simulator would intercept these thread executions, the memory 
accesses for the annotated field, and evaluate them so that in some cases the 
assertions would fail.

This is obviously a toy example that is not very interesting, but the main real 
example we have is too complicated to produce a snippet to demonstrate. In my 
view, the long term outcome of this work is likely the enablement of many unit 
tests that are a little more complicated than this, on less obvious code.

But the headline goal of the CEP is not. By itself, the LWT Verifier 
demonstrates the power and utility of the work. I don’t believe it is terribly 
helpful to focus on secondary justifications like the example I gave. For me, 
the _ability_ to prove the correctness of difficult but critical systems is 
justification enough, whether or not we deliver a simple API as part of the CEP.



From: bened...@apache.org 
Date: Tuesday, 13 July 2021 at 10:43
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
> Should target release be 4.1. (not 4.0.x) ?


No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changing code and tests at the same time risks regressions…


I’ve never heard this position before. Would you care to elaborate? It is quite 
normal for us to update tests alongside changes to the code.

> And seconding Benjamin's comments… some documentation on how to write a test, 
> and a simple test example, that this CEP then allows us to write would help a 
> lot (a la "working backwards").

1) This work is to _enable_ the development of tests, with the only test 
originally planned to arrive alongside it the fairly sophisticated LWT 
Verifier. This is something we have sorely needed as a project, as we have had 
serious correctness violations for multiple years. This broad category of 
integrated test for verifying correctness is the main goal of the work and is 
not easily condensed into an example snippet.
2) It is _possible_ that some simple and fluid APIs will be introduced in a 
later phase of this work, but they haven’t been designed yet, so I cannot share 
snippets.

In principle, however, you would be able to do something like:

@Nemesis volatile int x = 0;
int foo() {
x = x + 1;
return x;
}

@Test
void test() {
Future f1 = executor.submit(() -> foo());
Future f2 = executor.submit(() -> foo());
Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
}


From: Mick Semb Wever 
Date: Tuesday, 13 July 2021 at 10:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
>
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
>
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
>
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>



Should target release be 4.1. (not 4.0.x) ?

I'd be interested in seeing a rough timeline/plan of how the proposed
changes are to be defined in JIRAs and ordered.

I'd like to hear a bit more about the test plan. Not so much about how
the CEP itself improves testability of the project, but for example
the testing required to be in place to introduce the changes of the
CEP (and if it already exists, where). My concern is that changing
code and tests at the same time risks regressions…

And seconding Benjamin's comments… some documentation on how to write
a test, and a simple test example, that this CEP then allows us to
write would help a lot (a la "working backwards").

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
> Should target release be 4.1. (not 4.0.x) ?

No, in my opinion the target should be 4.0.x. We are reaching for a shippable 
trunk and this has no public API impacts. This work is IMO central to achieving 
a shippable trunk, either way. The only reason I do not target 3.x is that it 
would be too burdensome.

> My concern is that changing code and tests at the same time risks regressions…

I’ve never heard this position before. Would you care to elaborate? It is quite 
normal for us to update tests alongside changes to the code.

> And seconding Benjamin's comments… some documentation on how to write a test, 
> and a simple test example, that this CEP then allows us to write would help a 
> lot (a la "working backwards").

1) This work is to _enable_ the development of tests, with the only test 
originally planned to arrive alongside it the fairly sophisticated LWT 
Verifier. This is something we have sorely needed as a project, as we have had 
serious correctness violations for multiple years. This broad category of 
integrated test for verifying correctness is the main goal of the work and is 
not easily condensed into an example snippet.
2) It is _possible_ that some simple and fluid APIs will be introduced in a 
later phase of this work, but they haven’t been designed yet, so I cannot share 
snippets.

In principle, however, you would be able to do something like:

@Nemesis volatile int x = 0;
int foo() {
x = x + 1;
return x;
}

@Test
void test() {
Future f1 = executor.submit(() -> foo());
Future f2 = executor.submit(() -> foo());
Assert.assertTrue(f1.get() == 1 || f2.get() == 1);
}


From: Mick Semb Wever 
Date: Tuesday, 13 July 2021 at 10:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
>
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
>
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
>
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
>
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>



Should target release be 4.1. (not 4.0.x) ?

I'd be interested in seeing a rough timeline/plan of how the proposed
changes are to be defined in JIRAs and ordered.

I'd like to hear a bit more about the test plan. Not so much about how
the CEP itself improves testability of the project, but for example
the testing required to be in place to introduce the changes of the
CEP (and if it already exists, where). My concern is that changing
code and tests at the same time risks regressions…

And seconding Benjamin's comments… some documentation on how to write
a test, and a simple test example, that this CEP then allows us to
write would help a lot (a la "working backwards").

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Mick Semb Wever
>
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
>
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
>
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
>
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>



Should target release be 4.1. (not 4.0.x) ?

I'd be interested in seeing a rough timeline/plan of how the proposed
changes are to be defined in JIRAs and ordered.

I'd like to hear a bit more about the test plan. Not so much about how
the CEP itself improves testability of the project, but for example
the testing required to be in place to introduce the changes of the
CEP (and if it already exists, where). My concern is that changing
code and tests at the same time risks regressions…

And seconding Benjamin's comments… some documentation on how to write
a test, and a simple test example, that this CEP then allows us to
write would help a lot (a la "working backwards").

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Hi Benjamin,

The concurrency constructs listed are all _blocking_ concurrency primitives, 
i.e. they put threads to sleep and wake them up. Since the goal of this work is 
pseudorandom execution of the application, trapping thread events is a central 
feature.

The ability to mock the file system is only to ensure the execution is 
_deterministic_. Otherwise a cluster running billions of simulations would be 
almost useless, as you would not readily be able to reproduce the sequence on a 
local machine. The execution order is extremely brittle, with even a different 
patch release of the JVM being able to produce a different sequence of 
execution (in some cases, at least – no doubt many patch releases do not have 
ordering impacts).

The best example of this work is the LWT linearizability verifier that will be 
included with it, which is quite a simple test to put together with the 
simulator: you simply issue some LWT reads and writes to a cluster, and the 
simulator intercepts* every message and thread (and in some specific relevant 
cases, memory access) event, and executes them in pseudorandom order. Each run 
exhibits unique behaviour, exploring different edge cases in the system. If we 
were to only intercept message events, we would fail to explore a wide variety 
of potentially erroneous states in the system – including even those only 
related to message delivery (in the real world, responses can be received 
before the thread sending them completes the act of doing so, for instance).


From: Benjamin Lerer 
Date: Tuesday, 13 July 2021 at 09:50
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-10: Cluster and Code Simulations
Hi Benedict, Sam,

Could you describe some of the scenarios that this new framework will allow
us to test ? They might help me to understand the changes required.
The need for the changes around concurrency and file access is not obvious
to me. By consequence, I am guessing that I probably do not fully
understand the goal of the proposal.

Thanks in advance

Benjamin


Le mar. 13 juil. 2021 à 10:37, Sam Tunnicliffe  a écrit :

> Spoiler alert: I am pretty familiar with the proposal and the off-list
> work that has been done toward it.
>
> From my perspective, I have no qualms about putting this CEP up for a
> vote. Having seen the potential (and to some degree, realised) benefit of
> this proposal, I am
> convinced of its value.
>
> Thanks,
> Sam
>
> > On 13 Jul 2021, at 09:20, bened...@apache.org wrote:
> >
> > Did anyone have any thoughts on this CEP, or shall I bring it forward
> for a vote also?
> >
> > From: bened...@apache.org 
> > Date: Thursday, 3 June 2021 at 20:19
> > To: dev@cassandra.apache.org 
> > Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Proposal for a mechanism to evaluate whole clusters, or individual
> classes, with a deterministically pseudorandom ordering of all thread and
> message events.
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
> >
> > Evaluating the correctness of distributed systems is hard, as I’m sure
> every developer on this list appreciates. As the project has matured, we
> have had to grapple more with the guarantees we provide users for features
> we develop, and the semantics we promise, particularly around edge-cases
> between two mechanisms or systems.
> >
> > This work aims to dramatically reduce the project overhead necessary for
> delivering a bug-free Cassandra.
> >
> > The premise is to intercept all relevant events that could be performed
> in a different order, i.e. primarily message delivery and thread events
> such as executor submission, signalling of threads, lock acquisition and
> release, and even volatile reads and writes (to a lesser extent). These
> events are then scheduled pseudo-randomly (with various restrictions to
> ensure a valid execution), or in some cases not evaluated at all (to
> simulate e.g. messages being lost). The result is a repeatable sequential
> evaluation of a multi-threaded, multi-actor system.
> >
> > This permits us to evaluate a much broader range of cluster behaviours
> without any additional development work, permitting us to implement a broad
> range of property-based and related randomized acceptance tests, without
> significant developer burden.
> >
> > The work will apply just as readily to multi-threaded single classes as
> it will to whole clusters, and will come with a linearizability test for
> LWTs as well as a unit test for an existing multi-threaded bug that is
> otherwise hard to exhibit.
> >
> > To achieve this, significant modifications will be required to the
> codebase, mostly cleaning up existing abstractions. Specifically, we will
> need to be able t

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Benjamin Lerer
 Hi Benedict, Sam,

Could you describe some of the scenarios that this new framework will allow
us to test ? They might help me to understand the changes required.
The need for the changes around concurrency and file access is not obvious
to me. By consequence, I am guessing that I probably do not fully
understand the goal of the proposal.

Thanks in advance

Benjamin


Le mar. 13 juil. 2021 à 10:37, Sam Tunnicliffe  a écrit :

> Spoiler alert: I am pretty familiar with the proposal and the off-list
> work that has been done toward it.
>
> From my perspective, I have no qualms about putting this CEP up for a
> vote. Having seen the potential (and to some degree, realised) benefit of
> this proposal, I am
> convinced of its value.
>
> Thanks,
> Sam
>
> > On 13 Jul 2021, at 09:20, bened...@apache.org wrote:
> >
> > Did anyone have any thoughts on this CEP, or shall I bring it forward
> for a vote also?
> >
> > From: bened...@apache.org 
> > Date: Thursday, 3 June 2021 at 20:19
> > To: dev@cassandra.apache.org 
> > Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
> > Proposal for a mechanism to evaluate whole clusters, or individual
> classes, with a deterministically pseudorandom ordering of all thread and
> message events.
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
> >
> > Evaluating the correctness of distributed systems is hard, as I’m sure
> every developer on this list appreciates. As the project has matured, we
> have had to grapple more with the guarantees we provide users for features
> we develop, and the semantics we promise, particularly around edge-cases
> between two mechanisms or systems.
> >
> > This work aims to dramatically reduce the project overhead necessary for
> delivering a bug-free Cassandra.
> >
> > The premise is to intercept all relevant events that could be performed
> in a different order, i.e. primarily message delivery and thread events
> such as executor submission, signalling of threads, lock acquisition and
> release, and even volatile reads and writes (to a lesser extent). These
> events are then scheduled pseudo-randomly (with various restrictions to
> ensure a valid execution), or in some cases not evaluated at all (to
> simulate e.g. messages being lost). The result is a repeatable sequential
> evaluation of a multi-threaded, multi-actor system.
> >
> > This permits us to evaluate a much broader range of cluster behaviours
> without any additional development work, permitting us to implement a broad
> range of property-based and related randomized acceptance tests, without
> significant developer burden.
> >
> > The work will apply just as readily to multi-threaded single classes as
> it will to whole clusters, and will come with a linearizability test for
> LWTs as well as a unit test for an existing multi-threaded bug that is
> otherwise hard to exhibit.
> >
> > To achieve this, significant modifications will be required to the
> codebase, mostly cleaning up existing abstractions. Specifically, we will
> need to be able to mock executors, any blocking concurrency primitives,
> time, filesystem access and internode streaming.
> >
> > The work is – in large part – already complete, with JIRA and PRs to
> follow in the coming weeks. Of course, the work is subject to the usual
> community input and review, so this does not preclude changes to the work
> (even significant ones, if they are warranted). I know a lot of incoming
> CEP are likely to be backed up by significant off-list development as a
> result of the focus on a shippable 4.0. Hopefully this is just a temporary
> growing pain, particularly as we move towards a shippable trunk.
> >
> > I hope this work will be of huge value to the project, particularly as
> we race to catch up on years of limited feature development.
> >
> > JIRA and PRs will follow, but I wanted to kick-off discussion in advance.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread Sam Tunnicliffe
Spoiler alert: I am pretty familiar with the proposal and the off-list work 
that has been done toward it. 

From my perspective, I have no qualms about putting this CEP up for a vote. 
Having seen the potential (and to some degree, realised) benefit of this 
proposal, I am
convinced of its value.

Thanks,
Sam

> On 13 Jul 2021, at 09:20, bened...@apache.org wrote:
> 
> Did anyone have any thoughts on this CEP, or shall I bring it forward for a 
> vote also?
> 
> From: bened...@apache.org 
> Date: Thursday, 3 June 2021 at 20:19
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
> Proposal for a mechanism to evaluate whole clusters, or individual classes, 
> with a deterministically pseudorandom ordering of all thread and message 
> events.
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
> 
> Evaluating the correctness of distributed systems is hard, as I’m sure every 
> developer on this list appreciates. As the project has matured, we have had 
> to grapple more with the guarantees we provide users for features we develop, 
> and the semantics we promise, particularly around edge-cases between two 
> mechanisms or systems.
> 
> This work aims to dramatically reduce the project overhead necessary for 
> delivering a bug-free Cassandra.
> 
> The premise is to intercept all relevant events that could be performed in a 
> different order, i.e. primarily message delivery and thread events such as 
> executor submission, signalling of threads, lock acquisition and release, and 
> even volatile reads and writes (to a lesser extent). These events are then 
> scheduled pseudo-randomly (with various restrictions to ensure a valid 
> execution), or in some cases not evaluated at all (to simulate e.g. messages 
> being lost). The result is a repeatable sequential evaluation of a 
> multi-threaded, multi-actor system.
> 
> This permits us to evaluate a much broader range of cluster behaviours 
> without any additional development work, permitting us to implement a broad 
> range of property-based and related randomized acceptance tests, without 
> significant developer burden.
> 
> The work will apply just as readily to multi-threaded single classes as it 
> will to whole clusters, and will come with a linearizability test for LWTs as 
> well as a unit test for an existing multi-threaded bug that is otherwise hard 
> to exhibit.
> 
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
> 
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
> 
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
> 
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-13 Thread bened...@apache.org
Did anyone have any thoughts on this CEP, or shall I bring it forward for a 
vote also?

From: bened...@apache.org 
Date: Thursday, 3 June 2021 at 20:19
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
Proposal for a mechanism to evaluate whole clusters, or individual classes, 
with a deterministically pseudorandom ordering of all thread and message events.

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations

Evaluating the correctness of distributed systems is hard, as I’m sure every 
developer on this list appreciates. As the project has matured, we have had to 
grapple more with the guarantees we provide users for features we develop, and 
the semantics we promise, particularly around edge-cases between two mechanisms 
or systems.

This work aims to dramatically reduce the project overhead necessary for 
delivering a bug-free Cassandra.

The premise is to intercept all relevant events that could be performed in a 
different order, i.e. primarily message delivery and thread events such as 
executor submission, signalling of threads, lock acquisition and release, and 
even volatile reads and writes (to a lesser extent). These events are then 
scheduled pseudo-randomly (with various restrictions to ensure a valid 
execution), or in some cases not evaluated at all (to simulate e.g. messages 
being lost). The result is a repeatable sequential evaluation of a 
multi-threaded, multi-actor system.

This permits us to evaluate a much broader range of cluster behaviours without 
any additional development work, permitting us to implement a broad range of 
property-based and related randomized acceptance tests, without significant 
developer burden.

The work will apply just as readily to multi-threaded single classes as it will 
to whole clusters, and will come with a linearizability test for LWTs as well 
as a unit test for an existing multi-threaded bug that is otherwise hard to 
exhibit.

To achieve this, significant modifications will be required to the codebase, 
mostly cleaning up existing abstractions. Specifically, we will need to be able 
to mock executors, any blocking concurrency primitives, time, filesystem access 
and internode streaming.

The work is – in large part – already complete, with JIRA and PRs to follow in 
the coming weeks. Of course, the work is subject to the usual community input 
and review, so this does not preclude changes to the work (even significant 
ones, if they are warranted). I know a lot of incoming CEP are likely to be 
backed up by significant off-list development as a result of the focus on a 
shippable 4.0. Hopefully this is just a temporary growing pain, particularly as 
we move towards a shippable trunk.

I hope this work will be of huge value to the project, particularly as we race 
to catch up on years of limited feature development.

JIRA and PRs will follow, but I wanted to kick-off discussion in advance.