Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-11 Thread Berenguer Blasi
On our 4.0 release I remember a number of such failures but not 
recently. What is more common though is packaging errors, 
cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade 
tests, being less responsive post-commit as you already moved on,... 
Either the smoke pre-commit has approval steps for everything or we 
should give imo a devBranch alike job to the dev pre-commit. I find it 
terribly useful. My 2cts.


On 11/7/23 18:26, Josh McKenzie wrote:
2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: 
at reviewer's discretion
In general, maybe offering a dev the option of choosing either 
"pre-commit smoke" or "post-commit full" at their discretion for any 
work would be the right play.


A follow-on thought: even with something as significant as Accord, 
TCM, Trie data structures, etc - I'd be a bit surprised to see tests 
fail on JDK17 that didn't on 11, or with vs. without vnodes, in ways 
that weren't immediately clear the patch stumbled across something 
surprising and was immediately trivially attributable if not fixable. 
/In theory/ the things we're talking about excluding from the 
pre-commit smoke test suite are all things that are supposed to be 
identical across environments and thus opaque / interchangeable by 
default (JDK version outside checking build which we will, vnodes vs. 
non, etc).


Has that not proven to be the case in your experience?

On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
A strong +1 to getting to a single CI system. CircleCI definitely has 
some niceties and I understand why it's currently used, but right now 
we get 2 CI systems for twice the price. +1 on the proposed subsets.


Derek

On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie  
wrote:



I'm personally not thinking about CircleCI at all; I'm
envisioning a world where all of us have 1 CI /software/ system
(i.e. reproducible on any env) that we use for pre-commit
validation, and then post-commit happens on reference ASF hardware.

So:
1: Pre-commit subset of tests (suites + matrices + env) runs. On
green, merge.
2: Post-commit tests (all suites, matrices, env) runs. If
failure, link back to the JIRA where the commit took place

Circle would need to remain in lockstep with the requirements for
point 1 here.

On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:


+1 to Josh which is exactly my line of thought as well. But that
is only valid if we have a solid Jenkins that will eventually
run all test configs. So I think I lost track a bit here. Are
you proposing:

1- CircleCI: Run pre-commit a single (the most
common/meaningful, TBD) config of tests

2- Jenkins: Runs post-commit _all_ test configs and
emails/notifies you in case of problems?

Or sthg different like having 1 also in Jenkins?

On 7/7/23 17:55, Andrés de la Peña wrote:

I think 500 runs combining all configs could be reasonable,
since it's unlikely to have config-specific flaky tests. As in
five configs with 100 repetitions each.

On Fri, 7 Jul 2023 at 16:14, Josh McKenzie
 wrote:

Maybe. Kind of depends on how long we write our tests to
run doesn't it? :)

But point taken. Any non-trivial test would start to be
something of a beast under this approach.

On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:

On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie
 wrote:
> 3. Multiplexed tests (changed, added) run against all
JDK's and a broader range of configs (no-vnode, vnode
default, compression, etc)

I think this is going to be too heavy...we're taking 500
iterations
and multiplying that by like 4 or 5?







--
+---+
| Derek Chen-Becker                        |
| GPG Key available at https://keybase.io/dchenbeckerand       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC  |
+---+



Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-11 Thread Yuki Morishita
> folks - I think we’ve achieved lazy consensus here. Please continue with
feedback on the jira.

Hi Dinesh,

As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?

For the future reference, in my opinion, adding new CQL syntax should have
a CEP as it is not something we can easily change once defined.

On Wed, Jul 12, 2023 at 7:19 AM Derek Chen-Becker 
wrote:

> EC - eventual consensus?
>
> On Tue, Jul 11, 2023 at 4:03 PM Dinesh Joshi  wrote:
>
>> folks - I think we’ve achieved lazy consensus here. Please continue with
>> feedback on the jira.
>>
>> Thanks,
>>
>> Dinesh
>>
>>
>> On Jul 7, 2023, at 12:23 PM, Jyothsna Konisa 
>> wrote:
>>
>> 
>> Hi Yuki, Jeremiah & Christopher,
>>
>> Thank you very much for the feedback.
>>
>> Regarding removing superuser check for adding/removing identities, I have
>> relaxed that check and added permissions check instead. With this change
>> only users with appropriate permissions to add/drop identities can perform
>> that action.
>>
>> About extending `Create Role` cqlsh statement, we have a couple of
>> reasons for not doing that. We designed the mTLS authenticator in such a
>> way that a single role can be associated with multiple identities, EX:
>> there can be several identities which are read_only users. Also, having a
>> separate cqlsh statement for identities makes it more pluggable and
>> independent. If we still think that extending the create role statement
>> would be a convenient feature, we can add it as required in the followup
>> patches.
>>
>> Christopher, I will be acting upon your feedback regarding having
>> identity in the cassandra.yaml optionally configurable.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Thu, Jul 6, 2023 at 5:30 PM Dinesh Joshi  wrote:
>>
>>> > On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan 
>>> wrote:
>>> >
>>> > I don’t think users necessarily need to be able to update their own
>>> identities.  I just don’t want to have to use the super user role.  The
>>> super user role has all power over all things in the data base.  I don’t
>>> want to have to give that much power to the person who manages identities,
>>> I just want to give them the power to manage identities.
>>>
>>> Makes sense. I think Jyothsna already pushed an update to the PR to
>>> relax the restriction. Please feel free to take a look at it.
>>>
>>> Dinesh
>>>
>>>
>>>
>>>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-11 Thread Derek Chen-Becker
EC - eventual consensus?

On Tue, Jul 11, 2023 at 4:03 PM Dinesh Joshi  wrote:

> folks - I think we’ve achieved lazy consensus here. Please continue with
> feedback on the jira.
>
> Thanks,
>
> Dinesh
>
>
> On Jul 7, 2023, at 12:23 PM, Jyothsna Konisa 
> wrote:
>
> 
> Hi Yuki, Jeremiah & Christopher,
>
> Thank you very much for the feedback.
>
> Regarding removing superuser check for adding/removing identities, I have
> relaxed that check and added permissions check instead. With this change
> only users with appropriate permissions to add/drop identities can perform
> that action.
>
> About extending `Create Role` cqlsh statement, we have a couple of reasons
> for not doing that. We designed the mTLS authenticator in such a way that a
> single role can be associated with multiple identities, EX: there can be
> several identities which are read_only users. Also, having a separate cqlsh
> statement for identities makes it more pluggable and independent. If we
> still think that extending the create role statement would be a convenient
> feature, we can add it as required in the followup patches.
>
> Christopher, I will be acting upon your feedback regarding having identity
> in the cassandra.yaml optionally configurable.
>
> Thanks,
> Jyothsna Konisa.
>
> On Thu, Jul 6, 2023 at 5:30 PM Dinesh Joshi  wrote:
>
>> > On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan 
>> wrote:
>> >
>> > I don’t think users necessarily need to be able to update their own
>> identities.  I just don’t want to have to use the super user role.  The
>> super user role has all power over all things in the data base.  I don’t
>> want to have to give that much power to the person who manages identities,
>> I just want to give them the power to manage identities.
>>
>> Makes sense. I think Jyothsna already pushed an update to the PR to relax
>> the restriction. Please feel free to take a look at it.
>>
>> Dinesh
>>
>>
>>
>>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-11 Thread Dinesh Joshi
folks - I think we’ve achieved lazy consensus here. Please continue with feedback on the jira.Thanks,DineshOn Jul 7, 2023, at 12:23 PM, Jyothsna Konisa  wrote:Hi Yuki, Jeremiah & Christopher,Thank you very much for the feedback. Regarding removing superuser check for adding/removing identities, I have relaxed that check and added permissions check instead. With this change only users with appropriate permissions to add/drop identities can perform that action.About extending `Create Role` cqlsh statement, we have a couple of reasons for not doing that. We designed the mTLS authenticator in such a way that a single role can be associated with multiple identities, EX: there can be several identities which are read_only users. Also, having a separate cqlsh statement for identities makes it more pluggable and independent. If we still think that extending the create role statement would be a convenient feature, we can add it as required in the followup patches.Christopher, I will be acting upon your feedback regarding having identity in the cassandra.yaml optionally configurable.Thanks,Jyothsna Konisa.On Thu, Jul 6, 2023 at 5:30 PM Dinesh Joshi  wrote:> On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan  wrote:
> 
> I don’t think users necessarily need to be able to update their own identities.  I just don’t want to have to use the super user role.  The super user role has all power over all things in the data base.  I don’t want to have to give that much power to the person who manages identities, I just want to give them the power to manage identities.

Makes sense. I think Jyothsna already pushed an update to the PR to relax the restriction. Please feel free to take a look at it.

Dinesh






Re: [DISCUSS] Conducting a User Survey

2023-07-11 Thread German Eichberger via dev
Same. Great idea. How ill the results be published?

Thanks,
German

From: C. Scott Andreas 
Sent: Tuesday, July 11, 2023 7:41 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org ; 
market...@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Conducting a User Survey

Thanks Patrick. I like the idea of a user survey.

Added a handful of comments in the doc. 

– Scott

On Jul 11, 2023, at 12:51 AM, Mick Semb Wever  wrote:


Looks good to me, thanks Patrick.

On Tue, 11 Jul 2023 at 03:11, Patrick McFadin  wrote:

For quite a few years, I have done Twitter polls to gather helpful information 
about how people use Apache Cassandra. Twitter is no longer the best place to 
conduct this kind of activity since it has become a ghost town.

We should ask more comprehensive questions to get the pulse of our user 
community. I want to do a simple Google Form survey that we can promote on 
every channel for a few weeks. I'll anonymize the results and post them on 
cassandra.apache.org.

Here are the proposed questions I have compiled. A pretty basic set of 
questions, but it would be fun to know the answer to several of these: 
https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharing

Comments are open to all. Please let me know what you think.

Patrick




Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-11 Thread Josh McKenzie
> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at 
> reviewer's discretion
In general, maybe offering a dev the option of choosing either "pre-commit 
smoke" or "post-commit full" at their discretion for any work would be the 
right play.

A follow-on thought: even with something as significant as Accord, TCM, Trie 
data structures, etc - I'd be a bit surprised to see tests fail on JDK17 that 
didn't on 11, or with vs. without vnodes, in ways that weren't immediately 
clear the patch stumbled across something surprising and was immediately 
trivially attributable if not fixable. *In theory* the things we're talking 
about excluding from the pre-commit smoke test suite are all things that are 
supposed to be identical across environments and thus opaque / interchangeable 
by default (JDK version outside checking build which we will, vnodes vs. non, 
etc).

Has that not proven to be the case in your experience?

On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
> A strong +1 to getting to a single CI system. CircleCI definitely has some 
> niceties and I understand why it's currently used, but right now we get 2 CI 
> systems for twice the price. +1 on the proposed subsets.
> 
> Derek
> 
> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie  wrote:
>> __
>> I'm personally not thinking about CircleCI at all; I'm envisioning a world 
>> where all of us have 1 CI *software* system (i.e. reproducible on any env) 
>> that we use for pre-commit validation, and then post-commit happens on 
>> reference ASF hardware.
>> 
>> So:
>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green, 
>> merge.
>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link back 
>> to the JIRA where the commit took place
>> 
>> Circle would need to remain in lockstep with the requirements for point 1 
>> here.
>> 
>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>> +1 to Josh which is exactly my line of thought as well. But that is only 
>>> valid if we have a solid Jenkins that will eventually run all test configs. 
>>> So I think I lost track a bit here. Are you proposing:
>>> 
>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) 
>>> config of tests
>>> 
>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in 
>>> case of problems?
>>> 
>>> Or sthg different like having 1 also in Jenkins?
>>> 
>>> On 7/7/23 17:55, Andrés de la Peña wrote:
 I think 500 runs combining all configs could be reasonable, since it's 
 unlikely to have config-specific flaky tests. As in five configs with 100 
 repetitions each.
 
 On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
> Maybe. Kind of depends on how long we write our tests to run doesn't it? 
> :)
> 
> But point taken. Any non-trivial test would start to be something of a 
> beast under this approach.
> 
> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie  
>> wrote:
>> > 3. Multiplexed tests (changed, added) run against all JDK's and a 
>> > broader range of configs (no-vnode, vnode default, compression, etc)
>> 
>> I think this is going to be too heavy...we're taking 500 iterations
>> and multiplying that by like 4 or 5?
>> 
> 
>> 
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 


Re: [DISCUSS] Conducting a User Survey

2023-07-11 Thread C. Scott Andreas

Thanks Patrick. I like the idea of a user survey.Added a handful of comments in the doc. – 
ScottOn Jul 11, 2023, at 12:51 AM, Mick Semb Wever  wrote:Looks good 
to me, thanks Patrick.On Tue, 11 Jul 2023 at 03:11, Patrick McFadin 
 wrote:For quite a few years, I have done Twitter polls to gather 
helpful information about how people use Apache Cassandra. Twitter is no longer the best 
place to conduct this kind of activity since it has become a ghost town.We should ask more 
comprehensive questions to get the pulse of our user community. I want to do a simple 
Google Form survey that we can promote on every channel for a few weeks. I'll anonymize the 
results and post them on cassandra.apache.org.Here are the proposed questions I have 
compiled. A pretty basic set of questions, but it would be fun to know the answer to 
several of these: 
https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharingComments
 are open to all. Please let me know what you think.Patrick

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-11 Thread Derek Chen-Becker
A strong +1 to getting to a single CI system. CircleCI definitely has some
niceties and I understand why it's currently used, but right now we get 2
CI systems for twice the price. +1 on the proposed subsets.

Derek

On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie  wrote:

> I'm personally not thinking about CircleCI at all; I'm envisioning a world
> where all of us have 1 CI *software* system (i.e. reproducible on any
> env) that we use for pre-commit validation, and then post-commit happens on
> reference ASF hardware.
>
> So:
> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
> merge.
> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
> back to the JIRA where the commit took place
>
> Circle would need to remain in lockstep with the requirements for point 1
> here.
>
> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>
> +1 to Josh which is exactly my line of thought as well. But that is only
> valid if we have a solid Jenkins that will eventually run all test configs.
> So I think I lost track a bit here. Are you proposing:
>
> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
> config of tests
>
> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in
> case of problems?
>
> Or sthg different like having 1 also in Jenkins?
> On 7/7/23 17:55, Andrés de la Peña wrote:
>
> I think 500 runs combining all configs could be reasonable, since it's
> unlikely to have config-specific flaky tests. As in five configs with 100
> repetitions each.
>
> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>
> Maybe. Kind of depends on how long we write our tests to run doesn't it? :)
>
> But point taken. Any non-trivial test would start to be something of a
> beast under this approach.
>
> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>
> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie 
> wrote:
> > 3. Multiplexed tests (changed, added) run against all JDK's and a
> broader range of configs (no-vnode, vnode default, compression, etc)
>
> I think this is going to be too heavy...we're taking 500 iterations
> and multiplying that by like 4 or 5?
>
>
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-11 Thread Maxim Muzafarov
Thank you for your comments and for sharing the ticket targeting
strategy, I'm really happy to see this page where I have found all the
answers to the questions I had. So, I tend towards your view and will
just land this ticket on the 5.0 release only for now as it makes
sense for me as well.

I didn't add the feature flag for this feature because for 99% of the
source code changes it only works with Cassandra internals leaving the
public API unchanged. A few remarks on this are:
- the display format of the vtable property has changed to match the
yaml configuration style, this doesn't mean that we are displaying
property values in a completely different way in fact the formats
match with only 4 exceptions mentioned in the message above (this
should be fine for the major release I hope);
- a new column, which we've agreed to add (I'll fix the PR shortly);


I would also like to mention the follow-up todos required by this
issue to set the right expectations. Currently, we've brought a few
properties under the framework to make them updateable with the
SettingsTable, so that you can keep focusing on the framework itself
rather than on tagging the configuration properties themselves with
the @Mutable annotation. Although the solution is self-sufficient for
the already tagged properties, we still need to bring the rest of them
under the framework afterwards. I'll create an issue and do it right,
we'll be done with the inital patch.


On Fri, 7 Jul 2023 at 20:37, Josh McKenzie  wrote:
>
> This really is great work Maxim; definitely appreciate all the hard work 
> that's gone into it and I think the users will too.
>
> In terms of where it should land, we discussed this type of question at 
> length on the ML awhile ago and ended up codifying it in the wiki: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases
>
> When working on a ticket, use the following guideline to determine which 
> branch to apply it to (Note: See How To Commit for details on the commit and 
> merge process)
>
> Bugfix: apply to oldest applicable LTS and merge up through latest GA to trunk
>
> In the event you need to make changes on the merge commit, merge with -s ours 
> and revise the commit via --amend
>
> Improvement: apply to trunk only (next release)
>
> Note: refactoring and removing dead code qualifies as an Improvement; our 
> priority is stability on GA lines
>
> New Feature: apply to trunk only (next release)
>
> Our priority is to keep the 2 LTS releases and latest GA stable while 
> releasing new "latest GA" on a cadence that provides new improvements and 
> functionality to users soon enough to be valuable and relevant.
>
>
> So in this case, target whatever unreleased next feature release (i.e. SEMVER 
> MAJOR || MINOR) we have on deck.
>
> On Thu, Jul 6, 2023, at 1:21 PM, Ekaterina Dimitrova wrote:
>
> Hi,
>
> First of all, thank you for all the work!
> I personally think that it should be ok to add a new column.
>
> I will be very happy to see this landing in 5.0.
> I am personally against porting this patch to 4.1. To be clear, I am sure you 
> did a great job and my response would be the same to every single person - 
> the configuration is quite wide-spread and the devil is in the details. I do 
> not see a good reason for exception here except convenience. There is no 
> feature flag for these changes too, right?
>
> Best regards,
> Ekaterina
>
> На четвъртък, 6 юли 2023 г. Miklosovic, Stefan  
> написа:
>
> Hi Maxim,
>
> I went through the PR and added my comments. I think David also reviewed it. 
> All points you mentioned make sense to me but I humbly think it is necessary 
> to have at least one additional pair of eyes on this as the patch is 
> relatively impactful.
>
> I would like to see additional column in system_views.settings of name 
> "mutable" and of type "boolean" to see what field I am actually allowed to 
> update as an operator.
>
> It seems to me you agree with the introduction of this column (1) but there 
> is no clear agreement where we actually want to put it. You want this whole 
> feature to be committed to 4.1 branch as well which is an interesting 
> proposal. I was thinking that this work will go to 5.0 only. I am not 
> completely sure it is necessary to backport this feature but your 
> argumentation here (2) is worth to discuss further.
>
> If we introduce this change to 4.1, that field would not be there but in 5.0 
> it would. So that way we will not introduce any new column to 
> system_views.settings.
> We could also go with the introduction of this column to 4.1 if people are ok 
> with that.
>
> For the simplicity, I am slightly leaning towards introducing this feature to 
> 5.0 only.
>
> (1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
> (2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041
>
> 
> From: Maxim Muzafarov 
> Sent: Friday, June 23, 2023 

Re: Changing the output of tooling between majors

2023-07-11 Thread Francisco Guerrero
I am +1 (nb) on the proposal to change the human readable output in a major 
release as long as we have a machine readable output that can be consumed by 
scripts.

On 2023/07/10 23:06:11 "Fleming, Jackson" wrote:
> We use Nodetool in scripts sparsely, in my opinion trying to programmatically 
> parse the human readable output should be avoided as much as possible, it’s 
> usually leads to implementations that are brittle.
> 
> I certainly agree you don’t want to make these kinds of changes in 3.11 or 
> 4.x (and I don’t think that’s what Stefan was suggesting), but I don’t 
> necessarily agree that you can’t make these kinds of changes in major 
> versions. Chasing compatibility like this seems like a deep rabbit hole one 
> could possibly go down, I personally don’t see it as unreasonable for 
> commands that are designed to be read by humans to be updated over time to 
> improve readability, as that is the purpose of those commands. While people 
> script against that output I don’t think anyone is going to say it’s an 
> official API, the project also makes no public commitment to that either.
> 
> If the proposal is to treat Nodetool input and output like a contract/API, 
> it’d be great for a formal specification, or at least the documentation to be 
> updated to cover what users should expect as output from Nodetool, if the 
> project is going to such effort to maintain a specification, why not make it 
> official? That way the maintainers of scripts have a fighting chance of 
> finding incompatibilities before upgrading their infrastructure and the 
> project could make these kinds of changes and provide a mechanism for users 
> to validate.
> 
> Currently the argument could be made that there’s no guarantee about Nodetool 
> output since it’s not actually written down anywhere official outside the 
> codebase.
> 
> Isn’t this one of the reasons Cassandra maintains the NEWS and CHANGES files 
> in the repo, and follows semantic versioning, to communicate potentially 
> breaking changes as clearly as possible? Surely a message like (but with some 
> more detail) “Nodetool command x has had its human readable output 
> restructured, item y was removed/renamed to z” would suffice.
> 
> Not sure if you can deprecate the human readable output without generating a 
> lot of noise for the user, and if it’s being parsed by a bash script, the 
> user would never see it anyway, but sounds like that’s what the project needs.
> 
> To the note about having users migrate over to more machine friendly output 
> types (JSON etc), in my experience the operators who maintain these scripts 
> aren’t going to re-write them just because a better way of doing them is 
> newly available, usually they’re too busy with other work and will keep using 
> those old scripts until they stop working, so in my view it’s not really a 
> solution to this problem.
> 
> Regards,
> 
> Jackson
> 
> From: Eric Evans 
> Date: Tuesday, 11 July 2023 at 4:14 am
> To: dev@cassandra.apache.org 
> Subject: Re: Changing the output of tooling between majors
> You don't often get email from john.eric.ev...@gmail.com. Learn why this is 
> important
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> 
> On Sun, Jul 9, 2023 at 9:10 PM Dinesh Joshi 
> mailto:djo...@apache.org>> wrote:
> On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
> 
> If we are providing CQL / JSON / YAML for couple years, I do not believe that 
> the argument "lets not break it for folks in nodetool" is still relevant. CQL 
> output is there from times of 4.0 at least (at least!) and YAML / JSON is 
> also not something completely new. It is not like we are suddenly forcing 
> people to change their habits, there was enough time to update the stuff to 
> CQL / json / yaml etc ...
> 
> What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and 
> beyond may still use their existing scripts. Therefore keeping things stable 
> is important. Until nodetool can support JSON as output format for all 
> interaction and there is a significant adoption in the user community, I 
> would strongly advise against making breaking changes to the CLI output.
> 
> +1
> 
> --
> Eric Evans
> john.eric.ev...@gmail.com
> 


Re: Bloom filter calculation

2023-07-11 Thread Claude Warren, Jr via dev
I think we are talking past each other here.  What I was missing was the
size of the filter.  I was assuming that the size of the filter was the
number of bits specified in the BloomFilterCalculations (error on my
part),  what I was missing was the multiplication of the number of bits by
the number of keys.   Is there a fixed number of bits (it looks to be
Integer.MAX_VALUE - 20) or is it calculated somewhere?


On Tue, Jul 11, 2023 at 10:11 AM Benedict  wrote:

> I’m not sure I follow your reasoning. The bloom filter table is false
> positive per sstable given the number of bits *per key*. So for 10 keys you
> would have 200 bits, which yields the same false positive rate as 20 bits
> and 1 key.
>
> It does taper slightly at much larger N, but it’s pretty nominal for
> practical purposes.
>
> I don’t understand what you mean by merging multiple filters together. We
> do lookup multiple bloom filters per query, but only one per sstable, and
> the false positive rate you’re calculating for 10 such lookups would not be
> accurate. This would be 1-(1-0.671)^10 which is still only around a 4%,
> not 100%. You seem to be looking at the false positive rate of a bloom
> filter of 20 bits with 10 entries, which means only 2 bits per entry?
>
> On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> 
> Can someone explain to me how the Bloom filter table in
> BloomFilterCalculations was derived and how it is supposed to work?  As I
> read the table it seems to indicate that with 14 hashes and 20 bits you get
> a fp of 6.71e-05.  But if you plug those numbers into the Bloom filter
> calculator [1],  that is calculated only for 1 item being in the filter.
> If you merge multiple filters together the false positive rate goes up.
> And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at
> close to 100% fp.  So I have to assume this analysis is wrong.  Can someone
> point me to the correct calculations?
>
> Claude
>
> [1] https://hur.st/bloomfilter/?n==6.71e-05=20=14
>
>


Re: Bloom filter calculation

2023-07-11 Thread Benedict
I’m not sure I follow your reasoning. The bloom filter table is false positive 
per sstable given the number of bits *per key*. So for 10 keys you would have 
200 bits, which yields the same false positive rate as 20 bits and 1 key.

It does taper slightly at much larger N, but it’s pretty nominal for practical 
purposes.

I don’t understand what you mean by merging multiple filters together. We do 
lookup multiple bloom filters per query, but only one per sstable, and the 
false positive rate you’re calculating for 10 such lookups would not be 
accurate. This would be 1-(1-0.671)^10 which is still only around a 4%, not 
100%. You seem to be looking at the false positive rate of a bloom filter of 20 
bits with 10 entries, which means only 2 bits per entry?

> On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev 
>  wrote:
> 
> 
> Can someone explain to me how the Bloom filter table in 
> BloomFilterCalculations was derived and how it is supposed to work?  As I 
> read the table it seems to indicate that with 14 hashes and 20 bits you get a 
> fp of 6.71e-05.  But if you plug those numbers into the Bloom filter 
> calculator [1],  that is calculated only for 1 item being in the filter.  If 
> you merge multiple filters together the false positive rate goes up.  And as 
> [1] shows by 5 merges you are over 50% fp rate and by 10 you are at close to 
> 100% fp.  So I have to assume this analysis is wrong.  Can someone point me 
> to the correct calculations?
> 
> Claude
> 
> [1] https://hur.st/bloomfilter/?n==6.71e-05=20=14


Re: [DISCUSS] Conducting a User Survey

2023-07-11 Thread Mick Semb Wever
Looks good to me, thanks Patrick.

On Tue, 11 Jul 2023 at 03:11, Patrick McFadin  wrote:
>
> For quite a few years, I have done Twitter polls to gather helpful 
> information about how people use Apache Cassandra. Twitter is no longer the 
> best place to conduct this kind of activity since it has become a ghost town.
>
> We should ask more comprehensive questions to get the pulse of our user 
> community. I want to do a simple Google Form survey that we can promote on 
> every channel for a few weeks. I'll anonymize the results and post them on 
> cassandra.apache.org.
>
> Here are the proposed questions I have compiled. A pretty basic set of 
> questions, but it would be fun to know the answer to several of these: 
> https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharing
>
> Comments are open to all. Please let me know what you think.
>
> Patrick


Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-11 Thread Jacek Lewandowski
Thanks,

I will follow that path then,



pon., 10 lip 2023 o 19:03 Jon Meredith  napisał(a):

> +1 from me too. I would support removing all of the optional checks from
> jar/test as I also hit issues with rat from time to time while iterating,
> as long as the CI system runs them and makes it very clear for any
> committer there are failures.
>
> On Mon, Jul 10, 2023 at 9:40 AM Josh McKenzie 
> wrote:
>
>>
>>- Remove the checkstyle dependency from "jar" and "test"
>>- Create a single "check" target that includes all the checks we
>>expect to pass in the CI (currently Checkstyle, RAT, and 
>> Eclipse-Warnings),
>>making this task the default.
>>
>> +1 here.
>>
>> (of note: haven't forgotten the request from this thread to share local
>> env; just gotten sidetracked by things and also realized how little I've
>> actually modified locally since I just run most of the linting against
>> delta'ed files only to keep my changed work in compliance. Still a very
>> noisy mess when SpotBugs is run against the entire codebase proper)
>>
>> On Mon, Jul 10, 2023, at 7:13 AM, Brandon Williams wrote:
>>
>> On Mon, Jul 10, 2023 at 6:07 AM Jacek Lewandowski
>>  wrote:
>> > Remove the checkstyle dependency from "jar" and "test"
>> > Create a single "check" target that includes all the checks we expect
>> to pass in the CI (currently Checkstyle, RAT, and Eclipse-Warnings), making
>> this task the default.
>>
>> I support this.  Having checkstyle run when building is clearly
>> constant friction for many, even though you can disable it.
>>
>>
>>


Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-11 Thread Berenguer Blasi
Add a 'devBranch' jenkins job to that imo: The possibility to run the 
full suite + multiplex new tests before commit when you're about to 
release a Kraken into the codebase: Accord, TCM, TTL, SAI, Vector, 
JDK... So:


1: Pre-commit subset of tests (suites + matrices + env) runs. On green, 
merge.
2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at 
reviewer's discretion
3: Post-commit tests (all suites, matrices, env) runs. If failure, link 
back to the JIRA where the commit took place


My 2cts

On 10/7/23 17:36, Josh McKenzie wrote:
I'm personally not thinking about CircleCI at all; I'm envisioning a 
world where all of us have 1 CI /software/ system (i.e. reproducible 
on any env) that we use for pre-commit validation, and then 
post-commit happens on reference ASF hardware.


So:
1: Pre-commit subset of tests (suites + matrices + env) runs. On 
green, merge.
2: Post-commit tests (all suites, matrices, env) runs. If failure, 
link back to the JIRA where the commit took place


Circle would need to remain in lockstep with the requirements for 
point 1 here.


On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:


+1 to Josh which is exactly my line of thought as well. But that is 
only valid if we have a solid Jenkins that will eventually run all 
test configs. So I think I lost track a bit here. Are you proposing:


1- CircleCI: Run pre-commit a single (the most common/meaningful, 
TBD) config of tests


2- Jenkins: Runs post-commit _all_ test configs and emails/notifies 
you in case of problems?


Or sthg different like having 1 also in Jenkins?

On 7/7/23 17:55, Andrés de la Peña wrote:
I think 500 runs combining all configs could be reasonable, since 
it's unlikely to have config-specific flaky tests. As in five 
configs with 100 repetitions each.


On Fri, 7 Jul 2023 at 16:14, Josh McKenzie > wrote:


Maybe. Kind of depends on how long we write our tests to run
doesn't it? :)

But point taken. Any non-trivial test would start to be
something of a beast under this approach.

On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:

On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie
mailto:jmcken...@apache.org>> wrote:
> 3. Multiplexed tests (changed, added) run against all JDK's
and a broader range of configs (no-vnode, vnode default,
compression, etc)

I think this is going to be too heavy...we're taking 500 iterations
and multiplying that by like 4 or 5?





Bloom filter calculation

2023-07-11 Thread Claude Warren, Jr via dev
Can someone explain to me how the Bloom filter table in
BloomFilterCalculations was derived and how it is supposed to work?  As I
read the table it seems to indicate that with 14 hashes and 20 bits you get
a fp of 6.71e-05.  But if you plug those numbers into the Bloom filter
calculator [1],  that is calculated only for 1 item being in the filter.
If you merge multiple filters together the false positive rate goes up.
And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at
close to 100% fp.  So I have to assume this analysis is wrong.  Can someone
point me to the correct calculations?

Claude

[1] https://hur.st/bloomfilter/?n==6.71e-05=20=14