from:"Miklosovic, Stefan"

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Miklosovic, Stefan via dev

That would work reliably in case there is no way how to misconfigure guardrails 
in the cluster. What if you set a guardrail on one node but you don’t set it 
(or set it differently) on the other? If it is configured differently and you 
want to check the guardrails if constraints do not violate them, then your 
query might fail or not based on what node is hit.

I guess that guardrails would need to start to be transactional to be sure this 
is avoided and guardrails are indeed same everywhere (CEP-24 thread sent 
recently here in ML).


From: Bernardo Botella 
Date: Tuesday, 4 June 2024 at 00:31
To: dev@cassandra.apache.org 
Cc: Miklosovic, Stefan 
Subject: Re: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from conta...@bernardobotella.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


Basically, I am trying to protect the limits set by the operator against 
misconfigured schemas from the customers.

I see the guardrails as a safety limit added by the operator, setting the 
limits within the customers owning the actual schema (and their constraints) 
can operate. With that vision, if a customer tries to “ignore” the actual 
limits set by the operator by adding more relaxed constraints, it gets a nice 
message saying that “that is not allowed for the cluster, please contact your 
admin".




On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
 wrote:

You wrote in the CEP:

As we mentioned in the motivation section, we currently have some guardrails 
for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the 
schema, and a SCHEMA ALTER adding constraints that break the limits defined by 
the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning 
mentioning that there are schemas with offending constraints.

I think that this should be other way around. Guardrails should kick in when 
there are no constraints and they would be overridden by table schema. That 
way, there is always a “default” in terms of guardrails (which one can turn off 
on demand / change) but you can override it by table alternation.

Basically, what is in schema should win regardless of how guardrails are 
configured. They don’t matter when a constraint is explicitly specified in a 
schema. It should take the defaults in guardrails if there are any and no 
constraint is specified on schema level.

What is your motivation to do it like you suggested?

From: Bernardo Botella 
mailto:conta...@bernardobotella.com>>
Date: Friday, 31 May 2024 at 23:24
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from 
conta...@bernardobotella.com<mailto:conta...@bernardobotella.com>. Learn why 
this is important<https://aka.ms/LearnAboutSenderIdentification>

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

Hello everyone,

I am proposing this CEP:
CEP-42: Constraints Framework - CASSANDRA - Apache Software 
Foundation<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
cwiki.apache.org<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>


And I’m looking for feedback from the community.

Thanks a lot!
Bernardo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Miklosovic, Stefan via dev

You wrote in the CEP:

As we mentioned in the motivation section, we currently have some guardrails 
for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the 
schema, and a SCHEMA ALTER adding constraints that break the limits defined by 
the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning 
mentioning that there are schemas with offending constraints.

I think that this should be other way around. Guardrails should kick in when 
there are no constraints and they would be overridden by table schema. That 
way, there is always a “default” in terms of guardrails (which one can turn off 
on demand / change) but you can override it by table alternation.

Basically, what is in schema should win regardless of how guardrails are 
configured. They don’t matter when a constraint is explicitly specified in a 
schema. It should take the defaults in guardrails if there are any and no 
constraint is specified on schema level.

What is your motivation to do it like you suggested?

From: Bernardo Botella 
Date: Friday, 31 May 2024 at 23:24
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from conta...@bernardobotella.com. Learn why this is 
important

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


Hello everyone,

I am proposing this CEP:
CEP-42: Constraints Framework - CASSANDRA - Apache Software 
Foundation
cwiki.apache.org
[favicon.ico]


And I’m looking for feedback from the community.

Thanks a lot!
Bernardo

Re: [Discuss] CEP-24 Password validation and generation

2024-06-01 Thread Miklosovic, Stefan via dev

I feel like this thread deserves an update.

This CEP was put in a dormant state because there was one quite substantial 
flaw, that is that if a node is misconfigured in such a way that it would 
accept weaker passwords than other nodes in a cluster, it would not be safe. 
The security of such solution would be as safe as the weakiest configuration of 
a node from a cluster.

The correct answer to this problem was / is transactional guardrails. I was 
waiting for TCM to appear in trunk to implement this for year and a half and we 
are finally there (1) which I am very excited about.

What transactional guardrails are doing is that each CQL mutation to a 
respective guardrails virtual table (which is mutable) will commit a 
transfromation into TCM log. That in turn means that this configuration is 
propagated to whole cluster and survives restarts etc. That also means that we 
are configuring any guardrail by one CQL statement for whole cluster in 
persistent manner which I would say is quite powerful and time / cost saving 
from techops / devops point of view, especially on a very large scale.

You can do something like this

 UPDATE system_guardrails.flags SET value = false where name = 'simplestrategy';

and this will be commited into TCM, everything replayed on restart, same for 
whole cluster ... you got the idea. Hence, similarly, you can commit 
configuration for a password validator and it will be same across whole cluster 
as well.

This solution received quite positive feedback and it was suggested that we 
should actually commit into TCM all configuration which is meant to be same for 
each node.

I stopped with the introduction of more general "config in TCM" solution as 
there seems to be entities in this space which are trying to come up with that 
(that is the vibe I am getting) hence I am currently in kind of a limbo and 
half-way there.

Let's see what happens next, I just want to highlight that the next course of 
action will most probably be the introduction of transactional configuration 
until this one can finally be integrated with that too. Currently, there is one 
missing configuration property to be transactional - default_keyspace_rf - 
because it is used by one of guardrails too. This leads to more general "config 
in TCM" case which we have not dealt with yet.

Branch with transactional guardrails is in (2).

(1) https://issues.apache.org/jira/browse/CASSANDRA-19593
(2) https://github.com/instaclustr/cassandra/tree/CEP-24-with-generator-tcm

________
From: Miklosovic, Stefan 
Sent: Monday, December 19, 2022 14:24
To: dev@cassandra.apache.org
Subject: Re: [Discuss] CEP-24 Password validation and generation

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




One not-so-obvious consequence of the configuration of password validator - 
since it is based on guardrails - is that if there is a cluster of 50 nodes and 
we change a configuration (in runtime) in one node, it needs to be done for all 
remaining 49. We need to be sure that the configuration is same for all nodes 
because if we do not configure one node the way we want, all it takes to pass 
the (less secure) validation is to create passwords while being logged on that 
node. I think that something similar was done to memtables CEP and there was 
some additional discussion about that - the way how it is configured - it is in 
yaml and not in schema so it is only node-specific, right? (not saying it is 
wrong, I just noticed that there was additional discussion questioning that 
approach which was further clarified). However when it comes to security, I 
think it should be as robust as possible.

I am not completely sure what to do here. It would be great to have some 
"distributed configuration" otherwise all I can do is to mimic this behavior by 
a table, similarly as system_auth.roles is done for passwords, for example. 
However, I feel like it should be more robust and I think that in the future 
there might be more cases when we need to have the configuration distributed 
like this.

However, I am fine to proceed with my original plan when community thinks that 
the current approach is enough.


From: Claude Warren, Jr via dev 
Sent: Wednesday, October 19, 2022 10:58
To: dev@cassandra.apache.org
Subject: Re: [Discuss] CEP-24 Password validation and generation

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Just to clarify, I have no objections to the current plan.

On Thu, Oct 13, 2022 at 2:56 PM Claude Warren, Jr 
mailto:claude.war...@aiven.io>> wrote:
I am not familiar with the Diagnostics framework but it sounds like it would 
satisfy the need.  Thanks for pointing it out.  I will dive into it to g

Re: [Discuss] CQLSH should left-align numbers, right-align text (CASSANDRA-19150)

2024-01-09 Thread Miklosovic, Stefan via dev

My personal bet is that from the very beginning, Cassandra was more 
"number-centric" and right alignment just made more sense back then, 
considering strings as an afterthought. Another explanation is that nobody 
actually put any work to it to distinguish strings and numbers and it stayed 
like that. I can definitely see the value in left alignment for strings. Whole 
system of written "latin" language is to do it from left to right. Arabic is 
from right to left and there it makes more sense but that is imho absolute 
minority in practice so left alignment just makes more sense to me overall in 
every situation (for strings)


From: Derek Chen-Becker 
Sent: Tuesday, January 9, 2024 17:15
To: dev@cassandra.apache.org
Subject: Re: [Discuss] CQLSH should left-align numbers, right-align text 
(CASSANDRA-19150)

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments



In the ticket itself there's an example of left aligned being better for prefix 
strings (e.g. fully qualified class names), and I suspect this is similarly 
useful for other things like file paths, etc. I would also agree with Stefan 
that it would be nice to know why the current convention was chosen in the 
first place.

Cheers,

Derek

On Tue, Jan 9, 2024 at 8:18 AM Brad 
mailto:bscho...@gmail.com>> wrote:
Derek,

I'm proposing a switch or blanket change to a convention of right aligned text 
and left aligned numbers in CQLSH.

I took a look at two other examples, Excel and Postgres shell and that's how 
they work when displaying tabular data.  The Jira was originally to make right 
or left alignment an option, but making it configurable seems less useful than 
choosing a better standard.

On Tue, Jan 9, 2024 at 9:58 AM Derek Chen-Becker 
mailto:de...@chen-becker.org>> wrote:
Just to clarify, per the ticket you're proposing a configuration option to 
control this on a per-column basis, correct? Your email makes it sound like a 
blanket change.

Cheers,

Derek

On Tue, Jan 9, 2024 at 7:34 AM Brad 
mailto:bscho...@gmail.com>> wrote:
CQLSH currently left-aligns all output, affecting both numbers and text.  While 
this works well for numbers, a better approach adopted by many is to left align 
numbers and right align text.

For example, both Excel and Postgres shell use the later:


psql

# select * from employee;

 empid |  name   |dept

---+-+

 1 | Clark   | Sales

   200 | Dave| Accounting

33 | Johnson | Sales

while CQLSH simply left aligns all the columns


cqlsh> select * from employee;

 empid | dept   | name

---++-

33 |  Sales | Johnson

 1 |  Sales |   Clark

   200 | Accounting |Dave



Left aligned text looks much worse on text values which share common prefixes


cqlsh> select * from system_views.system_properties limit 7 ;


 name   | value

+

  JAVA_HOME |  
/Users/brad/.jenv/versions/17

   cassandra.jmx.local.port |   
7199

   cassandra.logdir | 
/usr/local/cassandra-5.0-beta1/bin/../logs

   cassandra.storagedir | 
/usr/local/cassandra-5.0-beta1/bin/../data

  com.sun.management.jmxremote.authenticate |   
   false

 com.sun.management.jmxremote.password.file |  
/etc/cassandra/jmxremote.password

io.netty.transport.estimateSizeOnSubmit |   
   false



The Jira CASSANDRA-19150 
discusses this in further detail with some additional examples.


I wanted to raise the issue here to propose changing CQLSH to right-align text 
while continue to left-align numbers.


Regards,


Brad Schoening


[https://lh3.googleusercontent.com/a/ACg8ocIyewytSUXGiqiyXhMz6n1Kw3G3R_QAsy09jFQvse8HqQ=s80-p-mo]
ReplyForward

Add reaction


--
+---+
| Derek Chen-Becker |
| GPG Key available at 
https://keybase.io/dchenbecker and   |
| 
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
 |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+



--
+---+
| Derek Chen-Becker |
| GPG Key available at 
https://keybase.io/dchenbecker and   |
| 
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
 |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7

Re: [Discuss] CQLSH should left-align numbers, right-align text (CASSANDRA-19150)

2024-01-09 Thread Miklosovic, Stefan via dev

I would like to know whose idea was it to align it like it is currently done in 
the first place. Maybe we are missing something important like why it was done 
like that? If there is no reason, we might just start to align it as other DB 
offerings do. My initial proposal to support both is more about not breaking it 
but if "breaking it" does not make anybody complaining, we can just go without 
dual solution.


From: Brandon Williams 
Sent: Tuesday, January 9, 2024 16:30
To: dev@cassandra.apache.org
Subject: Re: [Discuss] CQLSH should left-align numbers, right-align text 
(CASSANDRA-19150)

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments



A configuration option for a cosmetic feature seems like overkill to me, I 
don't think which side we align text on is enough to justify (heh) the 
overhead.  I agree with how Excel and Postgres do it and think we should follow 
suit.

Kind Regards,
Brandon


On Tue, Jan 9, 2024 at 9:19 AM Brad 
mailto:bscho...@gmail.com>> wrote:
Derek,

I'm proposing a switch or blanket change to a convention of right aligned text 
and left aligned numbers in CQLSH.

I took a look at two other examples, Excel and Postgres shell and that's how 
they work when displaying tabular data.  The Jira was originally to make right 
or left alignment an option, but making it configurable seems less useful than 
choosing a better standard.

On Tue, Jan 9, 2024 at 9:58 AM Derek Chen-Becker 
mailto:de...@chen-becker.org>> wrote:
Just to clarify, per the ticket you're proposing a configuration option to 
control this on a per-column basis, correct? Your email makes it sound like a 
blanket change.

Cheers,

Derek

On Tue, Jan 9, 2024 at 7:34 AM Brad 
mailto:bscho...@gmail.com>> wrote:
CQLSH currently left-aligns all output, affecting both numbers and text.  While 
this works well for numbers, a better approach adopted by many is to left align 
numbers and right align text.

For example, both Excel and Postgres shell use the later:


psql

# select * from employee;

 empid |  name   |dept

---+-+

 1 | Clark   | Sales

   200 | Dave| Accounting

33 | Johnson | Sales

while CQLSH simply left aligns all the columns


cqlsh> select * from employee;

 empid | dept   | name

---++-

33 |  Sales | Johnson

 1 |  Sales |   Clark

   200 | Accounting |Dave



Left aligned text looks much worse on text values which share common prefixes


cqlsh> select * from system_views.system_properties limit 7 ;


 name   | value

+

  JAVA_HOME |  
/Users/brad/.jenv/versions/17

   cassandra.jmx.local.port |   
7199

   cassandra.logdir | 
/usr/local/cassandra-5.0-beta1/bin/../logs

   cassandra.storagedir | 
/usr/local/cassandra-5.0-beta1/bin/../data

  com.sun.management.jmxremote.authenticate |   
   false

 com.sun.management.jmxremote.password.file |  
/etc/cassandra/jmxremote.password

io.netty.transport.estimateSizeOnSubmit |   
   false



The Jira CASSANDRA-19150 
discusses this in further detail with some additional examples.


I wanted to raise the issue here to propose changing CQLSH to right-align text 
while continue to left-align numbers.


Regards,


Brad Schoening


[X]
ReplyForward

Add reaction


--
+---+
| Derek Chen-Becker |
| GPG Key available at 
https://keybase.io/dchenbecker and   |
| 
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
 |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Miklosovic, Stefan via dev

Great news! Congratulations.

From: Josh McKenzie 
Sent: Monday, January 8, 2024 19:19
To: dev
Subject: Welcome Maxim Muzafarov as Cassandra Committer

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has 
accepted
the invitation to become a committer.

Thanks for all the hard work and collaboration on the project thus far, and 
we're all looking forward to working more with you in the future. 
Congratulations and welcome!

The Apache Cassandra PMC members

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread Miklosovic, Stefan via dev

For completeness, there is this thread (1) where we already decided that sigar 
is OK to be removed completely.

I think that OSHI is way better lib to have, I am +1 on this proposal.

Currently the deal seems to be that this will go just to trunk.

(1) https://lists.apache.org/thread/6gzyh1zhxnkz50lld7hlgq172xc0pg3t


From: Claude Warren, Jr via dev 
Sent: Thursday, December 14, 2023 0:17
To: dev
Cc: Claude Warren, Jr
Subject: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments



Greetings,

I have submitted a pull request[1] that replaces the unsupported Sigar library 
with the maintained OSHI library.

OSHI is an MIT licensed library that provides information about the underlying 
OS much like Sigar did.

The change adds a dependency on oshi-core at the following coordinates:

com.github.oshi
 oshi-core
 6.4.6

In addition to switching to a supported library, this change will reduce the 
size of the package as the native Sigar libraries are removed.

Are there objections to making this switch and adding a new dependency?

[1] 
https://github.com/apache/cassandra/pull/2842/files
[2] 
https://issues.apache.org/jira/browse/CASSANDRA-16565

Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Miklosovic, Stefan via dev

Wow, great news! Congratulations on your committership, Mike.

From: Benjamin Lerer 
Sent: Friday, December 8, 2023 15:41
To: dev@cassandra.apache.org
Subject: Welcome Mike Adamson as Cassandra committer

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

The PMC members are pleased to announce that Mike Adamson has accepted
the invitation to become committer.

Thanks a lot, Mike, for everything you have done for the project.

Congratulations and welcome

The Apache Cassandra PMC members

Re: [DISCUSS] CASSANDRA-19104: Standardize tablestats formatting and data units

2023-12-05 Thread Miklosovic, Stefan via dev

Hi Claude,

while technically possible, I do not see a lot of people would use this. I am 
for straightforward -H option instead of introducing -Hn which seems to bring 
almost no value and brings discrepancy into the nodetool flags. I think there 
are other -H outputs for other commands, are not there? Should not we then also 
take a look if -Hn is not applicable for them as well? Anyway ... this seems to 
be just a lot of work with almost no benefit.

Regards


From: Claude Warren, Jr via dev 
Sent: Tuesday, December 5, 2023 8:46
To: dev@cassandra.apache.org
Cc: Claude Warren, Jr
Subject: Re: [DISCUSS] CASSANDRA-19104: Standardize tablestats formatting and 
data units

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments



Why not change the option so that -H will operate as it does now while -Hn 
(where n is a digit) will limit the number of decimal places to n.

On Mon, Dec 4, 2023 at 5:11 PM Brad 
mailto:bscho...@gmail.com>> wrote:
Thanks, Jacek.  Using three significant digits for disk space is a good 
suggestion.

On Mon, Dec 4, 2023 at 9:58 AM Jacek Lewandowski 
mailto:lewandowski.ja...@gmail.com>> wrote:
This looks great,

I'd consider limiting the number of significant digits to 3 in the human 
readable format. In the above example it would translate to:

Space used (live): 1.46 TiB
Space used (total): 1.46 TiB

Bytes repaired: 0.00 KiB
Bytes unrepaired: 4.31 TiB
Bytes pending repair: 0.000 KiB

I just think with human readable format we just expect to have a grasp view of 
the stats and 4th significant digit has very little meaning in that case.


thanks,
Jacek

Re: Removal of deprecations added in Cassandra 3.x

2023-11-30 Thread Miklosovic, Stefan via dev

Hi,

I want to refresh this thread. I know people are busy with 5.0 etc. but I would 
really like to have this resolved.

This might be removed in trunk (1). JMX methods / beans to remove are (2)

Mick had a point in (1) that even it is possible to remove it all, do we really 
want to? We should not break things unnecessarily so people do not have hard 
time to keep up with what we ship, there might be some legacy integrations 
which might depend on this, even on stuff as old as 3.x. Some custom tooling 
might call these methods etc. Even it is deprecated, that code is pretty much 
"maintenance-less". It does not need any special care so we might not look at 
it as its removal is something critical.

Personally, I think the removal of the deprecated code which was marked like 
that in 3.x is quite safe to do in 5.x but I have to ask broader audience to 
have a consensus.

We might be extra careful and drop it in 6.0 instead of 5.x so I would have to 
wait for 6.0 branch to be created. Supporting deprecated code for 2 majors 
sounds pretty safe to me.

This is all written for cases when the code is public-facing - JMX methods etc. 
I think that what is "private" might go away in 5.x easily.

Anyway, It is a good question was is considered to be a public API, I think 
that Josh was talking about this in some other thread already. I would like to 
start to map the codebase and annotate interfaces / extension points etc with 
something like "@PublicApi" or even "@Stable" / "@Unstable" and similar so we 
can reason about what is public or not more explicitly but this is not the 
topic I want to go into here.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18975
(2) 
https://github.com/apache/cassandra/pull/2853/files#diff-4e5b9f6d0d76ab9ace1bd805efe5788bb5d23c84c25ccf75b9896f20b46a1879

Thanks and regards

________
From: Miklosovic, Stefan via dev 
Sent: Monday, October 30, 2023 23:07
To: dev@cassandra.apache.org
Cc: Miklosovic, Stefan
Subject: Re: Removal of deprecations added in Cassandra 3.x

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Sure we can do that just for trunk. No problem with that. Hence, I am parking 
this effort for a while.


From: Mick Semb Wever 
Sent: Monday, October 30, 2023 22:56
To: dev@cassandra.apache.org
Subject: Re: Removal of deprecations added in Cassandra 3.x

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




> similarly as for Cassandra 1.x and 2.x deprecations removal done in 
> CASSANDRA-18959, you are welcome to comment on the removal of all stuff 
> deprecated in 3.x (1).
>
> If nobody objects after couple days I would like to proceed to the actual 
> removal. Please tell me if you want something to keep around.
>


I have concerns, but I won't block.

I would like to propose we focus on getting to a 5.0-beta1 release.
To do that we should be stopping all work on cassandra-5.0 that isn't
about stabilisation.

Can this land in trunk instead ?
How much work is in front of us to get to 5.0-beta1 ?  (Please add
fixVersion 5.0-beta for stabilisation work.)

Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

2023-11-06 Thread Miklosovic, Stefan via dev

The link is fixed. Thanks!

From: Miklosovic, Stefan 
Sent: Monday, November 6, 2023 11:42
To: dev@cassandra.apache.org
Subject: Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

I can't view it either.

From: guo Maxwell 
Sent: Monday, November 6, 2023 11:40
To: dev@cassandra.apache.org
Subject: Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Do I need permission to view this link? When I open it, an error appears, 
saying “It may have been deleted or you don't have permission to view it.”

Benjamin Lerer mailto:b.le...@gmail.com>> 于2023年11月6日周一 
18:34写道：
I created a Dashboard to track the progress and remaining tasks for 5.0: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=593<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FRapidBoard.jspa%3FrapidView%3D593=05%7C01%7CStefan.Miklosovic%40netapp.com%7C83db318dc59d4ace3ded08dbdeb4d68b%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638348640501244920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=aDmFrtaDdB0F4kEG%2BHbBiF52VHTvrEdIwL2RUQXX%2FbY%3D=0>
Everybody logged in should have access. Ping me if it is not the case.

Le sam. 4 nov. 2023 à 19:54, Mick Semb Wever 
mailto:m...@apache.org>> a écrit :

Please mark such bugs with fixVersion 5.0-beta

If there are no more tickets that need API changes (i.e. those that should be 
marked fixVersion 5.0-alpha) this then indicates we do not need a 5.0-alpha3 
release and can focus towards 5.0-beta1 (regardless of having blockers open to 
it).

Appreciate the attention 18993 is getting – we do have a shortlist of beta 
blockers that we gotta prioritise !

On Sat, 4 Nov 2023 at 18:33, Benedict 
mailto:bened...@apache.org>> wrote:
Yep, data loss bugs are not any old bug. I’m concretely -1 (binding) releasing 
a beta with one that’s either under investigation or confirmed.

As Scott says, hopefully it won’t come to that - the joy of deterministic 
testing is this should be straightforward to triage.

On 4 Nov 2023, at 17:30, C. Scott Andreas 
mailto:sc...@paradoxica.net>> wrote:

I’d happily be the first to vote -1(nb) on a release containing a known and 
reproducible bug that can result in data loss or an incorrect response to a 
query. And I certainly wouldn’t run it.

Since we have a programmatic repro within just a few seconds, this should not 
take long to root-cause.

On Friday, Alex worked to get this reproducing on a Cassandra branch rather 
than via unstaged changes. We should have a published / shareable example with 
details near the beginning of the week.

– Scott

On Nov 4, 2023, at 10:17 AM, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

I think before we cut a beta we need to have diagnosed and fixed 18993 
(assuming it is a bug).
Before a beta? I could see that for rc or GA definitely, but having a known 
(especially non-regressive) data loss bug in a beta seems like it's compatible 
with the guarantees we're providing for it: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FRelease%2BLifecycle=05%7C01%7CStefan.Miklosovic%40netapp.com%7C83db318dc59d4ace3ded08dbdeb4d68b%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638348640501244920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=lfB59qRc64YbPS9vGECYUYm4j2YHtwMQNe%2FiqafSQTk%3D=0>

This release is recommended for test/QA clusters where short(order of minutes) 
downtime during upgrades is not an issue

On Sat, Nov 4, 2023, at 12:56 PM, Ekaterina Dimitrova wrote:
Totally agree with the others. Such an issue on its own should be a priority in 
any release. Looking forward to the reproduction test mentioned on the ticket.

Thanks to Alex for his work on harry!

On Sat, 4 Nov 2023 at 12:47, Benedict 
mailto:bened...@apache.org>> wrote:
Alex can confirm but I think it actually turns out to be a new bug in 5.0, but 
either way we should not cut a release with such a serious potential known 
issue.

> On 4 Nov 2023, at 16:18, J. D. Jordan 
> mailto:jeremiah.jor...@gmail.com>> wrote:
>
> Sounds like 18993 is not a regression in 5.0? But present in 4.1 as well?  
> So I would say we should fix it with the highest priority and get a new 4.1.x 
> released. Blocking 5.0 beta voting is a secondary issue to me if we have a 
> “data not being returned” issue in an existing release?
>
>> On Nov 4, 2023, at 11:09 AM, Benedict 
>> mailto:bened...@apache.org>> wr

Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

2023-11-06 Thread Miklosovic, Stefan via dev

I can't view it either.

From: guo Maxwell 
Sent: Monday, November 6, 2023 11:40
To: dev@cassandra.apache.org
Subject: Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Do I need permission to view this link? When I open it, an error appears, 
saying “It may have been deleted or you don't have permission to view it.”

Benjamin Lerer mailto:b.le...@gmail.com>> 于2023年11月6日周一 
18:34写道：
I created a Dashboard to track the progress and remaining tasks for 5.0: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=593
Everybody logged in should have access. Ping me if it is not the case.

Le sam. 4 nov. 2023 à 19:54, Mick Semb Wever 
mailto:m...@apache.org>> a écrit :

Please mark such bugs with fixVersion 5.0-beta

If there are no more tickets that need API changes (i.e. those that should be 
marked fixVersion 5.0-alpha) this then indicates we do not need a 5.0-alpha3 
release and can focus towards 5.0-beta1 (regardless of having blockers open to 
it).

Appreciate the attention 18993 is getting – we do have a shortlist of beta 
blockers that we gotta prioritise !

On Sat, 4 Nov 2023 at 18:33, Benedict 
mailto:bened...@apache.org>> wrote:
Yep, data loss bugs are not any old bug. I’m concretely -1 (binding) releasing 
a beta with one that’s either under investigation or confirmed.

As Scott says, hopefully it won’t come to that - the joy of deterministic 
testing is this should be straightforward to triage.

On 4 Nov 2023, at 17:30, C. Scott Andreas 
mailto:sc...@paradoxica.net>> wrote:

I’d happily be the first to vote -1(nb) on a release containing a known and 
reproducible bug that can result in data loss or an incorrect response to a 
query. And I certainly wouldn’t run it.

Since we have a programmatic repro within just a few seconds, this should not 
take long to root-cause.

On Friday, Alex worked to get this reproducing on a Cassandra branch rather 
than via unstaged changes. We should have a published / shareable example with 
details near the beginning of the week.

– Scott

On Nov 4, 2023, at 10:17 AM, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

I think before we cut a beta we need to have diagnosed and fixed 18993 
(assuming it is a bug).
Before a beta? I could see that for rc or GA definitely, but having a known 
(especially non-regressive) data loss bug in a beta seems like it's compatible 
with the guarantees we're providing for it: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

This release is recommended for test/QA clusters where short(order of minutes) 
downtime during upgrades is not an issue

On Sat, Nov 4, 2023, at 12:56 PM, Ekaterina Dimitrova wrote:
Totally agree with the others. Such an issue on its own should be a priority in 
any release. Looking forward to the reproduction test mentioned on the ticket.

Thanks to Alex for his work on harry!

On Sat, 4 Nov 2023 at 12:47, Benedict 
mailto:bened...@apache.org>> wrote:
Alex can confirm but I think it actually turns out to be a new bug in 5.0, but 
either way we should not cut a release with such a serious potential known 
issue.

> On 4 Nov 2023, at 16:18, J. D. Jordan 
> mailto:jeremiah.jor...@gmail.com>> wrote:
>
> Sounds like 18993 is not a regression in 5.0? But present in 4.1 as well?  
> So I would say we should fix it with the highest priority and get a new 4.1.x 
> released. Blocking 5.0 beta voting is a secondary issue to me if we have a 
> “data not being returned” issue in an existing release?
>
>> On Nov 4, 2023, at 11:09 AM, Benedict 
>> mailto:bened...@apache.org>> wrote:
>>
>> I think before we cut a beta we need to have diagnosed and fixed 18993 
>> (assuming it is a bug).
>>
 On 4 Nov 2023, at 16:04, Mick Semb Wever 
 mailto:m...@apache.org>> wrote:
>>>
>>> 

 With the publication of this release I would like to switch the
 default 'latest' docs on the website from 4.1 to 5.0.

Re: Releasing of Cassandra 3.x / 4.x

2023-11-03 Thread Miklosovic, Stefan via dev

I would release 4.1.4 and 4.0.12 before December and once 5.0.0 is out we would 
do the last releases of 3.0.30 and 3.11.17. That would be the very last 
releases on these branches ever.

From: Maxim Muzafarov 
Sent: Friday, November 3, 2023 23:05
To: dev@cassandra.apache.org
Subject: Re: Releasing of Cassandra 3.x / 4.x

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

+1

you've mentioned some important fixes earlier [1], and we are waiting
for them as well :-)

[1] https://issues.apache.org/jira/browse/CASSANDRA-18773

On Fri, 3 Nov 2023 at 22:55, Miklosovic, Stefan via dev
 wrote:
>
> Hi list,
>
> is anybody against cutting some 3.x and 4.x releases? I think that is nice to 
> do before summit. The last 4.x were released late July, 3.0 in the middle of 
> May. There is quite a lot of changes in these branches.
>
> I can release it all.
>
> What is your opinion?
>
> Regards

Releasing of Cassandra 3.x / 4.x

2023-11-03 Thread Miklosovic, Stefan via dev

Hi list,

is anybody against cutting some 3.x and 4.x releases? I think that is nice to 
do before summit. The last 4.x were released late July, 3.0 in the middle of 
May. There is quite a lot of changes in these branches.

I can release it all.

What is your opinion?

Regards

Re: Immediately Deprecated Code

2023-10-31 Thread Miklosovic, Stefan via dev

Do I understand it correctly that this is basically the case of "deprecated on 
introduction" as we know that it will not be necessary the very next version?

I think that not everybody is upgrading from version to version as they appear. 
If somebody upgrades from 4.0 to 5.1 (which we seem to support) (1) and you 
would have introduced the deprecation in 4.0 with intention to remove it in 5.0 
and somebody jumps to 5.1 straight away, is not that a problem? Because they 
have not made the move via 5.0 where you upgrade logic was triggered.

(1) 
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java#L97-L108


From: Claude Warren, Jr via dev 
Sent: Tuesday, October 31, 2023 10:57
To: dev
Cc: Claude Warren, Jr
Subject: Immediately Deprecated Code

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I was thinking about code that is used to migrate from one version to another.  
For example the code that rewrote the order of the hash values used for Bloom 
filters.  That code was necessary for the version it was coded in.  But the 
next version does not need that code because the next version is not going to 
read the data from 2 versions prior to itself.  So the code could be removed 
for verson+1.

So, would it have made sense to annotate those methods (and variables) as 
deprecated since the version they were written in so the methods/variables can 
be removed in the next version?

If so, what I propose is that all transitional methods and variable be marked 
as deprecated with the version in which they were introduced.

Re: Removal of deprecations added in Cassandra 3.x

2023-10-30 Thread Miklosovic, Stefan via dev

Sure we can do that just for trunk. No problem with that. Hence, I am parking 
this effort for a while.

From: Mick Semb Wever 
Sent: Monday, October 30, 2023 22:56
To: dev@cassandra.apache.org
Subject: Re: Removal of deprecations added in Cassandra 3.x

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

> similarly as for Cassandra 1.x and 2.x deprecations removal done in 
> CASSANDRA-18959, you are welcome to comment on the removal of all stuff 
> deprecated in 3.x (1).
>
> If nobody objects after couple days I would like to proceed to the actual 
> removal. Please tell me if you want something to keep around.
>

I have concerns, but I won't block.

I would like to propose we focus on getting to a 5.0-beta1 release.
To do that we should be stopping all work on cassandra-5.0 that isn't
about stabilisation.

Can this land in trunk instead ?
How much work is in front of us to get to 5.0-beta1 ?  (Please add
fixVersion 5.0-beta for stabilisation work.)

Removal of deprecations added in Cassandra 3.x

2023-10-30 Thread Miklosovic, Stefan via dev

Hi,

similarly as for Cassandra 1.x and 2.x deprecations removal done in 
CASSANDRA-18959, you are welcome to comment on the removal of all stuff 
deprecated in 3.x (1).

If nobody objects after couple days I would like to proceed to the actual 
removal. Please tell me if you want something to keep around.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18975

Thanks

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-30 Thread Miklosovic, Stefan via dev

OK fair enough, I am taking that part back.

From: Alex Petrov 
Sent: Monday, October 30, 2023 11:45
To: dev
Cc: Miklosovic, Stefan
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

> I also do not understand what has some session about TCM and Accord in "New 
> Orleans" to do with it.

Stefan, please make sure to read the context, and re-read my message, since it 
seems like you have completely misinterpreted it. My message states clearly 
that I was responding to someone saying that there were talks that mentioned 
some seemingly misleading timelines. I have checked the talks by the people who 
were involved in the feature development and found no trace of that. So not 
only have I not said that someone should have been in New Orleans to be aware 
of the status, I have also said that there was nothing said in those talks 
about the timelines that could have mislead anyone who did happen to be in New 
Orleans to hear that.

On Mon, Oct 30, 2023, at 11:28 AM, Miklosovic, Stefan via dev wrote:
I could not agree more with what Benjamin just wrote.

It is truly more about the visibility of the progress. If one looks at this 
(1), well, that seems like a pretty much finished epic, isn't it? If we make ML 
and Jira the only official sources of the truth, then there is no mentioning 
whatoever that there is any delay (or really, just point me to something which 
tells the opposite and I will gladly eat my humble pie).

I also do not understand what has some session about TCM and Accord in "New 
Orleans" to do with it. Does it mean that only people who are going to New 
Orleans are about to learn what the actual progress is? I just read the ML and 
Jira tickets.

It is questionable how to track this, maybe a commiter's digest would serve the 
role just perfectly as Maxim suggested. Josh's summaries are also super nice to 
use for this. I always find it helpful to see the overall picture and I can not 
thank Josh enough for writing it down. He always spices it up with his own 
writing style which I find engaging but that is just an observation.

(1) 
https://issues.apache.org/jira/browse/CASSANDRA-18330<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-18330=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cc975f7da86354b65643d08dbd93571b5%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638342595813973401%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=u1IAj4WibjhRfSDe103x1xEiiHck2T2PvwGrMdDaaTs%3D=0>

Regards

From: Benjamin Lerer mailto:ble...@apache.org>>
Sent: Monday, October 30, 2023 10:45
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

The CEP process + dedicated development channels on ASF slack + public JIRA's + 
feature branches in the ASF repo we've seen with specifically TCM and Accord 
are the most transparent this kind of development has ever been on this 
project, and I'd argue right at the sweet spot or past where the degree of 
reaching out to external parties to get feedback starts to not just hit 
diminishing returns but starts to actively hurt a small group of peoples' 
ability to make rapid progress on something.

I feel that there are several misunderstanding going on. When I am talking 
about visibility, I am not talking specifically of Accord or TCM. I just do not 
believe that we are doing a good job generally in term of visibility, me 
included.
When I talk about visibility, I am also not talking about how a feature is 
supposed to work or its internals. I think that a really good job was made on 
that front.

By visibility, what I am referring to is progress visibility. A lot of features 
have been pushed in the project at an advanced state or near complete state. 
There are sometime good reasons for it as the feature might have already been 
existing somewhere in a fork for quite some time already. The issue with that 
approach is that it makes thing hard to anticipate and usually take everybody 
not involved by surprise. This make it hard for us as a community to organize 
ourselves.
The project is changing and there are more and more dependencies between 
features/improvements. We had the case this time with Java 17 support being 
needed for Accord and ANN, with ANN being build on top of SAI and Accord on top 
of TCM.
I expect more and more dependencies coming in the future. Getting a reviews

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-30 Thread Miklosovic, Stefan via dev

 in a while. Even the feedback on the CEP itself, which was published in 
April 2023, was minimal. There were multiple sessions about the TCM and Accord 
in New Orleans in 2023, and the interested parties (including many folks form 
this discussion) couldn't help but learn about their status and progress. 
Still, there was very little engagement (which, I claim, is absolutely fine). 
So, since one can't say that we (collectively) are not pubslihing CEPs and code 
early enough, the only argument is that the people choose to prioritise things 
based on what is important for their businesses today, and this is, again, 
completely fine.

If you are interested in a CEP, make sure you engage with its authors from the 
first time they publish something. There are many patches and CEPs I wish I 
have reviewed, but did not have time for. For those, I am reading the available 
discussions, talking to their authors, and writing Harry tests. I would not, 
however, ask someone to postpone a feature based on my past or future 
availability.

On Fri, Oct 27, 2023, at 10:14 AM, Jacek Lewandowski wrote:
I've been thinking about this and I believe that if we ever decide to delay a 
release to include some CEPs, we should make the plan and status of those CEPs 
public. This should include publishing a branch, creating tickets for the 
remaining work required for feature completion in Jira, and notifying the 
mailing list.

By doing this, we can make an informed decision about whether delivering a CEP 
in a release x.y planned for some time z is feasible. This approach would also 
be beneficial for improving collaboration, as we will all be aware of what is 
left to be done and can adjust our focus accordingly to participate in the 
remaining work.

Thanks,
- - -- --- -  -
Jacek Lewandowski


pt., 27 paź 2023 o 10:26 Benjamin Lerer 
mailto:ble...@apache.org>> napisał(a):
I would be interested in testing Maxim's approach. We need more visibility on 
big features and their progress to improve our coordination. Hopefully it will 
also open the door to more collaboration on those big projects.

Le jeu. 26 oct. 2023 à 21:35, German Eichberger via dev 
mailto:dev@cassandra.apache.org>> a écrit :
+1 to Maxim's idea

Like Stefan my assumption was that we would get some version of TCM + ACCORD in 
5.0 but it wouldn't be ready for production use. My own testing and 
conversations at Community over Code in Halifax confirmed this.

From this perspective as disappointing as TCM+ACCORD slipping is moving it to 
5.1 makes sense and I am supporting of this - but I am worried if 5.1 is 
basically 5.0 + TCM/ACCORD and this slips again we draw ourselves into a corner 
where we can't release 5.2 before 5.1 or something. I would like some more 
elaboration on that.

I am also very worried about ANN vector search being in jeopardy for 5.0 which 
is an important feature for me to win some internal company bet 

My 2 cents,
German




From: Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>>
Sent: Thursday, October 26, 2023 4:23 AM
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Cc: Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>>
Subject: [EXTERNAL] Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut 
an immediate 5.1-alpha1)

What Maxim proposes in the last paragraph would be definitely helpful. Not for 
the project only but for a broader audience, companies etc., too.

Until this thread was started, my assumption was that "there will be 5.0 on 
summit with TCM and Accord and it somehow just happens". More transparent 
communication where we are at with high-profile CEPs like these and knowing if 
deadlines are going to be met would be welcome.

I don't want to be that guy and don't take me wrong here, but really, these 
CEPs are being developed, basically, by devs from two companies, which have 
developers who do not have any real need to explain themselves like what they 
do, regularly, to outsiders. (or maybe you do, you just don't have time?) I get 
that. But on the other hand, you can not realistically expect that other folks 
will have any visibility into what is going on there and that there is a delay 
on the horizon and so on.


From: Maxim Muzafarov mailto:mmu...@apache.org>>
Sent: Thursday, October 26, 2023 12:21
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Personally, I think frequent releases (2-3 per year) are better than
infrequent big releases. I can understand all the concerns from a
marketing perspective, as smaller major releases may not shin

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-26 Thread Miklosovic, Stefan via dev

What Maxim proposes in the last paragraph would be definitely helpful. Not for 
the project only but for a broader audience, companies etc., too.

Until this thread was started, my assumption was that "there will be 5.0 on 
summit with TCM and Accord and it somehow just happens". More transparent 
communication where we are at with high-profile CEPs like these and knowing if 
deadlines are going to be met would be welcome.

I don't want to be that guy and don't take me wrong here, but really, these 
CEPs are being developed, basically, by devs from two companies, which have 
developers who do not have any real need to explain themselves like what they 
do, regularly, to outsiders. (or maybe you do, you just don't have time?) I get 
that. But on the other hand, you can not realistically expect that other folks 
will have any visibility into what is going on there and that there is a delay 
on the horizon and so on.

From: Maxim Muzafarov 
Sent: Thursday, October 26, 2023 12:21
To: dev@cassandra.apache.org
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Personally, I think frequent releases (2-3 per year) are better than
infrequent big releases. I can understand all the concerns from a
marketing perspective, as smaller major releases may not shine as
brightly as a single "game changer" release. However, smaller
releases, especially if they don't have backwards compatibility
issues, are better for the engineering and SRE teams because if a
long-awaited feature is delayed for any reason, there should be no
worry about getting it in right into the next release.

An analogy here might be that if you miss your train (small release)
due to circumstances, you can wait right here for the next one, but if
you miss a flight (big release), you will go back home :-) This is why
I think that the 5.0, 5.1, 5.2, etc. are better and I support Mick's
plan with the caveat that we should release 5.1 when we think we are
ready to do so. Here is an example of the Postgres releases [1].

[1] https://bucardo.org/postgres_all_versions.html

Another little thing that I'd like to mention is a release management
story. In the Apache Ignite project, we've got used to creating a
release thread and posting the release status updates and/or problems,
and/or delays there, and maybe some of the benchmarks at the end. Of
course, this was done by the release manager who volunteered to do
this work. I'm not saying we're doing anything wrong here, no, but the
publicity and openness, coupled with regular updates, could help
create a real sense of the remaining work in progress. These are my
personal feelings, and definitely not actions to be taken. The example
is here: [2].

[2] https://lists.apache.org/thread/m11m0nxq701f2cj8xxdcsc4nnn2sm8ql

On Thu, 26 Oct 2023 at 11:15, Benjamin Lerer  wrote:
>>
>> Regarding the release of 5.1, I understood the proposal to be that we cut an 
>> actual alpha, thereby sealing the 5.1 release from new features. Only 
>> features merged before we cut the alpha would be permitted, and the alpha 
>> should be cut as soon as practicable. What exactly would we be waiting for?
>
>
> The problem I believe is about expectations. It seems that your expectation 
> is that a release with only TCM and Accord will reach GA quickly. Based on 
> the time it took us to release 4.1, I am simply expecting more delays (a GA 
> around end of May, June). In which case it seems to me that we could be 
> interested in shipping more stuff in the meantime (thinking of 
> CASSANDRA-15254 or CEP-29 for example).
> I do not have a strong opinion, I just want to make sure that we all share 
> the same understanding and fully understand what we agree upon.
>
> Le jeu. 26 oct. 2023 à 10:59, Benjamin Lerer  a écrit :
>>>
>>> I am surprised this needs to be said, but - especially for long-running 
>>> CEPs - you must involve yourself early, and certainly within some 
>>> reasonable time of being notified the work is ready for broader input and 
>>> review. In this case, more than six months ago.
>>
>>
>> It is unfortunately more complicated than that because six month ago 
>> Ekaterina and I were working on supporting Java 17 and dropping Java 8 which 
>> was needed by different ongoing works. We both missed the announcement that 
>> TCM was ready for review and anyway would not have been available at that 
>> time. Maxim has asked me ages ago for a review of CASSANDRA-15254  more than 
>> 6 months ago and I have not been able to help him so far. We all have a 
>> limited bandwidth and can miss some announcements.
>>
>> The project has grown and a lot of things are going on in parallel. There 
>> are also more interdependencies between the different projects. In my 
>> opinion what we are lacking is a

[DISCUSS] mapping of all deprecations after 18912 and removal of deprecations added in Cassandra 1.x and 2.x

2023-10-25 Thread Miklosovic, Stefan via dev

Hi list,

this is the follow-up thread after we discussed the addition of Deprecated 
annotations with "since" in the code. It was merged to 5.0 and trunk under 
18912.

I have added all the mappings under (1). There are tables for each major 
version of Cassandra with links to all places where we deprecated that with 
information whether it is JMX/public facing or nodetool, Config etc.

As we are targeting this for 5.x/trunk, I think we are safe to elaborate on the 
removal of all deprecations which we added in majors 1, 2 and 3 only.

I would like to make this an iterative process. For starters, we might remove 
(if we decide to do so), stuff which was deprecated in 1.x and 2.x to which I 
put various commentary in the ticket.

If you see something which we should not remove, please tell me so, for such 
elements, I will put "forRemoval = false" to respective deprecation annotations.

Regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-18938

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Miklosovic, Stefan via dev

To double check the reasoning behind this proposal:

1) is 5.1 going to contain only TCM / Accord when it comes to new features? In 
other words, 5.1, except these two, will only ever contain bugfixes from older 
branches (merging them up) or fixes for TCM / Accord itself (which will be 
eventually merged to trunk)? If that is all true, will we create 6.0 in trunk 
or trunk will be 5.2?

I think it is a nice-to-have. 5.1.0 will be just vanilla TCM / Accord on top of 
5.0.

2) Do we drop the support of 3.0 and 3.11 after 5.0.0 is out or after 5.1.0 is?

3) When I understand our current deprecation policy correctly, everything which 
is deprecated in 3.11 included and older is eligible to be removed in 5.x. If 
we manage to remove some deprecations in 5.0.0 and there are some leftovers in 
5.1, am I still OK to remove the rest in 5.1.x or do I need to wait until 6.0 
is made? (I think the answer is "yes", I can remove 3.x stuff in whatever 5.x).

Regards



From: Mick Semb Wever 
Sent: Monday, October 23, 2023 13:51
To: dev
Subject: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




The TCM work (CEP-21) is in its review stage but being well past our cut-off 
date¹ for merging, and now jeopardising 5.0 GA efforts, I would like to propose 
the following.

We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut an 
immediate 5.1-alpha1 release.

I see this as a win-win scenario for us, considering our current situation.  
(Though it is unfortunate that Accord is included in this scenario because we 
agreed it to be based upon TCM.)

This will mean…
 - We get to focus on getting 5.0 to beta and GA, which already has a ton of 
features users want.
 - We get an alpha release with TCM and Accord into users hands quickly for 
broader testing and feedback.
 - We isolate GA efforts on TCM and Accord – giving oss and downstream 
engineers time and patience reviewing and testing.  TCM will be the biggest 
patch ever to land in C*.
 - Give users a choice for a more incremental upgrade approach, given just how 
many new features we're putting on them in one year.
 - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all 4.x 
versions, just as if it had landed in 5.0.


The risks/costs this introduces are
 - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch, and at 
some point decide to undo this work, while we can throw away the cassandra-5.1 
branch we would need to do a bit of work reverting the changes in trunk.  This 
is a _very_ edge case, as confidence levels on the design and implementation of 
both are already tested and high.
 - We will have to maintain an additional branch.  I propose that we treat the 
5.1 branch in the same maintenance window as 5.0 (like we have with 3.0 and 
3.11).  This also adds the merge path overhead.
 - Reviewing of TCM and Accord will continue to happen post-merge.  This is not 
our normal practice, but this work will have already received its two +1s from 
committers, and such ongoing review effort is akin to GA stabilisation work on 
release branches.


I see no other ok solution in front of us that gets us at least both the 5.0 
beta and TCM+Accord alpha releases this year.  Keeping in mind users demand to 
start experimenting with these features, and our Cassandra Summit in December.


1) 
https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

How I look at it is that if we clearly and explicitly specify "since" and 
"forRemoval" for all of our @Deprecated annotations, then it does really matter 
who is the consumer of that?

Imagine a scenario where there is some tool which puts cassandra-all on the 
class path and then it cherry-picks this and that from it. Once it is 
deprecated and it is explicitly visible since when and there is a clear policy 
what will happen with it in next release (some documetation on the website or 
similar), then is it really us to blame that their code will broke in the next 
release if they don't clean it up? I don't think so. External tooling should 
not take it for granted that what is there will be there for ever and what is 
deprecated will have its expiration date, again, if not explicitly said 
otherwise.


From: Josh McKenzie 
Sent: Friday, October 13, 2023 15:36
To: dev
Cc: Miklosovic, Stefan
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



If some piece of code is not used anymore then simplifying the code is the best 
thing to do
In the case of unused / unreferenced, sure. In the case of "other things use 
this but we shouldn't add any more dependencies on this because we need to 
remove it", a @Deprecated annotation w/version, reason, etc could be pretty 
useful.

Also - my instinct is that we have a lot of stuff in our ecosystem that depends 
on public methods in the codebase (I assume sidecar, bulk writer / reader, CDC 
clients though I tried to provide a formal API there, etc. etc) and I for one 
would be receptive to discussions on dev@ for the things people in the 
ecosystem have taken dependencies on so we can discuss whether or not to a) 
formally support those, and/or b) wrap an actual API around them so we can 
decouple those signatures from implementation.

Our lack of rigor around what's a public API and what's not combined with our 
historic default posture of "none of it's an API, if you depend on it it's on 
you and we'll break it, also we don't provide many public extension points nor 
do we provide more than the core functionality of the DB in our ecosystem so 
have fun" may not be the optimal posture for us in terms of ecosystem adoption 
+ long-term maintenance burden. I realize we've done this in the name of us 
being able to be as productive as possible working on the core DB itself, but 
I'm not entirely convinced it's actually the most productive path tbh.

Go slow to go fast, invest to reap returns, etc.

On Fri, Oct 13, 2023, at 9:16 AM, Miklosovic, Stefan via dev wrote:
I forgot the round #3.

That would consist of an ant task which would scan the source. Since we 
enforced that each Deprecation annotation has to have its "since" on compile 
time, we can write a parser in that task which would tell you what you have to 
do in order to be sure that your next release will not contain any stuff which 
should not be there. E.g. when we release 6.0, all 4.0 stuff can go away etc ...

________
From: Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>>
Sent: Friday, October 13, 2023 15:00
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Cc: Miklosovic, Stefan
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




OK. So here we are ... round 1 will be to map how bad it is, round 2 will be 
the removal of what should not be there. I am not sure if round 2 will be done 
before 5.0 is out (that would be ideal, to release 5.0 without a lot of baggage 
like that) so it will be better if we split this effort into two parts.


From: Benjamin Lerer mailto:b.le...@gmail.com>>
Sent: Friday, October 13, 2023 14:45
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Ok, thanks Stefan I understand the context better now. Looking at the PR.
Some make sense also for serialization reasons but some make no sense to me.


Le ven. 13 oct. 2023 à 14:26, Benjamin Lerer 
mailto:b.le...@gmail.com><mailto:b.le...@gmail.com<mailto:b.le...@gmail.com>>>
 a écrit :
I’ve been told in the past not to remove public methods in a patch release 
though.

Then I am curious to get the rationale behind that. If some piece of code is 
not used anymore then simplifying the code is the best thing to do. It makes 
maintenance easi

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

I forgot the round #3.

That would consist of an ant task which would scan the source. Since we 
enforced that each Deprecation annotation has to have its "since" on compile 
time, we can write a parser in that task which would tell you what you have to 
do in order to be sure that your next release will not contain any stuff which 
should not be there. E.g. when we release 6.0, all 4.0 stuff can go away etc ...

____
From: Miklosovic, Stefan via dev 
Sent: Friday, October 13, 2023 15:00
To: dev@cassandra.apache.org
Cc: Miklosovic, Stefan
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




OK. So here we are ... round 1 will be to map how bad it is, round 2 will be 
the removal of what should not be there. I am not sure if round 2 will be done 
before 5.0 is out (that would be ideal, to release 5.0 without a lot of baggage 
like that) so it will be better if we split this effort into two parts.


From: Benjamin Lerer 
Sent: Friday, October 13, 2023 14:45
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Ok, thanks Stefan I understand the context better now. Looking at the PR.
Some make sense also for serialization reasons but some make no sense to me.


Le ven. 13 oct. 2023 à 14:26, Benjamin Lerer 
mailto:b.le...@gmail.com>> a écrit :
I’ve been told in the past not to remove public methods in a patch release 
though.

Then I am curious to get the rationale behind that. If some piece of code is 
not used anymore then simplifying the code is the best thing to do. It makes 
maintenance easier and avoids mistakes.
Le ven. 13 oct. 2023 à 14:11, Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>> a écrit :
Maybe for better understanding what we talk about, there is the PR which 
implements the changes suggested here (1)

It is clear that @Deprecated is not used exclusively on JMX / Configuration but 
we use it internally as well. This is a very delicate topic and we need to go, 
basically, one by one.

I get that there might be some kind of a "nervousness" around this as we strive 
for not breaking it unnecessarily so there might be a lot of exceptions etc and 
I completely understand that but what I lack is clear visibility into what we 
plan to do with it (if anything).

There is deprecated stuff as old as Cassandra 1.2 / 2.0 (!!!) and it is really 
questionable if we should not just get rid of that once for all. I am OK with 
keeping it there if we decide that, but we should provide some additional 
information like when it was deprecated and why it is necessary to keep it 
around otherwise the code-base will bloat and bloat ...

(1) 
https://github.com/apache/cassandra/pull/2801/files<https://github.com/apache/cassandra/pull/2801/files>


From: Mick Semb Wever mailto:m...@apache.org>>
Sent: Friday, October 13, 2023 13:51
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





On Fri, 13 Oct 2023 at 13:07, Benjamin Lerer 
mailto:ble...@apache.org><mailto:ble...@apache.org<mailto:ble...@apache.org>>>
 wrote:
I was asking because outside of configuration parameters and JMX calls, the 
approach as far as I remember was to just change things without using an 
annotation.


Yes, it is my understanding that such deprecation is only needed on 
methods/objects that belong to some API/SPI component of ours that requires 
compatibility.  This is much more than configuration and JMX, and can be quite 
subtle in areas.   A failed attempt I started at this is here: 
https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning<https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning><https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning<https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning>>

But there will also be internal methods/objects marked as deprecated that 
relate back to these compatibility concerns, annotated because their connection 
and removal might not be so obvious when the time comes.

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

OK. So here we are ... round 1 will be to map how bad it is, round 2 will be 
the removal of what should not be there. I am not sure if round 2 will be done 
before 5.0 is out (that would be ideal, to release 5.0 without a lot of baggage 
like that) so it will be better if we split this effort into two parts.

From: Benjamin Lerer 
Sent: Friday, October 13, 2023 14:45
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Ok, thanks Stefan I understand the context better now. Looking at the PR.
Some make sense also for serialization reasons but some make no sense to me.

Le ven. 13 oct. 2023 à 14:26, Benjamin Lerer 
mailto:b.le...@gmail.com>> a écrit :
I’ve been told in the past not to remove public methods in a patch release 
though.

Then I am curious to get the rationale behind that. If some piece of code is 
not used anymore then simplifying the code is the best thing to do. It makes 
maintenance easier and avoids mistakes.
Le ven. 13 oct. 2023 à 14:11, Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>> a écrit :
Maybe for better understanding what we talk about, there is the PR which 
implements the changes suggested here (1)

It is clear that @Deprecated is not used exclusively on JMX / Configuration but 
we use it internally as well. This is a very delicate topic and we need to go, 
basically, one by one.

I get that there might be some kind of a "nervousness" around this as we strive 
for not breaking it unnecessarily so there might be a lot of exceptions etc and 
I completely understand that but what I lack is clear visibility into what we 
plan to do with it (if anything).

There is deprecated stuff as old as Cassandra 1.2 / 2.0 (!!!) and it is really 
questionable if we should not just get rid of that once for all. I am OK with 
keeping it there if we decide that, but we should provide some additional 
information like when it was deprecated and why it is necessary to keep it 
around otherwise the code-base will bloat and bloat ...

(1) 
https://github.com/apache/cassandra/pull/2801/files<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fpull%2F2801%2Ffiles=05%7C01%7CStefan.Miklosovic%40netapp.com%7C13df2c6740cf4bb2832e08dbcbea5924%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638327979607217759%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=A8pSKjQJL894lvHB0RYn32U4t8cAIcA36XZ49njMmyI%3D=0>

From: Mick Semb Wever mailto:m...@apache.org>>
Sent: Friday, October 13, 2023 13:51
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Fri, 13 Oct 2023 at 13:07, Benjamin Lerer 
mailto:ble...@apache.org><mailto:ble...@apache.org<mailto:ble...@apache.org>>>
 wrote:
I was asking because outside of configuration parameters and JMX calls, the 
approach as far as I remember was to just change things without using an 
annotation.

Yes, it is my understanding that such deprecation is only needed on 
methods/objects that belong to some API/SPI component of ours that requires 
compatibility.  This is much more than configuration and JMX, and can be quite 
subtle in areas.   A failed attempt I started at this is here: 
https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2F%2528wip%2529%2BCompatibility%2BPlanning=05%7C01%7CStefan.Miklosovic%40netapp.com%7C13df2c6740cf4bb2832e08dbcbea5924%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638327979607217759%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=gWRK8YI0F%2Bn%2BX402hGvQxKF%2FJPXyhzuQMS52IjoxQOw%3D=0><https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2F%2528wip%2529%2BCompatibility%2BPlanning=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cf0af5e7db5e9468faa8c08dbcbe2e9f3%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638327947670748135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=3363U2ZlI%2FasNF0NTMrdZ%2BogB%2FjigmCGt3zqRrIs99Q%3D=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2F%2528wip%2529%2BCompatibility%2BPlanning=05%7C01%7CStefan.Miklosovic%40netapp.com%7C1

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

Maybe for better understanding what we talk about, there is the PR which 
implements the changes suggested here (1)

It is clear that @Deprecated is not used exclusively on JMX / Configuration but 
we use it internally as well. This is a very delicate topic and we need to go, 
basically, one by one.

I get that there might be some kind of a "nervousness" around this as we strive 
for not breaking it unnecessarily so there might be a lot of exceptions etc and 
I completely understand that but what I lack is clear visibility into what we 
plan to do with it (if anything).

There is deprecated stuff as old as Cassandra 1.2 / 2.0 (!!!) and it is really 
questionable if we should not just get rid of that once for all. I am OK with 
keeping it there if we decide that, but we should provide some additional 
information like when it was deprecated and why it is necessary to keep it 
around otherwise the code-base will bloat and bloat ...

(1) https://github.com/apache/cassandra/pull/2801/files


From: Mick Semb Wever 
Sent: Friday, October 13, 2023 13:51
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





On Fri, 13 Oct 2023 at 13:07, Benjamin Lerer 
mailto:ble...@apache.org>> wrote:
I was asking because outside of configuration parameters and JMX calls, the 
approach as far as I remember was to just change things without using an 
annotation.


Yes, it is my understanding that such deprecation is only needed on 
methods/objects that belong to some API/SPI component of ours that requires 
compatibility.  This is much more than configuration and JMX, and can be quite 
subtle in areas.   A failed attempt I started at this is here: 
https://cwiki.apache.org/confluence/display/CASSANDRA/%28wip%29+Compatibility+Planning

But there will also be internal methods/objects marked as deprecated that 
relate back to these compatibility concerns, annotated because their connection 
and removal might not be so obvious when the time comes.

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

OK. That is definitely something to mention when we will approach the second 
phase where  we decide what do with it but I humbly think we are not there yet.

Could you point me some document / ML thread this was explicitly decided in if 
you know of anything like that? It would be great if there was some solid 
guidance on this.

From: Benjamin Lerer 
Sent: Friday, October 13, 2023 13:07
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I was asking because outside of configuration parameters and JMX calls, the 
approach as far as I remember was to just change things without using an 
annotation.

Le ven. 13 oct. 2023 à 12:45, Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>> a écrit :
Hi Benjamin,

in other words, anything we have @Deprecated annotation on top of (or anything 
you want to annotate with it). Does it help with the explanation?

For the initial phase, I plan to just put "since" everywhere (into every 
already existing @Deprecated annotation) and we leave out "forRemoval" in 
Deprecated annotation for now as that is quite tricky to get right.

I am confused what is considered to be removed and what we keep there for ever 
even it is deprecated (referring to what Mick said in this thread that 
forRemoval can not be by default true). After we map what technical debt we 
have, we can summarize this and I bring it to the ML again for further 
discussion what to actually remove and when.

Regards

From: Benjamin Lerer mailto:b.le...@gmail.com>>
Sent: Friday, October 13, 2023 12:19
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I am a bit confused by the starting point of this discussion: "When we 
deprecate APIs / methods"
What are we exactly calling APIs/methods? It is really unclear to me what we 
are talking about here.

Le jeu. 12 oct. 2023 à 02:38, Francisco Guerrero 
mailto:fran...@apache.org><mailto:fran...@apache.org<mailto:fran...@apache.org>>>
 a écrit :

On 2023/10/11 16:59:35 Maxim Muzafarov wrote:
> Francisco,
>
> I agree with your vision of the deprecation comments and actually, I
> think we should recommend doing it that way for the cases where it is
> applicable on our code-style page, but when things get to the
> implementation phase there are some obstacles that are not easy to
> overcome.

Yeah, I agree that this should be recommended rather than enforced via
some checkstyle rule. However, reviewers should be aware of this
recommendation in the code-style page.

>
> So, adding the MissingDeprecated will emphasize to a developer the
> need to describe the deprecation reasons in comments, but
> unfortunately, there is no general pattern that we can enforce for
> every such description message and/or automatically validate its
> meaningfulness. There may be no alternative for a deprecated field, or
> it may simply be marked for deletion, so the pattern is slightly
> different in this case.

+1 for adding the MissingDeprecated rule

> Another problem is how to add meaningful comments to the deprecated
> annotations that we already have in the code, since we can't enforce
> checkstyle rules only on newly added code. This is a very exhausting
> process with no 100% guarantee of accuracy - some of the commits don't
> have a good commit message and require a deep archaeology.

Not aiming for 100% accuracy, but more on code style agreement.

> All of the above led me to the following which is pretty easy to
> achieve and improves the code quality:
>
> /** @deprecated See CASSANDRA-6504 */
> @Deprecated(since = "2.1")
> public Integer concurrent_replicates = null;
>
> On Wed, 11 Oct 2023 at 09:51, Miklosovic, Stefan
> mailto:stefan.mikloso...@netapp.com><mailto:stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com>>>
>  wrote:
> >
> > Here (1) it supports check of both Javadoc and annotation at the same time 
> > so what you want is possible. What is not possible is to checkstyle the 
> > _content_ of deprecated Javadoc nor any format of it. I think that ensuring 
> > the presence of both annotation and Javadoc comment is just enough.
> >
> > (1) 
> > https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html<https://nam04.safelinks.protection

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Miklosovic, Stefan via dev

Hi Benjamin,

in other words, anything we have @Deprecated annotation on top of (or anything 
you want to annotate with it). Does it help with the explanation?

For the initial phase, I plan to just put "since" everywhere (into every 
already existing @Deprecated annotation) and we leave out "forRemoval" in 
Deprecated annotation for now as that is quite tricky to get right.

I am confused what is considered to be removed and what we keep there for ever 
even it is deprecated (referring to what Mick said in this thread that 
forRemoval can not be by default true). After we map what technical debt we 
have, we can summarize this and I bring it to the ML again for further 
discussion what to actually remove and when.

Regards


From: Benjamin Lerer 
Sent: Friday, October 13, 2023 12:19
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I am a bit confused by the starting point of this discussion: "When we 
deprecate APIs / methods"
What are we exactly calling APIs/methods? It is really unclear to me what we 
are talking about here.

Le jeu. 12 oct. 2023 à 02:38, Francisco Guerrero 
mailto:fran...@apache.org>> a écrit :


On 2023/10/11 16:59:35 Maxim Muzafarov wrote:
> Francisco,
>
> I agree with your vision of the deprecation comments and actually, I
> think we should recommend doing it that way for the cases where it is
> applicable on our code-style page, but when things get to the
> implementation phase there are some obstacles that are not easy to
> overcome.

Yeah, I agree that this should be recommended rather than enforced via
some checkstyle rule. However, reviewers should be aware of this
recommendation in the code-style page.

>
> So, adding the MissingDeprecated will emphasize to a developer the
> need to describe the deprecation reasons in comments, but
> unfortunately, there is no general pattern that we can enforce for
> every such description message and/or automatically validate its
> meaningfulness. There may be no alternative for a deprecated field, or
> it may simply be marked for deletion, so the pattern is slightly
> different in this case.


+1 for adding the MissingDeprecated rule

> Another problem is how to add meaningful comments to the deprecated
> annotations that we already have in the code, since we can't enforce
> checkstyle rules only on newly added code. This is a very exhausting
> process with no 100% guarantee of accuracy - some of the commits don't
> have a good commit message and require a deep archaeology.

Not aiming for 100% accuracy, but more on code style agreement.

> All of the above led me to the following which is pretty easy to
> achieve and improves the code quality:
>
> /** @deprecated See CASSANDRA-6504 */
> @Deprecated(since = "2.1")
> public Integer concurrent_replicates = null;
>
> On Wed, 11 Oct 2023 at 09:51, Miklosovic, Stefan
> mailto:stefan.mikloso...@netapp.com>> wrote:
> >
> > Here (1) it supports check of both Javadoc and annotation at the same time 
> > so what you want is possible. What is not possible is to checkstyle the 
> > _content_ of deprecated Javadoc nor any format of it. I think that ensuring 
> > the presence of both annotation and Javadoc comment is just enough.
> >
> > (1) 
> > https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcheckstyle.sourceforge.io%2Fapidocs%2Fcom%2Fpuppycrawl%2Ftools%2Fcheckstyle%2Fchecks%2Fannotation%2FMissingDeprecatedCheck.html=05%7C01%7CStefan.Miklosovic%40netapp.com%7C59fa2b3786ff436c83ba08dbcbd5ece7%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638327891917050879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8qKu8ob%2BvPdHfUQdkxr5C%2BgkR5iMcUaEqw9a%2FNN276k%3D=0>
> >
> > 
> > From: Francisco Guerrero mailto:fran...@apache.org>>
> > Sent: Tuesday, October 10, 2023 23:34
> > To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
> > Subject: Re: [DISCUSS] putting versions into Deprecated annotations
> >
> > NetApp Security WARNING: This is an external email. Do not click links or 
> > open attachments unless you recognize the sender and know the content is 
> > safe.
> >
> >
> >
> >
> > To me this seems insufficient. As a developer, I'd like to see what the 
> > alternative is when reading the javadoc without having to go to Jira

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-11 Thread Miklosovic, Stefan

I hope I am not nitpicking here but there is also "@see" annotation which could 
contain that JIRA ticket.

/**
 * @see https://issues.apache.org/jira/browse/CASSANDRA-123;>CASSANDRA-123
 */

Doing ctrl+q (at least that is how I have it in IDEA) will show Javadoc for 
such javadoc'ed element and you can just click to that directly and it will 
open a tab for you in a browser. I am not sure there is a faster way to get to 
that.

From: Maxim Muzafarov 
Sent: Wednesday, October 11, 2023 18:59
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Francisco,

I agree with your vision of the deprecation comments and actually, I
think we should recommend doing it that way for the cases where it is
applicable on our code-style page, but when things get to the
implementation phase there are some obstacles that are not easy to
overcome.

So, adding the MissingDeprecated will emphasize to a developer the
need to describe the deprecation reasons in comments, but
unfortunately, there is no general pattern that we can enforce for
every such description message and/or automatically validate its
meaningfulness. There may be no alternative for a deprecated field, or
it may simply be marked for deletion, so the pattern is slightly
different in this case.

Another problem is how to add meaningful comments to the deprecated
annotations that we already have in the code, since we can't enforce
checkstyle rules only on newly added code. This is a very exhausting
process with no 100% guarantee of accuracy - some of the commits don't
have a good commit message and require a deep archaeology.

All of the above led me to the following which is pretty easy to
achieve and improves the code quality:

/** @deprecated See CASSANDRA-6504 */
@Deprecated(since = "2.1")
public Integer concurrent_replicates = null;

On Wed, 11 Oct 2023 at 09:51, Miklosovic, Stefan
 wrote:
>
> Here (1) it supports check of both Javadoc and annotation at the same time so 
> what you want is possible. What is not possible is to checkstyle the 
> _content_ of deprecated Javadoc nor any format of it. I think that ensuring 
> the presence of both annotation and Javadoc comment is just enough.
>
> (1) 
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
>
> 
> From: Francisco Guerrero 
> Sent: Tuesday, October 10, 2023 23:34
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
> To me this seems insufficient. As a developer, I'd like to see what the 
> alternative is when reading the javadoc without having to go to Jira.
>
> What I would prefer is to know what the alternative is and how to use it. For 
> example:
>
> /** @deprecated Use {@link #alternative} instead. See CASSANDRA-6504 */
> @Deprecated(since = "2.1")
> public Integer concurrent_replicates = null;
>
> I am not sure if checkstyle can enforce the above, so the mechanisms to 
> enforce it would still need to be laid out, unless we can easily support 
> something like the above with checkstyle rules.
>
> On 2023/10/10 20:34:27 Maxim Muzafarov wrote:
> > Hello everyone,
> >
> >
> > I've discussed with Stefan some steps we can take to improve the final
> > solution, so the final version might look like this:
> >
> > /** @deprecated See CASSANDRA-6504 */
> > @Deprecated(since = "2.1")
> > public Integer concurrent_replicates = null;
> >
> > The issue number will be taken from the git blame comment. I doubt I
> > can generate and/or create a meaningful comment for every deprecation
> > annotation, but providing a link to the issue that was retrieved from
> > the git blame is not too big a problem. This also improves the
> > visibility.
> >
> > In addition, we can add two checkstyle rules [1] [2] to ensure that
> > any future deprecations will have a "since" element and a JavaDoc
> > comment.
> > WDYT?
> >
> > [1] 
> > https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
> > [2] 
> > https://checkstyle.org/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/MatchXpathCheck.html
> >
> > On Tue, 10 Oct 2023 at 14:50, Josh McKenzie  wrote:
> > >
> > >

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-11 Thread Miklosovic, Stefan

Here (1) it supports check of both Javadoc and annotation at the same time so 
what you want is possible. What is not possible is to checkstyle the _content_ 
of deprecated Javadoc nor any format of it. I think that ensuring the presence 
of both annotation and Javadoc comment is just enough.

(1) 
https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html


From: Francisco Guerrero 
Sent: Tuesday, October 10, 2023 23:34
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




To me this seems insufficient. As a developer, I'd like to see what the 
alternative is when reading the javadoc without having to go to Jira.

What I would prefer is to know what the alternative is and how to use it. For 
example:

/** @deprecated Use {@link #alternative} instead. See CASSANDRA-6504 */
@Deprecated(since = "2.1")
public Integer concurrent_replicates = null;

I am not sure if checkstyle can enforce the above, so the mechanisms to enforce 
it would still need to be laid out, unless we can easily support something like 
the above with checkstyle rules.

On 2023/10/10 20:34:27 Maxim Muzafarov wrote:
> Hello everyone,
>
>
> I've discussed with Stefan some steps we can take to improve the final
> solution, so the final version might look like this:
>
> /** @deprecated See CASSANDRA-6504 */
> @Deprecated(since = "2.1")
> public Integer concurrent_replicates = null;
>
> The issue number will be taken from the git blame comment. I doubt I
> can generate and/or create a meaningful comment for every deprecation
> annotation, but providing a link to the issue that was retrieved from
> the git blame is not too big a problem. This also improves the
> visibility.
>
> In addition, we can add two checkstyle rules [1] [2] to ensure that
> any future deprecations will have a "since" element and a JavaDoc
> comment.
> WDYT?
>
> [1] 
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html
> [2] 
> https://checkstyle.org/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/MatchXpathCheck.html
>
> On Tue, 10 Oct 2023 at 14:50, Josh McKenzie  wrote:
> >
> > Sounds like we're relitigating the basics of how @Deprecated, forRemoval, 
> > since, and javadoc @link all intersect to make deprecation less painful ;)
> >
> > So:
> >
> > Built-in java.lang.Deprecated: required
> > Can use since and forRemoval if you have that info handy and think it'd be 
> > useful (would make it a lot easier to grep for things to pull before a 
> > major)
> > If it's being replaced by something, you should {@link #} the javadoc for 
> > it so people know where to bounce over to
> >
> > I've been leaning pretty heavily on the functionality of point 3 for 
> > documenting cross-module implicit dependencies as I come across them lately 
> > so that one resonates with me.
> >
> > On Tue, Oct 10, 2023, at 4:38 AM, Miklosovic, Stefan wrote:
> >
> > OK.
> >
> > Let's go with in-built java.lang.Deprecated annotation. If somebody wants 
> > to document that in more detail, there are Javadocs as mentioned. Let's 
> > just stick with the standard stuff.
> >
> > I will try to implement this for 5.0 (versions since it was deprecated) 
> > with my take on what should be removed (forRemoval = true) but that should 
> > be definitely cross-checked on review as Mick mentioned.
> >
> > 
> > From: Mick Semb Wever 
> > Sent: Monday, October 9, 2023 10:55
> > To: dev@cassandra.apache.org
> > Subject: Re: [DISCUSS] putting versions into Deprecated annotations
> >
> > NetApp Security WARNING: This is an external email. Do not click links or 
> > open attachments unless you recognize the sender and know the content is 
> > safe.
> >
> >
> >
> > Tangential question to this is if everything we deprecated is eligible for 
> > removal? In other words, are there any cases when forRemoval would be 
> > false? Could you elaborate on that and give such examples or do you all 
> > think that everything which is deprecated will be eventually removed?
> >
> >
> > Removal cannot be default.  This came up in the subtickets of 
> > CASSANDRA-18306.
> >
> > I suggest that adding " forRemoval = true" and the later actual removal of 
> > the code both require broader consensus.  I'm open to that being on the 
> > ticket or needing a thread on the ML.  Small stuff, common sense says on 
> > the ticket is enough, but a few folk have already stated that deprecated 
> > code that has minimal maintenance overhead should not be removed.
> >
> >
>

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-10 Thread Miklosovic, Stefan

OK.

Let's go with in-built java.lang.Deprecated annotation. If somebody wants to 
document that in more detail, there are Javadocs as mentioned. Let's just stick 
with the standard stuff.

I will try to implement this for 5.0 (versions since it was deprecated) with my 
take on what should be removed (forRemoval = true) but that should be 
definitely cross-checked on review as Mick mentioned.


From: Mick Semb Wever 
Sent: Monday, October 9, 2023 10:55
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Tangential question to this is if everything we deprecated is eligible for 
removal? In other words, are there any cases when forRemoval would be false? 
Could you elaborate on that and give such examples or do you all think that 
everything which is deprecated will be eventually removed?


Removal cannot be default.  This came up in the subtickets of CASSANDRA-18306.

I suggest that adding " forRemoval = true" and the later actual removal of the 
code both require broader consensus.  I'm open to that being on the ticket or 
needing a thread on the ML.  Small stuff, common sense says on the ticket is 
enough, but a few folk have already stated that deprecated code that has 
minimal maintenance overhead should not be removed.

Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-06 Thread Miklosovic, Stefan

If we want to use some description (of type String), we would need to introduce 
a brand new annotation instead of java.lang.Deprecated as that one does not 
have it.

I am in favor of custom annotation instead of using Javadocs for this kind of 
technical documentation. An annotation seems to be more succinct, even it is a 
custom one, rather that using comments for it.

On the other hand, I am not sure how we ensure that developers use this custom 
annotation instead of the in-built one. java.lang package is not a package to 
be imported so we can not have a checkstyle rule for it.


From: Francisco Guerrero 
Sent: Saturday, October 7, 2023 0:54
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] putting versions into Deprecated annotations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




> Might be nice to support a 3rd param that's a String for the reason it's 
> deprecated.

Javadocs offers this natively

/**
 * @deprecated Use instance method {@link #newMethod(Param1, Param2...)} 
instead.
 */
@Deprecated

So we could leverage javadocs for this purpose

On 2023/10/06 11:49:52 Josh McKenzie wrote:
> Might be nice to support a 3rd param that's a String for the reason it's 
> deprecated. i.e. "Replaced by X",  "Unmaintained", "Obsolete", "See 
> CASSANDRA-N", link to a dev ML thread on pony mail, etc. That way if 
> someone comes across it in the codebase they have some context to follow up 
> on if it's the shape of a thing they need w/out having to go full-bore w/git 
> blame and JQL.
>
> On Fri, Oct 6, 2023, at 4:43 AM, Miklosovic, Stefan wrote:
> > Hi list,
> >
> > I have a ticket to discuss (1).
> >
> > When we deprecate APIs / methods etc, what I want to suggest is that we 
> > might start to explicitly add the version when that happened. For example, 
> > if you deprecated something which goes to 5.0, would you be so nice to do 
> > this?
> >
> > @Deprecated(since = "5.0")
> >
> > Similarly, that annotation offers one more field - forRemoval, so using it 
> > like this:
> >
> > @Deprecated(since = "5.0", forRemoval = true)
> >
> > means that this is eligible to be deleted in Cassandra 6.0.
> >
> > With this information, it is way more comfortable to just "grep" where we 
> > are at when it comes to deprecations eligible to be deleted in the next 
> > version. Currently, we basically have to go one by one and figure out if it 
> > is not old enough to remove. I believe this would bring more transparency 
> > into what is planned to be removed and when as well it will be clearly 
> > visible what should be removed in the next version and it is not.
> >
> > Tangential question to this is if everything we deprecated is eligible for 
> > removal? In other words, are there any cases when forRemoval would be 
> > false? Could you elaborate on that and give such examples or do you all 
> > think that everything which is deprecated will be eventually removed?
> >
> > (1) https://issues.apache.org/jira/browse/CASSANDRA-18912
> >
> > Thanks and regards
>

[DISCUSS] putting versions into Deprecated annotations

2023-10-06 Thread Miklosovic, Stefan

Hi list,

I have a ticket to discuss (1). 

When we deprecate APIs / methods etc, what I want to suggest is that we might 
start to explicitly add the version when that happened. For example, if you 
deprecated something which goes to 5.0, would you be so nice to do this?

@Deprecated(since = "5.0") 

Similarly, that annotation offers one more field - forRemoval, so using it like 
this: 

@Deprecated(since = "5.0", forRemoval = true) 

means that this is eligible to be deleted in Cassandra 6.0. 

With this information, it is way more comfortable to just "grep" where we are 
at when it comes to deprecations eligible to be deleted in the next version. 
Currently, we basically have to go one by one and figure out if it is not old 
enough to remove. I believe this would bring more transparency into what is 
planned to be removed and when as well it will be clearly visible what should 
be removed in the next version and it is not. 

Tangential question to this is if everything we deprecated is eligible for 
removal? In other words, are there any cases when forRemoval would be false? 
Could you elaborate on that and give such examples or do you all think that 
everything which is deprecated will be eventually removed?

(1) https://issues.apache.org/jira/browse/CASSANDRA-18912

Thanks and regards

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Miklosovic, Stefan

Would it be possible to make Jimfs integration production-ready then? I see we 
are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a 
type of workload / type of nodes with plenty of RAM but no disk, some kind of 
compute nodes, it would just hold it all in memory and we might "flush" it to a 
cloud-based storage if rendered to be not necessary anymore (whatever that 
means).

We could then completely bypass the memtables as fetching data from an SSTable 
from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not 
sure what exactly we would gain here. However, if it was eventually storing 
these SSTables in a cloud storage, we might "compact" "TWCS tables" 
automatically after so-and-so period by moving them there.

From: Jake Luciani 
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external 
storage locations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>

--
http://twitter.com/tjake

CASSANDRA-18773 compaction speedup

2023-09-26 Thread Miklosovic, Stefan

Hi list,

there is CASSANDRA-18773 we want to merge to 4.0 up to trunk (hence it will be 
in 5.0 (alpha2)) and I want to be sure we are all OK with that (especially for 
that 5.0 alpha release).

The patch is significantly speeding up the compaction throughput for cases when 
you have a lot of SSTables in a key-value table without secondary index.

My colleague Cameron Zemek has identified and fixed the issue together with 
help of Branimir Lambov. 

It is a little bit hard to believe but for cases when your table contains 
thousands of SSTables and it does not have any 2i's, (tested on around cca 2500 
SSTables), we saw the speedup of 50x (fifty times) on compaction throughput for 
major compactions. It is also, reportedly, affecting operations when switching 
from STCS to LCS.

As mentioned, we plan to merge this to 4.0, 4.1, 5.0 and trunk.

Any objections to that?

Regards

Re: [Discuss] disabling io.netty.transport.noNative in tests

2023-09-07 Thread Miklosovic, Stefan

Thank you for your insights. I created (1) to track the work / progress.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18830

From: Jon Meredith 
Sent: Thursday, September 7, 2023 15:42
To: dev@cassandra.apache.org
Subject: Re: [Discuss] disabling io.netty.transport.noNative in tests

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I think the Native dependencies were disabled for in-jvm Netty because they 
prevented the in-jvm dtest InstanceClassLoaders from being garbage collected 
and were a source of out-of-metaspace exceptions.  I'll echo Alex's comment 
that you will also need to investigate in-jvm upgrade tests. I'm not sure if 
it's possible to load two different versions of native libraries concurrently.

Perhaps the netty code has changed and we can re-enable, or perhaps you can 
determine what was not being released by the native code -- that would be much 
better and as Alex says more reflective of the common environment.

To check if it is now safe, you can use the ResourceLeakTest - you may have to 
comment out a few @Ignores - the previous bar was for the looperGossiperNetwork 
test to complete 100 loops.

Jon.

On Wed, Sep 6, 2023 at 9:32 AM Alex Petrov 
mailto:al...@coffeenco.de>> wrote:
I think most of the time people actually use netty _with_ native. This might 
have been introduced when we were tried to make shaded in-JVM dtest jars. If 
all tests are passing, and we actually do have a confirmtion that native Netty 
is being used, I would say +1 to remove `noNative`.

Just to make sure though, did you have a chance to see if the upgrade tests 
also work fine?

On Thu, Aug 31, 2023, at 1:20 PM, Miklosovic, Stefan wrote:
Hi list,

Currently, we are skipping the usage of native libraries in Netty as part of 
testing here (1).

In 5.0 branch, we upgraded Netty to 4.1.96 and we brought all native 
dependencies to the class path so they are there in runtime (x86, arm, mac).

I conducted few CI tests for 5.0+ and not having "io.netty.transport.noNative" 
set to "true" introduces no errors. I think we were just too motivated here to 
skip stuff left and right. Having this property enabled seems to have no 
functional effect. Also, one negative side-effect of having this property 
enabled is that it logs exceptions when running in-jvm-dtests e.g. in IDEA and 
it pollutes the logs unnecessarily and it is just a visual clutter to deal with 
every time. To silence this, I set (2) so it skips the logic in (3) completely 
hence no un-necessary logging will occure.

My question is whether we should not remove (4) in 5.0, that means that tests 
will use native libraries too. That also means that we are running tests closer 
to a production environment. I just do not see any reason why we should skip 
this when all tests are just passing with it too with additional benefit of not 
seeing an exception logged every time when testing it locally.

Thanks

(1) 
https://github.com/apache/cassandra-in-jvm-dtest-api/blob/trunk/src/main/java/org/apache/cassandra/distributed/api/ICluster.java#L95-L102<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra-in-jvm-dtest-api%2Fblob%2Ftrunk%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fcassandra%2Fdistributed%2Fapi%2FICluster.java%23L95-L102=05%7C01%7CStefan.Miklosovic%40netapp.com%7C4f869e92cf9944a9fa2808dbafa85d46%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638296909901595158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=G8PiVroWMG0%2FFpfnvEJ1sEkJeofTGVqBUSQ29dmChWU%3D=0>
(2) 
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L196<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2Ftrunk%2Ftest%2Fdistributed%2Forg%2Fapache%2Fcassandra%2Fdistributed%2Fimpl%2FAbstractCluster.java%23L196=05%7C01%7CStefan.Miklosovic%40netapp.com%7C4f869e92cf9944a9fa2808dbafa85d46%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638296909901595158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=5CeimCMEMkRckfThR%2BKHLnroKFzhE%2B8pcs%2FGBa8SEH0%3D=0>
(3) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/NativeTransportService.java#L163<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2Ftrunk%2Fsrc%2Fjava%2Forg%2Fapache%2Fcassandra%2Fservice%2FNativeTransportService.java%23L163=05%7C01%7CStefan.Miklosovic%40netapp.com%7C4f869e92cf9944a9fa2808dbafa85d46%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638296909901595158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Bb6mQK6

Re: [DISCUSS] Update default disk_access_mode to mmap_index_only on 5.0

2023-09-06 Thread Miklosovic, Stefan

I wonder why disk_access_mode property is not in cassandra.yaml (looking into 
trunk right now). Do you all think we can add it there with brief explanation 
what each option does?

From: Caleb Rackliffe 
Sent: Wednesday, September 6, 2023 21:08
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Update default disk_access_mode to mmap_index_only on 5.0

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

+100 to this

We'd have to come up w/ a pretty compelling counterexample to NOT switch the 
default to mmap_index_only at this point.

On Wed, Sep 6, 2023 at 11:40 AM Brandon Williams 
mailto:dri...@gmail.com>> wrote:
Given 
https://issues.apache.org/jira/browse/CASSANDRA-17237
 I think it
makes sense.  At the least I think we should restore disk_access_mode
so that users are more aware of the options available.

Kind Regards,
Brandon

On Wed, Sep 6, 2023 at 10:50 AM Paulo Motta 
mailto:pauloricard...@gmail.com>> wrote:
>
> Hi,
>
> I've been bitten by OOMs with disk_access_mode:auto/mmap that were fixed by 
> changing to disk_access_mode:mmap_index_only. In a particular benchmark I got 
> 5x more read throughput on 3.11.x with disk_access_mode: mmap_index_only vs 
> disk_access_mode: auto/mmap.
>
> Changing disk_access_mode to mmap_index_only seems to be a common 
> recommendation on forums[1][2][3][4] and slack (find by searching 
> disk_access_mode in the #cassandra channel on 
> https://the-asf.slack.com/).
>
> It's not clear to me when using the default disk_access_mode:auto/mmap is 
> beneficial, perhaps only when the read set fits in memory? Mick seems to 
> think on CASSANDRA-15531 [5], that mmap_index_only has a higher heap cost and 
> should be only used when warranted. However it's not uncommon to see people 
> being bitten with OOMs or lower read performance due to the default 
> disk_access_mode, so it makes me think it's not the best fool-proof default.
>
> Should we consider changing default "auto" behavior of "disk_access_mode" to 
> be "mmap_index_only" instead of "mmap" in 5.0 since it's likely safer and 
> perhaps more performant?
>
> Thanks,
>
> Paulo
>
> [1] 
> https://stackoverflow.com/questions/72272035/troubleshooting-and-fixing-cassandra-oom-issue
> [2] 
> https://phabricator.wikimedia.org/T137419
> [3] 
> https://stackoverflow.com/a/55975471
> [4] 
>

[Discuss] disabling io.netty.transport.noNative in tests

2023-08-31 Thread Miklosovic, Stefan

Hi list,

Currently, we are skipping the usage of native libraries in Netty as part of
testing here (1).

In 5.0 branch, we upgraded Netty to 4.1.96 and we brought all native
dependencies to the class path so they are there in runtime (x86, arm, mac).

I conducted few CI tests for 5.0+ and not having "io.netty.transport.noNative"
set to "true" introduces no errors. I think we were just too motivated here to
skip stuff left and right. Having this property enabled seems to have no
functional effect. Also, one negative side-effect of having this property
enabled is that it logs exceptions when running in-jvm-dtests e.g. in IDEA and
it pollutes the logs unnecessarily and it is just a visual clutter to deal with
every time. To silence this, I set (2) so it skips the logic in (3) completely
hence no un-necessary logging will occure.

My question is whether we should not remove (4) in 5.0, that means that tests
will use native libraries too. That also means that we are running tests closer
to a production environment. I just do not see any reason why we should skip
this when all tests are just passing with it too with additional benefit of not
seeing an exception logged every time when testing it locally.

Thanks

(1)
https://github.com/apache/cassandra-in-jvm-dtest-api/blob/trunk/src/main/java/org/apache/cassandra/distributed/api/ICluster.java#L95-L102
(2)
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/impl/AbstractCluster.java#L196
(3)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/NativeTransportService.java#L163
(4)
https://github.com/apache/cassandra-in-jvm-dtest-api/blob/trunk/src/main/java/org/apache/cassandra/distributed/api/ICluster.java#L101

Re: [Discuss] Enabling JMX in in-jvm dtests (by default)

2023-08-25 Thread Miklosovic, Stefan

Great. So, we are going to remove legacy solution / mocking of JMX, nodetool 
will interact only via a proper JMX connection and a developer is required to 
enable JMX feature to be able to execute nodetool commands. If JMX feature is 
not present an exception is thrown saying it has to be enabled to execute 
nodetool commands.

From: Doug Rohrer 
Sent: Friday, August 25, 2023 16:02
To: dev@cassandra.apache.org
Subject: Re: [Discuss] Enabling JMX in in-jvm dtests (by default)

You don't often get email from droh...@apple.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I’d agree that anywhere we’re calling `nodetoolResult` or `nodetool` in a test, 
it would be better to enable JMX and use it rather than the older mocks we set 
up to enable calling the mbeans directly. I don’t think enabling JMX by default 
is the right way to go mostly due to the added resources/time required to run 
the tests (it’s only a few seconds of additional startup/shutdown time, but 
when running lots of tests every second counts).  Also, all other features are 
only enabled when requested, so making JMX on by default would require us to 
change the general pattern and have a `without` method to turn off a feature?

Better, I think, just to require it to be explicitly turned on and then have 
the methods that call into nodetool on Instance just throw a clear exception if 
jmx is disabled.

Doug

On Aug 25, 2023, at 6:35 AM, Brandon Williams  wrote:

I would prefer to have one standard way to do it, and given the
options I would prefer it be proper JMX instead of mocking.

Kind Regards,
Brandon

On Fri, Aug 25, 2023 at 4:20 AM Miklosovic, Stefan
 wrote:

Hi list,

I want to gather a feedback for this comment (1).

Long story short, until JMX feature was introduced, we kind of hacked / mocked 
the calls to MBeans from IInstance, like this (2). If you notice, there is a 
lot of methods throwing UnsupportedOperationException because we had no proper 
JMX connection in place. That in turn means that tests which call nodetool 
commands which are using these MBeans / operations are not possible.

The fix I made in CASSANDRA-18572 will use JMX feature and it will hook 
nodetool to a proper JMX connection where we are not mocking anything etc ... 
It will use same stuff as in production.

However, this is happening only if one uses JMX feature. So all existing tests 
calling nodetool without this feature will still use it like it was. The patch 
I made takes care of both scenarios.

My question is if we should not make JMX feature turned on by default. That way 
we might further simplify the code base and get rid of the hacks.

Another possibility is to not turn it on by default but we would add JMX 
feature to each test which is using nodetool. That would also mean that any 
future test which will use nodetool will fail if it does not have JMX feature 
enabled.

What would you like to see - dual solution (proper JMX connection if such 
feature is used as well as the legacy way) or only one solution with a proper 
JMX? (enabled by default or not).

Regards

(1) 
https://issues.apache.org/jira/browse/CASSANDRA-18572?focusedCommentId=17758920=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17758920
(2) 
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/mock/nodetool/InternalNodeProbe.java

[Discuss] Enabling JMX in in-jvm dtests (by default)

2023-08-25 Thread Miklosovic, Stefan

Hi list,

I want to gather a feedback for this comment (1).

Long story short, until JMX feature was introduced, we kind of hacked / mocked
the calls to MBeans from IInstance, like this (2). If you notice, there is a
lot of methods throwing UnsupportedOperationException because we had no proper
JMX connection in place. That in turn means that tests which call nodetool
commands which are using these MBeans / operations are not possible.

The fix I made in CASSANDRA-18572 will use JMX feature and it will hook
nodetool to a proper JMX connection where we are not mocking anything etc ...
It will use same stuff as in production.

However, this is happening only if one uses JMX feature. So all existing tests
calling nodetool without this feature will still use it like it was. The patch
I made takes care of both scenarios.

My question is if we should not make JMX feature turned on by default. That way
we might further simplify the code base and get rid of the hacks.

Another possibility is to not turn it on by default but we would add JMX
feature to each test which is using nodetool. That would also mean that any
future test which will use nodetool will fail if it does not have JMX feature
enabled.

What would you like to see - dual solution (proper JMX connection if such
feature is used as well as the legacy way) or only one solution with a proper
JMX? (enabled by default or not).

Regards

(1)
https://issues.apache.org/jira/browse/CASSANDRA-18572?focusedCommentId=17758920=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17758920
(2)
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/mock/nodetool/InternalNodeProbe.java

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-21 Thread Miklosovic, Stefan

reviews when I am back.
> >>
> >> On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov 
> >> mailto:mmu...@apache.org>> wrote:
> >>
> >> Yes, I agree. The javadoc task should be part of our CI if we decide
> >> to keep it, to keep it buildable at all times.
> >>
> >>
> >> BTW, I have managed to fix all the javadoc errors.
> >> I have tested the task for both jdk11 and jdk17.
> >>
> >> Changes are here:
> >> https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fcompare%2Ftrunk...Mmuzaf%3Acassandra%3Ajavadoc_build=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cd3af424ebf634336d60108dba2715985%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638282379959793803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=gso3XJX0tibpVKXACsBq%2Ft0Xq7L0TlUkgEuIsYImchU%3D=0>
> >>
> >> On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova 
> >> mailto:e.dimitr...@gmail.com>> wrote:
> >> >
> >> > Thank you Maxim,
> >> >
> >> > “
> >> >
> >> > From my point of
> >> > view, the problem is that the javadoc task is not given the attention
> >> > it deserves. The failonerror is currently 'false' and the task itself
> >> > is not a part of any build and/or release processes, correct me if I'm
> >> > wrong.
> >> >
> >> > So,
> >> > 1. Fix warnings/errors;
> >> > 2. Make the javadoc task part of the build (e.g. put it under
> >> > 'artifacts'), or make it part of the release process that is regularly
> >> > checked on the CI;
> >> > 3. Publish/deploy the javadoc htmls for release in the special
> >> > directory of the cassandra website to give them a chance of being
> >> > indexed;“
> >> >
> >> > This is aligned with what I saw and the two options mentioned at the 
> >> > beginning - if we decide to keep it we should fix things and add the 
> >> > task to CI, if we don’t because no one wants the html pages - then 
> >> > better to remove it this ant task.
> >> > On your comment about 100 errors - it seems they are more. There is a 
> >> > cap of 100 but when you fix them, more errors appear.
> >> > Further discussion can be found at CASSANDRA-17687
> >> >
> >> > On Thu, 3 Aug 2023 at 14:21, Maxim Muzafarov 
> >> > mailto:mmu...@apache.org>> wrote:
> >> >>
> >> >> Personally, I find javadocs quite useful, especially when htmls are
> >> >> indexed by search engines, which in turn increases the chances of
> >> >> finding the right answer faster (I have seen a lot of useful javadocs
> >> >> in the source code).
> >> >>
> >> >> I have done a quick build of the javadocs:
> >> >>
> >> >>   [javadoc] Building index for all the packages and classes...
> >> >>   [javadoc] Building index for all classes...
> >> >>   [javadoc] Building index for all classes...
> >> >>   [javadoc] 100 errors
> >> >>   [javadoc] 100 warnings
> >> >>
> >> >> 100 errors is no big deal and can be easily fixed. From my point of
> >> >> view, the problem is that the javadoc task is not given the attention
> >> >> it deserves. The failonerror is currently 'false' and the task itself
> >> >> is not a part of any build and/or release processes, correct me if I'm
> >> >> wrong.
> >> >>
> >> >> So,
> >> >> 1. Fix warnings/errors;
> >> >> 2. Make the javadoc task part of the build (e.g. put it under
> >> >> 'artifacts'), or make it part of the release process that is regularly
> >> >> checked on the CI;
> >> >> 3. Publish/deploy the javadoc htmls for release in the special
> >> >> directory of the cassandra website to give them a chance of being
> >> >> indexed;
> >> >>
> >> >> On Thu, 3 Aug 2023 at 17:11, Jeremiah Jordan 
> >> >> mailto:jeremiah.jor...@gmail.com>> wrote:
> >> >> >
> >> >> > I don’t think anyone wants to remove the javadocs.  This thread is 
> >> >> > about removing the broken ant task which generates html files from 
> >> >> > them.
> >>

[RELEASE] Apache Cassandra 3.11.16 released

2023-08-20 Thread Miklosovic, Stefan

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 3.11.16.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 https://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 https://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, please pay 
attention to the release notes[2] and let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/311x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-3.11.16/CHANGES.txt
[2]: NEWS.txt 
https://github.com/apache/cassandra/blob/cassandra-3.11.16/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RESULT][VOTE] Release Apache Cassandra 3.11.16 - second attempt

2023-08-18 Thread Miklosovic, Stefan

The vote passed with 3 binding and 2 non-binding +1s.

Re: [DISCUSSION] CASSANDRA-18772 - removal of commons-codec on trunk

2023-08-17 Thread Miklosovic, Stefan

I would remove it all in 5.0 but that's just me. I do not think that the 
deprecation is a must and it is just unnecessary exercise and we are just red 
taping here.

Major releases are good for dropping the "baggage" like this. Do we really want 
to wait until 6.0 is out to cut off the dead weight?


From: Ekaterina Dimitrova 
Sent: Thursday, August 17, 2023 19:10
To: dev
Subject: [DISCUSSION] CASSANDRA-18772 - removal of commons-codec on trunk

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi everyone,

I propose we remove commons-codec on trunk.
The only usage I found was from 
CASSANDRA-12790 - 
Support InfluxDb metrics reporter configuration, which relied on commons-codec 
and metrics-reporter-config, which will be removed as part of CASSANDRA-18743.
The only question is whether we can remove those two dependencies on trunk, 
considering it is 5.1, or do we need to wait until 6.0.

Best regards,
Ekaterina

[VOTE] Release Apache Cassandra 3.11.16 - SECOND ATTEMPT

2023-08-15 Thread Miklosovic, Stefan

This is the second attempt to pass the vote after [1] is fixed.

Proposing the test build of Cassandra 3.11.16 for release.

sha1: 681b6ca103d91d940a9fecb8cd812f58dd2490d0
Git: https://github.com/apache/cassandra/tree/3.11.16-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1306/org/apache/cassandra/cassandra-all/3.11.16/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: https://issues.apache.org/jira/browse/CASSANDRA-18751
[2]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt
[3]: NEWS.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt

[RESULT][VOTE] Release Apache Cassandra 3.11.16

2023-08-14 Thread Miklosovic, Stefan

The vote has not passed successfully because of (1) not being done.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18751

Re: [VOTE] Release Apache Cassandra 3.11.16

2023-08-14 Thread Miklosovic, Stefan

-1 as we can not pull Java 11 as a dependency for RPM package when 3.11 is 
being installed.

We need to fix this, stage and vote once again.

Ticket for tracking this work is here 
https://issues.apache.org/jira/browse/CASSANDRA-18751


From: Mick Semb Wever 
Sent: Monday, August 14, 2023 15:50
To: dev@cassandra.apache.org
Subject: Re: [VOTE] Release Apache Cassandra 3.11.16

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.


Checked
- signing correct
- checksums are correct
- source artefact builds
- binary artefact runs
- debian package runs
- debian repo installs and runs

While the following fails
- redhat (almalinux) repo installs and  runs

as it appears that the rpm package now pulls in java 11 when java 8 is already 
installed.
slack thread: 
https://the-asf.slack.com/archives/CK23JSY2K/p1692011649614379

Re: [Discuss] CEP-35: Add PIP support for CQLSH

2023-08-11 Thread Miklosovic, Stefan

One nice benefit of a CQLSH PIP package which was omitted in this discussion is 
that it is "Python-version-agnostic". What I mean by that is that the way how 
we currently package CQLSH in RPM is that the container it is produced in is 
using Python 3.6 so the produced RPM will run, believe or not, only on distros 
with Python 3.6. See (1) for more details.

To solve this problem without a PIP package, we would need to start to build 
RPMs per supported Python version. I briefly looked into what Python versions 
are present in the most popular RPM distros and the most prevalent are 3.6.x, 
3.9.x and 3.11.x.

I personaly think that solving this problem by producing 3 RPMs instead of one 
is quite impractical but it seems like currently we do not have any other 
option.

If we had an official PIP package, I can imagine that we would not ship CQLSH 
in RPM at all (maybe not in DEB either?) so we would decouple this. A PIP 
package is installable almost anywhere (if it is Python 3, that is the way how 
I solved the problem in 18642, I just installed a PIP package because RPM 
installation was broken).

On the other hand, a user should be able to just download what we ship, extract 
it, run the db and connect to it. All being done out of the box. Hence I think 
we should still ship CQLSH sources within Cassandra tarball but it might be 
installable locally from the tarball like:

pip install /where/my/cassandra/tarbal/is/extracted/cqlshpackage

This would search for setup.py / project.toml, then it would build the wheel 
and it would install it locally if one wishes to do so.

I do not think that depending on PIP in 2023 is a lot to ask for. PIP was made 
an official package manager in Python years ago.

Another problem I see is that how do we say what CQLSH is compatible with what 
Cassandra release? If we shipped CQLSH as a PIP package as part of the tarball, 
we would guarantee that they play together. If it is living somewhere online, 
how can be people sure that what they install is compatible with Cassandra they 
run? I am sorry if this was already explained somewhere.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18642

Regards

From: Dinesh Joshi 
Sent: Wednesday, August 9, 2023 21:31
To: dev@cassandra.apache.org
Subject: Re: [Discuss] CEP-35: Add PIP support for CQLSH

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Brad,

Thanks for starting this discussion. My understanding is that we're
simply adding pip support for cqlsh and Apache Cassandra project will
officially publish a cqlsh pip package. This is a good goal but other
than having an official pip package, what is it that we're gaining?
Please don't interpret this as push back on your proposal but I am
unclear on what we're trying to solve by making this official
distribution. There are several distribution channels and it is
untenable to officially support all of them.

If we do adopt this, there will be non-zero overhead of the release
process. This is fine but we need volunteers to run this process. My
understanding is that they need to be ideally PMC or at least Committers
on the project to go through all the steps to successfully release a new
artifact for our users.

I would have liked this CEP to go a bit further than just packaging
cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
doesn't need to live in the cassandra repo. Extracting cqlsh into it's
separate repo would allow us to truly decouple cqlsh from the server.
This is already true for the most part as we rely on the Python driver
which is compatible with several cassandra releases. As it stands today
it is not possible for us to update cqlsh without making a Cassandra
release.

If you truly want to go a bit further, we should consider rewriting
cqlsh in Java so we can easily share code from the server. We can then
potentially use Java Native Image[1] to produce a truly platform
independent binary like golang. Python has its strengths but it does get
hairy as it expects certain runtime components on the target. Java With
Native Image we make things very simple from a user's perspective very
similar to how golang produces statically linked binaries. This might be
a very far out thought but it is worth exploring. I believe GraalVM's
license might allow us to produce binaries that we can incorporate in
our release but IANAL so maybe we can ask ASF legal on their opinion.

Giving cqlsh it's own identity as a sub-project might help us build a
roadmap and evolve it along these lines.

I would like other folks to chime in with their opinions.

Dinesh

On 8/9/23 09:18, Brad wrote:
>
> As per the CEP process guidelines, I'm starting a formal DISCUSS thread
> to resume the conversation started here[1].
>
> The developers who maintain the Python CQLSH client on the official
> Python PYPI repository would like to

[VOTE] Release Apache Cassandra 3.11.16

2023-08-10 Thread Miklosovic, Stefan

Proposing the test build of Cassandra 3.11.16 for release.

sha1: f86929eae086aa108cf58ee0164c3d12a59ad4af
Git: https://github.com/apache/cassandra/tree/3.11.16-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1305/org/apache/cassandra/cassandra-all/3.11.16/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt

[ANNOUNCE] Apache Cassandra 3.11.16 test artifact available

2023-08-09 Thread Miklosovic, Stefan

The test build of Cassandra 3.11.16 is available.

sha1: f86929eae086aa108cf58ee0164c3d12a59ad4af
Git: https://github.com/apache/cassandra/tree/3.11.16-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1305/org/apache/cassandra/cassandra-all/3.11.16/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/3.11.16/

A vote of this test build will be initiated within the next couple of days.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/3.11.16-tentative/NEWS.txt

Timing of the last releases of Cassandra 3.0.x / 3.11.x

2023-08-09 Thread Miklosovic, Stefan

Hi,

with 5.0 getting closer, when do we plan to release the last releases of 3.0.x 
and 3.11.x?

There is a user on Slack asking for a release  of 3.11.16 because of 16555.

If we release right now, we might potentially do one more release before 5.0 is 
GA. 

Do you think it makes sense to release now and then do the last one or we just 
wait until 5.0 is out so we do not release twice? It seems like there will not 
be a lot of fixes in 3.0 / 3.11 anymore so we will basically release nothing 
new there ... 

Thanks

Re: Removal of commitlog_sync_batch_window_in_ms in 5.0

2023-08-07 Thread Miklosovic, Stefan

Since there is no response / nobody seems to see this as an issue, I am going 
to remove it (will be removed in 5.0).

From: Miklosovic, Stefan
Sent: Wednesday, August 2, 2023 21:57
To: dev@cassandra.apache.org
Subject: Removal of commitlog_sync_batch_window_in_ms in 5.0

Hello list,

I want to double check this one (1) on ML.

It is relatively an innocent low-hanger however the caveat is that it might 
potentially break the upgrade to 5.0. The deprecation happened in (2) (in 4.0).

I think it is just eligible for deletion now. This property was commented out 
and it is effectively not used. There is even the comment about this (3).

Other option is to leave it deprecated. While this might work, I think this is 
quite a precedence, isn't it? Are there any other configuration parameters we 
will live with for ever even they are not used? It seems strange to me that we 
would just keep this one deprecated for good. Do we apply this rule to all 
other properties from now on then? I am afraid the config would be bloated a 
little bit after some time ... I think that waiting one major and removing it 
is a good compromise.

(1) https://issues.apache.org/jira/browse/CASSANDRA-17161
(2) https://issues.apache.org/jira/browse/CASSANDRA-13530
(3) https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L545-L546

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-02 Thread Miklosovic, Stefan

That is a good idea. I would like to have Javadocs valid when going through 
them in IDE. To enforce it, we would have to fix it first. If we find a way how 
to validate Javadocs without actually rendering them, that would be cool.

There is a lot of legacy and rewriting of some custom-crafted formatting of 
some comments might be quite a tedious task to do if it is required to have 
them valid. I am in general for valid documentation and even enforcing it but 
what to do with what is already there ...

From: Jacek Lewandowski 
Sent: Wednesday, August 2, 2023 23:38
To: dev@cassandra.apache.org
Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

With or without outputting JavaDoc to HTML, there are some errors which we 
should maybe fix. We want to keep the documentation, but there can be syntax 
errors which may prevent IDE generating a proper preview. So, the question is - 
should we validate the JavaDoc comments as a precommit task? Can it be done 
without actually generating HTML output?

Thanks,
Jacek

śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker 
mailto:de...@chen-becker.org>> napisał:
Oh, whoops, I guess I'm the only one that thinks Javadoc is just the tool 
and/or it's output (not the markup itself) :P If anything, the codebase could 
use a little more package/class/method markup in some places, so I'm definitely 
only in favor of getting rid of the ant task. I should amend my statement to be 
"...I suspect most people are not opening their browsers and looking at 
Javadoc..." :)

Cheers,

Derek

On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:
most people are not looking at Javadoc when working on the codebase.
I definitely use it extensively inside the IDE. But never as a compiled set of 
external docs.

Which is to say, I'm +1 on removing the target and I'd ask everyone to keep 
javadoccing your classes and methods where things are non-obvious or there's a 
logical coupling with something else in the system. :)

On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
+1. If a need comes up for Javadoc we can fix it at that point, but I suspect 
most people are not looking at Javadoc when working on the codebase.

Cheers,

Derek

On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams 
mailto:dri...@gmail.com>> wrote:
I don't think even if it works anyone is going to use the output, so
I'm good with removal.

Kind Regards,
Brandon

On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
mailto:e.dimitr...@gmail.com>> wrote:
>
> Hi everyone,
> We were looking into a user report around our ant javadoc task recently.
> That made us realize it is not run in CI; it finishes successfully even if 
> there are hundreds of errors, some potentially breaking doc pages.
>
> There was a ticket discussion where a few community members mentioned that 
> this task was probably unnecessary. Can we remove it, or shall we fix it?
>
> Best regards,
> Ekaterina

--
+---+
| Derek Chen-Becker |
| GPG Key available at 
https://keybase.io/dchenbecker
 and   |
| 
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
 |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+

Removal of commitlog_sync_batch_window_in_ms in 5.0

2023-08-02 Thread Miklosovic, Stefan

Hello list,

I want to double check this one (1) on ML.

It is relatively an innocent low-hanger however the caveat is that it might 
potentially break the upgrade to 5.0. The deprecation happened in (2) (in 4.0). 

I think it is just eligible for deletion now. This property was commented out 
and it is effectively not used. There is even the comment about this (3).

Other option is to leave it deprecated. While this might work, I think this is 
quite a precedence, isn't it? Are there any other configuration parameters we 
will live with for ever even they are not used? It seems strange to me that we 
would just keep this one deprecated for good. Do we apply this rule to all 
other properties from now on then? I am afraid the config would be bloated a 
little bit after some time ... I think that waiting one major and removing it 
is a good compromise.

(1) https://issues.apache.org/jira/browse/CASSANDRA-17161
(2) https://issues.apache.org/jira/browse/CASSANDRA-13530
(3) https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L545-L546

Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-08-01 Thread Miklosovic, Stefan

I think we might wait for Accord and transactional metadata as the last big 
contributions in 5.0 (if I have not forgotten something) and then we can just 
polish it all just before the release. There will be still some room to do the 
housekeeping like this after these patches lend. It is not like Accord will be 
in trunk on Monday and we release Tuesday ...


From: Maxim Muzafarov 
Sent: Monday, July 31, 2023 23:05
To: dev@cassandra.apache.org
Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Hello everyone,


It's been a long time since the last discussion about the import order
code style, so I want to give these changes a chance as all the major
JIRA issues have already landed on the release branch so we won't
affect anyone. I'd be happy to find any reviewers who are interested
in helping with the next steps :-) I've updated the changes to reflect
the latest checkstyle work, so here they are:

https://issues.apache.org/jira/browse/CASSANDRA-17925
https://github.com/apache/cassandra/pull/2108


The changes look scary at first glance, but they're actually quite
simple and in line with what we've discussed above. In short, we can
divide all the affected files into two parts: the update of the code
style configuration files (checkstyle + IDE configs), and the update
of all the sources to match the code style.

In short:

- "import order" hotkey will work regardless of which IDE you are using;
- updated checkstyle configuration, and IDEA, Eclipse, NetBeans
configurations have been updated;
- AvoidStarImport checkstyle rule applied as well;

The import order we've agreed upon:

java.*
[blank line]
javax.*
[blank line]
com.*
[blank line]
net.*
[blank line]
org.*
[blank line]
org.apache.cassandra.*
[blank line]
all other imports
[blank line]
static all other imports

On Mon, 27 Feb 2023 at 13:26, Maxim Muzafarov  wrote:
>
> > I suppose it can be easy for the existing feature branches if they have a 
> > single commit. Don't we need to adjust each commit for multi-commit feature 
> > branches?
>
> It depends on how feature branches are maintained and developed, I
> guess. My thoughts here are that the IDE's hotkeys should just work to
> resolve any code-style issues that arise during rebase/maintenance.
> I'm not talking about enforcing all our code-style rules but giving
> developers good flexibility. The classes import order rule might be a
> good example here.
>
> On Wed, 22 Feb 2023 at 21:27, Jacek Lewandowski
>  wrote:
> >
> > I suppose it can be easy for the existing feature branches if they have a 
> > single commit. Don't we need to adjust each commit for multi-commit feature 
> > branches?
> >
> > śr., 22 lut 2023, 19:48 użytkownik Maxim Muzafarov  
> > napisał:
> >>
> >> Hello everyone,
> >>
> >> I have created an issue CASSANDRA-18277 that may help us move forward
> >> with code style changes. It only affects the way we store the IntelliJ
> >> code style configuration and has no effect on any current (or any)
> >> releases, so it should be safe to merge. So, once the issue is
> >> resolved, every developer that checkouts a release branch will use the
> >> same code style stored in that branch. This in turn makes rebasing a
> >> big change like the import order [1] a really straightforward matter
> >> (by pressing Crtl + Opt + O in their local branch to organize
> >> imports).
> >>
> >> See:
> >>
> >> Move the IntelliJ Idea code style and inspections configuration to the
> >> project's root .idea directory
> >> https://issues.apache.org/jira/browse/CASSANDRA-18277
> >>
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/CASSANDRA-17925
> >>
> >> On Wed, 25 Jan 2023 at 13:05, Miklosovic, Stefan
> >>  wrote:
> >> >
> >> > Thank you Maxim for doing this.
> >> >
> >> > It is nice to see this effort materialized in a PR.
> >> >
> >> > I would wait until bigger chunks of work are committed to trunk (like 
> >> > CEP-15) to not collide too much. I would say we can postpone doing this 
> >> > until the actual 5.0 release, last weeks before it so we would not clash 
> >> > with any work people would like to include in 5.0. This can go in 
> >> > anytime, basically.
> >> >
> >> > Are people on the same page?
> >> >
> >> > Regards
> >> >
> >> >

Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-27 Thread Miklosovic, Stefan

I plan to run few builds with Corretto etc and do few manual tests, I will try 
to push that as soon as possible but it might happen it will leak to the next 
week.

From: Mick Semb Wever 
Sent: Thursday, July 27, 2023 0:27
To: dev
Subject: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

The previous thread¹ on when to freeze 5.0 landed on freezing the first week of 
August, with a waiver in place for TCM and Accord to land later (but before 
October).

With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work that 
hasn't landed is Vector search (CEP-30).

Are there any objections to a waiver on Vector search?  All the groundwork: SAI 
and the vector type; has been merged, with all remaining work expected to land 
in August.

I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 and 
a long list of flakies.  It takes time and patience to triage and identify the 
bugs that hit us before GA.  The freeze is about being "mostly feature 
complete",  so we have room for things before our first beta (precedence is to 
ask).   If we hope for a GA by December, account for the 6 weeks turnaround 
time for cutting and voting on one alpha, one beta, and one rc release, and the 
quiet period that August is, we really only have September and October left.

I already feel this is asking a bit of a miracle from us given how 4.1 went 
(and I'm hoping I will be proven wrong).

In addition, are there any objections to cutting an 5.0-alpha1 release as soon 
as we freeze?

This is on the understanding vector, tcm and accord will become available in 
later alphas.  Originally the discussion¹ was waiting for Accord for alpha1, 
but a number of folk off-list have requested earlier alphas to help with 
testing.

¹) 
https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan

We can make it opt-in, wait one major to see what bugs pop up and we might do 
that opt-out eventually. We do not need to hurry up with this. I understand 
everybody's expectations and excitement but it really boils down to one line 
change in yaml. People who are so much after the performance will be definitely 
aware of this knob to turn on to squeeze even more perf ...

I look around dtests Jeremiah mentioned but I would just moved on and make it 
opt-in if we are not 100% persuaded about it _yet_.

From: Mick Semb Wever 
Sent: Wednesday, July 26, 2023 20:48
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

What comes to mind is how we brought down people clusters and made sstables 
unreadable with the introduction of the chunk_length configuration in 1.0.  It 
wasn't about how tested the compression libraries were, but about the new 
configuration itself.  Introducing silent defaults has more surface area for 
bugs than introducing explicit defaults that only apply to new clusters and are 
so opt-in for existing clusters.

On Wed, 26 Jul 2023 at 20:13, J. D. Jordan 
mailto:jeremiah.jor...@gmail.com>> wrote:
Enabling ssl for the upgrade dtests would cover this use case. If those don’t 
currently exist I see no reason it won’t work so I would be fine for someone to 
figure it out post merge if there is a concern.  What JCE provider you use 
should have no upgrade concerns.

-Jeremiah

> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
>
> Am I understanding it correctly that tests you are talking about are only 
> required in case we make ACCP to be default provider?
>
> I can live with not making it default and still deliver it if tests are not 
> required. I do not think that these kind of tests were required couple mails 
> ago when opt-in was on the table.
>
> While I tend to agree with people here who seem to consider testing this 
> scenario to be unnecessary exercise, I am afraid that I will not be able to 
> deliver that as testing something like this is quite complicated matter. 
> There is a lot of aspects which could be tested I can not even enumerate 
> right now ... so I try to meet you somewhere in the middle.
>
> 
> From: Mick Semb Wever mailto:m...@apache.org>>
> Sent: Wednesday, July 26, 2023 17:34
> To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
>
>
>
>
>
> Can you say more about the shape of your concern?
>
>
> Integration testing where some nodes are running JCE and others accp, and 
> various configurations that are and are not accp compatible/native.
>
> I'm not referring to (re-) unit testing accp or jce themselves, or matrix 
> testing over them, but our commitment to always-on upgrades against all 
> possible configurations that integrate.  We've history with config changes 
> breaking upgrades, for as simple as they are.

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan

Am I understanding it correctly that tests you are talking about are only 
required in case we make ACCP to be default provider?

I can live with not making it default and still deliver it if tests are not 
required. I do not think that these kind of tests were required couple mails 
ago when opt-in was on the table.

While I tend to agree with people here who seem to consider testing this 
scenario to be unnecessary exercise, I am afraid that I will not be able to 
deliver that as testing something like this is quite complicated matter. There 
is a lot of aspects which could be tested I can not even enumerate right now 
... so I try to meet you somewhere in the middle.


From: Mick Semb Wever 
Sent: Wednesday, July 26, 2023 17:34
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





Can you say more about the shape of your concern?


Integration testing where some nodes are running JCE and others accp, and 
various configurations that are and are not accp compatible/native.

I'm not referring to (re-) unit testing accp or jce themselves, or matrix 
testing over them, but our commitment to always-on upgrades against all 
possible configurations that integrate.  We've history with config changes 
breaking upgrades, for as simple as they are.

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan

Yes, you are right. I know the providers have their preference and we are 
installing Corretto as the first one.

So if a service is not there it will just search where it is next. I completely 
forgot this aspect of it ... Folks from Corretto forgot to mention this 
behavior as well, interesting. It is not as we are going to use this _as the 
only provider_.

In that case I think we can set it as default.

We just need to be cautious to not use e.g Cipher.getInstance("algorithm", 
"provider") - provider being "AmazonCorrettoCryptoProvider" or anything like 
that. In other words, as long as we are not specifying a concrete provider to 
get an instance from, we should be safe. I looked over the codebase and we are 
not using it anywhere.


From: J. D. Jordan 
Sent: Wednesday, July 26, 2023 14:32
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I thought the crypto providers were supposed to “ask the next one down the 
line” if something is not supported?  Have you tried some unsupported thing and 
seen it break?  My understanding of the providers being an ordered list was 
that isn’t supposed to happen.

-Jeremiah

On Jul 26, 2023, at 3:23 AM, Mick Semb Wever  wrote:






That means that if somebody is on 4.0 and they upgrade to 5.0, if they use some 
ciphers / protocols / algorithms which are not in Corretto, it might break 
their upgrade.



If there's any risk of breaking upgrades we have to go with (2).  We support a 
variation of JCE configurations, and I don't see we have the test coverage in 
place to de-risk it other than going with (2).

Once the yaml configuration is in place we can then change the default in the 
next major version 6.0.

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Miklosovic, Stefan

Hi,

we need to be on the same page here and this is crucial to get right.

We evaluated that Corretto is a subset of what is in SunJCE provider (bundled 
in JRE). It is not true that Corretto is just "a drop-in replacement". That 
means that if somebody is on 4.0 and they upgrade to 5.0, if they use some 
ciphers / protocols / algorithms which are not in Corretto, it might break 
their upgrade.

I asked Corretto team here (1) and they told that is truly a subset of what is 
in JCE and the diff is relatively large. There is also enumeration of all 
services in Corretto and default provider so we can see the difference.

On the other hand, they say that services which are considered "weak" are not 
there so by moving to Corretto, we are actually making Cassandra safer but as I 
mentioned the cost is that we will drop the support of all other stuff and we 
might break things.

So, with all this information we have two choices:

1) to make Corretto default and make it opt-out
2) to not make Corretto default and make it opt-in

Jordan's opinion is added as the last comment in (2)

What is the preference of the community? We need to be sure we are aligned here.

(1) https://github.com/corretto/amazon-corretto-crypto-provider/issues/315
(2) https://issues.apache.org/jira/browse/CASSANDRA-18624

________
From: Miklosovic, Stefan 
Sent: Friday, July 21, 2023 18:17
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




We gave it the second look and I came up with this (1)

In a nutshell, we download both arch libs to libs/corretto and then 
cassandra.in.sh will dynamically resolve the architecture and OS. Based on 
that, it will add respective jar to the class path. If it went wrong and it is 
not added to CP, we just skip the installation / healthchecks as if nothing 
happened (by default).

We are also adding the dependency to Maven's pom.xml based on the architecture 
the build is invoked on so there is a possibility to create 
architecture-specific artifact. This is achieved by Maven profiles which are 
activated based on what architecture it is run.

Hence, we covered both aspects, Maven build / dependencies as well as runtime 
library resolution.

There is also flag added, "fail_on_missing_provider", which is by default 
false, if set to true, in case it was not on CP or if we by mistake installed 
different architecture, it will fail the startup.

We could definitely use some review here, especially from people who run on ARM 
so we are sure that it works there as well as intended.

(1) https://github.com/apache/cassandra/pull/2505/files


From: Mick Semb Wever 
Sent: Friday, July 21, 2023 7:18
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



As I am on x86 and I wanted to simulate what would happen to users on ARM, I 
just did it other way around - I introduced the dependency with classifier 
linux-aarch_64.

…
Surprisingly, the installation step succeeded on x86 even the dependency was 
for aarch. However, the startup check went to else branch (2) and I saw that 
the provider was not Corretto provider but the default - SunJCE. So that tells 
me that it basically falls back to the default which is what we want.


I raised concerns about this because we have no other dependencies that use the 
classifier in the pom file to bind us to a particular arch.  The loading of the 
native code isn't my concern.

I'm uneasy (without further investigation) with publishing cassandra pom files 
that classify us to " x86_64".  For example, how the jar files differ between 
classifiers for this project.

I'm also curious if there's a way to bundle the native files for all arch, like 
we do for other libraries, with runtime just loading what's correct.

[RELEASE] Apache Cassandra 4.1.3 released

2023-07-24 Thread Miklosovic, Stefan

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.1.3.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.1 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/41x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-4.1.3/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-4.1.3/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RESULT][VOTE] Release Apache Cassandra 4.1.3

2023-07-24 Thread Miklosovic, Stefan

The vote passes with three binding and one non-binding +1s.

https://lists.apache.org/thread/8ot3wjc88k0rhx1m9m58k0bp4msbjw6w

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-21 Thread Miklosovic, Stefan

We gave it the second look and I came up with this (1)

In a nutshell, we download both arch libs to libs/corretto and then 
cassandra.in.sh will dynamically resolve the architecture and OS. Based on 
that, it will add respective jar to the class path. If it went wrong and it is 
not added to CP, we just skip the installation / healthchecks as if nothing 
happened (by default).

We are also adding the dependency to Maven's pom.xml based on the architecture 
the build is invoked on so there is a possibility to create 
architecture-specific artifact. This is achieved by Maven profiles which are 
activated based on what architecture it is run.

Hence, we covered both aspects, Maven build / dependencies as well as runtime 
library resolution.

There is also flag added, "fail_on_missing_provider", which is by default 
false, if set to true, in case it was not on CP or if we by mistake installed 
different architecture, it will fail the startup.

We could definitely use some review here, especially from people who run on ARM 
so we are sure that it works there as well as intended.

(1) https://github.com/apache/cassandra/pull/2505/files


From: Mick Semb Wever 
Sent: Friday, July 21, 2023 7:18
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



As I am on x86 and I wanted to simulate what would happen to users on ARM, I 
just did it other way around - I introduced the dependency with classifier 
linux-aarch_64.

…
Surprisingly, the installation step succeeded on x86 even the dependency was 
for aarch. However, the startup check went to else branch (2) and I saw that 
the provider was not Corretto provider but the default - SunJCE. So that tells 
me that it basically falls back to the default which is what we want.


I raised concerns about this because we have no other dependencies that use the 
classifier in the pom file to bind us to a particular arch.  The loading of the 
native code isn't my concern.

I'm uneasy (without further investigation) with publishing cassandra pom files 
that classify us to " x86_64".  For example, how the jar files differ between 
classifiers for this project.

I'm also curious if there's a way to bundle the native files for all arch, like 
we do for other libraries, with runtime just loading what's correct.

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-20 Thread Miklosovic, Stefan

Thank you all for your opinions. Very appreciated.

I finally got some time to play with the patch of Ayushi Singh.

As I am on x86 and I wanted to simulate what would happen to users on ARM, I 
just did it other way around - I introduced the dependency with classifier 
linux-aarch_64.

The whole setup of that crypto provider consists of two steps. The first is the 
"installation". The second the is the "verification / check" that it is 
installed correctly by performing a "health check".

Surprisingly, the installation step succeeded on x86 even the dependency was 
for aarch. However, the startup check went to else branch (2) and I saw that 
the provider was not Corretto provider but the default - SunJCE. So that tells 
me that it basically falls back to the default which is what we want.

I think this might work, if it is available, it will use it, if not, we emit a 
big fat warning.

We could introduce a flag into crypto_provider's "parameters" (as it is 
configured by ParameterizedClass) which would fail the startup if it is not 
installed and by default it would be turned off so for people on ARM it would 
just emit warning.

1) 
https://github.com/apache/cassandra/blob/9b0b0f03f97f0bb1d7c0295cdfa3b3da80d1c3b8/src/java/org/apache/cassandra/security/DefaultCryptoProvider.java
2) 
https://github.com/apache/cassandra/blob/9b0b0f03f97f0bb1d7c0295cdfa3b3da80d1c3b8/src/java/org/apache/cassandra/security/DefaultCryptoProvider.java#L64-L70

From: Abe Ratnofsky 
Sent: Thursday, July 20, 2023 23:59
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

This feels analogous to other past discussions around prioritizing a config 
that enables new users to clone + build + run as easily as possible, vs. having 
better prod recommendations out of the box.

Both are important. I personally think we should default configuration to make 
it just work for new users, and have a config to allow power users to fail if 
ACCP is not present, and warn once on startup if we tolerate missing ACCP but 
detect it is absent.

> On Jul 20, 2023, at 14:51, Brandon Williams  wrote:
>
> I think we could special-case and default to 'auto' but allow other
> more explicit options.
>
> Kind Regards,
> Brandon
>
>> On Thu, Jul 20, 2023 at 4:18 PM German Eichberger via dev
>>  wrote:
>>
>> In general I agree with Joey -- but I would prefer if this behavior is 
>> configurable, e.g. there is an option to get a startup failure if the 
>> configured fastest provider can't run for any reason to avoid a "silent" 
>> performance degradation as Jordan was experiencing.
>>
>> Thanks,
>> German
>>
>> 
>> From: Joseph Lynch 
>> Sent: Thursday, July 20, 2023 7:38 AM
>> To: dev@cassandra.apache.org 
>> Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default
>>
>> Having native dependencies shouldn't make the project x86 only, it
>> should just accelerate the performance on x86 when available. Can't we
>> just try to load the fastest available provider (so arm will use
>> native java but x86 will use proper hardware acceleration) and failing
>> that fall-back to the default? If I recall correctly from the
>> messaging service patches (and zstd/lz4) it's reasonably
>> straightforward to try to load native code and then fail-back if you
>> fail.
>>
>> -Joey
>>
>>> On Thu, Jul 20, 2023 at 10:27 AM J. D. Jordan  
>>> wrote:
>>>
>>> Maybe we could start providing Dockerfile’s and/or make arch specific 
>>> rpm/deb packages that have everything setup correctly per architecture?
>>> We could also download them all and have the startup scripts put stuff in 
>>> the right places depending on the arch of the machine running them?
>>> I feel like there are probably multiple ways we could solve this without 
>>> requiring users to jump through a bunch of hoops?
>>> But I do agree we can’t make the project x86 only.
>>>
>>> -Jeremiah
>>>
>>>> On Jul 20, 2023, at 2:01 AM, Miklosovic, Stefan 
>>>>  wrote:
>>>>
>>>> Hi,
>>>>
>>>> as I was reviewing the patch for this feature (1), we realized that it is 
>>>> not quite easy to bundle this directly into Cassandra.
>>>>
>>>> The problem is that this was supposed to be introduced as a new dependency:
>>>>
>>>> 
>>>>   software.amazon.cryptool

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-20 Thread Miklosovic, Stefan

Hi,

as I was reviewing the patch for this feature (1), we realized that it is not 
quite easy to bundle this directly into Cassandra.

The problem is that this was supposed to be introduced as a new dependency:


software.amazon.cryptools
AmazonCorrettoCryptoProvider
2.2.0
linux-x86_64


Notice "classifier". That means that if we introduced this dependency into the 
project, what about ARM users? (there is corresponding aarch classifier as 
well). ACCP is platform-specific but we have to ship Cassandra 
platform-agnostic. It just needs to run OOTB everywhere. If we shipped that 
with x86 and a user runs Cassandra on ARM, I guess that would break things, 
right?

We also can not just add both dependencies (both x86 and aarch) because how 
would we differentiate between them in runtime? That all is just too tricky / 
error prone.

So, the approach we want to take is this:

1) nothing will be bundled in Cassandra by default
2) a user is supposed to download the library and put it to the class path
3) a user is supposed to put the implementation of ICryptoProvider interface 
Cassandra exposes to the class path
3) a user is supposed to configure cassandra.yaml and its section 
"crypto_provider" to reference the implementation he wants

That way, we avoid the situation when somebody runs x86 lib on ARM or vice 
versa.

By default, NoOpProvider will be used, that means that the default crypto 
provider from JRE will be used.

It can seem like we have not done too much progress here but hey ... we opened 
the project to the custom implementations of crypto providers a community can 
create. E.g. as 3rd party extensions etc ...

I want to be sure that everybody is aware of this change (that we plan to do 
that in such a way that it will not be "bundled") and that everybody is on 
board with this. Otherwise I am all ears about how to do that differently.

(1) https://issues.apache.org/jira/browse/CASSANDRA-18624


From: German Eichberger via dev 
Sent: Friday, June 23, 2023 22:43
To: dev
Subject: Re: [DISCUSS] Using ACCP or tc-native by default

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



+1 to ACCP - we love performance.

From: David Capwell 
Sent: Thursday, June 22, 2023 4:21 PM
To: dev 
Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default

+1 to ACCP

On Jun 22, 2023, at 3:05 PM, C. Scott Andreas  wrote:

+1 for ACCP and can attest to its results. ACCP also optimizes for a range of 
hash functions and other cryptographic primitives beyond TLS acceleration for 
Netty.

On Jun 22, 2023, at 2:07 PM, Jeff Jirsa  wrote:


Either would be better than today.

On Thu, Jun 22, 2023 at 1:57 PM Jordan West 
mailto:jw...@apache.org>> wrote:
Hi,

I’m wondering if there is appetite to change the default SSL provider for 
Cassandra going forward to either ACCP [1] or tc-native in Netty? Our 
deployment as well as others I’m aware of make this change in their fork and it 
can lead to significant performance improvement. When recently qualifying 4.1 
without using ACCP (by accident) we noticed p99 latencies were 2x higher than 
3.0 w/ ACCP. Wiring up ACCP can be a bit of a pain and also requires some 
amount of customization. I think it could be great for the wider community to 
adopt it.

The biggest hurdle I foresee is licensing but ACCP is Apache 2.0 licensed. 
Anything else I am missing before opening a JIRA and submitting a patch?

Jordan


[1]
https://github.com/corretto/amazon-corretto-crypto-provider

[VOTE] Release Apache Cassandra 4.1.3

2023-07-19 Thread Miklosovic, Stefan

Proposing the test build of Cassandra 4.1.3 for release.

sha1: 2a4cd36475de3eb47207cd88d2d472b876c6816d
Git: https://github.com/apache/cassandra/tree/4.1.3-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1304/org/apache/cassandra/cassandra-all/4.1.3/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.1.3/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/4.1.3-tentative/CHANGES.txt
[2]: NEWS.txt: https://github.com/apache/cassandra/blob/4.1.3-tentative/NEWS.txt

[ANNOUNCE] Apache Cassandra 4.1.3 test artifact available

2023-07-18 Thread Miklosovic, Stefan

The test build of Cassandra 4.1.3 is available.

sha1: 2a4cd36475de3eb47207cd88d2d472b876c6816d
Git: https://github.com/apache/cassandra/tree/4.1.3-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1304/org/apache/cassandra/cassandra-all/4.1.3/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.1.3/

A vote of this test build will be initiated within the next couple of days.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/4.1.3-tentative/CHANGES.txt
[2]: NEWS.txt: https://github.com/apache/cassandra/blob/4.1.3-tentative/NEWS.txt

[RELEASE] Apache Cassandra 4.0.11 released

2023-07-18 Thread Miklosovic, Stefan

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 4.0.11.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.0 series. As always, please pay 
attention to the release notes[2] and let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/40x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-4.0.11/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-4.0.11/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RESULT][VOTE] Release Apache Cassandra 4.0.11

2023-07-18 Thread Miklosovic, Stefan

The vote passes with three binding +1s.

https://lists.apache.org/thread/hd4lncvnqz8f5c0f2wfv9o2flk02loq2

Re: Changing the output of tooling between majors

2023-07-13 Thread Miklosovic, Stefan

I am starting to slowly but surely share same opinion. Maybe we should just 
stop advancing this discussion altogether and we should rather focus on a way 
how to implement alternative output to nodetool etc. We probably need to do 
this command by command.

From: Eric Evans 
Sent: Thursday, July 13, 2023 17:09
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

You don't often get email from eev...@wikimedia.org. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Thu, Jul 13, 2023 at 9:44 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
For example Dinesh said this:

"Until nodetool can support JSON as output format for all interaction and there 
is a significant adoption in the user community, I would strongly advise 
against making breaking changes to the CLI output."

That is where I get the need to have a JSON output in order to fix a typo from. 
That is if we look at fixing a typo as a breaking change. Which I would say it 
is as if somebody is "greping" it and it is not there, it will break.

Do you understand that the same way or am I interpreting that wrong?

I interpreted this to mean: If we want to get to a place where we can freely 
edit human-readable output formats, we should provide stable alternatives.

From: C. Scott Andreas mailto:sc...@paradoxica.net>>
Sent: Thursday, July 13, 2023 16:35
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Cc: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

> "From what I see you guys want to condition any change by offering json/yaml 
> as well."

I don't think I've seen a proposal to block changes to nodetool output on 
machine-parseable formats in this thread.

Additions of new delimited fields to nodetool output are mostly 
straightforward. Changes to fields that exist today are likely to cause 
problems - as Josh mentions. These seem best to take on a case-by-case basis 
rather than trying to hammer out an abstract policy. What changes would you 
like to make?

I do think we will have difficulty evolving output formats of text-based 
Cassandra tooling until we offer machine-parseable output formats.

– Scott

On Jul 13, 2023, at 6:39 AM, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

I just find it ridiculous we can not change "someProperty: 10" to "Some 
Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a compiler that won't let you build if you're missing a ;

We can change it, but that doesn't mean the aggregate cost/benefit across our 
entire ecosystem is worth it. The value of correcting a typo is pretty small, 
and the cost for everyone downstream is not. This is why we should spellcheck 
things in API's before we release them. :)

On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote:
Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second paragraph.

I think we are on the same page afterall. I definitely understand that people 
are depending on this output and we need to be careful. That is why I propose 
to change it only each major. What I feel is that everybody's usage / 
expectations is little bit different and outputs of the commands are very 
diverse and it is hard to balance this so everybody is happy.

I am trying to come up with a solution which would not change the most 
important commands unnecessarily while also having some free room to tweak the 
existing commands where we see it appropriate. I just find it ridiculous we can 
not change "someProperty: 10" to "Some Property: 10" and there is so much red 
tape about that.

If I had to summarize this whole discussion, the best conclustion I can think 
of is to not change what is used the most (this would probably need to be 
defined more explicitly) and if we have to change something else we better 
document that extensively and provide json/yaml for people to be able to 
divorce from the parsing of human-readable format (which probably all agree 
should not happen in the first place).

What I am afraid of is that in order to satisfy these conditions, if, for 
example, we just want to fix a typo or the format of a key of some value, the 
we would need to deliver JSON/YAML format as well if there is not any yet and 
that would mean that the change of such triviality would require way more work 
in terms

Re: Changing the output of tooling between majors

2023-07-13 Thread Miklosovic, Stefan

"Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to naming/meaning/"

That is 100% correct. So by that logic, changing the output which you grep on 
to something else will break your scripts if you expect it there.

For example, take sstablemetadata command - I know it is not nodetool but it 
does not matter. This is just an example. Same "problem" can be found in 
nodetool probably, sstablemetadata just came to my mind first as that is what I 
hit recently.

sstablemetadata write this:

Repaired at: 0
Originating host id: d2d12c56-7d9c-49a7-aaef-05bd2633b09e
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1689261027905, 
position=59450)=CommitLogPosition(segmentId=1689261027905, position=60508)}
totalColumnsSet: 0
totalRows: 1
Estimated tombstone drop times:


Do you see "totalColumsSet" and "totalRows" when all other keys in that ouput 
(in whole command) are following different format? In this case, it should be 
"Total columns set" and "Total rows".

So when we change it to that, anybody who is grepping "totalRows" will have no 
output. That is a breaking change to me. His script stopped to work.

You are correct and I agree with you completely that STRICT ADDITIONS (what I 
was suggesting) are fine because we are not breaking anything to anybody.

So here, if I want to change this, by what Dinesh says, (we change the naming 
and we break it), I need to offer JSON / YAML alternative to what 
sstablemetadata prints currently. (might be as well nodetool, just an example).


From: C. Scott Andreas 
Sent: Thursday, July 13, 2023 17:01
To: dev@cassandra.apache.org
Cc: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to 
naming/meaning/position of existing fields vs. adding new ones). I don't read 
his message as saying that any change to nodetool output is conditional on 
offering a JSON/YAML representation, though.

What are some changes that you'd like to make?

– Scott

On Jul 13, 2023, at 7:44 AM, "Miklosovic, Stefan" 
 wrote:


For example Dinesh said this:

"Until nodetool can support JSON as output format for all interaction and there 
is a significant adoption in the user community, I would strongly advise 
against making breaking changes to the CLI output."

That is where I get the need to have a JSON output in order to fix a typo from. 
That is if we look at fixing a typo as a breaking change. Which I would say it 
is as if somebody is "greping" it and it is not there, it will break.

Do you understand that the same way or am I interpreting that wrong?


From: C. Scott Andreas 
Sent: Thursday, July 13, 2023 16:35
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Cc: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



"From what I see you guys want to condition any change by offering json/yaml as 
well."

I don't think I've seen a proposal to block changes to nodetool output on 
machine-parseable formats in this thread.

Additions of new delimited fields to nodetool output are mostly 
straightforward. Changes to fields that exist today are likely to cause 
problems - as Josh mentions. These seem best to take on a case-by-case basis 
rather than trying to hammer out an abstract policy. What changes would you 
like to make?

I do think we will have difficulty evolving output formats of text-based 
Cassandra tooling until we offer machine-parseable output formats.

– Scott

On Jul 13, 2023, at 6:39 AM, Josh McKenzie  wrote:


I just find it ridiculous we can not change "someProperty: 10" to "Some 
Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a compiler that won't let you build if you're missing a ;

We can change it, but that doesn't mean the aggregate cost/benefit across our 
entire ecosystem is worth it. The value of correcting a typo is pretty small, 
and the cost for everyone downstream is not. This is why we should spellcheck 
things in API's before we release them. :)

On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote:
Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second para

Re: Changing the output of tooling between majors

2023-07-13 Thread Miklosovic, Stefan

For example Dinesh said this:

"Until nodetool can support JSON as output format for all interaction and there 
is a significant adoption in the user community, I would strongly advise 
against making breaking changes to the CLI output."

That is where I get the need to have a JSON output in order to fix a typo from. 
That is if we look at fixing a typo as a breaking change. Which I would say it 
is as if somebody is "greping" it and it is not there, it will break.

Do you understand that the same way or am I interpreting that wrong?

From: C. Scott Andreas 
Sent: Thursday, July 13, 2023 16:35
To: dev@cassandra.apache.org
Cc: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

> "From what I see you guys want to condition any change by offering json/yaml 
> as well."

I don't think I've seen a proposal to block changes to nodetool output on 
machine-parseable formats in this thread.

Additions of new delimited fields to nodetool output are mostly 
straightforward. Changes to fields that exist today are likely to cause 
problems - as Josh mentions. These seem best to take on a case-by-case basis 
rather than trying to hammer out an abstract policy. What changes would you 
like to make?

I do think we will have difficulty evolving output formats of text-based 
Cassandra tooling until we offer machine-parseable output formats.

– Scott

On Jul 13, 2023, at 6:39 AM, Josh McKenzie  wrote:

I just find it ridiculous we can not change "someProperty: 10" to "Some 
Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a compiler that won't let you build if you're missing a ;

We can change it, but that doesn't mean the aggregate cost/benefit across our 
entire ecosystem is worth it. The value of correcting a typo is pretty small, 
and the cost for everyone downstream is not. This is why we should spellcheck 
things in API's before we release them. :)

On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote:
Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second paragraph.

I think we are on the same page afterall. I definitely understand that people 
are depending on this output and we need to be careful. That is why I propose 
to change it only each major. What I feel is that everybody's usage / 
expectations is little bit different and outputs of the commands are very 
diverse and it is hard to balance this so everybody is happy.

I am trying to come up with a solution which would not change the most 
important commands unnecessarily while also having some free room to tweak the 
existing commands where we see it appropriate. I just find it ridiculous we can 
not change "someProperty: 10" to "Some Property: 10" and there is so much red 
tape about that.

If I had to summarize this whole discussion, the best conclustion I can think 
of is to not change what is used the most (this would probably need to be 
defined more explicitly) and if we have to change something else we better 
document that extensively and provide json/yaml for people to be able to 
divorce from the parsing of human-readable format (which probably all agree 
should not happen in the first place).

What I am afraid of is that in order to satisfy these conditions, if, for 
example, we just want to fix a typo or the format of a key of some value, the 
we would need to deliver JSON/YAML format as well if there is not any yet and 
that would mean that the change of such triviality would require way more work 
in terms of the implementation of JSON/YAML format output. Some commands are 
quite sophisticated and I do not want to be blocked to change a field in 
human-readable out because providing corresponding JSON/YAML format would be 
gigantic portion of the work itself.

From what I see you guys want to condition any change by offering json/yaml as 
well and I dont know if that is just not too much.

From: Eric Evans mailto:eev...@wikimedia.org>>
Sent: Wednesday, July 12, 2023 19:48
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: Changing the output of tooling between majors

You don't often get email from 
eev...@wikimedia.org<mailto:eev...@wikimedia.org>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com><mailto:stefan.mikloso...@netapp.com

[VOTE] Release Apache Cassandra 4.0.11

2023-07-13 Thread Miklosovic, Stefan

Proposing the test build of Cassandra 4.0.11 for release.

sha1: f8584b943e7cd62ed4cb66ead2c9b4a8f1c7f8b5
Git: https://github.com/apache/cassandra/tree/4.0.11-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1303/org/apache/cassandra/cassandra-all/4.0.11/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.0.11/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/NEWS.txt

[ANNOUNCE] Apache Cassandra 4.0.11 test artifact available

2023-07-12 Thread Miklosovic, Stefan

The test build of Cassandra 4.0.11 is available.

sha1: f8584b943e7cd62ed4cb66ead2c9b4a8f1c7f8b5
Git: https://github.com/apache/cassandra/tree/4.0.11-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1303/org/apache/cassandra/cassandra-all/4.0.11/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.0.11/

A vote of this test build will be initiated within the next couple of days.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/4.0.11-tentative/NEWS.txt

Re: Changing the output of tooling between majors

2023-07-12 Thread Miklosovic, Stefan

Eric,

I appreciate your feedback on this, especially more background about where you 
are comming from in the second paragraph.

I think we are on the same page afterall. I definitely understand that people 
are depending on this output and we need to be careful. That is why I propose 
to change it only each major. What I feel is that everybody's usage / 
expectations is little bit different and outputs of the commands are very 
diverse and it is hard to balance this so everybody is happy.

I am trying to come up with a solution which would not change the most 
important commands unnecessarily while also having some free room to tweak the 
existing commands where we see it appropriate. I just find it ridiculous we can 
not change "someProperty: 10" to "Some Property: 10" and there is so much red 
tape about that.

If I had to summarize this whole discussion, the best conclustion I can think 
of is to not change what is used the most (this would probably need to be 
defined more explicitly) and if we have to change something else we better 
document that extensively and provide json/yaml for people to be able to 
divorce from the parsing of human-readable format (which probably all agree 
should not happen in the first place).

What I am afraid of is that in order to satisfy these conditions, if, for 
example, we just want to fix a typo or the format of a key of some value, the 
we would need to deliver JSON/YAML format as well if there is not any yet and 
that would mean that the change of such triviality would require way more work 
in terms of the implementation of JSON/YAML format output. Some commands are 
quite sophisticated and I do not want to be blocked to change a field in 
human-readable out because providing corresponding JSON/YAML format would be 
gigantic portion of the work itself.

From what I see you guys want to condition any change by offering json/yaml as 
well and I dont know if that is just not too much.



From: Eric Evans 
Sent: Wednesday, July 12, 2023 19:48
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

You don't often get email from eev...@wikimedia.org. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.





On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
I agree with Jackson that having a different output format (JSON/YAML) in order 
to be able to change the default output resolves nothing in practice.

As Jackson said, "operators who maintain these scripts aren’t going to re-write 
them just because a better way of doing them is newly available, usually 
they’re too busy with other work and will keep using those old scripts until 
they stop working".

This is true. If this approach is adopted, what will happen in practice is that 
we change the output and we provide a different format and then a user detects 
this change because his scripts changed. As he has existing solution in place 
which parses the text from human-readable output, he will try to fix that, he 
will not suddenly convert all scripting he has to parsing JSON just because we 
added it. Starting with JSON parsing might be done if he has no scripting in 
place yet but then we would not cover already existing deployments.

I think this is quite an extreme conclusion to draw.  If tooling had stable, 
structured output formats, and if we documented an expectation that 
human-readable console output was unstable, then presumably it would be safe to 
assume that any new scripters would avail themselves of the stable formats, or 
expect breakage later.  I think it's also fair to assume that at least some 
people would spend the time to convert their scripts, particularly if forced to 
revisit them (for example, after a breaking change to console output).  As 
someone who manages several large-scale mission-critical Cassandra clusters 
under constrained resources, this is how I would approach it.

TL;DR Don't let perfect by the enemy of 
good<https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>

[ ... ]

For that reason, what we could agree on is that we would never change the 
output for "tier 1" commands and if we ever changed something, it would be 
STRICT ADDITIONS only. In other words, everything it printed, it would continue 
to print that for ever. Only new lines could be introduced. We need to do this 
because Cassandra is evolving over time and we need to keep the output aligned 
as new functionality appears. But the output would be backward compatible. 
Plus, we are talking about majors only.

The only reason we would ever changed the output on "tier 1" commands, if is 
not an addition, is the fix of the typo in the existing

Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-12 Thread Miklosovic, Stefan

CEP is a great idea. The devil is in details and while this looks cool, it will 
definitely not hurt to have the nuances ironed out.

From: Patrick McFadin 
Sent: Tuesday, July 11, 2023 2:24
To: dev@cassandra.apache.org; German Eichberger
Subject: Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the 
release process

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I would say it helps a lot of people. 45k downloads in just last month: 
https://pypistats.org/packages/cqlsh

I feel like a CEP would be in order, along the lines of CEP-8: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

Unless anyone objects, I can help you get the CEP together and we can get a 
vote, then a JIRA in place for any changes in trunk.

Patrick

On Mon, Jul 10, 2023 at 4:58 PM German Eichberger via dev 
mailto:dev@cassandra.apache.org>> wrote:
Same - really appreciate those efforts and also welcome the upstreaming and 
release automation...

German

From: Jeff Widman mailto:j...@jeffwidman.com>>
Sent: Sunday, July 9, 2023 1:44 PM
To: Max C. mailto:mc_cassand...@core43.com>>
Cc: dev@cassandra.apache.org 
mailto:dev@cassandra.apache.org>>; Brad Schoening 
mailto:bscho...@gmail.com>>
Subject: [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as 
part of the release process

You don't often get email from j...@jeffwidman.com. 
Learn why this is important
Thanks Max, always encouraging to hear that the time I spend on open source is 
helping others.

Your use case is very similar to what drove my original desire to get involved 
with the project. Being able to `pip install cqlsh` from a dev machine was so 
much lighter weight than the alternatives.

Anyone else care to weigh in on this?

What are the next steps to move to a decision?

Cheers,
Jeff

On Sat, Jul 8, 2023, 7:23 PM Max C. 
mailto:mc_cassand...@core43.com>> wrote:

As a user, I really appreciate your efforts Jeff & Brad.  I would *love* for 
the C* project to officially support this.

In our environment we have a lot of client machines that all share common NFS 
mounted directories.  It's much easier for us to create a Python virtual 
environment on a file server with the cqlsh PyPI package installed than it is 
to install the Cassandra RPMs on every single machine.  Before I discovered 
your PyPI package, our developers would need to login to  a Cassandra node in 
order to run cqlsh.  The cqlsh PyPI package, however, is in our standard 
"python dev tools" virtual environment -- along with Ansible, black, isort and 
various other Python packages; which means it's accessible to everyone, 
everywhere.

I agree that this should not replace packaging cqlsh in the Cassandra RPM, so 
much provide an additional option for installing cqlsh without the baggage of 
installing the full Cassandra package.

Thanks again for your work Jeff & Brad.

- Max

On 7/6/2023 5:55 PM, Jeff Widman wrote:
Myself and Brad Schoening currently maintain 
https://pypi.org/project/cqlsh/ which 
repackages CQLSH that ships with every Cassandra release.

This way:

  *   anyone who wants a lightweight client to talk to a remote cassandra can 
simply `pip install cqlsh` without having to download the full cassandra 
source, unzip it, etc.
  *   it's very easy for folks to use it as scaffolding in their python 
scripts/tooling since they can simply include it in the list of their required 
dependencies.

We currently handle the packaging by waiting for a release, then manually 
copy/pasting the code out of the cassandra source tree into 
https://github.com/jeffwidman/cqlsh which 
has some additional build/python package configuration files, then using 
standard python tooling to publish to PyPI.

Given that our project is simply a build/packaging project, I wanted to start a 
conversation about upstreaming this into core Cassandra. I realize that 
Cassandra has no interest in maintaining lots of build targets... but given 
that cqlsh is written in Python and publishing to PyPI enables DBA's to share 
more complicated tooling built on top of it this seems like a natural fit for 
core cassandra rather than a standalone project.

Goal:
When a Cassandra release happens, the build/release process automatically 
publishes cqlsh to 
https://pypi.org/project/cqlsh/.

Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There was 
some initial chatter about that in

Re: Changing the output of tooling between majors

2023-07-12 Thread Miklosovic, Stefan

updated over time to improve readability, as that is 
the purpose of those commands. While people script against that output I don’t 
think anyone is going to say it’s an official API, the project also makes no 
public commitment to that either.

If the proposal is to treat Nodetool input and output like a contract/API, it’d 
be great for a formal specification, or at least the documentation to be 
updated to cover what users should expect as output from Nodetool, if the 
project is going to such effort to maintain a specification, why not make it 
official? That way the maintainers of scripts have a fighting chance of finding 
incompatibilities before upgrading their infrastructure and the project could 
make these kinds of changes and provide a mechanism for users to validate.

Currently the argument could be made that there’s no guarantee about Nodetool 
output since it’s not actually written down anywhere official outside the 
codebase.

Isn’t this one of the reasons Cassandra maintains the NEWS and CHANGES files in 
the repo, and follows semantic versioning, to communicate potentially breaking 
changes as clearly as possible? Surely a message like (but with some more 
detail) “Nodetool command x has had its human readable output restructured, 
item y was removed/renamed to z” would suffice.

Not sure if you can deprecate the human readable output without generating a 
lot of noise for the user, and if it’s being parsed by a bash script, the user 
would never see it anyway, but sounds like that’s what the project needs.

To the note about having users migrate over to more machine friendly output 
types (JSON etc), in my experience the operators who maintain these scripts 
aren’t going to re-write them just because a better way of doing them is newly 
available, usually they’re too busy with other work and will keep using those 
old scripts until they stop working, so in my view it’s not really a solution 
to this problem.

Regards,

Jackson

From: Eric Evans 
Date: Tuesday, 11 July 2023 at 4:14 am
To: dev@cassandra.apache.org 
Subject: Re: Changing the output of tooling between majors
You don't often get email from john.eric.ev...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




On Sun, Jul 9, 2023 at 9:10 PM Dinesh Joshi 
mailto:djo...@apache.org>> wrote:
On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:

If we are providing CQL / JSON / YAML for couple years, I do not believe that 
the argument "lets not break it for folks in nodetool" is still relevant. CQL 
output is there from times of 4.0 at least (at least!) and YAML / JSON is also 
not something completely new. It is not like we are suddenly forcing people to 
change their habits, there was enough time to update the stuff to CQL / json / 
yaml etc ...

What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and 
beyond may still use their existing scripts. Therefore keeping things stable is 
important. Until nodetool can support JSON as output format for all interaction 
and there is a significant adoption in the user community, I would strongly 
advise against making breaking changes to the CLI output.

+1

--
Eric Evans
john.eric.ev...@gmail.com<mailto:john.eric.ev...@gmail.com>

Re: Removal of CloudstackSnitch

2023-07-10 Thread Miklosovic, Stefan

OK, thanks all, we will go with 2), we will deprecate it in 5.0 and we remove 
it the next major.


From: Jeff Jirsa 
Sent: Monday, July 10, 2023 18:13
To: dev@cassandra.apache.org
Subject: Re: Removal of CloudstackSnitch

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



+1


On Mon, Jul 10, 2023 at 8:42 AM Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:
2) keep it there in 5.0 but mark it @Deprecated
I'd say Deprecate, log warnings that it's not supported nor maintained and 
people to use it at their own risk, and that it's going to be removed.

That is, assuming the maintenance burden of it isn't high. I assume not since, 
as Brandon said, they're quite pluggable and well modularized.

On Mon, Jul 10, 2023, at 9:57 AM, Brandon Williams wrote:
I agree with Ekaterina, but also want to point out that snitches are
pluggable, so whatever we do should be pretty safe.  If someone
discovers after the removal that they need it, they can just plug it
back in.

Kind Regards,
Brandon

On Mon, Jul 10, 2023 at 8:54 AM Ekaterina Dimitrova
mailto:e.dimitr...@gmail.com>> wrote:
>
> Hi Stefan,
>
> I think we should follow our deprecation rules and deprecate it in 5.0, 
> potentially remove in 6.0. (Deprecate in one major, remove in the next major)
> Maybe the deprecation can come with a note on your findings for the users, 
> just in case someone somewhere uses it and did not follow the user mailing 
> list?
>
> Thank you
> Ekaterina
>
> On Mon, 10 Jul 2023 at 9:47, Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
>>
>> Hi list,
>>
>> I want to ask about the future of CloudstackSnitch.
>>
>> This snitch was added 9 years ago (1). I contacted the original author of 
>> that snitch, Pierre-Yves Ritschard, who is currently CEO of a company he 
>> coded that snitch for.
>>
>> In a nutshell, Pierre answered that he does not think this snitch is 
>> relevant anymore and the company is using different way how to fetch 
>> metadata from a node, rendering CloudstackSnitch, as is, irrelevant for them.
>>
>> I also wrote an email to user ML list (2) about two weeks ago and nobody 
>> answered that they are using it either.
>>
>> The current implementation is using this approach (3) but I think that it is 
>> already obsolete in the snitch because snitch is adding a path to parsed 
>> metadata service IP which is probably not there at all in the default 
>> implementation of Cloudstack data server.
>>
>> What also bothers me is that we, as a community, seem to not be able to test 
>> the functionality of this snitch as I do not know anybody with a Cloudstack 
>> deployment who would be able to test this reliably.
>>
>> For completeness, in (1), Brandon expressed his opinion that unless users 
>> come forward for this snitch, he thinks the retiring it is the best option.
>>
>> For all cloud-based snitches, we did the refactorization of the code in 
>> 16555 an we work on improvement in 18438 which introduces a generic way how 
>> metadata services are called and plugging in custom logic or reusing a 
>> default implementation of a cloud connector is very easy, further making 
>> this snitch less relevant.
>>
>> This being said, should we:
>>
>> 1) remove it in 5.0
>> 2) keep it there in 5.0 but mark it @Deprecated
>> 3) keep it there
>>
>> Regards
>>
>> (1) 
>> https://issues.apache.org/jira/browse/CASSANDRA-7147<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-7147=05%7C01%7CStefan.Miklosovic%40netapp.com%7C5b174e223a4642a9970a08db8160b1ba%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638246024527024745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=pKvx%2FCrDyfDEgcp7cbzL0xFJoyJ%2BMc%2BhFP1S%2BCkA2PM%3D=0>
>> (2) 
>> https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fk4woljlk23m2oylvrbnod6wocno2dlm3=05%7C01%7CStefan.Miklosovic%40netapp.com%7C5b174e223a4642a9970a08db8160b1ba%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638246024527024745%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=ULTydGDtfzeDg2dyBiifY37Edb8I0ujCffmr0%2BLwP%2F8%3D=0>
>> (3) 
>> https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dn

Re: Removal of CloudstackSnitch

2023-07-10 Thread Miklosovic, Stefan

Hey,

should we still keep it around if we are not even sure it still works? As I see 
it, we are not able to verify it works on 5.0 release. What value there is in a 
snitch we do not know is still functioning?

Regards


From: Ekaterina Dimitrova 
Sent: Monday, July 10, 2023 15:54
To: dev@cassandra.apache.org
Subject: Re: Removal of CloudstackSnitch

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi Stefan,

I think we should follow our deprecation rules and deprecate it in 5.0, 
potentially remove in 6.0. (Deprecate in one major, remove in the next major)
Maybe the deprecation can come with a note on your findings for the users, just 
in case someone somewhere uses it and did not follow the user mailing list?

Thank you
Ekaterina

On Mon, 10 Jul 2023 at 9:47, Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
Hi list,

I want to ask about the future of CloudstackSnitch.

This snitch was added 9 years ago (1). I contacted the original author of that 
snitch, Pierre-Yves Ritschard, who is currently CEO of a company he coded that 
snitch for.

In a nutshell, Pierre answered that he does not think this snitch is relevant 
anymore and the company is using different way how to fetch metadata from a 
node, rendering CloudstackSnitch, as is, irrelevant for them.

I also wrote an email to user ML list (2) about two weeks ago and nobody 
answered that they are using it either.

The current implementation is using this approach (3) but I think that it is 
already obsolete in the snitch because snitch is adding a path to parsed 
metadata service IP which is probably not there at all in the default 
implementation of Cloudstack data server.

What also bothers me is that we, as a community, seem to not be able to test 
the functionality of this snitch as I do not know anybody with a Cloudstack 
deployment who would be able to test this reliably.

For completeness, in (1), Brandon expressed his opinion that unless users come 
forward for this snitch, he thinks the retiring it is the best option.

For all cloud-based snitches, we did the refactorization of the code in 16555 
an we work on improvement in 18438 which introduces a generic way how metadata 
services are called and plugging in custom logic or reusing a default 
implementation of a cloud connector is very easy, further making this snitch 
less relevant.

This being said, should we:

1) remove it in 5.0
2) keep it there in 5.0 but mark it @Deprecated
3) keep it there

Regards

(1) 
https://issues.apache.org/jira/browse/CASSANDRA-7147<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-7147=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cd3e27fac4ba3422e1a9d08db814d32d7%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638245940784302755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=4TbU6OYhFyPh2qI5gt7u29xP%2BeWM8n%2B%2FmEp4iUnvnUw%3D=0>
(2) 
https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fk4woljlk23m2oylvrbnod6wocno2dlm3=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cd3e27fac4ba3422e1a9d08db814d32d7%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638245940784302755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=JqpMHE8pAedxA6ulx5I3uNMGI9zwMHxeOUCZCGZ36kk%3D=0>
(3) 
https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dns<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.cloudstack.apache.org%2Fen%2Flatest%2Fadminguide%2Fvirtual_machines%2Fuser-data.html%23determining-the-virtual-router-address-without-dns=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cd3e27fac4ba3422e1a9d08db814d32d7%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638245940784302755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=2BsSpNkcqbKJyBe%2BVFapeAbFuFttQWMqwKH%2FRyef4xg%3D=0>

Removal of CloudstackSnitch

2023-07-10 Thread Miklosovic, Stefan

Hi list,

I want to ask about the future of CloudstackSnitch.

This snitch was added 9 years ago (1). I contacted the original author of that 
snitch, Pierre-Yves Ritschard, who is currently CEO of a company he coded that 
snitch for.

In a nutshell, Pierre answered that he does not think this snitch is relevant 
anymore and the company is using different way how to fetch metadata from a 
node, rendering CloudstackSnitch, as is, irrelevant for them.

I also wrote an email to user ML list (2) about two weeks ago and nobody 
answered that they are using it either.

The current implementation is using this approach (3) but I think that it is 
already obsolete in the snitch because snitch is adding a path to parsed 
metadata service IP which is probably not there at all in the default 
implementation of Cloudstack data server.

What also bothers me is that we, as a community, seem to not be able to test 
the functionality of this snitch as I do not know anybody with a Cloudstack 
deployment who would be able to test this reliably.

For completeness, in (1), Brandon expressed his opinion that unless users come 
forward for this snitch, he thinks the retiring it is the best option.

For all cloud-based snitches, we did the refactorization of the code in 16555 
an we work on improvement in 18438 which introduces a generic way how metadata 
services are called and plugging in custom logic or reusing a default 
implementation of a cloud connector is very easy, further making this snitch 
less relevant.

This being said, should we:

1) remove it in 5.0
2) keep it there in 5.0 but mark it @Deprecated
3) keep it there 

Regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-7147
(2) https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3
(3) 
https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dns

Re: Changing the output of tooling between majors

2023-07-08 Thread Miklosovic, Stefan

If somebody understood my message as I am promoting the removal of all these 
commands for which we have other means of getting the output of, that is not 
the case at all. I do not want to remove any of them.. I am just elaborating on 
"parsing the output of nodetool and problems related to that if it is changed" 
in this particular case.

____
From: Miklosovic, Stefan 
Sent: Saturday, July 8, 2023 17:43
To: dev
Subject: Re: Changing the output of tooling between majors

Thank you, Josh, for your insight.

I think they should not parse that output in the first place. Gradually 
introducing JSON / YAML output formats for nodetool is cool but I think it 
started to happen too late and people were already parsing the raw nodetool 
output and here we are.

I played with nodetool a little bit to see where we are with this, there is 135 
commands in total. We can leave out all "set*" commands, we can not ignore 
"get*" because that is potential output to parse. People just don't parse the 
output of "set*" commands. That is 116 commands. We can also ignore all 
"disable*" and "enable" commands and we are on 98. Then there is the group of 
"invalidate*" commands, we can skip them too, we are on 90, minus help command, 
89.

Now the commands which left can be categorized into two main groups: the 
commands which execute some action and commands which display some statistics 
or state about internals of a Cassandra node.

The first group, "action commands", are again not going to be parsed on the 
output. These are here (1) (I could make some mistakes here and there).

So, the commands we can potentially parse the output of are here (2), there is 
roughly 51 of them.

Some of these commands have their equivalent in system_views vtables, these 
are, if I havent forgotten something

clientstats (system_views.clients)
compactionhistory (system.compaction_history)
compactionstats (system_views.sstable_tasks)
gossipinfo (system_views.gossip_info)
listsnapshots (system_view.snapshots)
tpstats (system_view.thread_pools)

Some of them have already different format of the output supported (JSON or 
YAML), they are:

datapaths
tablestats
tpstats (has also cql table)
compactionhistory (has also cql table)

I would argue that some commands with prefix "status" and "get" can go away too 
because their value is visible in system_views.settings. Some of these settings 
will be even updateable after Maxim's work.

statusbackup incremental_backups
statushandoff hinted_handoff_enabled
getmaxhintwindow max_hint_window
getconcurrentcompactors concurrent_compactors
getconcurrentviewbuilders concurrent_materialized_view_builders
getdefaultrf default_keyspace_rf
gettimeout (this just reflects cassandra.yaml more or less)

Then there is the family of all "get throttle / threshold " etc like this, I am 
lazy to go through them but they are somehow retrievable from CQL 
system_views.settings too.

getbatchlogreplaythrottle
getcolumnindexsize
getcompactionthreshold
getcompactionthroughput
getinterdcstreamthroughput
getsnapshotthrottle
getstreamthroughput

There are commands which just return an integer or there is nothing to change 
about their output / it is just not necessary like:

gettraceprobability
getsstables

So commands which do not have their output equivalent in some cql table or for 
which there is not JSON / YAML format available are

describecluster
describering
failuredetector
gcstats
getauditlog
getauthcacheconfig
getconcurrency
getendpoints
getfullquerylog
getlogginglevels
getseeds
info
listpendinghints
netstats
profileload (replacement of toppartition (which should be removed in 5.0, 
actually))
proxyhistograms
rangekeysample
repair
repair_admin
ring
status
statusautocompaction
statusbinary
statusgossip
tablehistograms
toppartitions
viewbuildstatus

From these, if one asks which ones actually make sense to try to tweak the 
output of, they might be

describecluster
describering
info
listpendinghints
netstats
proxyhistograms
repair_admin (if somebody wants to list stuff in json)
ring
status
tablehistograms
viewbuildstatus

The point I want to make is that I do not think the problem of changing the 
output is too hot. There is basically like 15 at most commands for which the 
output matter because there is not their CQL equivalent or JSON / YAML output.

If we are providing CQL / JSON / YAML for couple years, I do not believe that 
the argument "lets not break it for folks in nodetool" is still relevant. CQL 
output is there from times of 4.0 at least (at least!) and YAML / JSON is also 
not something completely new. It is not like we are suddenly forcing people to 
change their habits, there was enough time to update the stuff to CQL / json / 
yaml etc ...

But really, the question I still don't have an answer for is who is actually 
parsin

Re: Changing the output of tooling between majors

2023-07-08 Thread Miklosovic, Stefan

nce there is, we are free to change the default output however we want.
One thing I always try to keep in mind on discussions like this. A thought 
experiment (with very hand-wavy numbers; try not to get hung up on them):

* Let's say there are 5,000 discrete "users" of C* out there (different groups 
of people using the DB)
* And assume 5% have written some kind of scripting / automation to parse our 
tooling output (250)
* And let's assume it'd take 18 developer hours (a few days at 6 hours/day) to 
retool to the new output, validate and test correctness, and then roll it out 
to qa, test, validate, and then to prod, test, validate

You're looking at 250 * 18 hours, 4,500 hours, 112.5 40 hour work weeks (2+ 
years for some poor sod without vacations) worth of work from what seems to be 
a simple change.

Now, that estimate could be off by an order of magnitude either way, but the 
motion of the exercise is valuable, I think. There's a real magnified 
downstream cost to our community when we make changes to APIs and we need to 
weigh that against the cost to the project in terms of maintaining those 
interfaces.

The above mental exercise really strongly applies to the periodic discussions 
where we talk about deprecating JMX support.

Not saying we should or shouldn't change things here for the record, just want 
to call this out for anyone that might not have been thinking about things this 
way.

On Fri, Jul 7, 2023, at 3:23 PM, Brandon Williams wrote:
On Fri, Jul 7, 2023 at 2:20 PM Miklosovic, Stefan
mailto:stefan.mikloso...@netapp.com>> wrote:
>
> Great thanks. That might work.
>
> So we do not change the default output unless there is json / yaml equivalent.
>
> Once there is, we are free to change the default output however we want.

Yes, exactly.  Then we have the best of both worlds: programmatic
access that isn't flimsy, and a pretty display however we want it.

Re: Changing the output of tooling between majors

2023-07-07 Thread Miklosovic, Stefan

Great thanks. That might work.

So we do not change the default output unless there is json / yaml equivalent.

Once there is, we are free to change the default output however we want.

From: Brandon Williams 
Sent: Friday, July 7, 2023 21:17
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Fri, Jul 7, 2023 at 2:11 PM Miklosovic, Stefan
 wrote:
>
> Yes, that is true, but the original, unfixed, output, is still there. Are we 
> OK with that?

When we have a serialized output available, we do whatever we like to
the display output.

Re: Changing the output of tooling between majors

2023-07-07 Thread Miklosovic, Stefan

Yes, that is true, but the original, unfixed, output, is still there. Are we OK 
with that?

Now the command "nodetool command" writes this:

someValue: 1
Another Value: 2
The Third Value: 3

You say that, lets add a flag to this too, -j (as in json), so a user will get:

{
"some_value": 1,
"another_value": 2,
"the_third_value": 3
}

Correct?

But the original discrepancy, "someValue" instead of "Some Value", is still 
there.

Is this OK for everybody?

My aim is to fix the original output too and having "-j" flag is just nice to 
have, just another way how to interpret the results. But you mean that we are 
not going to touch "someValue" output ever again?

From: Brandon Williams 
Sent: Friday, July 7, 2023 21:05
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Fri, Jul 7, 2023 at 2:02 PM Miklosovic, Stefan
 wrote:
>
> There is just no clear path how to improve that over time and exposing the 
> same output via different format is not really solving it ... the 
> discrepancies are still there.

I'm not sure what you mean, can you explain?  In my mind, if we have a
serialized output format, we have divorced the display from the data
and so we should be free to modify how we display it all we like after
that point.

Re: Changing the output of tooling between majors

2023-07-07 Thread Miklosovic, Stefan

Thank you Brandon for further clarification of your position on this.

While I get the necessity of being compatible is real, I just find the fact 
that we need to do this across majors to be just too much. Are we all aware 
that if we can not change it, this is just a snowball getting bigger over time? 
After long enough period, it will be so "conserved" that it will be detrimental 
to the usability as it will be also hard to parse just visually.

There is just no clear path how to improve that over time and exposing the same 
output via different format is not really solving it ... the discrepancies are 
still there.

I welcome other people to this thread to tell us about how they are parsing it, 
how frequently, how important is that for them. As I said before, I have never 
met anybody who is parsing this output and it actually matters to them. Do we 
have some proof this is happening in scale?

From: Brandon Williams 
Sent: Friday, July 7, 2023 20:39
To: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Fri, Jul 7, 2023 at 10:21 AM Miklosovic, Stefan
 wrote:
>
> Anyway, the main question here is if we are OK to change the output in majors.

I think we always want to strive for compatibility whenever possible.
My personal litmus test is "can this information be obtained
elsewhere?" and if the answer is no, then the format shouldn't change
as it is very likely to at least cause friction for anyone screen
scraping to get it programmatically.  However, as you mentioned,
adding a serialized format provides another, superior method of
programmatic access, freeing us of the issues with cosmetic changes.

Changing the output of tooling between majors

2023-07-07 Thread Miklosovic, Stefan

Hi list,

I want to clarify the policy we have when we want to / going to change the 
output of the tooling (nodetool or tools/bin etc.).

I am not sure it is written somewhere explicitly, but how I get it from the 
gossip over years is that we should not change the output (e.g. changing the 
name of fields etc) in minors, but for majors (4.0 -> 5.0), this is OK, correct?

For example, when some tool prints this:

thisIsAStatistic: 10

and we see that all other lines in that output print it like this:

This Is Another Statistic: abc

scratching the itch is almost irresistible so we want to change the output to:

This Is a Statistic: 10

This is the natural way how fixes are done. We are improving the output, making 
it consistent etc.

Someone may argue that we are changing "public api" and people are actually 
parsing the output like this and we better not to change it because we might 
break "the scripts" for somebody.

While I get this for minors and it is understandable that minors should be 
same, is this relevant for majors? Because if we care about majors too in this 
situation, how are we supposed to evolve the output over time? Is it supposed 
to be just frozen for ever? I do not buy this argument. For minors, fine. But 
for majors, I do not think so.

I feel like "not break the output because API" is more or less an urban legend 
we keep repeating ourselves. I yet need to meet somebody who is stressing over 
the fact that her output changed *between majors*.

If that is the case, we should start to treat this problem completely 
differently and we should not rely on the output of tooling at all and we 
should either provide corresponding JMX method to retrieve it or we should 
offer other formats tooling prints, like JSON or YAML.

Anyway, the main question here is if we are OK to change the output in majors.

Regards

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-06 Thread Miklosovic, Stefan

Hi Maxim,

I went through the PR and added my comments. I think David also reviewed it. 
All points you mentioned make sense to me but I humbly think it is necessary to 
have at least one additional pair of eyes on this as the patch is relatively 
impactful.

I would like to see additional column in system_views.settings of name 
"mutable" and of type "boolean" to see what field I am actually allowed to 
update as an operator.

It seems to me you agree with the introduction of this column (1) but there is 
no clear agreement where we actually want to put it. You want this whole 
feature to be committed to 4.1 branch as well which is an interesting proposal. 
I was thinking that this work will go to 5.0 only. I am not completely sure it 
is necessary to backport this feature but your argumentation here (2) is worth 
to discuss further.

If we introduce this change to 4.1, that field would not be there but in 5.0 it 
would. So that way we will not introduce any new column to 
system_views.settings.
We could also go with the introduction of this column to 4.1 if people are ok 
with that.

For the simplicity, I am slightly leaning towards introducing this feature to 
5.0 only.

(1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
(2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041


From: Maxim Muzafarov 
Sent: Friday, June 23, 2023 13:50
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change running 
configuration

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Hello everyone,


As there is a lack of feedback for an option to go on with and having
a discussion for pros and cons for each option I tend to agree with
the vision of this problem proposed by David :-) After a lot of
discussion on Slack, we came to the @ValidatedBy annotation which
points to a validation method of a property and this will address all
our concerns and issues with validation.

I'd like to raise the visibility of these changes and try to find one
more committer to look at them:
https://issues.apache.org/jira/browse/CASSANDRA-15254
https://github.com/apache/cassandra/pull/2334/files

I'd really appreciate any kind of review in advance.


Despite the number of changes +2,043 −302 and the fact that most of
these additions are related to the tests themselves, I would like to
highlight the crucial design points which are required to make the
SettingsTable virtual table updatable. Some of these have already been
discussed in this thread, and I would like to provide a brief outline
of these points to facilitate the PR review.

So, what are the problems that have been solved to make the
SettingsTable updatable?

1. Input validation.

Currently, the JMX, Yaml and DatabaseDescriptor#apply methods perform
the same validation of user input for the same property in their own
ways which fortunately results in a consistent configuration state,
but not always. The CASSANDRA-17734 is a good example of this.

The @ValidatedBy annotations, which point to a validation method have
been added to address this particular problem. So, no matter what API
is triggered the method will be called to validate input and will also
work even if the cassandra.yaml is loaded by the yaml engine in a
pre-parse state, such as we are now checking input properties for
deprecation and nullability.

There are two types of validation worth mentioning:
- stateless - properties do not depend on any other configuration;
- stateful - properties that require a fully-constructed Config
instance to be validated and those values depend on other properties;

For the sake of simplicity, the scope of this task will be limited to
dealing with stateless properties only, but stateful validations are
also supported in the initial PR using property change listeners.

2. Property mutability.

There is no way of distinguishing which parts of a property are
mutable and which are not. This meta-information must be available at
runtime and as we discussed earlier the @Mutable annotation is added
to handle this.

3. Listening for property changes.

Some of the internal components e.g. CommitLog, may perform some
operations and/or calculations just before or just after the property
change. As long as JMX is the only API used to update configuration
properties, there is no problem. To address this issue the observer
pattern has been used to maintain the same behaviour.

4. SettingsTable input/output format.

JMX, SettingsTable and Yaml accept values in different formats which
may not be compatible in some of the cases especially when
representing composite objects. The former uses toString() as an
output, and the latter uses a yaml human-readable format.

So, in order to see the same properties in the same format through
different APIs, the Yaml representation is

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-29 Thread Miklosovic, Stefan

Great stuff, Josh! This is what I was talking about when I was mentioning that 
I am super curious about the workflows of other people. Any chance you share 
your setup somewhere so I may try it? Too soon to tell if we indeed want to go 
that direction but trying it out would be great 

From: Josh McKenzie 
Sent: Thursday, June 29, 2023 20:44
To: dev
Subject: Re: [DISCUSS] When to run CheckStyle and other verificiations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

In accord I added an opt-out for each hook, and will require such here as well
On for main branches, off for feature branches seems like it might blanket 
satisfy this concern? Doesn't fix the "--atomic across 5 branches means style 
checks and build on hook across those branches" which isn't ideal. I don't 
think style check failures after push upstream are frequent enough to make the 
cost/benefit there make sense overall are they?

Related to this - I have sonarlint, spotbugs, and checkstyle all running inside 
idea; since pulling those in and tuning the configs a bit I haven't run into a 
single issue w/our checkstyle build target (go figure). Having the required 
style checks reflected realtime inside your work environment goes a long way 
towards making it a more intuitive part of your workflow rather than being an 
annoying last minute block of your ability to progress that requires circling 
back into the code.

>From a technical perspective, it looks like adding a reference 
>"externalDependencies.xml" to our ide/idea directory which we copied over 
>during "generate-idea-files" would be sufficient to get idea to pop up prompts 
>to install those extensions if you don't have them when opening the project 
>(theory; haven't tested).

We'd need to make sure the configuration for each of those was calibrated to 
our project out of the box of course, but making style considerations a 
first-class citizen in that way seems a more intuitive and human-centered 
approach to all this rather than debating nuance of our command-line targets, 
hooks, and how we present things to people. To Berenguer's point - better to 
have these be completely invisible to people with their workflows and Just Work 
(except for when your IDE scolds you for bad behavior w/build errors 
immediately).

I still think Flags Are Bad. :)

On Thu, Jun 29, 2023, at 1:38 PM, Ekaterina Dimitrova wrote:
Should we just keep a consolidated for all kind of checks no-check flag and get 
rid of the no-checkstyle one?

Trading one for one with Josh :-)

Best regards,
Ekaterina

On Thu, 29 Jun 2023 at 10:52, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

I really prefer separate tasks than flags. Flags are not listed in the help 
message like "ant -p" and are not auto-completed in the terminal. That makes 
them almost undiscoverable for newcomers.
Please, no more flags. We are more than flaggy enough right now.

Having to dig through build.xml to determine how to change things or do things 
is painful; the more we can avoid this (for oldtimers and newcomers alike!) the 
better.

On Thu, Jun 29, 2023, at 8:34 AM, Mick Semb Wever wrote:

On Thu, 29 Jun 2023 at 13:30, Jacek Lewandowski 
mailto:lewandowski.ja...@gmail.com>> wrote:
There is another target called "build", which retrieves dependencies, and then 
calls "build-project".

Is it intended to be called by a user ?

If not, please follow the ant style prefixing the target name with an 
underscore (so that it does not appear in the `ant -projecthelp` list).

If possible, I agree with Brandon, `build` is the better name to expose to the 
user.

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-27 Thread Miklosovic, Stefan

I am doing all git-related operations in the console / bash (IDEA terminal, 
alt+f12 in IDEA). I know IDEA can do git stuff as well but I never tried it and 
I just do not care. I just do not "believe it" (yeah, call me old-fashioned if 
you want) so for me how it looks like in IDEA around some checkboxes I have to 
turn off is irrelevant.

I do not like the idea of git hooks. Maybe it is a matter of a strong habit but 
I am executing all these checks before I push anyway so for me the git hooks 
are not important and I would have to unlearn building it if git hook is going 
to do that for me instead.

If I am going to push 5 branches like this:

git push upstream cassandra-3.0 cassandra-3.11 cassandra-4.0 cassandra-4.1 
trunk --atomic

This means that git hooks would start to build 5 branches again? What if 
somebody pushes as I am building it? Building 5 branches from scratch would 
take like 10 minutes, probably ...


From: Jacek Lewandowski 
Sent: Tuesday, June 27, 2023 9:08
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] When to run CheckStyle and other verificiations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



So far, nobody referred to running checks in a pre-push (or pre-commit) hook. 
The use of Git hooks is going to be introduced along with Accord, so we could 
use them to force running checks once before sending changes to the repo.
It would still be an opt-out approach because one would have to add the 
"--no-verify" flag or uncheck a box in the commit dialog to skip running the 
checks.

thanks,
Jacek


wt., 27 cze 2023 o 01:55 Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> napisał(a):
Thank you, Jacek, for starting the thread; those things are essential for 
developer productivity.

I support the idea of opting out vs opting into checks. In my experience, it 
also makes things easier and faster during review time.

If people have to opt-in - it is one more thing for new people to discover, and 
it will probably happen only during review time if they do not have access to 
Jenkins/paid CircleCI, etc.

I also support consolidating all types of checks/analyses and running them 
together.

Maxim’s suggestion about rat replacement sounds like a good improvement that 
can be explored (not part of what Jacek does here, though). Maxim, do you mind 
creating a ticket, please?

Best regards,
Ekaterina

On Mon, 26 Jun 2023 at 17:04, Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
Yes, in this case, opting-out is better than opting-in. I feel like the build 
process is quite versatile and one just picks what is necessary. I never build 
docs, there is a flag for that. I turned off checkstyle because I was fed up 
with that until Berenguer cached it and now I get ant jar with checkstyle like 
under 10 seconds so I leave it on, which is great.

Even though I feel like it is already flexible enough, grouping all checkstyles 
and rats etc under one target seems like a good idea. From my perspective, it 
is "all or nothing" so turning it all off until I am going to push it so I want 
it all on is a good idea. I barely want to "just checkstyle" in the middle of 
the development.

I do not think that having a lot of flags is bad. I like that I have bash 
aliases almost for everything and I bet folks have their tricks to get the 
mundane stuff done.

It would be pretty interesting to know the workflow of other people. I think 
there would be a lot of insights how other people have it on a daily basis when 
it comes to Cassandra development.


From: David Capwell mailto:dcapw...@apple.com>>
Sent: Monday, June 26, 2023 19:57
To: dev
Subject: Re: [DISCUSS] When to run CheckStyle and other verificiations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



not running it automatically with the targets which devs usually run locally.

The checks tend to have an opt-out, such as -Dno-checkstyle=true… so its really 
easy to setup your local environment to opt out what you do not care about… I 
feel we should force people to opt-out rather than opt-in…



On Jun 26, 2023, at 7:47 AM, Jacek Lewandowski 
mailto:lewandowski.ja...@gmail.com>> wrote:

That would work as well Brandon, basically what is proposed in CASSANDRA-18618, 
that is "check" target, actually needs to build the project to perform some 
verifications - I suppose running "ant check" should be sufficient.

- - -- --- -  -
Jacek Lewandowski


pon., 26 cze 2023 o 16:01 Brandon Williams 
mailto:dri...@gmail.com><mailto:dri...@gmail.com<mailto:dri...@gmail.com>>>
 napisał(a):
The "artifacts" task is not

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-26 Thread Miklosovic, Stefan

Yes, in this case, opting-out is better than opting-in. I feel like the build 
process is quite versatile and one just picks what is necessary. I never build 
docs, there is a flag for that. I turned off checkstyle because I was fed up 
with that until Berenguer cached it and now I get ant jar with checkstyle like 
under 10 seconds so I leave it on, which is great.

Even though I feel like it is already flexible enough, grouping all checkstyles 
and rats etc under one target seems like a good idea. From my perspective, it 
is "all or nothing" so turning it all off until I am going to push it so I want 
it all on is a good idea. I barely want to "just checkstyle" in the middle of 
the development.

I do not think that having a lot of flags is bad. I like that I have bash 
aliases almost for everything and I bet folks have their tricks to get the 
mundane stuff done.

It would be pretty interesting to know the workflow of other people. I think 
there would be a lot of insights how other people have it on a daily basis when 
it comes to Cassandra development.


From: David Capwell 
Sent: Monday, June 26, 2023 19:57
To: dev
Subject: Re: [DISCUSS] When to run CheckStyle and other verificiations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



not running it automatically with the targets which devs usually run locally.

The checks tend to have an opt-out, such as -Dno-checkstyle=true… so its really 
easy to setup your local environment to opt out what you do not care about… I 
feel we should force people to opt-out rather than opt-in…



On Jun 26, 2023, at 7:47 AM, Jacek Lewandowski  
wrote:

That would work as well Brandon, basically what is proposed in CASSANDRA-18618, 
that is "check" target, actually needs to build the project to perform some 
verifications - I suppose running "ant check" should be sufficient.

- - -- --- -  -
Jacek Lewandowski


pon., 26 cze 2023 o 16:01 Brandon Williams 
mailto:dri...@gmail.com>> napisał(a):
The "artifacts" task is not quite the same since it builds other things like 
docs, which significantly contributes to longer build time.  I don't see why we 
couldn't add a new task that preserves the old behavior though, "fulljar" or 
something like that.

Kind Regards,
Brandon


On Mon, Jun 26, 2023 at 6:12 AM Jacek Lewandowski 
mailto:lewandowski.ja...@gmail.com>> wrote:
Yes, I've mentioned that there is a property we can set to skip checkstyle.

Currently such a goal is "artifacts" which basically validates everything.


- - -- --- -  -
Jacek Lewandowski


pon., 26 cze 2023 o 13:09 Mike Adamson 
mailto:madam...@datastax.com>> napisał(a):
While I like the idea of this because of added time these checks take, I was 
under the impression that checkstyle (at least) can be disabled with a flag.

If we did do this, would it make sense to have a "release"  or "commit" target 
(or some other name) that ran a full build with all checks that can be used 
prior to pushing changes?

On Mon, 26 Jun 2023 at 08:35, Berenguer Blasi 
mailto:berenguerbl...@gmail.com>> wrote:

I would prefer sthg that is totally transparent to me and not add one more step 
I have to remember. Just to push/run CI to find out I missed it and rinse and 
repeat... With the recent fix to checkstyle I am happy as things stand atm. My 
2cts

On 26/6/23 8:43, Jacek Lewandowski wrote:
Hi,

The context is that we currently have 3 checks in the build:
- Checkstyle,
- Eclipse-Warnings,
- RAT

CheckStyle and RAT are executed with almost every target we run: build, jar, 
test, test-some, testclasslist, etc.; on the other hand, Eclipse-Warnings is 
executed automatically only with the artifacts target.

Checkstyle currently uses some caching, so subsequent reruns without cleaning 
the project validate only the modified files.

Both CI - Jenkins and Circle forces running all checks.

I want to discuss whether you are ok with extracting all checks to their 
distinct target and not running it automatically with the targets which devs 
usually run locally. In particular:


  *   "build", "jar", and all "test" targets would not trigger CheckStyle, RAT 
or Eclipse-Warnings
  *   A new target "check" would trigger all CheckStyle, RAT, and 
Eclipse-Warnings
  *   The new "check" target would be run along with the "artifacts" target on 
Jenkins-CI, and it as a separate build step in CircleCI

The rationale for that change is:

  *   Running all the checks together would be more consistent, but running all 
of them automatically with build and test targets could waste time when we 
develop something locally, frequently rebuilding and running tests.
  *   On the other hand, it would be more consistent if the build did what we 
want - as a dev, when prototyping, I don't want to be forced to run analysis 
(and potentially fix issues) whenever I

Re: Adding wiremock to test dependencies

2023-06-20 Thread Miklosovic, Stefan

I forgot to mention that in the future this will not be used for testing AWS 
IDSv2 only. There are other snitches we have which are also calling similar 
"services" to get some metadata from an instance a node runs in and this 
communication is currently not tested at all. I can pretty much imagine that 
the testing efforts might be expanded to all other snitches as well. 

____
From: Miklosovic, Stefan
Sent: Tuesday, June 20, 2023 13:35
To: dev@cassandra.apache.org
Subject: Adding wiremock to test dependencies

Hi,

we want to introduce wiremock library (1) into the project as a test dependency 
to test CASSANDRA-16555.

In that patch, (wip here (2)), we want to test how would such snitch behave 
based on what Amazon EC2 Identity Service of version 2 returned to that snitch. 
AWS Identity service of version 2 is necessary to call in order to get a token 
with which a snitch is going to get AZ of a node it is called from.

The last comment of mine in (3) elaborates about approaches we were considering 
and mocking http communication / requests with wiremock seems to be like the 
most comfortable and straightforward solution.

Wiremock is Apache licence 2.0 (4) and is well maintained.

Are people OK with us introducing this to the build?

(1) https://wiremock.org/
(2) 
https://github.com/apache/cassandra/pull/2403/files#diff-dc04778c6659040f1c00f37e97a9b1530a532d3d1e3620427bd6628d1b2ec048
(3) https://issues.apache.org/jira/browse/CASSANDRA-16555
(4) https://github.com/wiremock/wiremock/blob/master/LICENSE.txt

Regards

Adding wiremock to test dependencies

2023-06-20 Thread Miklosovic, Stefan

Hi,

we want to introduce wiremock library (1) into the project as a test dependency 
to test CASSANDRA-16555.

In that patch, (wip here (2)), we want to test how would such snitch behave 
based on what Amazon EC2 Identity Service of version 2 returned to that snitch. 
AWS Identity service of version 2 is necessary to call in order to get a token 
with which a snitch is going to get AZ of a node it is called from.

The last comment of mine in (3) elaborates about approaches we were considering 
and mocking http communication / requests with wiremock seems to be like the 
most comfortable and straightforward solution.

Wiremock is Apache licence 2.0 (4) and is well maintained.

Are people OK with us introducing this to the build?

(1) https://wiremock.org/
(2) 
https://github.com/apache/cassandra/pull/2403/files#diff-dc04778c6659040f1c00f37e97a9b1530a532d3d1e3620427bd6628d1b2ec048
(3) https://issues.apache.org/jira/browse/CASSANDRA-16555
(4) https://github.com/wiremock/wiremock/blob/master/LICENSE.txt

Regards

Re: Is simplenative in cassandra-stress still relevant?

2023-05-31 Thread Miklosovic, Stefan

Well this is a completely different kind of discussion, Josh, let's explore it, 
shall we?

I think that Cassandra should have some basic tool available to stress-test 
itself. Why not? I do not want to depend on some 3rd party tools even if they 
might be objectively better. I do not think that the current cassandra-stress 
is completely "useless". It is doing its job, more or less. If a user wants to 
have something more advanced she is welcome to use that but I do not like that 
we are trying to outsource the basic tooling outside of the project.

As I see it, we just spice it up with some tests to be sure that it will not 
break without us knowing it and that's it. The fact that it is not actively 
contributed to does not necessarily make it eligible for deletion as a whole.

Anyway, I am not calling the shots here, if a community decides it has to go so 
it will but I would be said to see it.

Regards

From: Josh McKenzie 
Sent: Wednesday, May 31, 2023 15:15
To: dev
Subject: Re: Is simplenative in cassandra-stress still relevant?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

The main issue I see with maintaining the SimpleClient in cassandra-stress is 
the burden it puts on a user to understand the options available when 
connecting with -mode:
How frequently do we expect users or devs to use the built-in cassandra-stress 
tool? Between tlp-stress and NoSQLBench, it's not clear to me that keeping 
cassandra-stress (which has been largely unmaintained for years as I understand 
it?) is the best option.

On Wed, May 31, 2023, at 9:00 AM, Brad wrote:
We all agree that we're not suggesting removing SimpleClient from Cassandra, 
just from its use in cassandra-stress.

For debugging the native transport protocol, in addition to the standalone Java 
Driver, there are the python drivers and ODBC drivers which can be exercised 
with cqlsh and Intellij respectively.  Are they not sufficient?

The main issue I see with maintaining the SimpleClient in cassandra-stress is 
the burden it puts on a user to understand the options available when 
connecting with -mode:

> cassandra-stress help -mode

Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] 
[password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] 
[protocolVersion=?]

 OR

Usage: -mode simplenative [prepared] cql3 [port=?]

A user trying to determine how to specify credentials for usr/pwd is presented 
with the option to use simplenative and prepared statements (which appear 
broken).  It can lead down a rabbit hole of sparse documentation trying to 
figure out what the simplenative option is, and is better than cql3?

On Wed, May 31, 2023 at 1:58 AM Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> wrote:
Interesting point about the debuggability.

Yes, I agree that SimpleClient (as class) should not be removed because we are 
using it in tests. I have already mentioned in my original e-mail that for this 
reason that class is not going anywhere and we still need to use it.

The cost of keeping it there is not big, sure, but we clearly see that e.g. the 
usage of "prepared" is buggy and it does not work. That somehow indicates to me 
that it kind of atrophied and nobody seems to notice which further supports my 
case that it is actually not used too much if it went undetected for so long.

Anyway, I think that we might just look at that bug with "prepared" and fix it 
and keep it all there. I do not see any tests which would test cassandra-stress 
command, similarly what we have for nodetool in JUnit. We could cover 
cassandra-stress similarly, just to be sure that its invocation on the most 
important commands does not fail over time.

From: Brandon Williams mailto:dri...@gmail.com>>
Sent: Wednesday, May 31, 2023 2:33
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: Is simplenative in cassandra-stress still relevant?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Tue, May 30, 2023 at 7:15 PM Brad 
mailto:bscho...@gmail.com>> wrote:
> If you're performing stress testing, why would you not want to use the 
> official driver?  I've spoken to several people who all have said they've 
> never used simplenative mode.

I agree that it shouldn't be used normally, but I'm not sure we should
remove it, because we can't remove it fully: SimpleClient is still
used in many tests, and I think that should continue.

If you suspect any kind of native proto or driver issue it may be
useful to have another implementation easily accessible to aid in
debugging the problem, and the maintenance cost of keeping it in
str

Re: Is simplenative in cassandra-stress still relevant?

2023-05-30 Thread Miklosovic, Stefan

Interesting point about the debuggability.

Yes, I agree that SimpleClient (as class) should not be removed because we are 
using it in tests. I have already mentioned in my original e-mail that for this 
reason that class is not going anywhere and we still need to use it.

The cost of keeping it there is not big, sure, but we clearly see that e.g. the 
usage of "prepared" is buggy and it does not work. That somehow indicates to me 
that it kind of atrophied and nobody seems to notice which further supports my 
case that it is actually not used too much if it went undetected for so long.

Anyway, I think that we might just look at that bug with "prepared" and fix it 
and keep it all there. I do not see any tests which would test cassandra-stress 
command, similarly what we have for nodetool in JUnit. We could cover 
cassandra-stress similarly, just to be sure that its invocation on the most 
important commands does not fail over time.

From: Brandon Williams 
Sent: Wednesday, May 31, 2023 2:33
To: dev@cassandra.apache.org
Subject: Re: Is simplenative in cassandra-stress still relevant?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

On Tue, May 30, 2023 at 7:15 PM Brad  wrote:
> If you're performing stress testing, why would you not want to use the 
> official driver?  I've spoken to several people who all have said they've 
> never used simplenative mode.

I agree that it shouldn't be used normally, but I'm not sure we should
remove it, because we can't remove it fully: SimpleClient is still
used in many tests, and I think that should continue.

If you suspect any kind of native proto or driver issue it may be
useful to have another implementation easily accessible to aid in
debugging the problem, and the maintenance cost of keeping it in
stress is roughly zero in my opinion.  We can make it clear that it's
not recommended for use and is intended only as a debugging tool,
though.

Kind Regards,
Brandon

Is simplenative in cassandra-stress still relevant?

2023-05-27 Thread Miklosovic, Stefan

I am doing some fixes for cassandra-stress and I stumbled upon this

https://issues.apache.org/jira/browse/CASSANDRA-18529

There is

Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] 
[password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] 
[protocolVersion=?]
 OR 
Usage: -mode simplenative [prepared] cql3 [port=?]

"-mode simplenative prepared cql3" throws: (it works without "prepared").

java.lang.ClassCastException: [B cannot be cast to 
org.apache.cassandra.transport.messages.ResultMessage$Prepared
java.io.IOException: Operation x10 on key(s) [373038504b3436363830]: Error 
executing: (ClassCastException): [B cannot be cast to 
org.apache.cassandra.transport.messages.ResultMessage$Prepared

at org.apache.cassandra.stress.Operation.error(Operation.java:127)
at 
org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:105)
at 
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:91)
at 
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99)
at 
org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:242)
at 
org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:467)
java.io.IOException: Operation x10 on key(s) [4e334f364c4c4b373530]: Error 
executing: (ClassCastException): [B cannot be cast to 
org.apache.cassandra.transport.messages.ResultMessage$Prepared


I want to ask if this "simplenative" is still relevant and people are still 
using it. It seems to me that nobody is actually using this / I've never heard 
of anybody doing that but I may be wrong and people are using it all day and 
night ... 

simplenative uses SimpleClient which is used through the code base, e.g. in 
CQLTester so we are not going to get rid of that for sure.

If simplenative in stress is not relevant, that whole -mode is questionable, if 
we get rid of simplenative, we would end up having "-mode native cql3" and 
since there is nothing but "native" as there is no Thrift anymore, "native" is 
a constant which can go away. If we end up having "-mode cql3" as the only mode 
possible, whole -mode can go away and we can rename it to "-cql3".

Thoughts?

Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-05-18 Thread Miklosovic, Stefan

Hi Maxim,

thanks for bringing this up. I am glad you did the heavy-lifting in / around 
CassandraRelevantProperties and we can build on top of this.

I am fine with @Replaces for Cassandra system properties. After we put 
everything into CassandraRelevantProperties, one can easily see that there are 
great inconsistencies in properties' naming. As we still need to support the 
old names too, using @Replaces, the similar mechanism we used in 
DatabaseDescriptor, seems like the ideal solution.

By the way, when somebody queries system_views.system_properties, it looks very 
strange in CQL shell, the formatting is just broken. EXPAND ON; does not help 
either. It is quite hard to parse this visually if a user wants to see them 
all. The reason is that there is "java.class.path" property for which the value 
is so long that it basically breaks the output.

Another solution would be to fix the output but I am not sure how it would look 
like.

As we are going to rename them to have same prefixes, could not we remodel that 
table as well? For example:

https://gist.github.com/smiklosovic/de662b7faa25e1fdd56805cdb5ba80a7

Feel free to come up with a different approach.

By doing this, it would be way easier to get just Cassandra properties or just 
properties for tests or just Java properties and selecting just the first two 
groups would not break CQLSH. It is nice that it would have same prefix but I 
am trying to find a way how to utilize the same prefix in CQLSH as well.

From: Maxim Muzafarov 
Sent: Thursday, May 18, 2023 12:54
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Moving system property names to the 
CassandraRelevantProperties

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Hello everyone,

Thanks for following this thread and the review, all the system
properties have been moved to CassandraRelevantProperties.
So you can find out what it looks like from the following link:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/CassandraRelevantProperties.java#L38

I would like to show you a few more steps in this thread so that the
solution is generally complete. As you may have noticed, we have three
types of system properties: cassandra properties used in production
environments, cassandra properties used for testing only, and
non-cassandra properties. I would like to reuse the @Replaces
annotation to rename cassandra-related properties according to the
following pattern: the 'cassandra.' prefix is for production
properties, and the 'cassandra.test' prefix is for testing properties.

This makes the results of the SystemPropertiesTable virtual table look
more natural to users. I thinks we should include this change for the
5.0 release.
WDYT?

The other code clarity minor improvements to do:

1.
Use WithProperties to ensure that system properties are handled
https://issues.apache.org/jira/browse/CASSANDRA-18453

2.
As a draft agreement, the CassandraRelevantProperties and
CassandraRelevantEnv (and probably DatabaseDescriptor) could share the
same interface to access the system properties, configuration
properties, and/or environment variables. The idea is still in draft
form, so I'm mentioning it here to keep it in context. Will come back
to it when more details are available.
This is what it might look like:
https://github.com/apache/cassandra/pull/2300/files#diff-6b7db8438314143a1b6b1c8c58901a4e3954af8cdd294ca8853a1001c1f4R70

On Fri, 31 Mar 2023 at 07:08, Jacek Lewandowski
 wrote:
>
> I'll do
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> czw., 30 mar 2023 o 22:09 Miklosovic, Stefan  
> napisał(a):
>>
>> Hi list,
>>
>> we are looking for one more committer to take a look at this patch (1, 2).
>>
>> It looks like there is a lot to go through because of number of files 
>> modified (around 200) but changes are really just about moving everything to 
>> CassandraRelevantProperties. I do not think that it should take more than 1 
>> hour of dedicated effort and we are done!
>>
>> Thanks in advance to whoever reviews this.
>>
>> I want to especially thank Maxim for his perseverance in this matter and I 
>> hope we will eventually deliver this work to trunk.
>>
>> (1) https://github.com/apache/cassandra/pull/2046
>> (2) https://issues.apache.org/jira/browse/CASSANDRA-17797
>>
>> Regards
>>
>> Regards
>>
>> 
>> From: Miklosovic, Stefan 
>> Sent: Wednesday, March 22, 2023 14:34
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Moving system property names to the 
>> CassandraRelevantProperties
>>
>> NetApp Security WAR

Re: [DISCUSS] The future of CREATE INDEX

2023-05-18 Thread Miklosovic, Stefan

I don't want to hijack this thread, I just want to say that the point 4) seems 
to be recurring. I second Caleb in saying that transactional metadata would 
probably fix this. Because of the problem of not being sure that all config is 
same, cluster-wide, I basically dropped the effort on CEP-24 because different 
local configurations might compromise the security.

From: Henrik Ingo 
Sent: Wednesday, May 17, 2023 22:32
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] The future of CREATE INDEX

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

I have read the thread but chose to reply to the top message...

I'm coming to this with the background of having worked with MySQL, where both 
the storage engine and index implementation had many options, and often of 
course some index types were only available in some engines.

I would humbly suggest:

1. What's up with naming anything "legacy". Calling the current index type "2i" 
seems perfectly fine with me. From what I've heard it can work great for many 
users?

2. It should be possible to always specify the index type explicitly. In other 
words, it should be possible to CREATE CUSTOM INDEX ... USING "2i" (if it isn't 
already)

2b) It should be possible to just say "SAI" or "SASIIndex", not the full Java 
path.

3. It's a fair point that the "CUSTOM" word may make this sound a bit too 
special... The simplest change IMO is to just make the CUSTOM work optional.

4. Benedict's point that a YAML option is per node is a good one... For 
example, you wouldn't want some nodes to create a 2i index and other nodes a 
SAI index for the same index That said, how many other YAML options can you 
think of that would create total chaos if different nodes actually had 
different values for them? For example what if a guardrail allowed some action 
on some nodes but not others?  Maybe what we need is a jira ticket to enforce 
that certain sections of the config must not differ?

5. That said, the default index type could also be a property of the keyspace

6. MySQL allows the DBA to determine the default engine. This seems to work 
well. If the user doesn't care, they don't care, if they do, they use the 
explicit syntax.

henrik

On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe 
mailto:calebrackli...@gmail.com>> wrote:
Earlier today, Mick started a thread on the future of our index creation DDL on 
Slack:

https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019

At the moment, there are two ways to create a secondary index.

1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()

This creates an optionally named legacy 2i on the provided table and column.

ex. CREATE INDEX my_index ON kd.tbl(my_text_col)

2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
 [WITH OPTIONS = ]

This creates a secondary index on the provided table and column using the 
specified 2i implementation class and (optional) parameters.

ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
'StorageAttachedIndex'

(Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
shorthand for the fully-qualified class name, which is also valid.)

So what is there to discuss?

The concern Mick raised is...

"...just folk continuing to use CREATE INDEX  because they think CREATE CUSTOM 
INDEX is advanced (or just don't know of it), and we leave users doing 2i (when 
they think they are, and/or we definitely want them to be, using SAI)"

To paraphrase, we want people to use SAI once it's available where possible, 
and the default behavior of CREATE INDEX could be at odds w/ that.

The proposal we seem to have landed on is something like the following:

For 5.0:

1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
2.) Leave CREATE CUSTOM INDEX...USING... available by default.

(Note: How this would interact w/ the existing secondary_indexes_enabled YAML 
options isn't clear yet.)

Post-5.0:

1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ 
it.
2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a hybrid 
between the two. For example, CREATE INDEX...USING...WITH. This would both be 
flexible enough to accommodate index implementation selection and prescriptive 
enough to force the user to make a decision (and wouldn't change the legacy 
behavior of the existing CREATE INDEX). In this world, creating a legacy 2i 
might look something like CREATE INDEX...USING `legacy`.
3.) Eventually deprecate CREATE CUSTOM INDEX...USING.

Eventually we would have a single enabled DDL statement for index creation that 
would be minimal but also explicit/able to

Re: Cassandra 4.0-beta1 available on FreeBSD

2023-05-16 Thread Miklosovic, Stefan

Great stuff, Lapo!

I was looking into FreeBSD ports few days ago to see what Cassandra version it 
supports as I have BSDs as a hobby ...

Can't wait until I see 4.1!

BTW I noticed there is quite a lot of patches to make Cassandra run on FreeBSD 
(1). Would you be maybe interesting in submitting patches for changes you did 
(when applicable) so you do not need to patch it yourself in your port?

(1) https://cgit.freebsd.org/ports/tree/databases/cassandra4/files

Regards

From: Lapo Luchini 
Sent: Tuesday, May 16, 2023 13:05
To: dev@cassandra.apache.org
Subject: Re: Cassandra 4.0-beta1 available on FreeBSD

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Now updated to 4.0.8! 
(yes, I know, 4.0.9 was released during the process…)

Next step will be to upgrade to 4.1.

cheers,
   Lapo

On 2020-07-27 19:28, Ekaterina Dimitrova wrote:
> Thank you Angelo for supporting the project with this! Truly appreciate it!
>
> Best,
> Ekaterina
>
> On Mon, 27 Jul 2020 at 13:13, Angelo Polo  wrote:
>
>> Cassandra 4.0-beta1 is now available on FreeBSD.
>>
>> You can find information about the port here:
>> https://www.freshports.org/databases/cassandra4/
>>
>> The beta can be installed from an up-to-date ports tree under
>> databases/cassandra4.
>>
>> Best,
>> Angelo

[RELEASE] Apache Cassandra 3.0.29 released

2023-05-15 Thread Miklosovic, Stefan

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 3.0.29.

Apache Cassandra is a fully distributed database. It is the right choice when 
you need scalability and high availability without compromising performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please pay 
attention to the release notes[2] and Let us know[3] if you were to encounter 
any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian 
/etc/apt/sources.list.d/cassandra.sources.list and RedHat 
/etc/yum.repos.d/cassandra.repo files must be updated to the new repository 
URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it 
is now https://redhat.cassandra.apache.org/30x/ .

Enjoy!

[1]: CHANGES.txt 
https://github.com/apache/cassandra/blob/cassandra-3.0.29/CHANGES.txt
[2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-3.0.29/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

[RESULT][VOTE] Release Apache Cassandra 3.0.29

2023-05-12 Thread Miklosovic, Stefan

The vote passed with 3 binding +1's and 0 binding -1's.

https://lists.apache.org/thread/39s345w5fv2r4z7p0jjslc2vf6rqdjk1

1 2 3 >

1 - 100 of 220 matches

Mail list logo