Re: [EXTERNAL] Re: [DISCUSS] Adding experimental vtables and rules around them

2024-05-31 Thread German Eichberger via dev
Hi,

To sum where everyone is coming from: We would like to have features in a 
stable version of Cassandra which are experimental and are subject to 
non-backward compatible change. This indicates to me that the feature is not 
finished and should likely not be included in a stable release.  What benefit 
are we looking for by including it into a stable release as opposed to rolling 
it to the next release.

Thanks,
German

From: Maxim Muzafarov 
Sent: Wednesday, May 29, 2024 1:09 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Adding experimental vtables and rules around 
them

Hello everyone,

I like the idea of highlighting some of the experimental virtual
tables whose model might be changed in future releases.

As another option, we could add an @Experimetal annotation (or another
name) and a configuration parameter
experimental_virtula_tables_enabled (default is false). This, in turn,
means if a virtual table is experimental, it won't be registered in a
virtual keyspace unless the corresponding configuration parameter is
enabled. This also means that a user must implicitly enable an
experimental API, and prevent us from spamming the log with warnings.
All of this does not preclude us from specifying the experimental
state of some virtual tables in the documentation.

On Wed, 29 May 2024 at 21:18, Abe Ratnofsky  wrote:
>
> I agree that ClientWarning is the best way to indicate the risk of using an 
> experimental feature directly to the user. Presenting information in the 
> client application's logs directly means that the person who wrote the query 
> is most likely to see the warning, rather than an operator who sees cluster 
> logs.
>
> I don't think it's necessary to attach a ClientWarning to every single client 
> response; a ClientWarning analog to NoSpamLogger would be useful for this 
> ("warn a client at most once per day").
>
> This would also be useful for warning on usage of deprecated features.
>
> > On May 29, 2024, at 3:01 PM, David Capwell  wrote:
> >
> > We agreed a long time ago that all new features are disabled by default, 
> > but I wanted to try to flesh out what we “should” do with something that 
> > might still be experimental and subject to breaking changes; I would prefer 
> > we keep this thread specific to vtables as the UX is different for 
> > different types of things…
> >
> > So, lets say we are adding a set of vtables but we are not 100% sure what 
> > the schema should be and we learn after the release that changes should be 
> > made, but that would end up breaking the table… we currently define 
> > everything as “don’t break this” so if we publish a table that isn’t 100% 
> > baked we are kinda stuck with it for a very very long time… I would like to 
> > define a way to expose vtables that are subject to change (breaking schema 
> > changes) across different release and rules around them (only in minor?  
> > Maybe even in patch?).
> >
> > Lets try to use a concrete example so everyone is on the same page.
> >
> > Accord is disabled by default (it is a new feature), so the vtables to 
> > expose internals would be expected to be undefined and not present on the 
> > instance.
> >
> > When accord is enabled (accord.enabled = true) we add a set of vtables:
> >
> > Epochs - shows what epochs are known to accord
> > Cache - shows how the internal caches are performing
> > Etc.
> >
> > Using epochs as an example it currently only shows a single column: the 
> > long epoch
> >
> > CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint PRIMARY KEY);
> >
> > Lets say we find that this table isn’t enough and we really need to scope 
> > it to each of the “stores” (threads for processing accord tasks)
> >
> > CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint, store_id int, 
> > PRIMARY KEY (epoch, store_id));
> >
> > In this example the table changed the schema in a way that could break 
> > users, so this normally is not allowed.
> >
> > Since we don’t really have a way to define something experimental other 
> > than NEWS.txt, we kinda get stuck with this table and are forced to make 
> > new versions and maintain them for a long time (in this example we would 
> > have epochs and epochs_v2)… it would be nice if we could define a way to 
> > express that tables are free to be changed (modified or even deleted) and 
> > the life cycle for them….
> >
> > I propose that we allow such a case and make changes to the UX (as best as 
> > we can) to warn about this:
> >
> > 1) update NEWS.txt to denote that the feature is experimental
> > 2) when you access an experimental table you get a ClientWarning stating 
> > that this is free to change
> > 3) the tables comments starts with “[EXPERIMENTAL]”
> >
> > What do others think?
> >
> >
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-15 Thread German Eichberger via dev
Thanks for the proposal. I second Jordan that we need more abstraction in (1), 
e.g. most cloud provider allow for disk snapshots and starting nodes from a 
snapshot which would be a good mechanism if you find yourself there.

German

From: Jordan West 
Sent: Sunday, April 14, 2024 12:27 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar 
for Live Migrating Instances

Thanks for proposing this CEP! We have something like this internally so I have 
some familiarity with the approach and the challenges. After reading the CEP a 
couple things come to mind:

1. I would like to see more abstraction of how the files get moved / put in 
place with the proposed solution being the default implementation. That would 
allow others to plug in alternatives means of data movement like pulling down 
backups from S3 or rsync, etc.

2. I do agree with Jon’s last email that the lifecycle / orchestration portion 
is the more challenging aspect. It would be nice to address that as well so we 
don’t end up with something like repair where the building blocks are there but 
the hard parts are left to the operator. I do, however, see that portion being 
done in a follow-on CEP to limit the scope of CEP-40 and have a higher chance 
for success by incrementally adding these features.

Jordan

On Thu, Apr 11, 2024 at 12:31 Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
First off, let me apologize for my initial reply, it came off harsher than I 
had intended.

I know I didn't say it initially, but I like the idea of making it easier to 
replace a node.  I think it's probably not obvious to folks that you can use 
rsync (with stunnel, or alternatively rclone), and for a lot of teams it's 
intimidating to do so.  Whether it actually is easy or not to do with rsync is 
irrelevant.  Having tooling that does it right is better than duct taping 
things together.

So with that said, if you're looking to get feedback on how to make the CEP 
more generally useful, I have a couple thoughts.

> Managing the Cassandra processes like bringing them up or down while 
> migrating the instances.

Maybe I missed this, but I thought we already had support for managing the C* 
lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me that 
adding the ability to make this entire workflow self managed would be the 
biggest win, because having a live migrate *feature* instead of what's 
essentially a runbook would be far more useful.

> To verify whether the desired file set matches with source, only file path 
> and size is considered at the moment. Strict binary level verification is 
> deferred for later.

Scott already mentioned this is a problem and I agree, we cannot simply rely on 
file path and size.

TL;DR: I like the intention of the CEP.  I think it would be better if it 
managed the entire lifecycle of the migration, but you might not have an 
appetite to implement all that.

Jon


On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala 
mailto:n.v.harikrishna.apa...@gmail.com>> 
wrote:
Thanks Jon & Scott for taking time to go through this CEP and providing inputs.

I am completely with what Scott had mentioned earlier (I would have added more 
details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without depending on 
rsync. At least in the cases I have seen, rsync is not enabled by default and 
most of them want to run OS/images with as minimal requirements as possible. 
Installing rsync requires admin privileges and syncing data is a manual 
operation. If an API is provided with Sidecar, then tooling can be built around 
it reducing the scope for manual errors.

From performance wise, at least in the cases I had seen, the File Streaming API 
in Sidecar performs a lot better. To give an idea on the performance, I would 
like to quote "up to 7 Gbps/instance writes (depending on hardware)" from 
CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of an 
>SSTable in its stats metadata component, which wouldn't alter the filename and 
>may not alter the length of the stats metadata component. A change to the 
>level of an SSTable on the source via single sstable uplevel may not be caught 
>by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified time 
would change, right? It is addressed in section MIGRATING ONE INSTANCE, point 
2.b.ii which says "If a file is present at the destination but did not match 
(by size or timestamp) with the source file, then local file is deleted and 
added to list of files to download.". And after download by final data copy 
task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas 
mailto:sc...@paradoxica.net>> wrote:
Oh, one note on this item:

>  The operator can 

Re: [DISCUSS] NULL handling and the unfrozen collection issue

2024-03-20 Thread German Eichberger via dev
Hi,

+1 I like doing it the SQL way. This makes sense to me.

Now, in Cassandra setting a column to null means deleting it and if all​ 
columns in a row are null the row is deleted. This might be another edge case...

German

From: Benjamin Lerer 
Sent: Wednesday, March 20, 2024 9:15 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] [DISCUSS] NULL handling and the unfrozen collection issue

You don't often get email from b.le...@gmail.com. Learn why this is 
important
Hi everybody,

CEP-29 (CQL NOT Operator) is hitting the grey area of how we want as a 
community to handle NULL including for things like unfrozen (multi-cell) 
collections and I would like to make a proposal for moving forward with NULL 
related issues.

We have currently 2 tickets open about NULL handling (I might have missed 
others):

  1.  CASSANDRA-10715: 
Allowing Filtering on NULL
  2.  CASSANDRA-17762: 
LWT IF col = NULL is inconsistent with SQL NULL

We also had previously some discussion on which we touched the subject:

  *   [DISCUSS] LWT UPDATE semantics with + and - when null
  *   CEP-15 multi key transaction syntax

In all those tickets and discussions the consensus was to have a behavior 
similar to SQL.

For null comparisons, SQL uses the three-value logic 
(https://modern-sql.com/concept/three-valued-logic) introducing the need for IS 
NULL and IS NOT NULL operators. Those conflict with the col = NULL predicate 
supported in LWT conditions 
(CASSANDRA-17762).

So far, as Cassandra was only using inclusive operators, comparisons were 
behaving in an expected way. According to three-valued logic NULL CONTAINS 
'foo' should return UNKNOWN and the filtering behavior should exclude 
everything which is not true.Therefore the row should not be returned as 
expected. With exclusive operators things are more tricky. NULL NOT CONTAINS 
'foo' will also return UNKNOWN causing the row to not be returned which might 
not match people's expectations.
This behavior can be even more confusing once you take into account empty and 
null collections. NOT CONTAINS on an empty collection will return true while it 
will return UNKNOWN on a NULL collection. Unfortunately, for unfrozen 
(multicell) collections we are unable to differentiate between an empty and 
null collection and therefore always treat empty collections as NULL.
For predicates such as map[myKey] != 'foo' when myKey is not present the result 
can also be surprising as it will end up comparing NULL to 'foo' returning once 
more UNKNOWN and ignoring the row.
In order to respect the SQL three-valued logic and be able to allow the user to 
fetch all the rows which do not contains a specific value we would need support 
IS NULL, IS NOT NULL and OR to allow query like:
WHERE c IS NULL OR c NOT CONTAINS 'foo' / WHERE m IS NULL OR m[myKey] != foo

Supporting the three-valued logic makes sense to me even if some behavior might 
end up being confusing. In which case we can easily fix CASSANDRA-10715 and 
deprectate support for col = NULL/col != NULL in LWT.

What is people's opinion? Should we go for the three-valued logic everywhere? 
Should we try something else?







Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread German Eichberger via dev
Jaydeep,

I concur with Stefan that extensibility of this  should be a design goal:

  *   It should be easy to add additional metrics (e.g. write queue depth) and 
decision logic
  *   There should be a way to interact with other systems to signal a resource 
need  which then could kick off things like scaling

Super interested in this and we have been thinking about siimilar things 
internally 

Thanks,
German

From: Jaydeep Chovatia 
Sent: Tuesday, January 16, 2024 1:16 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

You don't often get email from chovatia.jayd...@gmail.com. Learn why this is 
important
Hi Stefan,

Please find my response below:
1) Currently, I am keeping the signals as interface, so one can override with a 
different implementation, but a point noted that even the interface APIs could 
be also made dynamic so one can define APIs and its implementation, if they 
wish to override.
2) I've not looked into that yet, but I will look into it and see if it can be 
easily integrated into the Guardrails framework.
3) On the server side, when the framework detects that a node is overloaded, 
then it will throw OverloadedException back to the client. Because if the node 
while busy continues to serve additional requests, then it will slow down other 
peer nodes due to dependencies on meeting the QUORUM, etc. In this, we are at 
least preventing server nodes from melting down, and giving the control to the 
client via OverloadedException. Now, it will be up to the client policy, if 
client wishes to retry immediately on a different server node then eventually 
that server node might be impacted, but if client wishes to do exponential back 
off or throw exception back to the application then that server node will not 
be impacted.


Jaydeep

On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič 
mailto:stefan.mikloso...@gmail.com>> wrote:
Hi Jaydeep,

That seems quite interesting. Couple points though:

1) It would be nice if there is a way to "subscribe" to decisions your 
detection framework comes up with. Integration with e.g. diagnostics subsystem 
would be beneficial. This should be pluggable - just coding up an interface to 
dump / react on the decisions how I want. This might also act as a notifier to 
other systems, e-mail, slack channels ...

2) Have you tried to incorporate this with the Guardrails framework? I think 
that if something is detected to be throttled or rejected (e.g writing to a 
table), there might be a guardrail which would be triggered dynamically in 
runtime. Guardrails are useful as such but here we might reuse them so we do 
not need to code it twice.

3) I am curious how complex this detection framework would be, it can be 
complicated pretty fast I guess. What would be desirable is to act on it in 
such a way that you will not put that node under even more pressure. In other 
words, your detection system should work in such a way that there will not be 
any "doom loop" whereby mere throttling of various parts of Cassandra you make 
it even worse for other nodes in the cluster. For example, if a particular node 
starts to be overwhelmed and you detect this and requests start to be rejected, 
is it not possible that Java driver would start to see this node as "erroneous" 
with delayed response time etc and it would start to prefer other nodes in the 
cluster when deciding what node to contact for query coordination? So you would 
put more load on other nodes, making them more susceptible to be throttled as 
well ...

Regards

Stefan Miklosovic

On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:
Hi,

Happy New Year!

I would like to discuss the following idea:

Open-source Cassandra 
(CASSANDRA-15013) has an 
elementary built-in memory rate limiter based on the incoming payload from user 
requests. This rate limiter activates if any incoming user request’s payload 
exceeds certain thresholds. However, the existing rate limiter only solves 
limited-scope issues. Cassandra's server-side meltdown due to overload is a 
known problem. Often we see that a couple of busy nodes take down the entire 
Cassandra ring due to the ripple effect. The following document proposes a 
generic purpose comprehensive rate limiter that works considering system 
signals, such as CPU, and internal signals, such as thread pools. The rate 
limiter will have knobs to filter out internal traffic, system traffic, 
replication traffic, and furthermore based on the types of queries.

More design details to this doc: [OSS] Cassandra Generic Purpose Rate Limiter - 
Google 
Docs

Please let me know your thoughts.

Jaydeep


Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread German Eichberger via dev
Congrats!!

From: David Capwell 
Sent: Monday, January 8, 2024 11:03 AM
To: dev 
Subject: [EXTERNAL] Re: Welcome Maxim Muzafarov as Cassandra Committer

Congrats!

On Jan 8, 2024, at 10:53 AM, Jacek Lewandowski  
wrote:

Congratulations Maxim, well deserved, it's a pleasure to work with you!

- - -- --- -  -
Jacek Lewandowski


pon., 8 sty 2024 o 19:35 Lorina Poland 
mailto:polan...@apache.org>> napisał(a):
Congratulations Maxim!

On 2024/01/08 18:19:04 Josh McKenzie wrote:
> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has 
> accepted
> the invitation to become a committer.
>
> Thanks for all the hard work and collaboration on the project thus far, and 
> we're all looking forward to working more with you in the future. 
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>
>



Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2023-12-22 Thread German Eichberger via dev
+1

From: Patrick McFadin 
Sent: Friday, December 22, 2023 9:12 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: Harry in-tree (Forked from "Long tests, Burn tests, 
Simulator tests, Fuzz tests - can we clarify the diffs?")

It was great having some more extended discussions about Harry in person last 
week. Anything we can do to make it easier for anyone to test Cassandra 
thoroughly is an easy +1 from me!

Thanks for all your efforts so far, Alex.

Patrick

On Fri, Dec 22, 2023 at 8:03 AM Jacek Lewandowski 
mailto:lewandowski.ja...@gmail.com>> wrote:
Obviously +1

Thank you Alex

pt., 22 gru 2023, 16:45 użytkownik Sumanth Pasupuleti 
mailto:sumanth.pasupuleti...@gmail.com>> 
napisał:
+1, thank you for your efforts in bringing Harry in-tree. Anything that 
improves the testing ecosystem for Cassandra, particularly around complex 
scenarios / edge cases  goes a long way in improving reliability, and with 
having a powerful tool like Harry in-tree, it is a lot more accessible to the 
developers than it has been. Also, thank you for keeping in mind the onboarding 
experience of developers.

- Sumanth

On Fri, Dec 22, 2023 at 1:11 AM Alex Petrov 
mailto:al...@coffeenco.de>> wrote:
Some follow-up tickets to establish the project direction:

https://issues.apache.org/jira/browse/CASSANDRA-19229

Two other things that we will work on in Tree are:
https://issues.apache.org/jira/browse/CASSANDRA-18275 (model and in-JVM test 
for partition-restricted 2i queries)
https://issues.apache.org/jira/browse/CASSANDRA-18667 (multi-threaded SAI read 
and write fuzz test)

If you would like to get your recently added feature tested with Harry model, 
please let me know!

On Fri, Dec 22, 2023, at 12:41 AM, Joseph Lynch wrote:
+1

Sounds like a great change that will help us unify around a common testing 
paradigm, and even pave the path to in-tree load testing plus integrated 
correctness checking which would be extremely valuable!

-Joey

On Thu, Dec 21, 2023 at 1:35 PM Caleb Rackliffe 
mailto:calebrackli...@gmail.com>> wrote:
+1

Agree w/ all the justifications mentioned above.

As a reviewer on 
CASSANDRA-19210, my 
goals were to a.) look at the directory, naming, and package structure of the 
ported code, b.) make sure IDE integration was working, and c.) make sure any 
modifications to existing code (rather than direct code movements from 
cassandra-harry) were straightforward.

On Thu, Dec 21, 2023 at 3:23 PM Alex Petrov 
mailto:al...@coffeenco.de>> wrote:

Hey folks,

I am mostly done with a patch that brings Harry in-tree [1]. I will trigger one 
more CI run overnight, and my intention was to merge it some time soon, but I 
wanted to give a fair warning here, since this is a relatively large patch.

Good news for everyone that it:
  a) touches no production code whatsoever. Only test (in-jvm dtest namely) 
code that was using Harry already.
  b) the only tests that are changed are ones that used a duplicate version of 
placement simulator we had both for testing TCM, and in Harry
  c) in addition, I have converted 3 existing TCM tests to a new API to have 
some base for examples/usage.

Since we were effectively relying on this code for a while now, and the 
intention now is to converge to:
  a) fewer different generators, and have a shareable version of generators for 
everyone to use accross the base
  b) a testing tool that can be useful for both trivial cases, and complex 
scenarios
myself and many other Cassandra contributors have expressed an opinion that 
bringing Harry in-tree will be highly benefitial.

I strongly believe that bringing Harry in-tree will help to lower the barrier 
for fuzz test and simplify co-development of Cassandra and Harry. Previously, 
it has been rather difficult to debug edge cases because I had to either 
re-compile an in-jvm dtest jar and bring it to Harry, or re-compile a Harry jar 
and bring it to Cassandra, which is both tedious and time consuming. Moreover, 
I believe we have missed at very least one RT regression [2] because Harry was 
not in-tree, as its tests would've caught the issue even with the model that 
existed.

For other recently found issues, I think having Harry in-tree would have 
substantially lowered a turnaround time, and allowed me to share repros with 
developers of corresponding features much quicker.

I do expect a slight learning curve for Harry, but my intention is to build a 
web of simple tests (worked on some of them yesterday after conversation with 
David already), which can follow the in-jvm-dtest pattern of find-similar-test 
/ copy / modify. There's already copious documentation, so I do not believe not 
having docs for Harry was ever an issue, since there have been plenty.

You all are aware of my dedication to testing and quality of Apache Cassandra, 
and I hope you also see the benefits of having a model checker in-tree.

Thank you and happy upcoming 

Re: Future direction for the row cache and OHC implementation

2023-12-20 Thread German Eichberger via dev
Hi,

we once did some extensive performance testing on the row cache (motivated by 
some hardware accelerator we were hoping to introduce)  but could only find 
improvements in highly contrived scenarios - has been a while since then so 
fresh eyes are good but I think we will still arrive at the conclusion to 
deprecate the row cache.

Thanks,
German

From: Jon Haddad 
Sent: Monday, December 18, 2023 10:31 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: Future direction for the row cache and OHC 
implementation

You don't often get email from j...@jonhaddad.com. Learn why this is 
important
Sure, I’d love to work with you on this.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg 
mailto:ar...@weisberg.ws>> wrote:
Hi,

Thanks for the generous offer. Before you do that can you give me a chance to 
add back support for Caffeine for the row cache so you can test the option of 
switching back to an on-heap row cache?

Ariel

On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
I think we should probably figure out how much value it actually provides by 
getting some benchmarks around a few use cases along with some profiling.  
tlp-stress has a --rowcache flag that I added a while back to be able to do 
this exact test.  I was looking for a use case to profile and write up so this 
is actually kind of perfect for me.  I can take a look in January when I'm back 
from the holidays.

Jon

On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever 
mailto:m...@apache.org>> wrote:



I would avoid taking away a feature even if it works in narrow set of 
use-cases. I would instead suggest -

1. Leave it disabled by default.
2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
off. Cassandra should ideally detect this and do it automatically.
3. Move to Caffeine instead of OHC.

I would suggest having this as the middle ground.



Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, 
hard value when to disable.



Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-30 Thread German Eichberger via dev
I don't think outside people will know the distinction between alpha and beta - 
for them anything which isn't GA doesn't get deployed  (and even then they 
might wait another year or two).

People following this mailing list would lilkey know that 5.0-beta-1 is pretty 
close to 5.0-alpha-3 -- so I am supporting releasing to hit the date, At this 
point it's semantics...

Thanks,
German

From: Maxim Muzafarov 
Sent: Thursday, November 30, 2023 3:12 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-beta1

I'm gonna take a moment to outline the question. Here we have a point
in time where a time-driven release process clashes with the
alpha/beta release naming convention: we want to have a beta ready
_before_ the Summit.

Here's the Cassandra release lifecycle document [1] that I found
(still under discussion I think) and according to the 'beta'
definition we should have a green CI and no regressions for a beta
release.  This means that there may be known bugs in the new features
we are trying to ship. Unless I'm not missing something, 5.0 currently
meets the 'beta' criteria and the definition itself sounds clear to
me.

So, the question is - should we find a better place for the [1] page
and move it somewhere under the 'officially accepted'? :-)

[1] 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FRelease%2BLifecycle=05%7C01%7CGerman.Eichberger%40microsoft.com%7C1ec418c7eb5040f7acaf08dbf1955153%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638369395848575700%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=hE9tInVzW9GAhR7IbuMASiOpTaHuaMb9455HVBRaMx4%3D=0

On Thu, 30 Nov 2023 at 07:39, Jacek Lewandowski
 wrote:
>>
>> If we end up not releasing a final 5.0 artifact by a Cassandra Summit it 
>> will signal to the community that we’re prioritizing stability and it could 
>> be a good opportunity to get people to test the beta or RC before we stamp 
>> it as production ready.
>
>
> I agree with Paulo's comment
>
> czw., 30 lis 2023 o 04:44 Paulo Motta  napisał(a):
>>
>> > if any contributor has an opinion which is not technically refuted it will 
>> > usually be backed by a PMC via a binding -1
>>
>> clarifying a bit my personal view: if any contributor has an opinion against 
>> a proposal (in this case this release proposal) that is not refuted it will 
>> usually be backed by a PMC via binding -1
>>
>> Opinions supporting the proposal are also valuable, provided there are no 
>> valid claims against a proposal.
>>
>> On Wed, 29 Nov 2023 at 22:27 Paulo Motta  wrote:
>>>
>>> To me, the goal of a beta is to find unknown bugs. If no new bugs are found 
>>> during a beta release, then it can be automatically promoted to RC via 
>>> re-tagging. Likewise, if no new bugs are found during a RC after X time, 
>>> then it can be promoted to final.
>>>
>>> If we end up not releasing a final 5.0 artifact by a Cassandra Summit it 
>>> will signal to the community that we’re prioritizing stability and it could 
>>> be a good opportunity to get people to test the beta or RC before we stamp 
>>> it as production ready.
>>>
>>> WDYT?
>>>
>>> >  Aaron (and anybody who takes the time to follow this list, really), your 
>>> > opinion matters, that's why we discuss it here.
>>>
>>> +1, PMC are just officers who endorse community decisions, so if any 
>>> contributor has an opinion which is not technically refuted it will usually 
>>> be backed by a PMC via a binding -1 (as seen on this thread)
>>>
>>> On Wed, 29 Nov 2023 at 20:04 Nate McCall  wrote:



 On Thu, Nov 30, 2023 at 3:28 AM Aleksey Yeshchenko  
 wrote:
>
> -1 on cutting a beta1 in this state. An alpha2 would be acceptable now, 
> but I’m not sure there is significant value to be had from it. Merge the 
> fixes for outstanding issues listed above, then cut beta1.

 

 Agree with Aleksey. -1 on a beta we know has issues with a top-line new 
 feature.




Re: [EXTERNAL] Re: [DISCUSSION] CEP-38: CQL Management API

2023-11-20 Thread German Eichberger via dev
Hi,

>From a cloud provider perspective we expose the storage port to customers for 
>Hybrid scenarios (e.g. fusing on-prem Cassandra with in-cloud Cassandra) so 
>would prefer an extra port or a socket.
Thanks,
German


From: Dinesh Joshi 
Sent: Friday, November 17, 2023 4:06 PM
To: dev 
Subject: [EXTERNAL] Re: [DISCUSSION] CEP-38: CQL Management API

Hi Maxim,

Thanks for putting this CEP together! This is a great start. I have gone over 
the CEP and there is one thing that stuck out to me.

Among the 'basic requirements', I see you have this -

> A dedicated admin port with the native protocol behind it,
> allowing only admin commands, to address the concerns when
> the native protocol is disabled in certain circumstances
> e.g. the disablebinary command is executed;

I understand what you're achieve here. However, there are a few reasons we 
should probably offer some choice to our users w.r.t. using a dedicated port 
for management functions.

Today Cassandra exposes several ports - 9042, 9142, 7000 and 7001. The sidecar 
runs on port 9043. Thats a lot of ports. I would prefer to allow users to 
access management functionality over one of the existing ports.

I realize that this would mean a subtle change in behavior for disablebinary 
when we offer it over port 9042 and not when the operator decides to use a 
dedicated port.

More importantly, I think having this functionality exposed over the storage 
ports may be even better. The storage ports are typically firewalled off from 
the end users. Operators and tooling, however, usually have access to these 
ports. This especially makes sense from a security standpoint where we'd like 
to limit users from accessing management functionality.

What do others think about this approach?

thanks,

Dinesh

> On Nov 13, 2023, at 10:08 AM, Maxim Muzafarov  wrote:
>
> Hello everyone,
>
> While we are still waiting for the review to make the settings virtual
> table updatable (CASSANDRA-15254), which will improve the
> configuration management experience for users, I'd like to take
> another step forward and improve the C* management approach we have as
> a whole. This approach aims to make all Cassandra management commands
> accessible via CQL, but not only that.
>
> The problem of making commands accessible via CQL presents a complex
> challenge, especially if we aim to minimize code duplication across
> the implementation of management operations for different APIs and
> reduce the overall maintenance burden. The proposal's scope goes
> beyond simply introducing a new CQL syntax. It encompasses several key
> objectives for C* management operations, beyond their availability
> through CQL:
> - Ensure consistency across all public APIs we support, including JMX
> MBeans and the newly introduced CQL. Users should see consistent
> command specifications and arguments, irrespective of whether they're
> using an API or a CLI;
> - Reduce source code maintenance costs. With this new approach, when a
> new command is implemented, it should automatically become available
> across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
> the need for additional coding;
> - Maintain backward compatibility, ensuring that existing setups and
> workflows continue to work the same way as they do today;
>
> I would suggest discussing the overall design concept first, and then
> diving into the CQL command syntax and other details once we've found
> common ground on the community's vision. However, regardless of these
> details, I would appreciate any feedback on the design.
>
> I look forward to your comments!
>
> Please, see the design document: CEP-38: CQL Management API
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FCEP-38%253A%2BCQL%2BManagement%2BAPI=05%7C01%7CGerman.Eichberger%40microsoft.com%7C510fbe97b579406b389f08dbe7ca5430%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638358628430485779%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=aJcomfk5ufDIUqTFmUWzuvR18cFL8qAUS%2F3XwffqVqs%3D=0



Re: [DISCUSSION] CEP-38: CQL Management API

2023-11-15 Thread German Eichberger via dev
Hi Maxim,

We have adopted/forked the agent part of the 
https://github.com/k8ssandra/management-api-for-apache-cassandra project which 
aims to do similar things. I especially like how they have a local database 
socket where a sidecar can easily access cassandra and execute cql commands 
without the need of a service account like your example suggests.

The syntax they adopted (see for instance 
https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/7cb367eac46a12947bb87486456d3f905f37628b/management-api-server/src/main/java/com/datastax/mgmtapi/resources/NodeOpsResources.java#L115)
 looks like `CALL NodeOps.decommission(?, ?)", force, false)` which is similar 
to your execute - just throwing this out as another example.

I definitely like settling on the cql interface since that avoids having to 
load different jmx bindings for different Cassandra versions making things 
cleaner and more easily accessible. There is some security concern to mix data 
and control plane so I would liek to see some way to restrict access like the 
mgmt api does where the admin commands are only available on the socket. Maybe, 
have a special admin port or socket?

I  prefer making the agent part of the managment api become part of Cassandra 
either through your CEP or other means but I can also see this as an adjacent 
sub project  - let's discuss 

German


From: Maxim Muzafarov 
Sent: Monday, November 13, 2023 10:08 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] [DISCUSSION] CEP-38: CQL Management API

Hello everyone,

While we are still waiting for the review to make the settings virtual
table updatable (CASSANDRA-15254), which will improve the
configuration management experience for users, I'd like to take
another step forward and improve the C* management approach we have as
a whole. This approach aims to make all Cassandra management commands
accessible via CQL, but not only that.

The problem of making commands accessible via CQL presents a complex
challenge, especially if we aim to minimize code duplication across
the implementation of management operations for different APIs and
reduce the overall maintenance burden. The proposal's scope goes
beyond simply introducing a new CQL syntax. It encompasses several key
objectives for C* management operations, beyond their availability
through CQL:
- Ensure consistency across all public APIs we support, including JMX
MBeans and the newly introduced CQL. Users should see consistent
command specifications and arguments, irrespective of whether they're
using an API or a CLI;
- Reduce source code maintenance costs. With this new approach, when a
new command is implemented, it should automatically become available
across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
the need for additional coding;
- Maintain backward compatibility, ensuring that existing setups and
workflows continue to work the same way as they do today;

I would suggest discussing the overall design concept first, and then
diving into the CQL command syntax and other details once we've found
common ground on the community's vision. However, regardless of these
details, I would appreciate any feedback on the design.

I look forward to your comments!

Please, see the design document: CEP-38: CQL Management API
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCASSANDRA%2FCEP-38%253A%2BCQL%2BManagement%2BAPI=05%7C01%7CGerman.Eichberger%40microsoft.com%7C62051e1eb8964889962d08dbe473d482%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638354958369996874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=XT4LB1CopZy8qCUM6MnUfBhGFwKHmsUO%2B2AUpgv83zI%3D=0


Re: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-10-31 Thread German Eichberger via dev
+1

Heck, yeah, we already tested the branch (build ourselves) and it works great 
so far.

From: Mick Semb Wever 
Sent: Tuesday, October 31, 2023 1:43 PM
Cc: dev 
Subject: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-alpha2

> The vote will be open for 72 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding
> +1s and no -1's.


+1

Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 11+17)
- binary artefact runs (JDK 11+17)
- debian package runs (JDK 11+17)
- debian repo runs (JDK 11+17)
- redhat* package runs (JDK11+17)
- redhat* repo runs (JDK 11+17)


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-27 Thread German Eichberger via dev
uding many folks form 
this discussion) couldn't help but learn about their status and progress. 
Still, there was very little engagement (which, I claim, is absolutely fine). 
So, since one can't say that we (collectively) are not pubslihing CEPs and code 
early enough, the only argument is that the people choose to prioritise things 
based on what is important for their businesses today, and this is, again, 
completely fine.

If you are interested in a CEP, make sure you engage with its authors from the 
first time they publish something. There are many patches and CEPs I wish I 
have reviewed, but did not have time for. For those, I am reading the available 
discussions, talking to their authors, and writing Harry tests. I would not, 
however, ask someone to postpone a feature based on my past or future 
availability.

On Fri, Oct 27, 2023, at 10:14 AM, Jacek Lewandowski wrote:
I've been thinking about this and I believe that if we ever decide to delay a 
release to include some CEPs, we should make the plan and status of those CEPs 
public. This should include publishing a branch, creating tickets for the 
remaining work required for feature completion in Jira, and notifying the 
mailing list.

By doing this, we can make an informed decision about whether delivering a CEP 
in a release x.y planned for some time z is feasible. This approach would also 
be beneficial for improving collaboration, as we will all be aware of what is 
left to be done and can adjust our focus accordingly to participate in the 
remaining work.

Thanks,
- - -- --- -  -
Jacek Lewandowski


pt., 27 paź 2023 o 10:26 Benjamin Lerer 
mailto:ble...@apache.org>> napisał(a):
I would be interested in testing Maxim's approach. We need more visibility on 
big features and their progress to improve our coordination. Hopefully it will 
also open the door to more collaboration on those big projects.

Le jeu. 26 oct. 2023 à 21:35, German Eichberger via dev 
mailto:dev@cassandra.apache.org>> a écrit :
+1 to Maxim's idea

Like Stefan my assumption was that we would get some version of TCM + ACCORD in 
5.0 but it wouldn't be ready for production use. My own testing and 
conversations at Community over Code in Halifax confirmed this.

From this perspective as disappointing as TCM+ACCORD slipping is moving it to 
5.1 makes sense and I am supporting of this - but I am worried if 5.1 is 
basically 5.0 + TCM/ACCORD and this slips again we draw ourselves into a corner 
where we can't release 5.2 before 5.1 or something. I would like some more 
elaboration on that.

I am also very worried about ANN vector search being in jeopardy for 5.0 which 
is an important feature for me to win some internal company bet 

My 2 cents,
German



From: Miklosovic, Stefan via dev 
mailto:dev@cassandra.apache.org>>
Sent: Thursday, October 26, 2023 4:23 AM
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Cc: Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>>
Subject: [EXTERNAL] Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut 
an immediate 5.1-alpha1)

What Maxim proposes in the last paragraph would be definitely helpful. Not for 
the project only but for a broader audience, companies etc., too.

Until this thread was started, my assumption was that "there will be 5.0 on 
summit with TCM and Accord and it somehow just happens". More transparent 
communication where we are at with high-profile CEPs like these and knowing if 
deadlines are going to be met would be welcome.

I don't want to be that guy and don't take me wrong here, but really, these 
CEPs are being developed, basically, by devs from two companies, which have 
developers who do not have any real need to explain themselves like what they 
do, regularly, to outsiders. (or maybe you do, you just don't have time?) I get 
that. But on the other hand, you can not realistically expect that other folks 
will have any visibility into what is going on there and that there is a delay 
on the horizon and so on.


From: Maxim Muzafarov mailto:mmu...@apache.org>>
Sent: Thursday, October 26, 2023 12:21
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Personally, I think frequent releases (2-3 per year) are better than
infrequent big releases. I can understand all the concerns from a
marketing perspective, as smaller major releases may not shine as
brightly as a single "game changer" release. However, smaller
releases, especially if they don't have backwards compatibility
issues, are better for the engineering and SRE teams because if a

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-26 Thread German Eichberger via dev
+1 to Maxim's idea

Like Stefan my assumption was that we would get some version of TCM + ACCORD in 
5.0 but it wouldn't be ready for production use. My own testing and 
conversations at Community over Code in Halifax confirmed this.

From this perspective as disappointing as TCM+ACCORD slipping is moving it to 
5.1 makes sense and I am supporting of this - but I am worried if 5.1 is 
basically 5.0 + TCM/ACCORD and this slips again we draw ourselves into a corner 
where we can't release 5.2 before 5.1 or something. I would like some more 
elaboration on that.

I am also very worried about ANN vector search being in jeopardy for 5.0 which 
is an important feature for me to win some internal company bet 

My 2 cents,
German


From: Miklosovic, Stefan via dev 
Sent: Thursday, October 26, 2023 4:23 AM
To: dev@cassandra.apache.org 
Cc: Miklosovic, Stefan 
Subject: [EXTERNAL] Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut 
an immediate 5.1-alpha1)

What Maxim proposes in the last paragraph would be definitely helpful. Not for 
the project only but for a broader audience, companies etc., too.

Until this thread was started, my assumption was that "there will be 5.0 on 
summit with TCM and Accord and it somehow just happens". More transparent 
communication where we are at with high-profile CEPs like these and knowing if 
deadlines are going to be met would be welcome.

I don't want to be that guy and don't take me wrong here, but really, these 
CEPs are being developed, basically, by devs from two companies, which have 
developers who do not have any real need to explain themselves like what they 
do, regularly, to outsiders. (or maybe you do, you just don't have time?) I get 
that. But on the other hand, you can not realistically expect that other folks 
will have any visibility into what is going on there and that there is a delay 
on the horizon and so on.


From: Maxim Muzafarov 
Sent: Thursday, October 26, 2023 12:21
To: dev@cassandra.apache.org
Subject: Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 
5.1-alpha1)

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Personally, I think frequent releases (2-3 per year) are better than
infrequent big releases. I can understand all the concerns from a
marketing perspective, as smaller major releases may not shine as
brightly as a single "game changer" release. However, smaller
releases, especially if they don't have backwards compatibility
issues, are better for the engineering and SRE teams because if a
long-awaited feature is delayed for any reason, there should be no
worry about getting it in right into the next release.

An analogy here might be that if you miss your train (small release)
due to circumstances, you can wait right here for the next one, but if
you miss a flight (big release), you will go back home :-) This is why
I think that the 5.0, 5.1, 5.2, etc. are better and I support Mick's
plan with the caveat that we should release 5.1 when we think we are
ready to do so. Here is an example of the Postgres releases [1].

[1] 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbucardo.org%2Fpostgres_all_versions.html=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc811f6a430d1466acc3f08dbd61639c2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638339163187354112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=zjMpuN%2FQMhBtFTemLswn8BRaLyQ9eLZTIeZfeWYwhQk%3D=0


Another little thing that I'd like to mention is a release management
story. In the Apache Ignite project, we've got used to creating a
release thread and posting the release status updates and/or problems,
and/or delays there, and maybe some of the benchmarks at the end. Of
course, this was done by the release manager who volunteered to do
this work. I'm not saying we're doing anything wrong here, no, but the
publicity and openness, coupled with regular updates, could help
create a real sense of the remaining work in progress. These are my
personal feelings, and definitely not actions to be taken. The example
is here: [2].

[2] 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fm11m0nxq701f2cj8xxdcsc4nnn2sm8ql=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cc811f6a430d1466acc3f08dbd61639c2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638339163187360611%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2BG0wgMItsMv83XDLzRgbfJoi%2FiwSywWU0qAzN%2BmMBZU%3D=0

On Thu, 26 Oct 2023 at 11:15, Benjamin Lerer  wrote:
>>
>> Regarding the release of 5.1, I understood the proposal to be that we cut an 
>> actual alpha, 

Hacktoberfest?

2023-10-06 Thread German Eichberger via dev
All,

It's that time of year where the Hacktoberfest happens 
again:https://hacktoberfest.com/participation/#maintainers

We would need to mark our repo and also review the contributions timely so the 
participants can earn their T-Shirt (?). As Ekaterina pointed out October is 
also the month were we are crunched for time due to the release but I wanted to 
bring this up to the community anyway to discuss.

Thanks,
German
[https://www.bing.com/th?id=OVP.sZvFxTzWLuCCxgiYyScldQEsCW=Api]
Participation | Hacktoberfest 
2023
Hacktoberfest: a month-long celebration of open-source projects, their 
maintainers, and the entire community of contributors.
hacktoberfest.com

[https://hacktoberfest.com/_next/static/media/opengraph.e5fafe07.png]
Hacktoberfest 2023
Hacktoberfest: a month-long celebration of open-source projects, their 
maintainers, and the entire community of contributors.
hacktoberfest.com



Re: [VOTE] Accept java-driver

2023-10-03 Thread German Eichberger via dev
+1

Really excited about this as well.

From: Mick Semb Wever 
Sent: Tuesday, October 3, 2023 2:16 AM
Cc: dev 
Subject: [EXTERNAL] Re: [VOTE] Accept java-driver



The vote will be open for 72 hours (or longer). Votes by PMC members are 
considered binding. A vote passes if there are at least three binding +1s and 
no -1's.


+1


… we will request ASF Infra to move the datastax/java-driver as-is to 
apache/java-driver


I see now this will likely be instead apache/cassandra-java-driver


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread German Eichberger via dev
Super excited about this as well. Happy to help test with Azure and any other 
way needed.

Thanks,
German

From: guo Maxwell 
Sent: Wednesday, September 27, 2023 7:38 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias 
external storage locations

Thanks , So I think a jira can be created now. And I'd be happy to provide some 
help with this as well if needed.

Henrik Ingo mailto:henrik.i...@datastax.com>> 
于2023年9月28日周四 00:21写道:
It seems I was volunteered to rebase the Astra implementation of this 
functionality (FileSystemProvider) onto Cassandra trunk. (And publish it, of 
course) I'll try to get going today or tomorrow, so that this  discussion can 
then benefit from having that code available for inspection. And potentially 
using it as a soluttion to this use case.

On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani 
mailto:jak...@gmail.com>> wrote:
We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict 
mailto:bened...@apache.org>> wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg 
> mailto:ar...@weisberg.ws>> wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


--
http://twitter.com/tjake


--

[https://lh5.googleusercontent.com/UwlCp-Ixn21QzYv9oNnaGy0cKfFk1ukEBVKSv4V3-nQShsR-cib_VeSuNm4M_xZxyAzTTr0Et7MsQuTDhUGcmWQyfVP801Flif-SGT2x38lFRGkgoMUB4cot1DB9xd7Y0x2P0wJWA-gQ5k4rzytFSoLCP4wJntmJzhlqTuQQsOanCBHeejtSBcBry5v6kw]

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

[https://lh3.googleusercontent.com/T6MEp9neZySKd-eg-tkz96Yf4qG_Xsgu-IznDkdHfsHCjAnnHQP6OsPCdj8rsDvgKs-GJS6TA7Yx5HlK-zfRlE64j0zDpDG9cI29VaG948x5xLgUU4KKctaHNAhbpJ_pDwzRag9K7yCibGblB5Ix5z6Xj99Vc92V9nYSmR4HIj5F9T_TVI7ayW2n2_lp5Q]
  
[https://lh3.googleusercontent.com/Xrju2UthJiMtMS5jFknV8AhVO45tfhXSR6U0F8Qam1Mu2taE2SeVcl5ExaxU5l6pG0fHjv2b6vvUOe12WQldMqsOHknC7wQtBVYiX9ff3fLMtFAbjVRM0MGTKvPsjAcMI_FNvcIcuWIBP_zwRuh3b3g6hjHOW0ik9bDPuuYMvdLWIF8C8YgKDYQ-nV9dlQ]
    
[https://lh5.googleusercontent.com/OS41kMrzmJhmkvdmkHU-pq69Nzy1tOz36NIwGs61oz9cGj42TTggsXk58MY1Lqn5FyIK77jedKh3UN-1RMCgCqduMQeUNU5fVKjCBNvSOpp6NjBLZp-2NMypQnw7JoyPoeI_iXfygfzquE89GLoel7Tiq1Jtz6ueaaVA9goEhUn2rWIJMQ28DPrEj4xqfg]
 

Re: [EXTERNAL] Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-22 Thread German Eichberger via dev
+1 with taking it to legal

As anyone else I enjoy speculating about legal stuff and I think for jars you 
probably need possible deniability aka no paper trail that we knowingly... but 
that horse is out of the barn. So really interested in what legal says 

If you can stomach non Java here is an alternate DiskANN implementation: 
microsoft/DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and 
Filtered Approximate Nearest Neighbor Search 
(github.com)

Thanks,
German


From: Josh McKenzie 
Sent: Friday, September 22, 2023 7:43 AM
To: dev 
Subject: [EXTERNAL] Re: [DISCUSS] Add JVector as a dependency for CEP-30

I highly doubt liability works like that in all jurisdictions
That's a fantastic point. When speculating there, I overlooked the fact that 
there are literally dozens of legal jurisdictions in which this project is used 
and the foundation operates.

As a PMC let's take this to legal.

On Fri, Sep 22, 2023, at 9:16 AM, Jeff Jirsa wrote:
To do that, the cassandra PMC can open a legal JIRA and ask for a (durable, 
concrete) opinion.


On Fri, Sep 22, 2023 at 5:59 AM Benedict 
mailto:bened...@apache.org>> wrote:


  1.  my understanding is that with the former the liability rests on the 
provider of the lib to ensure it's in compliance with their claims to copyright

I highly doubt liability works like that in all jurisdictions, even if it might 
in some. I can even think of some historic cases related to Linux where patent 
trolls went after users of Linux, though I’m not sure where that got to and I 
don’t remember all the details.

But anyway, none of us are lawyers and we shouldn’t be depending on this kind 
of analysis. At minimum we should invite legal to proffer an opinion on whether 
dependencies are a valid loophole to the policy.



On 22 Sep 2023, at 13:48, J. D. Jordan 
mailto:jeremiah.jor...@gmail.com>> wrote:


This Gen AI generated code use thread should probably be its own mailing list 
DISCUSS thread?  It applies to all source code we take in, and accept copyright 
assignment of, not to jars we depend on and not only to vector related code 
contributions.

On Sep 22, 2023, at 7:29 AM, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

So if we're going to chat about GenAI on this thread here, 2 things:

  1.  A dependency we pull in != a code contribution (I am not a lawyer but my 
understanding is that with the former the liability rests on the provider of 
the lib to ensure it's in compliance with their claims to copyright and it's 
not sticky). Easier to transition to a different dep if there's something API 
compatible or similar.
  2.  With code contributions we take in, we take on some exposure in terms of 
copyright and infringement. git revert can be painful.

For this thread, here's an excerpt from the ASF policy:

a recommended practice when using generative AI tooling is to use tools with 
features that identify any included content that is similar to parts of the 
tool’s training data, as well as the license of that content.

Given the above, code generated in whole or in part using AI can be contributed 
if the contributor ensures that:

  1.  The terms and conditions of the generative AI tool do not place any 
restrictions on use of the output that would be inconsistent with the Open 
Source Definition (e.g., ChatGPT’s terms are inconsistent).
  2.
At least one of the following conditions is met:
 *   The output is not copyrightable subject matter (and would not be even 
if produced by a human)
 *   No third party materials are included in the output
 *   Any third party materials that are included in the output are being 
used with permission (e.g., under a compatible open source license) of the 
third party copyright holders and in compliance with the applicable license 
terms
  3.
A contributor obtain reasonable certainty that conditions 2.2 or 2.3 are met if 
the AI tool itself provides sufficient information about materials that may 
have been copied, or from code scanning results
 *   E.g. AWS CodeWhisperer recently added a feature that provides notice 
and attribution

When providing contributions authored using generative AI tooling, a 
recommended practice is for contributors to indicate the tooling used to create 
the contribution. This should be included as a token in the source control 
commit message, for example including the phrase “Generated-by

I think the real challenge right now is ensuring that the output from an LLM 
doesn't include a string of tokens that's identical to something in its input 
training dataset if it's trained on non-permissively licensed inputs. That plus 
the risk of, at least in the US, the courts landing on the side of saying that 
not only is the output of generative AI not copyrightable, but that there's 
legal liability on either the users of the tools or the creators of the models 
for some kind of copyright infringement. 

Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-20 Thread German Eichberger via dev
+1

I am biased because DiskANN is from Microsoft Research but it's  a good 
library/algorithm

From: Mike Adamson 
Sent: Wednesday, September 20, 2023 8:58 AM
To: dev 
Subject: [EXTERNAL] [DISCUSS] Add JVector as a dependency for CEP-30

You don't often get email from madam...@datastax.com. Learn why this is 
important
The original patch for CEP-30 brought several modified Lucene classes in-tree 
to implement the concurrent HNSW graph used by the vector index.

These classes are now being replaced with the io.github.jbellis.jvector 
library, which contains an improved diskANN implementation for the on-disk 
graph format.

The repo for this library is here: https://github.com/jbellis/jvector.

The library does not replace any code used by SAI or other parts of the 
codebase and is used solely by the vector index.

I would welcome any feedback on this change.
--
[DataStax Logo Square]   Mike Adamson
Engineering

+1 650 389 6000 | datastax.com
Find DataStax Online:   [LinkedIn Logo] 

[Facebook Logo] 

[Twitter Logo] [RSS Feed] 
[Github Logo] 




Re: [DISCUSS] Add Jepsen's Elle as a test dependency for Accord / Paxos

2023-09-15 Thread German Eichberger via dev
+1

From: David Capwell 
Sent: Wednesday, September 13, 2023 3:44 PM
To: dev 
Subject: [EXTERNAL] [DISCUSS] Add Jepsen's Elle as a test dependency for Accord 
/ Paxos

For validation of Paxos and Accord 2 different consistency verifiers were 
created: accord.verify.StrictSerializabilityVerifier (Accord), and 
org.apache.cassandra.simulator.paxos.LinearizabilityValidator (Paxos).  To 
increase confidence in both protocols it would be good to use an external 
consistency checker, such as Jepsen's Elle.

This work was first started in 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fthe-asf.slack.com%2Farchives%2FC0459N9R5C6%2Fp1692192925909199=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cf5c312fa94234393aac808dbb4ab03e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638302418834282902%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=3MlXVG0bNQdj027o2E4fNTwOTygbPSkvhMzv7t4qRiU%3D=0
 by Jarek, and would be good to get as part of our automation.


https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjepsen-io%2Felle%2Fblob%2Fmain%2FLICENSE=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cf5c312fa94234393aac808dbb4ab03e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638302418834282902%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=lFu8RVB4FqoJ59FM5sqC3nTUYbgFsHZcDB6L2jgBgFM%3D=0
 - Eclipse Public License 2.0




Re: [VOTE] Release Apache Cassandra 5.0-alpha1

2023-08-25 Thread German Eichberger via dev
I concur. Those are major features...

From: C. Scott Andreas 
Sent: Friday, August 25, 2023 9:06 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 5.0-alpha1

You don't often get email from sc...@paradoxica.net. Learn why this is 
important
A snapshot artifact seems more appropriate for early testing to me, rather than 
a voted / released build issued by the project given how much has yet to land.

- Scott

On Aug 25, 2023, at 8:46 AM, Ekaterina Dimitrova  wrote:


+1

On Fri, 25 Aug 2023 at 11:14, Mick Semb Wever 
mailto:m...@apache.org>> wrote:

Proposing the test build of Cassandra 5.0-alpha1 for release.

DISCLAIMER, this alpha release does not contain the expected 5.0
features: Vector Search (CEP-30), Transactional Cluster Metadata
(CEP-21) and Accord Transactions (CEP-15).  These features will land
in a later alpha release.

Please also note that this is an alpha release and what that means, further 
info at https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

sha1: 62cb03cc7311384db6619a102d1da6a024653fa6
Git: https://github.com/apache/cassandra/tree/5.0-alpha1-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1314/org/apache/cassandra/cassandra-all/5.0-alpha1/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha1/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/CHANGES.txt
[2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/NEWS.txt


Re: [EXTERNAL] Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-11 Thread German Eichberger via dev
I second Brandon. There is a group of people who expect to ssh into a node and 
then be able to run the "right" cqlsh instead of dealing with different cqlsh 
versions on their workstation/laptop...

German

From: Brandon Williams 
Sent: Friday, August 11, 2023 7:29 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

On Fri, Aug 11, 2023 at 2:13 AM Miklosovic, Stefan
 wrote:
>
> If we had an official PIP package, I can imagine that we would not ship CQLSH 
> in RPM at all (maybe not in DEB either?) so we would decouple this. A PIP 
> package is installable almost anywhere (if it is Python 3, that is the way 
> how I solved the problem in 18642, I just installed a PIP package because RPM 
> installation was broken).

I don't think we want to ship the database without any way to interact
with it.  Pip may also not be accessible to all installations.

> Another problem I see is that how do we say what CQLSH is compatible with 
> what Cassandra release? If we shipped CQLSH as a PIP package as part of the 
> tarball, we would guarantee that they play together. If it is living 
> somewhere online, how can be people sure that what they install is compatible 
> with Cassandra they run? I am sorry if this was already explained somewhere.

It is only guaranteed to work with the version it shipped with. It may
work with other versions, or it may have subtle issues that you could
spend a lot of time trying to figure out, as German and I recently
discovered.  To that end, I've created CASSANDRA-18745 so at least you
will know to expect problems when the versions don't match.


Re: [Discuss] Repair inside C*

2023-07-25 Thread German Eichberger via dev
In [2] we suggested that the next step should be a CEP.

I am happy to lend a hand to this effort as well.

Thanks Jaydeep and David - really appreciated.

German


From: David Capwell 
Sent: Tuesday, July 25, 2023 8:32 AM
To: dev 
Cc: German Eichberger 
Subject: [EXTERNAL] Re: [Discuss] Repair inside C*

As someone who has done a lot of work trying to make repair stable, I approve 
of this message ^_^

More than glad to help mentor this work

On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia  
wrote:

To clarify the repair solution timing, the one we have listed in the article is 
not the recently developed one. We were hitting some high-priority production 
challenges back in early 2018, and to address that, we developed and rolled out 
the solution in production in just a few months. The timing-wise, the solution 
was developed and productized by Q3 2018, of course, continued to evolve 
thereafter. Usually, we explore the existing solutions we can leverage, but 
when we started our journey in early 2018, most of the solutions were based on 
sidecar solutions. There is nothing against the sidecar solution; it was just a 
pure business decision, and in that, we wanted to avoid the sidecar to avoid a 
dependency on the control plane. Every solution developed has its deep context, 
merits, and pros and cons; they are all great solutions!

An appeal to the community members is to think one more time about having 
repairs in the Open Source Cassandra itself. As mentioned in my previous email, 
any solution getting adopted is fine; the important aspect is to have a repair 
solution in the OSS Cassandra itself!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:
Hi German,

The goal is always to backport our learnings back to the community. For 
example, I have already successfully backported the following two 
enhancements/bug fixes back to the Open Source Cassandra, which are described 
in the article. I am already currently working on open-source a few more 
enhancements mentioned in the article back to the open-source.

  1.  https://issues.apache.org/jira/browse/CASSANDRA-18555
  2.  https://issues.apache.org/jira/browse/CASSANDRA-13740

There is definitely heavy interest in having the repair solution inside the 
Open Source Cassandra itself, very much like Compaction. As I write this email, 
we are internally working on a one-pager proposal doc to all the community 
members on having a repair inside the OSS Apache Cassandra along with our 
private fork - I will share it soon.

Generally, we are ok with any solution getting adopted (either Joey's solution 
or our repair solution or any other solution). The primary motivation is to 
have the repair embedded inside the open-source Cassandra itself, so we can 
retire all various privately developed solutions eventually :)

I am also happy to help (drive conversation, discussion, etc.) in any way to 
have a repair solution adopted inside Cassandra itself, please let me know. 
Happy to help!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev 
mailto:dev@cassandra.apache.org>> wrote:
All,

We had a brief discussion in [2] about the Uber article [1] where they talk 
about having integrated repair into Cassandra and how great that is. I 
expressed my disappointment that they didn't work with the community on that 
(Uber, if you are listening time to make amends ) and it turns out Joey 
already had the idea and wrote the code [3] - so I wanted to start a discussion 
to gauge interest and maybe how to revive that effort.

Thanks,
German

[1] https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
[2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
[3] https://issues.apache.org/jira/browse/CASSANDRA-14346



[Discuss] Repair inside C*

2023-07-24 Thread German Eichberger via dev
All,

We had a brief discussion in [2] about the Uber article [1] where they talk 
about having integrated repair into Cassandra and how great that is. I 
expressed my disappointment that they didn't work with the community on that 
(Uber, if you are listening time to make amends ) and it turns out Joey 
already had the idea and wrote the code [3] - so I wanted to start a discussion 
to gauge interest and maybe how to revive that effort.

Thanks,
German

[1] https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
[2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
[3] https://issues.apache.org/jira/browse/CASSANDRA-14346


Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-20 Thread German Eichberger via dev
In general I agree with Joey -- but I would prefer if this behavior is 
configurable, e.g. there is an option to get a startup failure if the 
configured fastest provider can't run for any reason to avoid a "silent" 
performance degradation as Jordan was experiencing.

Thanks,
German


From: Joseph Lynch 
Sent: Thursday, July 20, 2023 7:38 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default

Having native dependencies shouldn't make the project x86 only, it
should just accelerate the performance on x86 when available. Can't we
just try to load the fastest available provider (so arm will use
native java but x86 will use proper hardware acceleration) and failing
that fall-back to the default? If I recall correctly from the
messaging service patches (and zstd/lz4) it's reasonably
straightforward to try to load native code and then fail-back if you
fail.

-Joey

On Thu, Jul 20, 2023 at 10:27 AM J. D. Jordan  wrote:
>
> Maybe we could start providing Dockerfile’s and/or make arch specific rpm/deb 
> packages that have everything setup correctly per architecture?
> We could also download them all and have the startup scripts put stuff in the 
> right places depending on the arch of the machine running them?
> I feel like there are probably multiple ways we could solve this without 
> requiring users to jump through a bunch of hoops?
> But I do agree we can’t make the project x86 only.
>
> -Jeremiah
>
> > On Jul 20, 2023, at 2:01 AM, Miklosovic, Stefan 
> >  wrote:
> >
> > Hi,
> >
> > as I was reviewing the patch for this feature (1), we realized that it is 
> > not quite easy to bundle this directly into Cassandra.
> >
> > The problem is that this was supposed to be introduced as a new dependency:
> >
> > 
> >software.amazon.cryptools
> >AmazonCorrettoCryptoProvider
> >2.2.0
> >linux-x86_64
> > 
> >
> > Notice "classifier". That means that if we introduced this dependency into 
> > the project, what about ARM users? (there is corresponding aarch classifier 
> > as well). ACCP is platform-specific but we have to ship Cassandra 
> > platform-agnostic. It just needs to run OOTB everywhere. If we shipped that 
> > with x86 and a user runs Cassandra on ARM, I guess that would break things, 
> > right?
> >
> > We also can not just add both dependencies (both x86 and aarch) because how 
> > would we differentiate between them in runtime? That all is just too tricky 
> > / error prone.
> >
> > So, the approach we want to take is this:
> >
> > 1) nothing will be bundled in Cassandra by default
> > 2) a user is supposed to download the library and put it to the class path
> > 3) a user is supposed to put the implementation of ICryptoProvider 
> > interface Cassandra exposes to the class path
> > 3) a user is supposed to configure cassandra.yaml and its section 
> > "crypto_provider" to reference the implementation he wants
> >
> > That way, we avoid the situation when somebody runs x86 lib on ARM or vice 
> > versa.
> >
> > By default, NoOpProvider will be used, that means that the default crypto 
> > provider from JRE will be used.
> >
> > It can seem like we have not done too much progress here but hey ... we 
> > opened the project to the custom implementations of crypto providers a 
> > community can create. E.g. as 3rd party extensions etc ...
> >
> > I want to be sure that everybody is aware of this change (that we plan to 
> > do that in such a way that it will not be "bundled") and that everybody is 
> > on board with this. Otherwise I am all ears about how to do that 
> > differently.
> >
> > (1) 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-18624=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cf4530a41df3b419fd2ff08db892f0ed6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638254607439254753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=kYGSZGi3caINvm%2FDT4ms3%2BrcnMTxg0E921cMjmUvHQw%3D=0<https://issues.apache.org/jira/browse/CASSANDRA-18624>
> >
> > 
> > From: German Eichberger via dev 
> > Sent: Friday, June 23, 2023 22:43
> > To: dev
> > Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> >
> > NetApp Security WARNING: This is an external email. Do not click links or 
> > open attachments unless you recognize the sender and know the content is 
> > safe.
> >

Re: Changing the output of tooling between majors

2023-07-14 Thread German Eichberger via dev
+1 to always version the output format


From: Dinesh Joshi 
Sent: Thursday, July 13, 2023 3:36 PM
To: dev 
Subject: [EXTERNAL] Re: Changing the output of tooling between majors

This adds maintenance overhead but is a potential alternative. I would only 
flip the flag. I would prefer to make the default "legacy" output and innovate 
behind a "--output-format=v2" flag. That way tools do not break or have to 
change to pass in the new flag.

Ideally we should always version our output format - structured or not.

Dinesh

On Jul 13, 2023, at 9:08 AM, German Eichberger via dev 
 wrote:

Let's take this discussion in a different direction: If we add a --legacy 
​ argument where we are supporting an old version for those who 
need/want it but have the (breaking) changes on the default this feels like a 
compromise - and then we can deprecate the legacy format without impacting 
innovation. We can also flip this with requiring a flag for the changed format 
if we feel this is better.

This let's us innovate without breaking anyone. Thoughts?

Thanks,
German


Re: [Discuss] CQLSH confusion

2023-07-13 Thread German Eichberger via dev
Forgot the references:
[1] https://the-asf.slack.com/archives/CJZLTM05A/p1686771286554899
[2] https://issues.apache.org/jira/browse/CASSANDRA-18666

From: German Eichberger via dev 
Sent: Thursday, July 13, 2023 10:14 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] [Discuss] CQLSH confusion

All,

I am working with clusters with different Cassandra versions and have been 
using some cqlsh which "just worked". Recently I wanted to use virtual tables 
and ran into [1]. After that I filed [2].

Brandon states that "do not use a cqlsh that is from a different version than 
what is distributed with the server" since I have no idea what other 
incompatibilities like this there are, compatibility of that kind has never 
been a goal."

I would like to open the discussion if this is what we want: cqlsh needs to be 
in lockstep with the C* version.

Assuming, this is how things should be, I would propose to change the cqlsh 
versioning to be in line with the C* versioning. Right now I am using cqlsh 
6.0.1 and I have no idea to which C* version that translates to. Aligning 
versions would make this much easier.

Thanks,
German


[Discuss] CQLSH confusion

2023-07-13 Thread German Eichberger via dev
All,

I am working with clusters with different Cassandra versions and have been 
using some cqlsh which "just worked". Recently I wanted to use virtual tables 
and ran into [1]. After that I filed [2].

Brandon states that "do not use a cqlsh that is from a different version than 
what is distributed with the server" since I have no idea what other 
incompatibilities like this there are, compatibility of that kind has never 
been a goal."

I would like to open the discussion if this is what we want: cqlsh needs to be 
in lockstep with the C* version.

Assuming, this is how things should be, I would propose to change the cqlsh 
versioning to be in line with the C* versioning. Right now I am using cqlsh 
6.0.1 and I have no idea to which C* version that translates to. Aligning 
versions would make this much easier.

Thanks,
German


Re: Changing the output of tooling between majors

2023-07-13 Thread German Eichberger via dev
Let's take this discussion in a different direction: If we add a --legacy 
​ argument where we are supporting an old version for those who 
need/want it but have the (breaking) changes on the default this feels like a 
compromise - and then we can deprecate the legacy format without impacting 
innovation. We can also flip this with requiring a flag for the changed format 
if we feel this is better.

This let's us innovate without breaking anyone. Thoughts?

Thanks,
German


From: Miklosovic, Stefan 
Sent: Thursday, July 13, 2023 8:20 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: Changing the output of tooling between majors

"Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to naming/meaning/"

That is 100% correct. So by that logic, changing the output which you grep on 
to something else will break your scripts if you expect it there.

For example, take sstablemetadata command - I know it is not nodetool but it 
does not matter. This is just an example. Same "problem" can be found in 
nodetool probably, sstablemetadata just came to my mind first as that is what I 
hit recently.

sstablemetadata write this:

Repaired at: 0
Originating host id: d2d12c56-7d9c-49a7-aaef-05bd2633b09e
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1689261027905, 
position=59450)=CommitLogPosition(segmentId=1689261027905, position=60508)}
totalColumnsSet: 0
totalRows: 1
Estimated tombstone drop times:


Do you see "totalColumsSet" and "totalRows" when all other keys in that ouput 
(in whole command) are following different format? In this case, it should be 
"Total columns set" and "Total rows".

So when we change it to that, anybody who is grepping "totalRows" will have no 
output. That is a breaking change to me. His script stopped to work.

You are correct and I agree with you completely that STRICT ADDITIONS (what I 
was suggesting) are fine because we are not breaking anything to anybody.

So here, if I want to change this, by what Dinesh says, (we change the naming 
and we break it), I need to offer JSON / YAML alternative to what 
sstablemetadata prints currently. (might be as well nodetool, just an example).


From: C. Scott Andreas 
Sent: Thursday, July 13, 2023 17:01
To: dev@cassandra.apache.org
Cc: dev@cassandra.apache.org
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Dinesh's message cautions against making "breaking" changes that are likely to 
break parsing of output by current users (e.g., changes to 
naming/meaning/position of existing fields vs. adding new ones). I don't read 
his message as saying that any change to nodetool output is conditional on 
offering a JSON/YAML representation, though.

What are some changes that you'd like to make?

– Scott

On Jul 13, 2023, at 7:44 AM, "Miklosovic, Stefan" 
 wrote:


For example Dinesh said this:

"Until nodetool can support JSON as output format for all interaction and there 
is a significant adoption in the user community, I would strongly advise 
against making breaking changes to the CLI output."

That is where I get the need to have a JSON output in order to fix a typo from. 
That is if we look at fixing a typo as a breaking change. Which I would say it 
is as if somebody is "greping" it and it is not there, it will break.

Do you understand that the same way or am I interpreting that wrong?


From: C. Scott Andreas 
Sent: Thursday, July 13, 2023 16:35
To: dev@cassandra.apache.org
Cc: dev
Subject: Re: Changing the output of tooling between majors

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



"From what I see you guys want to condition any change by offering json/yaml as 
well."

I don't think I've seen a proposal to block changes to nodetool output on 
machine-parseable formats in this thread.

Additions of new delimited fields to nodetool output are mostly 
straightforward. Changes to fields that exist today are likely to cause 
problems - as Josh mentions. These seem best to take on a case-by-case basis 
rather than trying to hammer out an abstract policy. What changes would you 
like to make?

I do think we will have difficulty evolving output formats of text-based 
Cassandra tooling until we offer machine-parseable output formats.

– Scott

On Jul 13, 2023, at 6:39 AM, Josh McKenzie  wrote:


I just find it ridiculous we can not change "someProperty: 10" to "Some 
Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a 

Re: [DISCUSS] Conducting a User Survey

2023-07-11 Thread German Eichberger via dev
Same. Great idea. How ill the results be published?

Thanks,
German

From: C. Scott Andreas 
Sent: Tuesday, July 11, 2023 7:41 AM
To: dev@cassandra.apache.org 
Cc: dev@cassandra.apache.org ; 
market...@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Conducting a User Survey

Thanks Patrick. I like the idea of a user survey.

Added a handful of comments in the doc. 

– Scott

On Jul 11, 2023, at 12:51 AM, Mick Semb Wever  wrote:


Looks good to me, thanks Patrick.

On Tue, 11 Jul 2023 at 03:11, Patrick McFadin  wrote:

For quite a few years, I have done Twitter polls to gather helpful information 
about how people use Apache Cassandra. Twitter is no longer the best place to 
conduct this kind of activity since it has become a ghost town.

We should ask more comprehensive questions to get the pulse of our user 
community. I want to do a simple Google Form survey that we can promote on 
every channel for a few weeks. I'll anonymize the results and post them on 
cassandra.apache.org.

Here are the proposed questions I have compiled. A pretty basic set of 
questions, but it would be fun to know the answer to several of these: 
https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharing

Comments are open to all. Please let me know what you think.

Patrick




Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-10 Thread German Eichberger via dev
Same - really appreciate those efforts and also welcome the upstreaming and 
release automation...

German

From: Jeff Widman 
Sent: Sunday, July 9, 2023 1:44 PM
To: Max C. 
Cc: dev@cassandra.apache.org ; Brad Schoening 

Subject: [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as 
part of the release process

You don't often get email from j...@jeffwidman.com. Learn why this is 
important
Thanks Max, always encouraging to hear that the time I spend on open source is 
helping others.

Your use case is very similar to what drove my original desire to get involved 
with the project. Being able to `pip install cqlsh` from a dev machine was so 
much lighter weight than the alternatives.

Anyone else care to weigh in on this?

What are the next steps to move to a decision?

Cheers,
Jeff

On Sat, Jul 8, 2023, 7:23 PM Max C. 
mailto:mc_cassand...@core43.com>> wrote:

As a user, I really appreciate your efforts Jeff & Brad.  I would *love* for 
the C* project to officially support this.

In our environment we have a lot of client machines that all share common NFS 
mounted directories.  It's much easier for us to create a Python virtual 
environment on a file server with the cqlsh PyPI package installed than it is 
to install the Cassandra RPMs on every single machine.  Before I discovered 
your PyPI package, our developers would need to login to  a Cassandra node in 
order to run cqlsh.  The cqlsh PyPI package, however, is in our standard 
"python dev tools" virtual environment -- along with Ansible, black, isort and 
various other Python packages; which means it's accessible to everyone, 
everywhere.

I agree that this should not replace packaging cqlsh in the Cassandra RPM, so 
much provide an additional option for installing cqlsh without the baggage of 
installing the full Cassandra package.

Thanks again for your work Jeff & Brad.

- Max

On 7/6/2023 5:55 PM, Jeff Widman wrote:
Myself and Brad Schoening currently maintain https://pypi.org/project/cqlsh/ 
which repackages CQLSH that ships with every Cassandra release.

This way:

  *   anyone who wants a lightweight client to talk to a remote cassandra can 
simply `pip install cqlsh` without having to download the full cassandra 
source, unzip it, etc.
  *   it's very easy for folks to use it as scaffolding in their python 
scripts/tooling since they can simply include it in the list of their required 
dependencies.

We currently handle the packaging by waiting for a release, then manually 
copy/pasting the code out of the cassandra source tree into 
https://github.com/jeffwidman/cqlsh which has some additional build/python 
package configuration files, then using standard python tooling to publish to 
PyPI.

Given that our project is simply a build/packaging project, I wanted to start a 
conversation about upstreaming this into core Cassandra. I realize that 
Cassandra has no interest in maintaining lots of build targets... but given 
that cqlsh is written in Python and publishing to PyPI enables DBA's to share 
more complicated tooling built on top of it this seems like a natural fit for 
core cassandra rather than a standalone project.

Goal:
When a Cassandra release happens, the build/release process automatically 
publishes cqlsh to https://pypi.org/project/cqlsh/.

Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There was 
some initial chatter about that in 
https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a lot of 
complexity, and I'm honestly not sure it's a great idea. Even if folks later 
want to go that route, the first hurdle is publishing to PyPI, so for now let's 
keep the scope of the discussion limited to treating PyPI purely as a release 
target, and not as an ingredient to a release.

>From an implementation perspective, this should be very straightforward. We 
>don't have any differences from the CQLSH source that's in cassandra, instead 
>we point folks to make changes to cqlsh in the Cassandra source. In fact we've 
>made multiple contributions back to `cqlsh` ourselves and have drastically 
>cleaned up the code: 
>https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening=pullrequests.
> So the only real change is adding the package config files and the build / 
>release pipeline.

We realize the Cassandra team isn't python/PyPI experts, so we'd be more than 
happy to help wire this up and maintain it. I am also a maintainer of kazoo and 
kafka-python which are both popular python clients for other distributed 
databases. So I'm very familiar with open source, python, and distributed 
databases.

My one hesitation around this discussion is that I'm a little concerned that we 
might lose the nimbleness we've currently got from having a separate project. 
Ie, if something is screwed up on PyPI / the build process, we can quickly get 
it fixed and get a new 

Re: [DISCUSS] Using ACCP or tc-native by default

2023-06-23 Thread German Eichberger via dev
+1 to ACCP - we love performance.

From: David Capwell 
Sent: Thursday, June 22, 2023 4:21 PM
To: dev 
Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default

+1 to ACCP

On Jun 22, 2023, at 3:05 PM, C. Scott Andreas  wrote:

+1 for ACCP and can attest to its results. ACCP also optimizes for a range of 
hash functions and other cryptographic primitives beyond TLS acceleration for 
Netty.

On Jun 22, 2023, at 2:07 PM, Jeff Jirsa  wrote:


Either would be better than today.

On Thu, Jun 22, 2023 at 1:57 PM Jordan West 
mailto:jw...@apache.org>> wrote:
Hi,

I’m wondering if there is appetite to change the default SSL provider for 
Cassandra going forward to either ACCP [1] or tc-native in Netty? Our 
deployment as well as others I’m aware of make this change in their fork and it 
can lead to significant performance improvement. When recently qualifying 4.1 
without using ACCP (by accident) we noticed p99 latencies were 2x higher than 
3.0 w/ ACCP. Wiring up ACCP can be a bit of a pain and also requires some 
amount of customization. I think it could be great for the wider community to 
adopt it.

The biggest hurdle I foresee is licensing but ACCP is Apache 2.0 licensed. 
Anything else I am missing before opening a JIRA and submitting a patch?

Jordan


[1]
https://github.com/corretto/amazon-corretto-crypto-provider




Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread German Eichberger via dev
+ 1

Great to see this moving forward!

From: Abe Ratnofsky 
Sent: Tuesday, June 13, 2023 10:09 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [VOTE] CEP-8 Datastax Drivers Donation

+1 (nb)

On Jun 13, 2023, at 09:23, Andrés de la Peña  wrote:


+1

On Tue, 13 Jun 2023 at 16:40, Yifan Cai 
mailto:yc25c...@gmail.com>> wrote:
+1

From: David Capwell mailto:dcapw...@apple.com>>
Sent: Tuesday, June 13, 2023 8:37:10 AM
To: dev mailto:dev@cassandra.apache.org>>
Subject: Re: [VOTE] CEP-8 Datastax Drivers Donation

+1

On Jun 13, 2023, at 7:59 AM, Josh McKenzie 
mailto:jmcken...@apache.org>> wrote:

+1

On Tue, Jun 13, 2023, at 10:55 AM, Jeremiah Jordan wrote:
+1 nb

On Jun 13, 2023 at 9:14:35 AM, Jeremy Hanna 
mailto:jeremy.hanna1...@gmail.com>> wrote:

Calling for a vote on CEP-8 [1].

To clarify the intent, as Benjamin said in the discussion thread [2], the goal 
of this vote is simply to ensure that the community is in favor of the 
donation. Nothing more.
The plan is to introduce the drivers, one by one. Each driver donation will 
need to be accepted first by the PMC members, as it is the case for any 
donation. Therefore the PMC should have full control on the pace at which new 
drivers are accepted.

If this vote passes, we can start this process for the Java driver under the 
direction of the PMC.

Jeremy

1. 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp



Re: Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread German Eichberger via dev
+ 1

I am seeing ANN Vector Search pop up in every database...

From: Patrick McFadin 
Sent: Thursday, May 25, 2023 11:29 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [VOTE] CEP-30 ANN Vector Search

+1
Love the buzz this creating with new users. Thanks for the work on this 
Jonathan.

On Thu, May 25, 2023 at 8:45 AM Jonathan Ellis 
mailto:jbel...@gmail.com>> wrote:
Let's make this official.

CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

POC that demonstrates all the big rocks, including distributed queries: 
https://github.com/datastax/cassandra/tree/cep-vsearch

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-14 Thread German Eichberger via dev
All,

What does it mean to be OpenSource? For me the community is 
developers/maintainers who work on Cassandra, operators who run Cassandra, and 
developers who write applications which use Cassandra. We all need to work 
together to make Cassandra successful - and we need to listen to each other to 
make the project successful.

It's apparent that a sizable number of people haven't migrated from 3.11 to 4.x 
- this might be because the EOL announcement has been confusing and what EOL 
means is fuzzy. Does the project still fix CVEs, will there be infrastructure 
if someone wants to fix something, etc.  So at a minimum I would expect 
documentation and agreement around those things.

If you look at Ubuntu and Java they distinguish between LTS releases and normal 
releases - but they are also doing this for a long time. The quicker release 
cycle (a new release every year) is sort of new-ish and hasn't been digested by 
all operators and users. So given 3.11 only extra support for a limited time to 
aid the transition like OpenJDK is doing for Java 8 might be prudent - Mick 
raises a valid point that if we go out and say "this is the new EOL, but this 
time we mean it" might encourage people to hope for another extension. I have 
no good answer other than communicate harder and more clearly - the status quo 
lacks clarity which is worse.

The other point Mick raises which releases to support gets to another 
discussion: As of today operators need to upgrade every two years and (also 
jump versions) aka I would need to go 3.11->4.1 right when it came out to get 
the full two year "support". I might feel uncomfortable going to a release 
which has just been released so realistically I need to update in between one 
and two years - give or take. This raises the question if we should dedicate 
some versions as LTS releases meaning they get longer support. Five years is 
common but that is also up for discussion. As an added benefit if there are 
commercial entities wanting to offer paid support they could focus on the LTS 
releases and bundle resources for the upstream support.

This is a good discussion and I feel especially the implied CVE support needs 
to be more formalized.

Thanks for indulging me,
German


From: Jacek Lewandowski 
Sent: Thursday, April 13, 2023 11:23 PM
To: dev@cassandra.apache.org 
Subject: Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

To me, as this is an open source project, we, the community, do not have to do 
anything, we can, but we are not obliged to, and we usually do that because we 
want to :-)

To me, EOL means that we move focus to newer releases. Not that we are 
forbidden to do anything in the older ones. One formal point though is the 
machinery - as long as we have the machinery to test and release, that's all we 
need. However, in face of coming changes in testing, I suppose some extra 
effort will have to be done to support older versions. Finding people who want 
to help out with that could be a kind of validation whether that effort is 
justified.

btw. We have recently agreed to keep support for M sstables format (3.0 - 3.11).

thanks,
- - -- --- -  -
Jacek Lewandowski


czw., 13 kwi 2023 o 21:59 Mick Semb Wever 
mailto:m...@apache.org>> napisał(a):
Yes, this would be great. Right now users are confused what EOL means and what 
they can expect.


I think the project would need to land on an agreed position.  I tried to find 
any reference to my earlier statement around CVEs on the latest unmaintained 
branch but could not find it (I'm sure it was mentioned somewhere :(

How many past branches?  All CVEs?  What if CVEs are in dependencies?
And is this a slippery slope, will such a formalised and documented commitment 
lead to more users on EOL versions? (see below)
How do other committers feel about this?


I am also asking specifically for 3.11 since this release has been around so 
long that it might warrant longer support than what we would offer for 4.0.


This logic can also be the other way around :-)

We should be sending a clear signal that OSS users are expected to perform a 
major upgrade every ~two years.  Vendors can, and are welcome to solve this, 
but the project itself does not support any user's production system, it only 
maintains code branches and performs releases off them, with our focus on 
quality solely on those maintained branches.



Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-13 Thread German Eichberger via dev
Josh,





We already have an understanding and precedence in place that CVEs on
the previous unmaintained branch are addressed and released.
Correct me if I'm wrong German, but the question I got from your email was 
effectively "If we  consider formalizing our commitment to fixing CVE's on 
older branches that are out of formal bugfix support as a community, what are 
the benefits and costs to doing that"?

Yes, this would be great. Right now users are confused what EOL means and what 
they can expect.

I am also asking specifically for 3.11 since this release has been around so 
long that it might warrant longer support than what we would offer for 4.0.

On Thu, Apr 13, 2023, at 2:47 PM, Mick Semb Wever wrote:
>
> There have been several discussions on slack [1], [2] to support 3.11 beyond 
> the date stated on the web [3] which is May-July 23 and given it's April 
> that's an unlikely date.
>


Strictly speaking it is maintained until the 5.0 GA release. We should
update the downloads page accordingly.


>
> So we will support anyway but I would like to start a broader discussion if 
> we, the community, are interested in at a minimum CVE only support, maybe bug 
> fixes as well,  after 5.0 is released for 3.11 and if so for how long - 
> something like a Cassandra LTS policy.
>



The community's resources are limited, and the statement is intended
to avoid tying up resources and to avoid letting users down. This is
open source and "to upgrade" is often our easy and pragmatic answer.

It is not a statement that fixes to older branches will be rejected. A
(two) committers can still push to older branches, and a release can
still happen if you find someone to do it (and three PMCs to +1 it).
This is why the 2.2 branch is still present on ci-cassandra.a.o. If
vendors want to provide support for versions longer and can make the
commitment to upstream those efforts (whether that's bug-fixes and
releases, or only bug-fixes) the machinery is in place to accept it.

We already have an understanding and precedence in place that CVEs on
the previous unmaintained branch are addressed and released.




Re: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT

2023-04-13 Thread German Eichberger via dev
+1

(Recently learned anyone can vote so using my new discovered powers)

From: Josh McKenzie 
Sent: Thursday, April 13, 2023 6:59 AM
To: dev 
Subject: [EXTERNAL] Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT

+1

On Thu, Apr 13, 2023, at 3:17 AM, Benjamin Lerer wrote:
+1

Le jeu. 13 avr. 2023 à 08:56, Tommy Stendahl via dev 
mailto:dev@cassandra.apache.org>> a écrit :
+1 (nb)

-Original Message-
From: Brandon Williams 
mailto:brandon%20williams%20%3cdri...@gmail.com%3e>>
Reply-To: dev@cassandra.apache.org
To: dev@cassandra.apache.org
Subject: Re: [VOTE] Release Apache Cassandra 4.0.9 - SECOND ATTEMPT
Date: Tue, 11 Apr 2023 05:30:59 -0500


+1



On Tue, Apr 11, 2023 at 2:54 AM Miklosovic, Stefan

<

stefan.mikloso...@netapp.com

> wrote:


Lets just vote on that straight away. Nothing significant has changed except 
zstd-jni update to 1.5.5. If all goes well it would be nice to have the vote 
resolved by this Friday's noon UTC.


Proposing the test build of Cassandra 4.0.9 for release.


sha1: e9f8f2efa2ba75f223f31ca6801aff3fe2964745

Git:

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.9-tentative


Maven Artifacts:

https://repository.apache.org/content/repositories/orgapachecassandra-1286/org/apache/cassandra/cassandra-all/4.0.9/



The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here:

https://dist.apache.org/repos/dist/dev/cassandra/4.0.9/



The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.


[1]: CHANGES.txt:

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.9-tentative


[2]: NEWS.txt:

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.9-tentative


(CVE only) support for 3,11 beyond published EOL

2023-04-13 Thread German Eichberger via dev
All,

There have been several discussions on slack [1], [2] to support 3.11 beyond 
the date stated on the web [3] which is May-July 23 and given it's April that's 
an unlikely date.

Given that there are still a sizable number of users on 3.11 in [2] we talked 
about a CVE only support for some time. When we discussed that internally at 
Azure we entertained supporting until Java 8 is EOL but with that now being 
12/31/2030 [4] we quickly gave up on that and are now thinking a shorter time. 
So we will support anyway but I would like to start a broader discussion if we, 
the community, are interested in at a minimum CVE only support, maybe bug fixes 
as well,  after 5.0 is released for 3.11 and if so for how long - something 
like a Cassandra LTS policy.

Thanks,
German




[1] https://the-asf.slack.com/archives/CJZLTM05A/p1678451091766109
[2] https://the-asf.slack.com/archives/CK23JSY2K/p1680125394038599
[3] https://cassandra.apache.org/_/download.html
[4] https://endoflife.date/java


Re: [EXTERNAL] Re: Welcome our next PMC Chair Josh McKenzie

2023-03-31 Thread German Eichberger via dev
Thanks Mick - working with you has been a blast an I hop we can continue!

Welcome and congrats Josh!!

From: Dinesh Joshi 
Sent: Friday, March 24, 2023 8:31 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: Welcome our next PMC Chair Josh McKenzie

Thank you Mick for all the work you did!

Welcome Josh and congratulations!

On 3/23/23 01:22, Mick Semb Wever wrote:
> It is time to pass the baton on, and on behalf of the Apache Cassandra
> Project Management Committee (PMC) I would like to welcome and
> congratulate our next PMC Chair Josh McKenzie (jmckenzie).
>
> Most of you already know Josh, especially through his regular and
> valuable project oversight and status emails, always presenting a
> balance and understanding to the various views and concerns incoming.
>
> Repeating Paulo's words from last year: The chair is an administrative
> position that interfaces with the Apache Software Foundation Board, by
> submitting regular reports about project status and health. Read more
> about the PMC chair role on Apache projects:
> - 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Ffoundation%2Fhow-it-works.html%23pmc=05%7C01%7CGerman.Eichberger%40microsoft.com%7C7c2a9fb10a8d45d6b31208db2c7cf93c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638152687493431619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=NbWt9T%2FY6EADpyJS7Zv%2FZeTybXhubFlDR1ITdb3%2BGnM%3D=0
> >
> - 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Ffoundation%2Fhow-it-works.html%23pmc-chair=05%7C01%7CGerman.Eichberger%40microsoft.com%7C7c2a9fb10a8d45d6b31208db2c7cf93c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638152687493431619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=kshTtStNz3yJx68qEbbC7Ukl8lu2pW2IskhjktVD83A%3D=0
> >
> - 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Ffoundation%2Ffaq.html%23why-are-PMC-chairs-officers=05%7C01%7CGerman.Eichberger%40microsoft.com%7C7c2a9fb10a8d45d6b31208db2c7cf93c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638152687493431619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=5lBfH7ddeS3MWBGnkEtRY74AOahIre8IAFuD86LQmV0%3D=0
> >
>
> The PMC as a whole is the entity that oversees and leads the project and
> any PMC member can be approached as a representative of the committee. A
> list of Apache Cassandra PMC members can be found
> on: 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcassandra.apache.org%2F_%2Fcommunity.html=05%7C01%7CGerman.Eichberger%40microsoft.com%7C7c2a9fb10a8d45d6b31208db2c7cf93c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638152687493431619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8VFaceYP8UERral1zDKFfffN8IDLdG4spg57pbl8iQ0%3D=0
> 

Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-03 Thread German Eichberger via dev
Hi,

We shouldn't release just for releases sake. Are there enough new features and 
are they working well enough (quality!).

The big feature from our perspective for 5.0 is ACCORD (CEP-15) and I would 
advocate to delay until this has sufficient quality to be in production.

Just because something is released doesn't mean anyone is gonna use it. To add 
some operator perspective: Every time there is a new release we need to decide
1) are we supporting it
2) which other release can we deprecate

and potentially migrate people - which is also a tough sell if there are no 
significant features and/or breaking changes.  So from my perspective less 
frequent releases are better - after all we haven't gotten around to support 
4.1 

The 5.0 release is also coupled with deprecating  3.11 which is what a 
significant amount of people are using - given 4.1 took longer I am not sure 
how many people are assuming that 5 will be delayed and haven't made plans 
(OpenJDK support for 8 is longer than Java 17 ) . So being a bit more 
deliberate with releasing 5.0 and having a longer beta phase are all things we 
should consider.

My 2cts,
German

From: Benedict 
Sent: Wednesday, March 1, 2023 5:59 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] Next release date

It doesn’t look like we agreed to a policy of annual branch dates, only annual 
releases and that we would schedule this for 4.1 based on 4.0’s branch date. 
Given this was the reasoning proposed I can see why folk would expect this 
would happen for the next release. I don’t think there was a strong enough 
commitment here to be bound by, it if we think different maths would work 
better.

I recall the goal for an annual cadence was to ensure we don’t have lengthy 
periods between releases like 3.x to 4.0, and to try to reduce the pressure 
certain contributors might feel to hit a specific release with a given feature.

I think it’s better to revisit these underlying reasons and check how they 
apply than to pick a mechanism and stick to it too closely.

The last release was quite recent, so we aren’t at risk of slow releases here. 
Similarly, there are some features that the *project* would probably benefit 
from landing prior to release, if this doesn’t push release back too far.




On 1 Mar 2023, at 13:38, Mick Semb Wever  wrote:


My thoughts don't touch on CEPs inflight.



For the sake of broadening the discussion, additional questions I think 
worthwhile to raise are…

1. What third parties, or other initiatives, are invested and/or working 
against the May deadline? and what are their views on changing it?
  1a. If we push branching back to September, how confident are we that we'll 
get to GA before the December Summit?
2. What CEPs look like not landing by May that we consider a must-have this 
year?
  2a. Is it just tail-end commits in those CEPs that won't make it? Can these 
land (with or without a waiver) during the alpha phase?
  2b. If the final components to specified CEPs are not approved/appropriate to 
land during alpha, would it be better if the project commits to a one-off 
half-year release later in the year?


Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-02-13 Thread German Eichberger via dev
First, one of my learnings was that a ticket assigned to an issue in one branch 
of butler doesn't carry to another. So always search.

New failures from build lead week 7:

I created a Jira filter for finding the tickets I created: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20component%20in%20(%22Test%2Fdtest%2Fjava%22%2C%20%22Test%2Fdtest%2Fpython%22%2C%20%22Test%2Ffuzz%22%2C%20%22Test%2Funit%22)%20AND%20created%20%3E%3D%20-7d%20AND%20reporter%20in%20(xgerman42)

*** CASSANDRA-18257 - 
Test Failures: 
org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome - linked in 
4.0, 4.1, trunk
*** CASSANDRA-18253 - 
Test Failures: dtest 
repair_tests.repair_test.TestRepair.test_simple_sequential_repair - linked in 
4.0, trunk
*** CASSANDRA-18246 - 
Test Failures: 
org.apache.cassandra.cql3.validation.operations.TTLTest.testCapNoWarnExpirationOverflowPolicy
 - linked in 3.11
*** CASSANDRA-18245 - 
Test Failures: 
org.apache.cassandra.db.compaction.CompactionsTest.testDontPurgeAccidentally - 
linked in 3.11
-


From: Dan Jatnieks 
Sent: Friday, February 10, 2023 2:42 PM
To: dev@cassandra.apache.org ; Claude Warren, Jr 

Subject: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

You don't often get email from d...@datastax.com. Learn why this is 
important
New Failures from Build Lead Week 6:

*** CASSANDRA-18021 - Flaky 
org.apache.cassandra.distributed.test.ReprepareTestOldBehaviour#testReprepareMixedVersionWithoutReset
- This existing ticket has been linked in butler to new failures on 3.11

*** CASSANDRA-17608 - Fix testMetricsWithRebuildAndStreamingToTwoNodes
- Re-opened as intermittent failure occurred in build 1445 on trunk

Several new failures had only a single occurrence; no new tickets were opened 
during this time.



On Fri, Feb 10, 2023 at 12:44 AM Claude Warren, Jr via dev 
mailto:dev@cassandra.apache.org>> wrote:
New Failures from Build Lead Week 5

*** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'" 
reported in multiple tests
- reported in 4.1, 3.11, and 3.0
- identified as a possible class loader issue associated with CASSANDRA-18150

*** CASSANDRA-18191 - Native Transport SSL tests failing
- TestNativeTransportSSL.test_connect_to_ssl and 
TestNativeTransportSSL.test_connect_to_ssl (novnode)
- TestNativeTransportSSL.test_connect_to_ssl_optional and 
TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)


On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
mailto:calebrackli...@gmail.com>> wrote:
New failures from Build Lead Week 4:

*** CASSANDRA-18188 - Test failure in 
upgrade_tests.cql_tests.cls.test_limit_ranges
- trunk
- AttributeError: module 'py' has no attribute 'io'

*** CASSANDRA-18189 - Test failure in 
cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
- 4.0
- assert 10 == 94764
- other failures currently open in this test class, but at least superficially, 
different errors (see CASSANDRA-17322, CASSANDRA-18162)

Timeouts continue to manifest in many places.

On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever 
mailto:m...@apache.org>> wrote:
*** The Butler (Build Lead)

The introduction of Butler and the Build Lead was a wonderful
improvement to our CI efforts.  It has brought a lot of hygiene in
listing out flakies as they happened.  Noted that this has in-turn
increased the burden in getting our major releases out, but that's to
be seen as a one-off cost.


New Failures from Build Lead Week 3.


*** CASSANDRA-18156 – 
repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
 - AssertionError: Node logs don't have an error message for the failed repair
 - hard regression
 - 3.0, 3.11,

*** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match what 
was written with serialize(out, 12) for verb PAXOS2_COMMIT_AND_PREPARE_RSP
 - serializer class org.apache.cassandra.net.Message$Serializer; expected 1077, 
actual 1079
 - 4.1, trunk

*** CASSANDRA-18158 – 
org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
 - Cannot achieve consistency level ALL
 - 3.11, trunk

*** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
  - AssertionError: null in MemtablePool$SubPool.released(MemtablePool.java:193)
 - 3.11, 4.0, 4.1, trunk

*** CASSANDRA-18160 – 
cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
 - Found orphaned index file in after CDC state not in former
 - 4.1, trunk

*** CASSANDRA-18161 – 
org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
 - AssertionFailedError in 

Re: [EXTERNAL] Welcome Patrick McFadin as Cassandra Committer

2023-02-09 Thread German Eichberger via dev
Congratulations! Surprised Patrick wasn't a committer already...

From: Benjamin Lerer 
Sent: Thursday, February 2, 2023 9:58 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Welcome Patrick McFadin as Cassandra Committer

The PMC members are pleased to announce that Patrick McFadin has accepted
the invitation to become committer today.

Thanks a lot, Patrick, for everything you have done for this project and its 
community through the years.

Congratulations and welcome!

The Apache Cassandra PMC members


Re: [ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-30 Thread German Eichberger via dev
Great news indeed. I am wondering what it would take to include projects 
everyone is using like medusa, reaper, cassandra-ldap, etc. as a subproject.

Thanks,
German

From: Francisco Guerrero 
Sent: Friday, January 27, 2023 9:46 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] Evolving governance in the Cassandra 
Ecosystem

Great news! I'm very happy to see these changes coming soon.

Thanks to everyone involved in this work.

On 2023/01/26 21:21:01 Josh McKenzie wrote:
> The Cassandra PMC is pleased to announce that we're evolving our governance 
> procedures to better foster subprojects under the Cassandra Ecosystem's 
> umbrella. Astute observers among you may have noticed that the Cassandra 
> Sidecar is already a subproject of Apache Cassandra as of CEP-1 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D95652224=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=xUbCe%2FQGgZq3Ynr42YQucMkOw1IZ67cONiQSnkZI7bk%3D=0)
>  and Cassandra-14395 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRASC-24=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=RdItVOzwVs865Xd%2Ff8ancwkTDJWKPosHlKgbl1uysMw%3D=0),
>  however up until now we haven't had any structure to accommodate raising 
> committers on specific subprojects or clarity on the addition or governance 
> of future subprojects.
>
> Further, with the CEP for the driver donation in motion 
> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY%2Fedit%23heading%3Dh.xhizycgqxoyo=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=pUXo983DEHRBDtjGD%2FHaZnqc1uRwpS7tBkFkNF9Qfns%3D=0),
>  the need for a structured and sustainable way to expand the Cassandra 
> Ecosystem is pressing.
>
> We'll document these changes in the confluence wiki as well as the sidecar as 
> our first formal subproject after any discussion on this email thread. The 
> new governance process is as follows:
> -
>
> Subproject Governance
> 1. The Apache Cassandra PMC is responsible for governing the broad Cassandra 
> Ecosystem.
> 2. The PMC will vote on inclusion of new interested subprojects using the 
> existing procedural change vote process documented in the confluence wiki 
> (Super majority voting: 66% of votes must be in favor to pass. Requires 50% 
> participation of roll call).
> 3. New committers for these subprojects will be nominated and raised, both at 
> inclusion as a subproject and over time. Nominations can be brought to 
> priv...@cassandra.apache.org. Typically we're looking for a mix of commitment 
> and contribution to the community and project, be it through code, 
> documentation, presentations, or other significant engagement with the 
> project.
> 4. While the commit-bit is ecosystem wide, code modification rights and 
> voting rights (technical contribution, binding -1, CEP's) are granted per 
> subproject
>  4a. Individuals are trusted to exercise prudence and only commit or 
> claim binding votes on approved subprojects. Repeated violations of this 
> social contract will result in losing committer status.
>  4b. Members of the PMC have commit and voting rights on all subprojects.
> 5. For each subproject, the PMC will determine a trio of PMC members that 
> will be responsible for all PMC specific functions (release votes, driving 
> CVE response, marketing, branding, policing marks, etc) on the subproject.
> -
>
> Curious to see what thoughts we have as a community!
>
> Thanks!
>
> ~Josh
>


Re: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-03 Thread German Eichberger via dev
All,

This is a great idea and I am looking forward to it.

 Having dedicated consistent hardware is a good way to find regressions in the 
code but orthogonal to that is "certifying" new hardware to run with Cassandra, 
e.g. is there a performance regression when running on AMD? ARM64? What about 
more RAM? faster SSD?

What has limited us in perf testing in the past was some "representative" 
benchmark with clear recommendations so I am hoping that this work will produce 
a reference test suite with at least some hardware recommendation for the 
machine running the tests to make things more comparable. Additionally, some 
perf tests keep increasing the load until latency hits a certain threshold and 
others do some operations and measure how long it took. What types of tests 
where you aiming for?

The proposal also doesn't talk much about the test matrix. Will all supported 
Cassandra versions be tested with the same tests or will there be version 
specific tests?

I understand that we need to account for variances in configuration hardware 
but I am wondering if we can have more than just the sha. For example the 
complete cassandra.yaml for a test should be checked in as well - also we 
shoudl encourage people not to change too much from the reference test. 
Different hardware, different cassandra.yaml, and different tests will just 
create numbers which are hard to make sense of.

Really excited about this - thanks for the great work,
German


From: Josh McKenzie 
Sent: Friday, December 30, 2022 7:41 AM
To: dev 
Subject: [EXTERNAL] [DISCUSS] Taking another(other(other)) stab at performance 
testing

There was a really interesting presentation from the Lucene folks at ApacheCon 
about how they're doing perf regression testing. That combined with some recent 
contributors wanting to get involved on some performance work and not having 
much direction or clarity on how to get involved led some of us to come 
together and riff on what we might be able to take away from that presentation 
and context.

Lucene presentation: "Learning from 11+ years of Apache Lucene benchmarks": 
https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p

Their nightly indexing benchmark site: 
https://home.apache.org/~mikemccand/lucenebench/indexing.html

I've checked in with a handful of performance minded contributors in early 
December and we came up with a first draft, then some others of us met on an 
adhoc call on the 12/9 (which was recorded; ping on this thread if you'd like 
that linked - I believe Joey Lynch has that).

Here's where we landed after the discussions earlier this month (1st page, 
estimated reading time 5 minutes): 
https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#

Curious to hear what other perspectives there are out there on the topic.

Early Happy New Years everyone!

~Josh