Re: [DISCUSS] ccm as a subproject

2024-05-20 Thread Josh McKenzie
Sounds like we have a general consensus from the project on being willing to 
accept the donation, should the current rights owners be interested in said 
donation.

> We've been working on this along with the python-driver (just haven't raised 
> it yet).
Which they indicate they are. :)

I'll follow up on this topic offline w/Mick. Thanks everyone for the good 
conversation and feedback on it.

~Josh

On Mon, May 20, 2024, at 2:36 PM, Jordan West wrote:
> I would also love to see CCM as an official side project. It is important to 
> the project and I personally use it regularly. 
> 
> Jordan
> 
> On Thu, May 16, 2024 at 7:55 AM Josh McKenzie  wrote:
>> __
>>> We do still have the issues of DSE-supporting code in it, as we do with the 
>>> drivers.  I doubt any of us strongly object to it: there's no trickery 
>>> happening here on the user; but we should be aware of it and have a rough 
>>> direction sketched out for when someone else comes along wanting to add 
>>> support for their proprietary product.
>> IMO as long as it's documented well at the outset and we have plans to 
>> slowly refactor to move it to clean boundaries (epic in JIRA anyone <3) so 
>> it can be extracted into a separately maintained module by folks that need 
>> it, I think we'd be in great shape. That'd also pave a path for others 
>> wanting to add support for their proprietary products as well. Win-win.
>> 
>> There's always this chicken or egg problem w/things like ccm. Do people not 
>> contribute to it because it's out of the umbrella, or is it out of the 
>> umbrella because people don't need to contribute to it?
>> 
>> I hadn't thought about other subprojects relying on it. That's a very good 
>> point.
>> 
>> On Thu, May 16, 2024, at 4:48 AM, Jacek Lewandowski wrote:
>>> +1 (my personal opinion)
>>> 
>>> How to deal with the DSE-supporting code is a separate discussion IMO
>>> 
>>> - - -- --- -  -
>>> Jacek Lewandowski
>>> 
>>> 
>>> czw., 16 maj 2024 o 10:21 Berenguer Blasi  
>>> napisał(a):
>>>> __
>>>> +1 ccm is super useful
>>>> 
>>>> On 16/5/24 10:09, Mick Semb Wever wrote:
>>>>> 
>>>>> 
>>>>> On Wed, 15 May 2024 at 16:24, Josh McKenzie  wrote:
>>>>>> Right now ccm isn't formally a subproject of Cassandra or under 
>>>>>> governance of the ASF. Given it's an integral components of our CI as 
>>>>>> well as for local testing for many devs, and we now have more experience 
>>>>>> w/our muscle on IP clearance and ingesting / absorbing subprojects where 
>>>>>> we can't track down every single contributor to get an ICLA, seems like 
>>>>>> it might be worth revisiting the topic of donation of ccm to Apache.
>>>>>> 
>>>>>> For what it's worth, Sylvain originally and then DataStax after transfer 
>>>>>> have both been incredible and receptive stewards of the projects and 
>>>>>> repos, so this isn't about any response to any behavior on their part. 
>>>>>> Structurally, however, it'd be better for the health of the project(s) 
>>>>>> long-term to have ccm promoted in. As far as I know there was strong 
>>>>>> receptivity to that donation in the past but the IP clearance was the 
>>>>>> primary hurdle.
>>>>>> 
>>>>>> Anyone have any thoughts for or against?
>>>>>> 
>>>>>> https://github.com/riptano/ccm
>>>>> 
>>>>> 
>>>>> 
>>>>> We've been working on this along with the python-driver (just haven't 
>>>>> raised it yet).  It is recognised, like the python-driver, as a key 
>>>>> dependency that would best be in the project.
>>>>> 
>>>>> Obtaining the CLAs should be much easier, the contributors to ccm are 
>>>>> less diverse, being more the people we know already.
>>>>> 
>>>>> We do still have the issues of DSE-supporting code in it, as we do with 
>>>>> the drivers.  I doubt any of us strongly object to it: there's no 
>>>>> trickery happening here on the user; but we should be aware of it and 
>>>>> have a rough direction sketched out for when someone else comes along 
>>>>> wanting to add support for their proprietary product.  We also don't want 
>>>>> to be pushing downstream users to be having to create their own forks 
>>>>> either.
>>>>> 
>>>>> Great to see general consensus (so far) in receiving it :) 
>>>>>  
>> 


Re: [DISCUSS] Gossip Protocol Change

2024-05-16 Thread Josh McKenzie
I'm +1 to continuing work on CASSANDRA-18917 for all the reasons Jordan listed.

Sounds like the request was to hit the pause button until TCM merged rather 
than skipping the work entirely so that's promising.

On Thu, May 16, 2024, at 1:43 PM, Jon Haddad wrote:
> I have also recently worked with a teams who lost critical data as a result 
> of gossip issues combined with collision in our token allocation.  I haven’t 
> filed a jira yet as it slipped my mind but I’ve seen it in my own testing as 
> well. I’ll get a JIRA in describing it in detail. 
>  
> It’s severe enough that it should probably block 5.0. 
> 
> Jon
> 
> On Thu, May 16, 2024 at 10:37 AM Jordan West  wrote:
>> I’m a big +1 on 18917 or more testing of gossip. While I appreciate that it 
>> makes TCM more complicated, gossip and schema propagation bugs have been the 
>> source of our two worst data loss events in the last 3 years. Data loss 
>> should immediately cause us to evaluate what we can do better. 
>> 
>> We will likely live with gossip for at least 1, maybe 2, more years. 
>> Otherwise outside of bug fixes (and to some degree even still) I think the 
>> only other solution is to not touch gossip *at all* until we are all 
>> TCM-only which I don’t think is practical or realistic. recent changes to 
>> gossip in 4.1 introduced several subtle bugs that had serious impact (from 
>> data loss to loss of ability to safely replace nodes in the cluster). 
>> 
>> I am happy to contribute some time to this if lack of folks is the issue. 
>> 
>> Jordan 
>> 
>> On Mon, May 13, 2024 at 17:05 David Capwell  wrote:
>>> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which 
>>> lets you do deterministic gossip simulation testing cross large clusters 
>>> within seconds… I stopped this work as it conflicted with TCM (they were 
>>> trying to merge that week) and it hit issues where some nodes never 
>>> converged… I didn’t have time to debug so I had to drop the patch…
>>> 
>>> This type of change would be a good reason to resurrect that patch as 
>>> testing gossip is super dangerous right now… its behavior is only in a few 
>>> peoples heads and even then its just bits and pieces scattered cross 
>>> multiple people (and likely missing pieces)… 
>>> 
>>> My brain is far too fried right now to say your idea is safe or not, but 
>>> honestly feel that we would need to improve our tests (we have 0) before 
>>> making such a change… 
>>> 
>>> I do welcome the patch though...
>>> 
>>> 
 On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev 
  wrote:
 
 In looking into CASSANDRA-19580 I noticed something that raises a 
 question. With Gossip SYN it doesn't check for missing digests. If its 
 empty for shadow round it will add everything from endpointStateMap to the 
 reply. But why not included missing entries in normal replies? The 
 branching for reply handling of SYN requests could then be merged into 
 single code path (though shadow round handles empty state different with 
 CASSANDRA-16213). Potential is performance impact as this requires doing a 
 set difference.
 
 For example, something along the lines of:
 
 ```
 Set missing = new 
 HashSet<>(endpointStateMap.keySet());
 
 missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet()));
 for ( InetAddressAndPort endpoint : missing)
 {
 gDigestList.add(new GossipDigest(endpoint, 0, 0));
 }
 ```
 
 It seems odd to me that after shadow round for a new node we have 
 endpointStateMap with only itself as an entry. Then the only way it gets 
 the gossip state is by another node choosing to send the new node a gossip 
 SYN. The choosing of this is random. Yeah this happens every second so 
 eventually its going to receive one (outside the issue of CASSANDRA-19580 
 were it doesn't if its in a dead state like hibernate) , but doesn't this 
 open up bootstrapping to failures on very large clusters as it can take 
 longer before its sent a SYN (as the odds of being chosen for SYN get 
 lower)? For years been seeing bootstrap failures with 'Unable to contact 
 any seeds' but they are infrequent and never been able to figure out how 
 to reproduce in order to open a ticket, but I wonder if some of them have 
 been due to not receiving a SYN message before it does the seenAnySeed 
 check.


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Josh McKenzie
> More of a "how could we technically reach mars?" discussion than a "how we 
> get congress to authorize a budget to reach mars?"
Wow - that is genuinely a great simile. Really good point.

To Jeff's point - want to kick off a [DISCUSS] thread referencing this thread 
Jon so we can take the conversation there? Definitely think it's worth 
continuing from a technical perspective.

On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
> You can remove the shadowed values at compaction time, but you can’t ever 
> fully propagate the range update to point updates, so you’d be propagating 
> all of the range-update structures throughout everything forever. It’s JUST 
> like a range tombstone - you don’t know what it’s shadowing (and can’t, in 
> many cases, because the width of the range is uncountable for some types). 
> 
> Setting aside whether or not this construct is worth adding (I suspect a lot 
> of binding votes would say it’s not), the thread focuses on BETWEEN operator, 
> and there’s no reason we should pollute the conversation of “add a missing 
> SQL operator that basically maps to existing functionality” with creation of 
> a brand new form of update that definitely doesn’t map to any existing 
> concepts. 
> 
> 
> 
> 
> 
>> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>> 
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>>  wrote:
>>> __
>>> Ranged update sounds like a disaster for compaction and read performance.
>>> 
>>> Imagine compacting or reading some SSTables in which a large number of 
>>> overlapping but non-identical ranges were updated with different values. It 
>>> gives me a headache by just thinking about it.
>>> 
>>> Ranged delete is much simpler, because the "value" is the same tombstone 
>>> marker, and it also is guaranteed to expire and disappear eventually, so 
>>> the performance impact of dealing with them at read and compaction time 
>>> doesn't suffer in the long term.
>>> 
>>> 
>>> On 14/05/2024 16:59, Benjamin Lerer wrote:
 It should be like range tombstones ... in much worse ;-). A tombstone is a 
 simple marker (deleted). An update can be far more complex.  
 
 Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing 
> a version bump of the storage format?
> 
> 
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
>> They do work on DELETE because under the hood C* they get translated 
>> into range tombstones.
>> 
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this 
>>> work.
>>> 
 On May 13, 2024, at 7:40 AM, Patrick McFadin  
 wrote:
 
 This is a great feature addition to CQL! I get asked about it from 
 time to time but then people figure out a workaround. It will be great 
 to just have it available. 
 
 And right on Simon! I think the only project I had as a high school 
 senior was figuring out how many parties I could go to and still 
 maintain a passing grade. Thanks for your work here. 
 
 Patrick 
 
 On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  
 wrote:
> Hi everybody,
> 
> Just raising awareness that Simon is working on adding support for 
> the BETWEEN operator in WHERE clauses (SELECT and DELETE) in 
> CASSANDRA-19604. We plan to add support for it in conditions in a 
> separate patch.
> 
> The 

Re: [DISCUSS] ccm as a subproject

2024-05-16 Thread Josh McKenzie
> We do still have the issues of DSE-supporting code in it, as we do with the 
> drivers.  I doubt any of us strongly object to it: there's no trickery 
> happening here on the user; but we should be aware of it and have a rough 
> direction sketched out for when someone else comes along wanting to add 
> support for their proprietary product.
IMO as long as it's documented well at the outset and we have plans to slowly 
refactor to move it to clean boundaries (epic in JIRA anyone <3) so it can be 
extracted into a separately maintained module by folks that need it, I think 
we'd be in great shape. That'd also pave a path for others wanting to add 
support for their proprietary products as well. Win-win.

There's always this chicken or egg problem w/things like ccm. Do people not 
contribute to it because it's out of the umbrella, or is it out of the umbrella 
because people don't need to contribute to it?

I hadn't thought about other subprojects relying on it. That's a very good 
point.

On Thu, May 16, 2024, at 4:48 AM, Jacek Lewandowski wrote:
> +1 (my personal opinion)
> 
> How to deal with the DSE-supporting code is a separate discussion IMO
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> czw., 16 maj 2024 o 10:21 Berenguer Blasi  
> napisał(a):
>> __
>> +1 ccm is super useful
>> 
>> On 16/5/24 10:09, Mick Semb Wever wrote:
>>> 
>>> 
>>> On Wed, 15 May 2024 at 16:24, Josh McKenzie  wrote:
>>>> Right now ccm isn't formally a subproject of Cassandra or under governance 
>>>> of the ASF. Given it's an integral components of our CI as well as for 
>>>> local testing for many devs, and we now have more experience w/our muscle 
>>>> on IP clearance and ingesting / absorbing subprojects where we can't track 
>>>> down every single contributor to get an ICLA, seems like it might be worth 
>>>> revisiting the topic of donation of ccm to Apache.
>>>> 
>>>> For what it's worth, Sylvain originally and then DataStax after transfer 
>>>> have both been incredible and receptive stewards of the projects and 
>>>> repos, so this isn't about any response to any behavior on their part. 
>>>> Structurally, however, it'd be better for the health of the project(s) 
>>>> long-term to have ccm promoted in. As far as I know there was strong 
>>>> receptivity to that donation in the past but the IP clearance was the 
>>>> primary hurdle.
>>>> 
>>>> Anyone have any thoughts for or against?
>>>> 
>>>> https://github.com/riptano/ccm
>>> 
>>> 
>>> 
>>> We've been working on this along with the python-driver (just haven't 
>>> raised it yet).  It is recognised, like the python-driver, as a key 
>>> dependency that would best be in the project.
>>> 
>>> Obtaining the CLAs should be much easier, the contributors to ccm are less 
>>> diverse, being more the people we know already.
>>> 
>>> We do still have the issues of DSE-supporting code in it, as we do with the 
>>> drivers.  I doubt any of us strongly object to it: there's no trickery 
>>> happening here on the user; but we should be aware of it and have a rough 
>>> direction sketched out for when someone else comes along wanting to add 
>>> support for their proprietary product.  We also don't want to be pushing 
>>> downstream users to be having to create their own forks either.
>>> 
>>> Great to see general consensus (so far) in receiving it :) 
>>>  


[DISCUSS] ccm as a subproject

2024-05-15 Thread Josh McKenzie
Right now ccm isn't formally a subproject of Cassandra or under governance of 
the ASF. Given it's an integral components of our CI as well as for local 
testing for many devs, and we now have more experience w/our muscle on IP 
clearance and ingesting / absorbing subprojects where we can't track down every 
single contributor to get an ICLA, seems like it might be worth revisiting the 
topic of donation of ccm to Apache.

For what it's worth, Sylvain originally and then DataStax after transfer have 
both been incredible and receptive stewards of the projects and repos, so this 
isn't about any response to any behavior on their part. Structurally, however, 
it'd be better for the health of the project(s) long-term to have ccm promoted 
in. As far as I know there was strong receptivity to that donation in the past 
but the IP clearance was the primary hurdle.

Anyone have any thoughts for or against?

https://github.com/riptano/ccm


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Josh McKenzie
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing a 
> version bump of the storage format?
The technical limitation would be cost/benefit due to how this intersects w/our 
architecture I think.

Range tombstones have taught us that something that should be relatively simple 
(merge in deletion mask at read time) introduces a significant amount of 
complexity on all the paths Benjamin enumerated with a pretty long tail of bugs 
and data incorrectness issues and edge cases. The work to get there, at a high 
level glance, would be:
 1. Updates to CQL grammar, spec
 2. Updates to write path
 3. Updates to accord. And thinking about how this intersects w/accord's WAL / 
logic (I think? Consider me not well educated on details here)
 4. Updates to compaction w/consideration for edge cases on all the different 
compaction strategies
 5. Updates to iteration and merge logic
 6. Updates to paging logic
 7. Indexing
 8. repair, both full and incremental implications, support, etc
 9. the list probably goes on? There's always >= 1 thing we're not thinking of 
with a change like this. Usually more.
For all of the above we also would need unit, integration, and fuzz testing 
extensively to ensure the introduction of this new spanning concept on a write 
doesn't introduce edge cases where incorrect data is returned on merge.

All of which is to say: it's an interesting problem, but IMO given our 
architecture and what we know about the past of trying to introduce an 
architectural concept like this, the costs to getting something like this to 
production ready are pretty high.

To me the cost/benefit don't really balance out. Just my .02 though.

On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.
> 
> It is not simply a gut feeling, Jon. This change impacts read, write, 
> indexing, storage, compaction, repair... The risk and cost associated with it 
> are pretty significant and I am not convinced at this point of its benefit.
> 
> Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>>  wrote:
>>> __
>>> Ranged update sounds like a disaster for compaction and read performance.
>>> 
>>> Imagine compacting or reading some SSTables in which a large number of 
>>> overlapping but non-identical ranges were updated with different values. It 
>>> gives me a headache by just thinking about it.
>>> 
>>> Ranged delete is much simpler, because the "value" is the same tombstone 
>>> marker, and it also is guaranteed to expire and disappear eventually, so 
>>> the performance impact of dealing with them at read and compaction time 
>>> doesn't suffer in the long term.
>>> 
>>> 
>>> On 14/05/2024 16:59, Benjamin Lerer wrote:
 It should be like range tombstones ... in much worse ;-). A tombstone is a 
 simple marker (deleted). An update can be far more complex.  
 
 Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing 
> a version bump of the storage format?
> 
> 
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
>> They do work on DELETE because under the hood C* they get translated 
>> into range tombstones.
>> 
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>> I would also include in UPDATE… but yeah, <3 BETWEEN 

Re: Is there appetite to maintain the gocql driver (in the drivers subproject) ?

2024-05-14 Thread Josh McKenzie
Does anyone call out the need for a new CEP for bringing the gocql into the 
Driver's subproject ? 
> My suggestion is that this falls under CEP-8, even if it is not DataStax 
> donating this particular codebase, the process is largely the same and it is 
> the Drivers subproject receiving it.
+1 to future driver donations falling under CEP-8

On Tue, May 14, 2024, at 1:03 PM, Mick Semb Wever wrote:
> 
> Ok, so we're got confidence now on how to approach this, confirmation from 
> the project's maintainers supporting it, and interest from a handful of 
> people interested in maintaining and contributing to the project.
> 
> The proposed plan forward is…
> 
> We will go through a round of collecting CLAs along with agreements to donate 
> work to ASF from all gocql authors, over email and LinkedIn searches and 
> messages.  We will also open a github issue on the gocql project describing 
> the steps involved and mentioning all the authors.  A response on the GH 
> issue from everyone agreeing to the donation is the best single place to 
> collect the responses from everyone, but we'll accept and work with 
> whatever/however we get them.   These authors will also need to sign this 
> ICLA and submit it to the ASF.
> 
> After a four week grace period we will move ahead with the IP donation to 
> ASF, and make a list of all work (files) that we don't have CLAs for.  Such 
> work may remain with headers honouring their past MIT license and authors.  
> When the work is accepted and brought into the Cassandra Driver subproject we 
> will be looking to add committers to the subproject.  These may or may not be 
> people who have expressed interest in further contributing to the codebase, 
> but rather people we trust regardless when/if they come back to contribute.
> 
> Does anyone call out the need for a new CEP for bringing the gocql into the 
> Driver's subproject ? 
> My suggestion is that this falls under CEP-8, even if it is not DataStax 
> donating this particular codebase, the process is largely the same and it is 
> the Drivers subproject receiving it.
> 
>  
> 
> On Mon, 15 Apr 2024 at 12:31, Mick Semb Wever  wrote:
>> 
>>
>> 
 We can open an issue with LEGAL to see what they say at least?
>>> 
>>> 
>>> I will raise a LEGAL ticket.
>>> 
>>> My question here is whether we have to go through the process of 
>>> best-efforts to get approval to donate (transfer copyright), or whether we 
>>> can honour the copyright on the prior work and move forward ( by 
>>> referencing it in our NOTICE.txt, as per 
>>> https://infra.apache.org/licensing-howto.html )
>> 
>> 
>> https://issues.apache.org/jira/browse/LEGAL-674  


Re: [DISCUSS] guardail for global schema modifcations - CASSANDRA-19556 in the context of CASSANDRA-17495

2024-05-08 Thread Josh McKenzie
> Given that CASSANDRA-17495 was not released in any GA (just in 5.0-alpha1), I 
> think that the option 1) is still viable - we would drop CASSANDRA-17495 and 
> we would have CASSANDRA-19556 instead of that which would act as a global 
> on/off on all schema modifications however given that we go into beta2 I am 
> not sure if it is not just too late.
I think this is the best solution for our end-users long term excepting how 
close we are to a 5.0 GA. That said, guardrails aren't exactly super invasive 
destabilizing new features, so if you could get this patch in before we GA'd, 
I'd personally support making an exception for it.

We discussed a bit on the ticket; I fall in the ambivalent space of "I like 
modularity, except for when it's unnecessary complexity for a use-case or 
flexibility that users don't need", and not being sure whether operators would 
need different guardrails for functionality. Especially given security and 
roles.

On Mon, May 6, 2024, at 7:51 AM, Štefan Miklošovič wrote:
> Hi list,
> 
> there is a question in CASSANDRA-19556 we would like to have more feedback on 
> in order to move forward.
> 
> CASSANDRA-19556 wants to introduce two guardrails. One for forbidding / 
> allowing DCL statements - (Authentication|Authorization)Statement - and 
> another one 
> for DDL statements (all schema modifications).
> 
> However, there is already a guardrail implemented by CASSANDRA-17495 which 
> prevents only modifications of schema on a table level so CASSANDRA-19556 
> might be viewed as a superset of CASSANDRA-17495.
> 
> I am not sure why we stopped with tables only in CASSANDRA-17495, this might 
> be extended to keyspaces too, any schema modification really. I think we have 
> three options here:
> 
> 1) drop CASSANDRA-17495 and implement CASSANDRA-19556 which would cover _all_ 
> schema modifications, not just table-related ones
> 2) keep CASSANDRA-17495 and implement CASSANDRA-19556 as is currently 
> proposed - that means that we would be able to forbid all schema 
> modifications by CASSANDRA-19556 but once schema modifications are allowed, 
> we might further forbid table modifications as implemented by CASSANDRA-17495.
> 3) keep CASSANDRA-17495 but change the implementation of CASSANDRA-19556 in 
> such a way that it would be more granular. What I mean by the granularity is 
> that we would have separate guardrail for keyspace, for example. 
> 
> 2) is the least impactful approach but what I do not like is that, basically, 
> one guardrail (CASSANDRA-19556) would shadow CASSANDRA-17495. For example, if 
> the first one is disabled but the second one is enabled, we can modify 
> keyspaces but we can not modify tables. When the first one is enabled, we can 
> not modify tables even 17495 is not disabled which I find counterintuitive.
> 
> The pros of 3) would be that it would be more granular, indeed, but, is that 
> even necessary? There are a lot of ddl statements, creation of triggers, 
> views, indices, functions, aggregates ... How are we going to categorize it? 
> Do we want to have a guardrail per logical schema component? Is that not an 
> overkill? Can you come up with a scenario when an operator wants to disable 
> keyspace modifications but they would enable table modifications? Or 
> disabling just index, materialized view or function creations / modifications 
> but e.g keyspace modifications would be possible? Is it not easier to have 
> one guardrail for all schema modifications?
> 
> Given that CASSANDRA-17495 was not released in any GA (just in 5.0-alpha1), I 
> think that the option 1) is still viable - we would drop CASSANDRA-17495 and 
> we would have CASSANDRA-19556 instead of that which would act as a global 
> on/off on all schema modifications however given that we go into beta2 I am 
> not sure if it is not just too late.
> 
> Thank you for your opinions in advance.
> 
> Regards


Re: CI's working, and repeatable !!

2024-04-28 Thread Josh McKenzie
A huge amount of work and time went into this and it's going to have a big 
impact on the project. I want to offer a heartfelt thanks to all involved for 
the focus and energy that went into this!

As the author of the system David lovingly dubbed "JoshCI" (/sigh), I 
definitely want to see us all move to converge as much as possible on the CI 
code we're running. While I remain convinced something like CASSANDRA-18731 is 
vital for hygiene in the long run (unit testing our CI, declaratively defining 
atoms of build logic independently from flow), I also think there'd be 
significant value in more of us moving towards using the JenkinsFile where at 
all possible.

Seriously - thanks again for all this work everyone. CI on Cassandra is a Big 
Data Problem, and not an easy one.

On Sun, Apr 28, 2024, at 10:22 AM, Mick Semb Wever wrote:
> 
> Good news.
> 
> CI on 5.0 and trunk is working again, after an unexpected 6 weeks hiatus (and 
> a string of general problems since last year). 
> This includes pre-commit for 5.0 and trunk working again.
> 
> 
> More info…
> 
> From 5.0 we now have in-tree a Jenkinsfile that only relies on the in-tree 
> scripts – it does not depend upon cassandra-builds and all the individual dsl 
> created stage jobs. This aligns how pre-commit and post-commit works.  More 
> importantly, it makes our CI repeatable regardless of the fork/branch of the 
> code, or the jenkins installation.
> 
> For 5.0+ pre-commit use the Cassandra-devbranch-5 and make sure your patch is 
> after sha 3c85def
> The jenkinsfile now comes with pre-defined profiles, it's recommended to use 
> "skinny" until you need the final "pre-commit".  You can also use the custom 
> profile with a regexp when you need just specific test types.
> See https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/build
> 
> For pre-commit on older branches, you now use Cassandra-devbranch-before-5
> 
> For both pre- and post-commit builds, each build now creates two new sharable 
> artefacts: ci_summary.html and results_details.tar.xz
> These are based on what apple contributors were sharing from builds from 
> their internal CI system.  The format and contents of these files is expected 
> to evolve.
> 
> Each build now archives its results and logs all under one location in 
> nightlies.
> 
> e.g. https://nightlies.apache.org/cassandra/Cassandra-5.0/227/ 
> 
> 
> 
> The post-commit pipeline profile remains *very* heavy, at 130k+ tests.  These 
> were previously ramped up to include everything in their pipelines, given 
> everything that's happening in both branches.   So they take time and 
> saturate everything they touch.  We need to re-evaluate what we need to be 
> testing to alleviate this.  There'll also be a new pattern of timeouts and 
> infra/script -related flakies, as happens whenever there's such a significant 
> change, all the patience and help possible is appreciated!
> 
> 
> 
> Now that the jenkinsfile can now be used on any jenkins server for any 
> fork/branch, the next work-in-progress is CASSANDRA-18145, to be able to run 
> the full pipeline with a single command line (given a k8s context 
> (~/.kube/config)).
>   
> We already have most of this working – it's possible to make a clone 
> ci-cassandra.apache.org on k8s using this wip helm chart: 
> https://github.com/thelastpickle/Cassius 
> And we are already using this on an auto-scaling gke k8s cluster – you might 
> have seen me posting the ci_summary.html and results_details.tar.xz files to 
> tickets for pre-commit CI instead of using the ci-cassandra.a.o or circleci 
> pre-commit liks.  Already, we have a full pipeline time down to two hours and 
> less than a third of the cost of CircleCI, and there's lhf to further improve 
> this.  For serious pre-commit testing we are still missing and need 
> repeatable test runs, ref CASSANDRA-18942.  On all this I'd like to give a 
> special shout out to Aleksandr Volochnev who was instrumental in the final 
> (and helm based) work of 18145 which was needed to be able to test its 
> prerequisite ticket CASSANDRA-18594 – ci-cassandra.a.o would not be running 
> again today without his recent time spent on it.
> 
> On a separate note, this new jenkinsfile is designed in preparation for 
> CASSANDRA-18731 ('Add declarative root CI structure'), to make it easier to 
> define profiles, tests, and their infrastructural requirements.
> 
> 
> To the community…
>   We are now in a place where we are looking and requesting further donations 
> of servers to the ci-cassandra.apache.org jenkins cluster.  We can now also 
> use cloud/instance credits to host auto-scaling k8s-based ci-cassandra.a.o 
> clones that would be available for community pre-commit testing.   
>   There's plenty of low-hanging fruit improvements available if folk want to 
> get involved.  Performance and throughput of splits is an important area as 
> it has a big impact on reducing costs of a whole pipeline run  (there's 
> nothing like knowing 

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-26 Thread Josh McKenzie
> might get roasted for scope creep
This community *would never*.

What you've outlined seems like a very reasonable stretch goal or v2 to keep in 
mind so we architect something in v1 that's also supportive of a v2 keyspace 
only migration.

On Thu, Apr 25, 2024, at 1:57 PM, Venkata Hari Krishna Nukala wrote:
> I have updated the CEP to use binary level file digest verification.
> 
> In the next iteration, I am going to address the below point. 
> > I would like to see more abstraction of how the files get moved / put in 
> > place with the proposed solution being the default implementation. That 
> > would allow others to plug in alternatives means of data movement like 
> > pulling down backups from S3 or rsync, etc. 
> 
> Thanks!
> Hari
> 
> On Wed, Apr 24, 2024 at 1:24 AM Patrick McFadin  wrote:
>> I finally got a chance to digest this CEP and am happy to see it raised. 
>> This feature has been left to the end user for far too long.
>> 
>> It might get roasted for scope creep, but here goes. Related and something 
>> that I've heard for years is the ability to migrate a single keyspace away 
>> from a set of hardware... online. Similar problem but a lot more 
>> coordination.
>>  - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
>>  - Establish replication between keyspaces and sync schema
>>  - Move data from Cluster A to B
>>  - Decommission keyspace in Cluster A
>> 
>> In many cases, multiple tenants present cause the cluster to overpressure. 
>> The best solution in that case is to migrate the largest keyspace to a 
>> dedicated cluster.
>> 
>> Live migration but a bit more complicated. No chance of doing this manually 
>> without some serious brain surgery on c* and downtime.
>> 
>> Patrick
>> 
>> 
>> On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala 
>>  wrote:
>>> Thank you all for the inputs and apologies for the late reply. I see good 
>>> points raised in this discussion. _Please allow me to reply to each point 
>>> individually._
>>> 
>>> To start with, let me focus on the point raised by Scott & Jon about file 
>>> content verification at the destination with the source in this reply. 
>>> Agree that just verifying the file name + size is not fool proof. The 
>>> reason why I called out binary level verification out of initial scope is 
>>> because of these two reasons: 1) Calculating digest for each file may 
>>> increase CPU utilisation and 2) Disk would also be under pressure as 
>>> complete disk content will also be read to calculate digest. As called out 
>>> in the discussion, I think we can't compromise on binary level check for 
>>> these two reasons. Let me update the CEP to include binary level 
>>> verification. During implementation, it can probably be made optional so 
>>> that it can be skipped if someone doesn't want it.
>>> 
>>> Thanks!
>>> Hari
>>> 
>>> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev 
>>>  wrote:
 We use backup/restore for our implementation of this concept. It has the 
 added benefit that the backup / restore path gets exercised much more 
 regularly than it would in normal operations, finding edge case bugs at a 
 time when you still have other ways of recovering rather than in a full 
 disaster scenario.
 __ __
 Cheers
 Ben
 __ __
 __ __
 __ __
 __ __
 *From: *Jordan West 
 *Date: *Sunday, 21 April 2024 at 05:38
 *To: *dev@cassandra.apache.org 
 *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for 
 Live Migrating Instances
 *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*
 
 
 I do really like the framing of replacing a node is restoring a node and 
 then kicking off a replace. That is effectively what we do internally. 
 __ __
 I also agree we should be able to do data movement well both internal to 
 Cassandra and externally for a variety of reasons. 
 __ __
 We’ve seen great performance with “ZCS+TLS” even though it’s not full zero 
 copy — nodes that previously took *days* to replace now take a few hours. 
 But we have seen it put pressure on nodes and drive up latencies which is 
 the main reason we still rely on an external data movement system by 
 default — falling back to ZCS+TLS as needed. 
 __ __
 Jordan 
 __ __
 On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
> Jeff, this is probably the best explanation and justification of the idea 
> that I've heard so far.
> __ __
> I like it because
> __ __
> 1) we really should have something official for backups
> 2) backups / object store would be great for analytics
> 3) it solves a much bigger problem than the single goal of moving 
> instances.
> __ __
> I'm a huge +1 in favor of this perspective, with live migration being one 
> use case for backup / restore.
> __ __

Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Josh McKenzie
Congrats everyone and thanks for all the hard work to get things to this point!

On Wed, Apr 17, 2024, at 1:18 PM, Ekaterina Dimitrova wrote:
> Congrats and thank you for all your work on the drivers!
> 
> On Wed, 17 Apr 2024 at 13:17, Francisco Guerrero  wrote:
>> Congratulations everyone!
>> 
>> On 2024/04/17 17:14:34 Abe Ratnofsky wrote:
>> > Congrats everyone!
>> > 
>> > > On Apr 17, 2024, at 1:10 PM, Benjamin Lerer  wrote:
>> > > 
>> > > The Apache Cassandra PMC is pleased to announce that Alexandre Dutra, 
>> > > Andrew Tolbert, Bret McGuire and Olivier Michallat have accepted the 
>> > > invitation to become committers on the java driver sub-project. 
>> > > 
>> > > Thanks for your contributions to the Java driver during all those years!
>> > > Congratulations and welcome!
>> > > 
>> > > The Apache Cassandra PMC members
>> > 
>> >


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-04-09 Thread Josh McKenzie
+1 to separate JIRA projects per subproject. Having workflows distinct to each 
project is reason enough for me, nevermind the global namespace pollution that 
occurs if you pack a bunch of disparate projects together into one instance.

On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
> hi folks - sorry to have dropped the ball on responding to this thread.
> 
> My 2 cents are as follows - 
> 
> 1. Having a separate JIRA project for each sub-project will add management 
> overhead. This option, however, allows us to model unique workflows for the 
> sub-project.
> 
> 2. Managing the sub-project as part of the Cassandra JIRA project would imply 
> less management overhead but the sub-project would need to conform to the 
> same workflows.
> 
> I would pick option 1 unless there is a strong reason and desire to manage a 
> separate Jira project. We can always split out the Java Driver project if 
> things don't work out. OTOH merging a Jira project is harder.
> 
> Thanks,
> 
> Dinesh
> 
> On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
>> CEP-8 proposes using separate Jira projects per Cassandra sub-project:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>> 
>> > We suggest distinct Jira projects, one per driver, all to be created.
>> 
>> I don't see any discussion changing that from the [DISCUSS] or vote threads:
>> https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
>> https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>> https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
>> 
>> But looks like upon acceptance that was changed:
>> https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
>> 
>> > New issues will be tracked under the CASSANDRA project on Apache’s JIRA 
>> >  under the component 
>> > ‘Client/java-driver’.
>> 
>> I'm in favor of using the same Jira as Cassandra proper. Committership is 
>> project-wide, so having a standardized process (same ticket flow, review 
>> rules, labels, etc. is beneficial). But multiple votes happened based on the 
>> content of the CEP, so we should stick to what was voted on and move to a 
>> separate Jira.
>> 
>> --
>> Abe


Re: [DISCUSS] Fixing coverage reports for jvm-dtest-upgrade

2024-03-17 Thread Josh McKenzie
+1 from me

> If classloaders are appropriately named in the current versions of Cassandra, 
> we should be able to test upgrade paths to that version without updating the 
> older branches or building new jvm-dtest JARs for them.
Pretty sure Caleb was wrestling w/something recently that might have benefited 
from being able to differentiate which ClassLoader was sad; in general this 
seems like it'd be a big help to debugging startup / env issues, assuming it 
doesn't break anything. :)

On Fri, Mar 15, 2024, at 4:41 PM, Abe Ratnofsky wrote:
> Hey folks,
> 
> I'm working on gathering coverage data across all of our test suites. The 
> jvm-dtest-upgrade suite is currently unfriendly to Jacoco: it uses classes 
> from multiple versions of Cassandra but with the same class names, so 
> Jacoco's analysis fails due to "Can't add different class with same name"[1]. 
> We need a way to exclude alternate-version classes from Jacoco's analysis, so 
> we can get coverage for the current version of Cassandra.
> 
> Jacoco supports exclusion of classes based on class name or classloader name, 
> but the class names are frequently usually identical across Cassandra 
> versions. The jvm-dtest framework doesn't name classloaders, so we can't use 
> that either.
> 
> I'd like to propose that we name the jvm-dtest InstanceClassLoader instances 
> so that some can be excluded from Jacoco's analysis. Instances that create 
> new InstanceClassLoaders should be able to provide an immutable name in the 
> constructor. InstanceClassLoader names should indicate whether they're for 
> the current version of Cassandra (where coverage should be collected) or an 
> alternate version. If classloaders are appropriately named in the current 
> versions of Cassandra, we should be able to test upgrade paths to that 
> version without updating the older branches or building new jvm-dtest JARs 
> for them.
> 
> Any objections or alternate approaches?
> 
> --
> Abe
> 
> [1]: More specifically: Jacoco uses class IDs to map the analysis data that's 
> produced during text execution to .class files. I'm configuring the Jacoco 
> agent's classdumpdir to ensure that the classes loaded during execution are 
> the same classes that are analyzed during report generation, as is 
> recommended. When we build the alternate version JARs for jvm-dtest-upgrade, 
> we end up with multiple classes with the same name but different IDs.


Re: [DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread Josh McKenzie
Looks like we bumped from 3.6 requirement to 3.7 in CASSANDRA-18960 
 as well - similar 
thing. Vector support in python, though that patch took it from "return a 
simple blob" to "return something the python driver knows about, but apparently 
not variable types so we'll need to upgrade again."

> The version of the Python driver that is used by cqlsh (3.25.0) doesn't 
> entirely support the new vector data type introduced by CASSANDRA-18504 
> . While we can 
> perfectly write data, read vectors are presented as blobs:
> 

As far as I can tell, support for vector types in cqlsh is the sole reason 
we've bumped to 3.7 and 3.8 to support that python driver. That correct Andres 
/ Brandon?

On Mon, Mar 11, 2024, at 1:22 PM, Caleb Rackliffe wrote:
> The vector issues itself was a simple error message change: 
> https://github.com/datastax/python-driver/commit/e90c0f5d71f4cac94ed80ed72c8789c0818e11d0
> 
> Was there something else in 3.29.0 that actually necessitated the move to a 
> floor of Python 3.8? Do we generally change runtime requirements in minor 
> releases for the driver?
> 
> On Mon, Mar 11, 2024 at 12:12 PM Brandon Williams  wrote:
>> Given that 3.6 has been EOL for 2+ years[1], I don't think it makes
>> sense to add support for it back.
>> 
>> Kind Regards,
>> Brandon
>> 
>> [1] https://devguide.python.org/versions/
>> 
>> On Mon, Mar 11, 2024 at 12:08 PM David Capwell  wrote:
>> >
>> > Originally we had planned to support RHEL 7 but in testing 5.0 we found 
>> > out that cqlsh no longer works on RHEL 7[1].  This was changed in 
>> > CASSANDRA-19245 which upgraded python-driver from 3.28.0 to 3.29.0. For 
>> > some reason this minor version upgrade also dropped support for python 3.6 
>> > which is the supported python version on RHEL 7.
>> >
>> > We wanted to bring this to the attention of the community to figure out 
>> > next steps; do we wish to say that RHEL 7 is no longer supported (making 
>> > upgrades tied to OS upgrades, which can be very hard for users), or do we 
>> > want to add python 3.6 support back to python-driver?
>> >
>> >
>> > 1: the error seen by users is
>> > $ cqlsh
>> > Warning: unsupported version of Python, required 3.8-3.11 but found 3.6 
>> > Warning: unsupported version of Python, required 3.8-3.11 but found 2.7
>> > No appropriate Python interpreter found.
>> > $
>> >
>> >


Re: [Discuss] Repair inside C*

2024-02-23 Thread Josh McKenzie
> we're all willing to bikeshed for our personal preference on where it lives 
> and how it's implemented, and at the end of the day, code talks. I don't 
> think anyone's said they'll die on the hill of implementation details

:D

I don't think we're going to be able to reach a consensus on an email thread 
with higher level abstractions and indicative statements. For instance: "a lot 
of complexity around repair in the main process" vs. "a lot of complexity in 
signaling between a sidecar and a main process and supporting multiple versions 
of C*". Both resonate with me at face value and neither contain enough detail 
to weigh against one another.

A more granular, lower level CEP that includes a tradeoff of the two designs 
with a recommendation on a path forward might help unstick us from the ML 
back-and-forth.

We could also take an indicative vote on "in-process vs. in-sidecar" to see if 
we can get a read on temperature.

On Thu, Feb 22, 2024, at 2:06 PM, Paulo Motta wrote:
> Apologies, I just read the previous message and missed the previous 
> discussion on sidecar vs main process on this thread. :-)
> 
> It does not look like a final agreement was reached about this and there are 
> lots of good arguments for both sides, but perhaps it would be nice to agree 
> on this before a CEP is proposed since this will significantly influence the 
> initial design?
> 
> I tend to agree with Dinesh and Scott's pragmatic stance of providing initial 
> support to repair scheduling on the sidecar, since this has fewer 
> dependencies, and progressively move what makes sense to the main process as 
> TCM/Accord primitives become available and mature.
> 
> On Thu, Feb 22, 2024 at 1:44 PM Paulo Motta  wrote:
>> +1 to Josh's points,  The project has considered native repair scheduling 
>> for a long time but it was never made a reality due to the complex 
>> considerations involved and availability of custom implementations/tools 
>> like cassandra-reaper, which is a popular way of scheduling repairs in 
>> Cassandra.
>> 
>> Unfortunately I did not have cycles to review this proposal, but it looks 
>> promising from a quick glance.
>> 
>> One important consideration that I think we need to discuss is: where should 
>> repair scheduling live: in the main process or the sidecar?
>> 
>> I think there is a lot of complexity around repair in the main process and 
>> we need to be extra careful about adding additional complexity on top of 
>> that.
>> 
>> Perhaps this could be a good opportunity to consider the sidecar to host 
>> repair scheduling, since this looks to be a control plane responsibility? 
>> One downside is that this would not make repair scheduling available to 
>> users who do not use the sidecar.
>> 
>> What do you think? It would be great to have input from sidecar maintainers 
>> if this is something that would make sense for that subproject.
>> 
>> On Thu, Feb 22, 2024 at 12:33 PM Josh McKenzie  wrote:
>>> __
>>> Very late response from me here (basically necro'ing this thread).
>>> 
>>> I think it'd be useful to get this condensed into a CEP that we can then 
>>> discuss in that format. It's clearly something we all agree we need and 
>>> having an implementation that works, even if it's not in your preferred 
>>> execution domain, is vastly better than nothing IMO.
>>> 
>>> I don't have cycles (nor background ;) ) to do that, but it sounds like you 
>>> do Jaydeep given the implementation you have on a private fork + design.
>>> 
>>> A non-exhaustive list of things that might be useful incorporating into or 
>>> referencing from a CEP:
>>> Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>> Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
>>> Even older automatic repair scheduling: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>> Your design gdoc: 
>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>> PR with automated repair: 
>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>> 
>>> My intuition is that we're all basically in agreement that this is 
>>> something the DB needs, we're all willing to bikeshed for our personal 
>>> preference on where it lives and how it's implemented, and at the end of 
>>> the day, code talks. I don't think anyone's said they'll die on the hill of 
>>> implementation details, so that feels like CEP time to m

Re: [Discuss] Repair inside C*

2024-02-22 Thread Josh McKenzie
Very late response from me here (basically necro'ing this thread).

I think it'd be useful to get this condensed into a CEP that we can then 
discuss in that format. It's clearly something we all agree we need and having 
an implementation that works, even if it's not in your preferred execution 
domain, is vastly better than nothing IMO.

I don't have cycles (nor background ;) ) to do that, but it sounds like you do 
Jaydeep given the implementation you have on a private fork + design.

A non-exhaustive list of things that might be useful incorporating into or 
referencing from a CEP:
Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
Even older automatic repair scheduling: 
https://issues.apache.org/jira/browse/CASSANDRA-10070
Your design gdoc: 
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
PR with automated repair: 
https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c

My intuition is that we're all basically in agreement that this is something 
the DB needs, we're all willing to bikeshed for our personal preference on 
where it lives and how it's implemented, and at the end of the day, code talks. 
I don't think anyone's said they'll die on the hill of implementation details, 
so that feels like CEP time to me.

If you were willing and able to get a CEP together for automated repair based 
on the above material, given you've done the work and have the proof points 
it's working at scale, I think this would be a *huge contribution* to the 
community.

On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
> Is anyone going to file an official CEP for this?
> As mentioned in this email thread, here is one of the solution's design doc 
> 
>  and source code on a private Apache Cassandra patch. Could you go through it 
> and let me know what you think?
> 
> Jaydeep
> 
> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad  wrote:
>> > That said I would happily support an effort to bring repair scheduling to 
>> > the sidecar immediately. This has nothing blocking it, and would 
>> > potentially enable the sidecar to provide an official repair scheduling 
>> > solution that is compatible with current or even previous versions of the 
>> > database.
>> 
>> This is something I hadn't thought much about, and is a pretty good argument 
>> for using the sidecar initially.  There's a lot of deployments out there and 
>> having an official repair option would be a big win. 
>> 
>> 
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler 
>> > in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling to 
>> > the sidecar immediately. This has nothing blocking it, and would 
>> > potentially enable the sidecar to provide an official repair scheduling 
>> > solution that is compatible with current or even previous versions of the 
>> > database.
>> >
>> > Once TCM has landed, we’ll have much stronger primitives for repair 
>> > orchestration in the database itself. But I don’t think that should block 
>> > progress on a repair scheduling solution in the sidecar, and there is 
>> > nothing that would prevent someone from continuing to use a sidecar-based 
>> > solution in perpetuity if they preferred.
>> >
>> > - Scott
>> >
>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad  
>> > > wrote:
>> > >
>> > > I'm 100% in favor of repair being part of the core DB, not the sidecar. 
>> > >  The current (and past) state of things where running the DB correctly 
>> > > *requires* running a separate process (either community maintained or 
>> > > official C* sidecar) is incredibly painful for folks.  The idea that 
>> > > your data integrity needs to be opt-in has never made sense to me from 
>> > > the perspective of either the product or the end user.
>> > >
>> > > I've worked with way too many teams that have either configured this 
>> > > incorrectly or not at all. 
>> > >
>> > > Ideally Cassandra would ship with repair built in and on by default.  
>> > > Power users can disable if they want to continue to maintain their own 
>> > > repair tooling for some reason.
>> > >
>> > > Jon
>> > >
>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> > >> All,
>> > >> We had a brief discussion in [2] about the Uber article [1] where they 
>> > >> talk about having integrated repair into Cassandra and how great that 
>> > >> is. I expressed my disappointment that they didn't work with the 
>> > >> community on that (Uber, if you are listening time to make amends ) 
>> > >> and it turns out Joey already had the idea and wrote the code [3] - so 
>> > >> I wanted to start a discussion to gauge interest and maybe how to 
>> 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Josh McKenzie
> Do folks think we should file an official CEP and take it there?
+1 here.

Synthesizing your gdoc, Caleb's work, and the feedback from this thread into a 
draft seems like a solid next step.

On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
> I see a lot of great ideas being discussed or proposed in the past to cover 
> the most common rate limiter candidate use cases. Do folks think we should 
> file an official CEP and take it there?
> 
> Jaydeep
> 
> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe  
> wrote:
>> I just remembered the other day that I had done a quick writeup on the state 
>> of compaction stress-related throttling in the project:
>> 
>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>> 
>> I'm sure most of it is old news to the people on this thread, but I figured 
>> I'd post it just in case :)
>> 
>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie  wrote:
>>> __
>>>> 2.) We should make sure the links between the "known" root causes of 
>>>> cascading failures and the mechanisms we introduce to avoid them remain 
>>>> very strong.
>>> Seems to me that our historical strategy was to address individual known 
>>> cases one-by-one rather than looking for a more holistic load-balancing and 
>>> load-shedding solution. While the engineer in me likes the elegance of a 
>>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me 
>>> wonders how far we think we are today from a stable set-point.
>>> 
>>> i.e. are we facing a handful of cases where nodes can still get pushed over 
>>> and then cascade that we can surgically address, or are we facing a broader 
>>> lack of back-pressure that rears its head in different domains (client -> 
>>> coordinator, coordinator -> replica, internode with other operations, etc) 
>>> at surprising times and should be considered more holistically?
>>> 
>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>>> I almost forgot CASSANDRA-15817, which introduced 
>>>> reject_repair_compaction_threshold, which provides a mechanism to stop 
>>>> repairs while compaction is underwater.
>>>> 
>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe  
>>>>> wrote:
>>>>> 
>>>>> Hey all,
>>>>> 
>>>>> I'm a bit late to the discussion. I see that we've already discussed 
>>>>> CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013> 
>>>>> and CASSANDRA-16663 
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in 
>>>>> passing. Having written the latter, I'd be the first to admit it's a 
>>>>> crude tool, although it's been useful here and there, and provides a 
>>>>> couple primitives that may be useful for future work. As Scott mentions, 
>>>>> while it is configurable at runtime, it is not adaptive, although we did 
>>>>> make configuration easier in CASSANDRA-17423 
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is 
>>>>> global to the node, although we've lightly discussed some ideas around 
>>>>> making it more granular. (For example, keyspace-based limiting, or 
>>>>> limiting "domains" tagged by the client in requests, could be 
>>>>> interesting.) It also does not deal with inter-node traffic, of course.
>>>>> 
>>>>> Something we've not yet mentioned (that does address internode traffic) 
>>>>> is CASSANDRA-17324 
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I proposed 
>>>>> shortly after working on the native request limiter (and have just not 
>>>>> had much time to return to). The basic idea is this:
>>>>> 
>>>>>> When a node is struggling under the weight of a compaction backlog and 
>>>>>> becomes a cause of increased read latency for clients, we have two 
>>>>>> safety valves:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 1.) Disabling the native protocol server, which stops the node from 
>>>>>> coordinating reads and writes.
>>>>>> 2.) Jacking up the severity on the node, which tells the dynamic snitch 
>>>>>> to avoid the node for reads from other coordinators.
>>>>>> 
>>>>>

Welcome Brad Schoening as Cassandra Committer

2024-02-21 Thread Josh McKenzie
The Apache Cassandra PMC is pleased to announce that Brad Schoening has accepted
the invitation to become a committer.

Your work on the integrated python driver, launch script environment, and tests
has been a big help to many. Congratulations and welcome!

The Apache Cassandra PMC members

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-15 Thread Josh McKenzie
> Would it make sense to only block commits on the test strategy you've listed, 
> and shift the entire massive test suite to post-commit? 

> Lots and lots of other emails

;)

There's an interesting broad question of: What config do we consider 
"recommended" going forward, the "conservative" (i.e. old) or the "performant" 
(i.e. new)? And what JDK do we consider "recommended" going forward, the oldest 
we support or the newest?

Since those recommendations apply for new clusters, people need to qualify 
their setups, and we have a high bar of quality on testing pre-merge, my gut 
tells me "performant + newest JDK". This would impact what we'd test pre-commit 
IMO.

Having been doing a lot of CI stuff lately, some observations:
 • Our True North needs to be releasing a database that's free of defects that 
violate our core properties we commit to our users. No data loss, no data 
resurrection, transient or otherwise, due to defects in our code (meteors, 
tsunamis, etc notwithstanding).
 • The relationship of time spent on CI and stability of final full 
*post-commit* runs is asymptotic. It's not even 90/10; we're probably somewhere 
like 98% value gained from 10% of work, and the other 2% "stability" (i.e. 
green test suites, not "our database works") is a long-tail slog. Especially in 
the current ASF CI heterogenous env w/its current orchestration.
 • Thus: Pre-commit and post-commit should be different. The following points 
all apply to pre-commit:
 • The goal of pre-commit tests should be some number of 9's of no test 
failures post-commit (i.e. for every 20 green pre-commit we introduce 1 flake 
post-commit). Not full perfection; it's not worth the compute and complexity.
 • We should **build **all branches on all supported JDK's (8 + 11 for older, 
11 + 17 for newer, etc).
 • We should **run **all test suites with the *recommended **configuration* 
against the *highest versioned JDK a branch supports. *And we should formally 
recommend our users run on that JDK.
 • We should *at least* run all jvm-based configurations on the highest 
supported JDK version with the "not recommended but still supported" 
configuration.
 • I'm open to being persuaded that we should at least run jvm-unit tests on 
the older JDK w/the conservative config pre-commit, but not much beyond that.
That would leave us with the following distilled:

*Pre-commit:*
 • Build on all supported jdks
 • All test suites on highest supported jdk using recommended config
 • Repeat testing on new or changed tests on highest supported JDK 
w/recommended config
 • JDK-based test suites on highest supported jdk using other config
*Post-commit:*
 • Run everything. All suites, all supported JDK's, both config files.
With Butler + the *jenkins-jira* integration script  
(need
 to dust that off but it should remain good to go), we should have a pretty 
clear view as to when any consistent regressions are introduced and why. We'd 
remain exposed to JDK-specific flake introductions and flakes in unchanged 
tests, but there's no getting around the 2nd one and I expect the former to be 
rare enough to not warrant the compute to prevent it.

On Thu, Feb 15, 2024, at 10:02 AM, Jon Haddad wrote:
> Would it make sense to only block commits on the test strategy you've listed, 
> and shift the entire massive test suite to post-commit?  If there really is 
> only a small % of times the entire suite is useful this seems like it could 
> unblock the dev cycle but still have the benefit of the full test suite.  
> 
> 
> 
> On Thu, Feb 15, 2024 at 3:18 AM Berenguer Blasi  
> wrote:
>> __
>> On reducing circle ci usage during dev while iterating, not with the 
>> intention to replace the pre-commit CI (yet), we could do away with testing 
>> only dtests, jvm-dtests, units and cqlsh for a _single_ configuration imo. 
>> That would greatly reduce usage. I hacked it quickly here for illustration 
>> purposes: 
>> https://app.circleci.com/pipelines/github/bereng/cassandra/1164/workflows/3a47c9ef-6456-4190-b5a5-aea2aff641f1
>>  The good thing is that we have the tooling to dial in whatever we decide 
>> atm.
>> 
>> Changing pre-commit is a different discussion, to which I agree btw. But the 
>> above could save time and $ big time during dev and be done and merged in a 
>> matter of days imo.
>> 
>> I can open a DISCUSS thread if we feel it's worth it.
>> 
>> On 15/2/24 10:24, Mick Semb Wever wrote:
>>>  
 Mick and Ekaterina (and everyone really) - any thoughts on what test 
 coverage, if any, we should commit to for this new configuration? 
 Acknowledging that we already have *a lot* of CI that we run.
>>> 
>>> 
>>> 
>>> Branimir in this patch has already done some basic cleanup of test 
>>> variations, so this is not a duplication of the pipeline.  It's a 
>>> significant improvement.
>>> 
>>> I'm ok with cassandra_latest being committed 

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-14 Thread Josh McKenzie
> When we have failing tests people do not spend the time to figure out if 
> their logic caused a regression and merge, making things more unstable… so 
> when we merge failing tests that leads to people merging even more failing 
> tests...
What's the counter position to this Jacek / Berenguer?

Mick and Ekaterina (and everyone really) - any thoughts on what test coverage, 
if any, we should commit to for this new configuration? Acknowledging that we 
already have *a lot* of CI that we run.


On Wed, Feb 14, 2024, at 5:11 AM, Berenguer Blasi wrote:
> +1 to not doing, imo, the ostrich lol
> 
> On 14/2/24 10:58, Jacek Lewandowski wrote:
>> We should not block merging configuration changes given it is a valid 
>> configuration - which I understand as it is correct, passes all config 
>> validations, it matches documented rules, etc. And this provided latest 
>> config matches those requirements I assume.
>> 
>> The failures should block release or we should not advertise we have those 
>> features at all, and the configuration should be named "experimental" rather 
>> than "latest".
>> 
>> The config changes are not responsible for broken features and we should not 
>> bury our heads in the sand pretending that everything is ok.
>> 
>> Thanks,
>> 
>> śr., 14 lut 2024, 10:47 użytkownik Štefan Miklošovič 
>>  napisał:
>>> Wording looks good to me. I would also put that into NEWS.txt but I am not 
>>> sure what section. New features, Upgrading nor Deprecation does not seem to 
>>> be a good category. 
>>> 
>>> On Tue, Feb 13, 2024 at 5:42 PM Branimir Lambov  wrote:
 Hi All,
 
 CASSANDRA-18753 introduces a second set of defaults (in a separate 
 "cassandra_latest.yaml") that enable new features of Cassandra. The 
 objective is two-fold: to be able to test the database in this 
 configuration, and to point potential users that are evaluating the 
 technology to an optimized set of defaults that give a clearer picture of 
 the expected performance of the database for a new user. The objective is 
 to get this configuration into 5.0 to have the extra bit of confidence 
 that we are not releasing (and recommending) options that have not gone 
 through thorough CI.
 
 The implementation has already gone through review, but I'd like to get 
 people's opinion on two things:
 - There are currently a number of test failures when the new options are 
 selected, some of which appear to be genuine problems. Is the community 
 okay with committing the patch before all of these are addressed? This 
 should prevent the introduction of new failures and make sure we don't 
 release before clearing the existing ones.
 - I'd like to get an opinion on what's suitable wording and documentation 
 for the new defaults set. Currently, the patch proposes adding the 
 following text to the yaml (see 
 https://github.com/apache/cassandra/pull/2896/files):
 # NOTE:
 #   This file is provided in two versions:
 # - cassandra.yaml: Contains configuration defaults for a "compatible"
 #   configuration that operates using settings that are 
 backwards-compatible
 #   and interoperable with machines running older versions of 
 Cassandra.
 #   This version is provided to facilitate pain-free upgrades for 
 existing
 #   users of Cassandra running in production who want to gradually and
 #   carefully introduce new features.
 # - cassandra_latest.yaml: Contains configuration defaults that enable
 #   the latest features of Cassandra, including improved functionality 
 as
 #   well as higher performance. This version is provided for new users 
 of
 #   Cassandra who want to get the most out of their cluster, and for 
 users
 #   evaluating the technology.
 #   To use this version, simply copy this file over cassandra.yaml, or 
 specify
 #   it using the -Dcassandra.config system property, e.g. by running
 # cassandra 
 -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml
 # /NOTE
 Does this sound sensible? Should we add a pointer to this defaults set 
 elsewhere in the documentation?
 
 Regards,
 Branimir


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-30 Thread Josh McKenzie
> 2.) We should make sure the links between the "known" root causes of 
> cascading failures and the mechanisms we introduce to avoid them remain very 
> strong.
Seems to me that our historical strategy was to address individual known cases 
one-by-one rather than looking for a more holistic load-balancing and 
load-shedding solution. While the engineer in me likes the elegance of a broad, 
more-inclusive *actual SEDA-like* approach, the pragmatist in me wonders how 
far we think we are today from a stable set-point. 

i.e. are we facing a handful of cases where nodes can still get pushed over and 
then cascade that we can surgically address, or are we facing a broader lack of 
back-pressure that rears its head in different domains (client -> coordinator, 
coordinator -> replica, internode with other operations, etc) at surprising 
times and should be considered more holistically?

On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
> I almost forgot CASSANDRA-15817, which introduced 
> reject_repair_compaction_threshold, which provides a mechanism to stop 
> repairs while compaction is underwater.
> 
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe  
>> wrote:
>> 
>> Hey all,
>> 
>> I'm a bit late to the discussion. I see that we've already discussed 
>> CASSANDRA-15013  and 
>> CASSANDRA-16663  at 
>> least in passing. Having written the latter, I'd be the first to admit it's 
>> a crude tool, although it's been useful here and there, and provides a 
>> couple primitives that may be useful for future work. As Scott mentions, 
>> while it is configurable at runtime, it is not adaptive, although we did 
>> make configuration easier in CASSANDRA-17423 
>> . It also is global 
>> to the node, although we've lightly discussed some ideas around making it 
>> more granular. (For example, keyspace-based limiting, or limiting "domains" 
>> tagged by the client in requests, could be interesting.) It also does not 
>> deal with inter-node traffic, of course.
>> 
>> Something we've not yet mentioned (that does address internode traffic) is 
>> CASSANDRA-17324 , 
>> which I proposed shortly after working on the native request limiter (and 
>> have just not had much time to return to). The basic idea is this:
>> 
>>> When a node is struggling under the weight of a compaction backlog and 
>>> becomes a cause of increased read latency for clients, we have two safety 
>>> valves:
>>> 
>>> 
>>> 1.) Disabling the native protocol server, which stops the node from 
>>> coordinating reads and writes.
>>> 2.) Jacking up the severity on the node, which tells the dynamic snitch to 
>>> avoid the node for reads from other coordinators.
>>> 
>>> These are useful, but we don’t appear to have any mechanism that would 
>>> allow us to temporarily reject internode hint, batch, and mutation messages 
>>> that could further delay resolution of the compaction backlog.
>>> 
>> 
>> Whether it's done as part of a larger framework or on its own, it still 
>> feels like a good idea.
>> 
>> Thinking in terms of opportunity costs here (i.e. where we spend our finite 
>> engineering time to holistically improve the experience of operating this 
>> database) is healthy, but we probably haven't reached the point of 
>> diminishing returns on nodes being able to protect themselves from clients 
>> and from other nodes. I would just keep in mind two things:
>> 
>> 1.) The effectiveness of rate-limiting in the system (which includes the 
>> database and all clients) as a whole necessarily decreases as we move from 
>> the application to the lowest-level database internals. Limiting correctly 
>> at the client will save more resources than limiting at the native protocol 
>> server, and limiting correctly at the native protocol server will save more 
>> resources than limiting after we've dispatched requests to some thread pool 
>> for processing.
>> 2.) We should make sure the links between the "known" root causes of 
>> cascading failures and the mechanisms we introduce to avoid them remain very 
>> strong.
>> 
>> In any case, I'd be happy to help out in any way I can as this moves forward 
>> (especially as it relates to our past/current attempts to address this 
>> problem space).


Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Josh McKenzie
The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has 
accepted
the invitation to become a committer.

Thanks for all the hard work and collaboration on the project thus far, and 
we're all looking forward to working more with you in the future. 
Congratulations and welcome!

The Apache Cassandra PMC members



Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Josh McKenzie
> Fundamentally, I think it's better for the project if administration is fully 
> done over CQL and we have a consistent, single way of doing things. 
Strongly agree here. With 2 caveats:
 1. Supporting backwards compat, especially for automated ops (i.e. nodetool, 
JMX, etc), is crucial. Painful, but crucial.
 2. We need something that's available for use before the node comes fully 
online; the point Jeff always brings up when we discuss moving away from JMX. 
So long as we have some kind of "out-of-band" access to nodes or accommodation 
for that, we should be good.
For context on point 2, see slack: 
https://the-asf.slack.com/archives/CK23JSY2K/p1688745128122749?thread_ts=1688662169.018449=CK23JSY2K

> I point out that JMX works before and after the native protocol is running 
> (startup, shutdown, joining, leaving), and also it's semi-common for us to 
> disable the native protocol in certain circumstances, so at the very least, 
> we'd then need to implement a totally different cql protocol interface just 
> for administration, which nobody has committed to building yet.

I think this is a solvable problem, and I think the benefits of having a 
single, elegant way of interacting with a cluster and configuring it justifies 
the investment for us as a project. Assuming someone has the cycles to, you 
know, actually do the work. :D

On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote:
> I like the idea of the ability to execute certain commands via CQL, but I 
> think it only makes sense for the nodetool commands that cause an action to 
> take place, such as compact or repair.  We already have virtual tables, I 
> don't think we need another layer to run informational queries.  I see little 
> value in having the following (I'm using exec here for simplicity):
> 
> cqlsh> exec tpstats
> 
> which returns a string in addition to:
> 
> cqlsh> select * from system_views.thread_pools
> 
> which returns structured data.  
> 
> I'd also rather see updatable configuration virtual tables instead of
> 
> cqlsh> exec setcompactionthroughput 128
> 
> Fundamentally, I think it's better for the project if administration is fully 
> done over CQL and we have a consistent, single way of doing things.  I'm not 
> dead set on it, I just think less is more in a lot of situations, this being 
> one of them.  
> 
> Jon
> 
> 
> On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov  wrote:
>> Happy New Year to everyone! I'd like to thank everyone for their
>> questions, because answering them forces us to move towards the right
>> solution, and I also like the ML discussions for the time they give to
>> investigate the code :-)
>> 
>> I'm deliberately trying to limit the scope of the initial solution
>> (e.g. exclude the agent part) to keep the discussion short and clear,
>> but it's also important to have a glimpse of what we can do next once
>> we've finished with the topic.
>> 
>> My view of the Command<> is that it is an abstraction in the broader
>> sense of an operation that can be performed on the local node,
>> involving one of a few internal components. This means that updating a
>> property in the settings virtual table via an update statement, or
>> executing e.g. the setconcurrentcompactors command are just aliases of
>> the same internal command via different APIs. Another example is the
>> netstats command, which simply aggregates the MessageService metrics
>> and returns them in a human-readable format (just another way of
>> looking at key-value metric pairs). More broadly, the command input is
>> Map and String as the result (or List).
>> 
>> As Abe mentioned, Command and CommandRegistry should be largely based
>> on the nodetool command set at the beginning. We have a few options
>> for how we can initially construct command metadata during the
>> registry implementation (when moving command metadata from the
>> nodetool to the core part), so I'm planning to consult with the
>> command representations of the k8cassandra project in the way of any
>> further registry adoptions have zero problems (by writing a test
>> openapi registry exporter and comparing the representation results).
>> 
>> So, the MVP is the following:
>> - Command
>> - CommandRegistry
>> - CQLCommandExporter
>> - JMXCommandExporter
>> - the nodetool uses the JMXCommandExporter
>> 
>> 
>> = Answers =
>> 
>> > What do you have in mind specifically there? Do you plan on rewriting a 
>> > brand new implementation which would be partially inspired by our agent? 
>> > Or would the project integrate our agent code in-tree or as a dependency?
>> 
>> Personally, I like the state of the k8ssandra project as it is now. My
>> understanding is that the server part of a database always lags behind
>> the client and sidecar parts in terms of the jdk version and the
>> features it provides. In contrast, sidecars should always be on top of
>> the market, so if we want to make an agent part in-tree, this should
>> be carefully considered for the flexibility which 

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-22 Thread Josh McKenzie
> I don't even think we have to think about *new* SAI features to see where it 
> will benefit from further *local* optimization...
You make good points IMO. After Caleb's reasoning it makes sense to me to start 
working on query optimization w/even just our initial SAI feature-set given 
querying across multiple indices.

On Fri, Dec 22, 2023, at 8:42 AM, J. D. Jordan wrote:
> 
> The CEP-29 “rejected alternatives” section mentions one such use case.  Being 
> able to put NOT arbitrarily in a query.  Adding an OR operator is another 
> thing we are likely to want to do in the near future that would benefit from 
> this work, those benefit from the syntax tree and reordering parts of the 
> proposal.
> 
> But I think we already have enough complexity available to us to justify a 
> query optimizer in the fact of multi index queries today. Especially when you 
> have the new ANN OF operator in use combined with index queries.  Depending 
> on what order you query the indexes in, it can dramatically change the 
> performance of the query.  We are seeing and working through such issues in 
> Astra today.
> 
> -Jeremiah
> 
> 
>> On Dec 21, 2023, at 12:00 PM, Josh McKenzie  wrote:
>> 
>>> we are already late. We have several features running in production that we 
>>> chose to not open source yet because implementing phase 1 of the CEP would 
>>> have heavily simplify their designs. The cost of developing them was much 
>>> higher than what it would have been if the CEP had already been 
>>> implemented. We are also currently working on some SAI features that need 
>>> cost based optimization.
>> Are there DISCUSS threads or CEP's for any of that work? For us to have a 
>> useful discussion about whether we're at a point in the project where a 
>> query optimizer is appropriate for the project this information would be 
>> vital.
>> 
>> On Thu, Dec 21, 2023, at 12:33 PM, Benjamin Lerer wrote:
>>> Hey German,
>>> 
>>> To clarify things, we intend to push cardinalities across nodes, not costs. 
>>> It will be up to the Cost Model to estimate cost based on those 
>>> cardinalities. We will implement some functionalities to collect costs on 
>>> query execution to be able to provide them as the output of EXPLAIN ANALYZE.
>>> 
>>> We will provide more details on how we will collect and distribute 
>>> cardinalities. We will probably not go into details on how we will estimate 
>>> costs before the patch for it is ready. The main reason being that there 
>>> are a lot of different parts that you need to account for and that it will 
>>> require significant testing and experimentation.
>>> 
>>> Regarding multi-tenancy, even if you use query cost, do not forget that you 
>>> will have to account also for background tasks such as compaction, repair, 
>>> backup, ... which is not included in this CEP.  
>>> 
>>> Le jeu. 21 déc. 2023 à 00:18, German Eichberger via dev 
>>>  a écrit :
>>>> All,
>>>> 
>>>> very much agree with Scott's reasoning. 
>>>> 
>>>> It seems expedient given the advent of ACCORD transactions to be more like 
>>>> the other distributed SQL databases and just support SQL. But just because 
>>>> it's expedient it isn't right and we should work out the relational 
>>>> features in more detail before we embark on tying us to some query 
>>>> planning design.
>>>> 
>>>> The main problem in this space is pushing cost / across nodes based on 
>>>> data density. I understand that TCM will level out data density but the 
>>>> cost based optimizer proposal does a lot of hand waving when it comes to 
>>>> collecting/estimating costs for each node. I like to see more details on 
>>>> this since otherwise it will be fairly limiting.
>>>> 
>>>> I am less tied to ALLOW FILTERING - many of my customers find allowing 
>>>> filtering beneficial for their workloads so I think removing it makes 
>>>> sense to me (and yes we try to discourage them )
>>>> 
>>>> I am also intrigued by this proposal when I think about multi tenancy and 
>>>> resource governance: We have heard from several operator who run multiple 
>>>> internal teams on the same Cassandra cluster jut to optimize costs. Having 
>>>> a way to attribute those costs more fairly by adding up the costs the 
>>>> optimizer calculates might be hugely beneficial.  There could also be a 
>>>> way to have a "cost budget&qu

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Josh McKenzie
y is 
>>>> perhaps simpler to implement and maintain, and has corollary benefits - 
>>>> such as providing a mechanism for users to specify their own execution 
>>>> plan.
>>>>  
>>>> Note, my proposal cuts across all of these elements of the CEP. There is 
>>>> no obvious need for a cross-cluster re-optimisation event or cross cluster 
>>>> statistic management.
>>>  
>>> I think that I am missing one part of your proposal. How do you plan to 
>>> build the initial execution plan for a prepared statement?
>>> 
>>> Le mer. 20 déc. 2023 à 14:05, Benedict  a écrit :
>>>> 
>>>> If we are to address that within the CEP itself then we should discuss it 
>>>> here, as I would like to fully understand the approach as well as how it 
>>>> relates to consistency of execution and the idea of triggering 
>>>> re-optimisation. These ideas are all interrelated.
>>>> 
>>>> I’m not sold on the proposed set of characteristics, and think my coupling 
>>>> an execution plan to a given prepared statement for clients to supply is 
>>>> perhaps simpler to implement and maintain, and has corollary benefits - 
>>>> such as providing a mechanism for users to specify their own execution 
>>>> plan.
>>>> 
>>>> Note, my proposal cuts across all of these elements of the CEP. There is 
>>>> no obvious need for a cross-cluster re-optimisation event or cross cluster 
>>>> statistic management.
>>>> 
>>>> We still also need to discuss more concretely how the base statistics 
>>>> themselves will be derived, as there is little detail here today in the 
>>>> proposal.
>>>> 
>>>> 
>>>>> On 20 Dec 2023, at 12:58, Benjamin Lerer  wrote:
>>>>> 
>>>>> After the second phase of the CEP, we will have two optimizer 
>>>>> implementations. One will be similar to what we have today and the other 
>>>>> one will be the CBO. As those implementations will be behind the new 
>>>>> Optimizer API interfaces they will both have support for EXPLAIN and they 
>>>>> will both benefit from the simplification/normalization rules. Such as 
>>>>> the ones that David mentioned.
>>>>> 
>>>>> Regarding functions, we are already able to determine which ones are 
>>>>> deterministic 
>>>>> (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55).
>>>>>  We simply do not take advantage of it.
>>>>> 
>>>>> I removed the ALLOW FILTERING part and will open a discussion about it at 
>>>>> the beginning of next year.
>>>>> 
>>>>> Regarding the statistics management part, I would like to try to address 
>>>>> it within the CEP itself, if feasible. If it turns out to be too 
>>>>> complicated, I will separate it into its own CEP.
>>>>> 
>>>>> Le mar. 19 déc. 2023 à 22:23, David Capwell  a écrit :
>>>>>>> even if the only outcome of all this work were to tighten up 
>>>>>>> inconsistencies in our grammar and provide more robust EXPLAIN and 
>>>>>>> EXPLAIN ANALYZE functionality to our end users, I think that would be 
>>>>>>> highly valuable
>>>>>> 
>>>>>> In my mental model a no-op optimizer just becomes what we have today 
>>>>>> (since all new features really should be disabled by default, I would 
>>>>>> hope we support this), so we benefit from having a logical AST + ability 
>>>>>> to mutate it before we execute it and we can use this to make things 
>>>>>> nicer for users (as you are calling out)
>>>>>> 
>>>>>> Here is one example that stands out to me in accord
>>>>>> 
>>>>>> LET a = (select * from tbl where pk=0);
>>>>>> Insert into tbl2 (pk, …) values (a.pk, …); — this is not allowed as we 
>>>>>> don’t know the primary key… but this could trivially be written to 
>>>>>> replace a.pk with 0…
>>>>>> 
>>>>>> With this work we could also rethink what functions are deterministic 
>>>>>> and which ones are not (not trying to bike shed)… simple example is 
>>>>>> “now” (select now() from tbl; — each row will have a different 
>>>>

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-18 Thread Josh McKenzie
> One thing where this “could” come into play is that we currently run with 
> different configs at the CI level and we might be able to make this happen at 
> the class or method level instead..
It'd be great to be able to declaratively indicate which configurations a test 
needed to exercise and we just have 1 CI run that includes them as appropriate. 

On Mon, Dec 18, 2023, at 7:22 PM, David Capwell wrote:
>> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
>> interesting annotation-based approach to property testing. Curious if you've 
>> looked into or used that at all David (Capwell)? (link for the lazy: 
>> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).
> 
> I have not no.  Looking at your link it moves from lambdas to annotations, 
> and tries to define a API for stateful… I am neutral to that as its mostly 
> style…. One thing to call out is that the project documents it tries to 
> “shrink”… we ended up disabling this in QuickTheories as shrinking doesn’t 
> work well for many of our tests (too high resource demand and unable to 
> actually shrink once you move past trivial generators).  Looking at their 
> docs and their code, its hard for me to see how we actually create C* 
> generators… its so much class gen magic that I really don’t see how to create 
> AbstractType or TableMetadata… the only example they gave was not random data 
> but hand crafted data… 
> 
>> moving to JUnit 5
> 
> I am a fan of this.  If we add dependencies and don’t keep update with them 
> it becomes painful over time (missing features, lack of support, etc).  
> 
>> First of all - when you want to have a parameterized test case you do not 
>> have to make the whole test class parameterized - it is per test case. Also, 
>> each method can have different parameters.
> 
> I strongly prefer this, but never had it as a blocker from me doing param 
> tests…. One thing where this “could” come into play is that we currently run 
> with different configs at the CI level and we might be able to make this 
> happen at the class or method level instead..
> 
> @ServerConfigs(all) // can exclude unsupported configs
> public class InsertTest
> 
> It bothers me deeply that we run tests that don’t touch the configs we use in 
> CI, causing us to waste resources… Can we solve this in junit4 param logic… 
> no clue… 
> 
>> On Dec 15, 2023, at 6:52 PM, Josh McKenzie  wrote:
>> 
>>> First of all - when you want to have a parameterized test case you do not 
>>> have to make the whole test class parameterized - it is per test case. 
>>> Also, each method can have different parameters.
>> This is a pretty compelling improvement to me having just had to use the 
>> somewhat painful and blunt instrument of our current framework's 
>> parameterization; pretty clunky and broad.
>> 
>> It also looks like they moved to a "test engine abstracted away from test 
>> identification" approach to their architecture in 5 w/the "vintage" model 
>> providing native unchanged backwards-compatibility w/junit 4. Assuming they 
>> didn't bork up their architecture that *should* lower risk of the framework 
>> change leading to disruption or failure (famous last words...).
>> 
>> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
>> interesting annotation-based approach to property testing. Curious if you've 
>> looked into or used that at all David (Capwell)? (link for the lazy: 
>> https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).
>> 
>> On Tue, Dec 12, 2023, at 11:39 AM, Jacek Lewandowski wrote:
>>> First of all - when you want to have a parameterized test case you do not 
>>> have to make the whole test class parameterized - it is per test case. 
>>> Also, each method can have different parameters.
>>> 
>>> For the extensions - we can have extensions which provide Cassandra 
>>> configuration, extensions which provide a running cluster and others. We 
>>> could for example apply some extensions to all test classes externally 
>>> without touching those classes, something like logging the begin and end of 
>>> each test case. 
>>> 
>>> 
>>> 
>>> wt., 12 gru 2023 o 12:07 Benedict  napisał(a):
>>>> 
>>>> Could you give (or link to) some examples of how this would actually 
>>>> benefit our test suites?
>>>> 
>>>> 
>>>>> On 12 Dec 2023, at 10:51, Jacek Lewandowski  
>>>>> wrote:
>>>>> 
>>>>> I have two major pros for JUnit 5:
>>

Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Josh McKenzie
Gotcha; wasn't sure given the earlier phrasing. Makes sense.

Dinesh's compromise position makes sense to me.

On Fri, Dec 15, 2023, at 11:21 PM, Ariel Weisberg wrote:
> Hi,
> 
> I did get one response from Robert indicating that he didn’t want to do the 
> work to contribute it.
> 
> I offered to do the work and asked for permission to contribute it and no 
> response. Followed up later with a ping and also no response.
> 
> Ariel
> 
> On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote:
>>> I have reached out to the original maintainer about it and it seems like if 
>>> we want to keep using it we will need to start releasing it under a new 
>>> package from a different repo.
>> 
>>> the current maintainer is not interested in donating it to the ASF
>> Is that the case Ariel or could you just not reach Robert?
>> 
>> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>>>> from a maintenance and
>>>> integration testing perspective I think it would be better to keep the
>>>> ohc in-tree, so we will be aware of any issues immediately after the
>>>> full CI run.
>>> 
>>> From the original email bringing OHC in tree is not an option because the 
>>> current maintainer is not interested in donating it to the ASF.  Thus the 
>>> option 1 of some set of people forking it to their own github org and 
>>> maintaining a version outside of the ASF C* project.
>>> 
>>> -Jeremiah
>>> 
>>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>>>> Ariel,
>>>> thank you for bringing this topic to the ML.
>>>> 
>>>> I may be missing something, so correct me if I'm wrong somewhere in
>>>> the management of the Cassandra ecosystem.  As I see it, the problem
>>>> right now is that if we fork the ohc and put it under its own root,
>>>> the use of that row cache is still not well tested (the same as it is
>>>> now). I am particularly emphasising the dependency management side, as
>>>> any version change/upgrade in Cassandra and, as a result of that
>>>> change a new set of libraries in the classpath should be tested
>>>> against this integration.
>>>> 
>>>> So, unless it is being widely used by someone else outside of the
>>>> community (which it doesn't seem to be), from a maintenance and
>>>> integration testing perspective I think it would be better to keep the
>>>> ohc in-tree, so we will be aware of any issues immediately after the
>>>> full CI run.
>>>> 
>>>> I'm also +1 for not deprecating it, even if it is used in narrow
>>>> cases, while the cost of maintaining its source code remains quite low
>>>> and it brings some benefits.
>>>> 
>>>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> To add some additional context.
>>>>> 
>>>>> The row cache is disabled by default and it is already pluggable, but 
>>>>> there isn’t a Caffeine implementation present. I think one used to exist 
>>>>> and could be resurrected.
>>>>> 
>>>>> I personally also think that people should be able to scratch their own 
>>>>> itch row cache wise so removing it entirely just because it isn’t 
>>>>> commonly used isn’t the right move unless the feature is very far out of 
>>>>> scope for Cassandra.
>>>>> 
>>>>> Auto enabling/disabling the cache is a can of worms that could result in 
>>>>> performance and reliability inconsistency as the DB enables/disables the 
>>>>> cache based on heuristics when you don’t want it to. It being off by 
>>>>> default seems good enough to me.
>>>>> 
>>>>> RE forking, we could create a GitHub org for OHC and then add people to 
>>>>> it. There are some examples of dependencies that haven’t been contributed 
>>>>> to the project that live outside like CCM and JAMM.
>>>>> 
>>>>> Ariel
>>>>> 
>>>>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>>>>> 
>>>>> I would avoid taking away a feature even if it works in narrow set of 
>>>>> use-cases. I would instead suggest -
>>>>> 
>>>>> 1. Leave it disabled by default.
>>>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>>>> it off. Cassandra should ideally detect this and do it automatically.
>>>>> 3. Move to Caffeine instead of OHC.
>>>>> 
>>>>> I would suggest having this as the middle ground.
>>>>> 
>>>>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in 
>>>>> a later release
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I'm for deprecating and removing it.
>>>>> It constantly trips users up and just causes pain.
>>>>> 
>>>>> Yes it works in some very narrow situations, but those situations often 
>>>>> change over time and again just bites the user.  Without the row-cache I 
>>>>> believe users would quickly find other, more suitable and lasting, 
>>>>> solutions.
>>>>> 
>>>>> 
>> 
> 


Re: Moving Semver4j from test to main dependencies

2023-12-15 Thread Josh McKenzie
+1

On Fri, Dec 15, 2023, at 1:29 PM, Derek Chen-Becker wrote:
> +1
> 
> Semver4j seems reasonable to me. I looked through the code and it's 
> relatively easy to understand. I'm not sure how easy it would be to replace 
> CassandraVersion, but that's not an immediate concern I guess.
> 
> Cheers,
> 
> Derek
> 
> On Fri, Dec 15, 2023 at 2:56 AM Jacek Lewandowski 
>  wrote:
>> Hi,
>> 
>> I'd like to add Semver4j to the production dependencies. It is currently on 
>> the test classpath. The library is pretty lightweight, licensed with MIT and 
>> has no transitive dependencies.
>> 
>> We need to represent the kernel version somehow in CASSANDRA-19196 and 
>> Semver4j looks as the right tool for it. Maybe at some point we can replace 
>> our custom implementation of CassandraVersion as well. 
>> 
>> Thanks,
>> - - -- --- -  -
>> Jacek Lewandowski
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 

Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Josh McKenzie
> I have reached out to the original maintainer about it and it seems like if 
> we want to keep using it we will need to start releasing it under a new 
> package from a different repo.

> the current maintainer is not interested in donating it to the ASF
Is that the case Ariel or could you just not reach Robert?

On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>> from a maintenance and
>> integration testing perspective I think it would be better to keep the
>> ohc in-tree, so we will be aware of any issues immediately after the
>> full CI run.
> 
> From the original email bringing OHC in tree is not an option because the 
> current maintainer is not interested in donating it to the ASF.  Thus the 
> option 1 of some set of people forking it to their own github org and 
> maintaining a version outside of the ASF C* project.
> 
> -Jeremiah
> 
> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>> Ariel,
>> thank you for bringing this topic to the ML.
>> 
>> I may be missing something, so correct me if I'm wrong somewhere in
>> the management of the Cassandra ecosystem.  As I see it, the problem
>> right now is that if we fork the ohc and put it under its own root,
>> the use of that row cache is still not well tested (the same as it is
>> now). I am particularly emphasising the dependency management side, as
>> any version change/upgrade in Cassandra and, as a result of that
>> change a new set of libraries in the classpath should be tested
>> against this integration.
>> 
>> So, unless it is being widely used by someone else outside of the
>> community (which it doesn't seem to be), from a maintenance and
>> integration testing perspective I think it would be better to keep the
>> ohc in-tree, so we will be aware of any issues immediately after the
>> full CI run.
>> 
>> I'm also +1 for not deprecating it, even if it is used in narrow
>> cases, while the cost of maintaining its source code remains quite low
>> and it brings some benefits.
>> 
>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> To add some additional context.
>>> 
>>> The row cache is disabled by default and it is already pluggable, but there 
>>> isn’t a Caffeine implementation present. I think one used to exist and 
>>> could be resurrected.
>>> 
>>> I personally also think that people should be able to scratch their own 
>>> itch row cache wise so removing it entirely just because it isn’t commonly 
>>> used isn’t the right move unless the feature is very far out of scope for 
>>> Cassandra.
>>> 
>>> Auto enabling/disabling the cache is a can of worms that could result in 
>>> performance and reliability inconsistency as the DB enables/disables the 
>>> cache based on heuristics when you don’t want it to. It being off by 
>>> default seems good enough to me.
>>> 
>>> RE forking, we could create a GitHub org for OHC and then add people to it. 
>>> There are some examples of dependencies that haven’t been contributed to 
>>> the project that live outside like CCM and JAMM.
>>> 
>>> Ariel
>>> 
>>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>>> 
>>> I would avoid taking away a feature even if it works in narrow set of 
>>> use-cases. I would instead suggest -
>>> 
>>> 1. Leave it disabled by default.
>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>> it off. Cassandra should ideally detect this and do it automatically.
>>> 3. Move to Caffeine instead of OHC.
>>> 
>>> I would suggest having this as the middle ground.
>>> 
>>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>>> 
>>> 
>>> 
>>> 
>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>>> later release
>>> 
>>> 
>>> 
>>> 
>>> I'm for deprecating and removing it.
>>> It constantly trips users up and just causes pain.
>>> 
>>> Yes it works in some very narrow situations, but those situations often 
>>> change over time and again just bites the user.  Without the row-cache I 
>>> believe users would quickly find other, more suitable and lasting, 
>>> solutions.
>>> 
>>> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-15 Thread Josh McKenzie
> First of all - when you want to have a parameterized test case you do not 
> have to make the whole test class parameterized - it is per test case. Also, 
> each method can have different parameters.
This is a pretty compelling improvement to me having just had to use the 
somewhat painful and blunt instrument of our current framework's 
parameterization; pretty clunky and broad.

It also looks like they moved to a "test engine abstracted away from test 
identification" approach to their architecture in 5 w/the "vintage" model 
providing native unchanged backwards-compatibility w/junit 4. Assuming they 
didn't bork up their architecture that *should* lower risk of the framework 
change leading to disruption or failure (famous last words...).

A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly 
interesting annotation-based approach to property testing. Curious if you've 
looked into or used that at all David (Capwell)? (link for the lazy: 
https://jqwik.net/docs/current/user-guide.html#detailed-table-of-contents).

On Tue, Dec 12, 2023, at 11:39 AM, Jacek Lewandowski wrote:
> First of all - when you want to have a parameterized test case you do not 
> have to make the whole test class parameterized - it is per test case. Also, 
> each method can have different parameters.
> 
> For the extensions - we can have extensions which provide Cassandra 
> configuration, extensions which provide a running cluster and others. We 
> could for example apply some extensions to all test classes externally 
> without touching those classes, something like logging the begin and end of 
> each test case. 
> 
> 
> 
> wt., 12 gru 2023 o 12:07 Benedict  napisał(a):
>> 
>> Could you give (or link to) some examples of how this would actually benefit 
>> our test suites?
>> 
>> 
>>> On 12 Dec 2023, at 10:51, Jacek Lewandowski  
>>> wrote:
>>> 
>>> I have two major pros for JUnit 5:
>>> - much better support for parameterized tests
>>> - global test hooks (automatically detectable extensions) + 
>>> multi-inheritance
>>> 
>>> 
>>> 
>>> 
>>> pon., 11 gru 2023 o 13:38 Benedict  napisał(a):
 
 Why do we want to move to JUnit 5? 
 
 I’m generally opposed to churn unless well justified, which it may be - 
 just not immediately obvious to me.
 
 
> On 11 Dec 2023, at 08:33, Jacek Lewandowski  
> wrote:
> 
> Nobody referred so far to the idea of moving to JUnit 5, what are the 
> opinions?
> 
> 
> 
> niedz., 10 gru 2023 o 11:03 Benedict  napisał(a):
>> 
>> Alex’s suggestion was that we meta randomise, ie we randomise the config 
>> parameters to gain better rather than lesser coverage overall. This 
>> means we cover these specific configs and more - just not necessarily on 
>> any single commit.
>> 
>> I strongly endorse this approach over the status quo.
>> 
>> 
>>> On 8 Dec 2023, at 13:26, Mick Semb Wever  wrote:
>>> 
>>>  
>>>  
>>>  
 
> I think everyone agrees here, but…. these variations are still 
> catching failures, and until we have an improvement or replacement we 
> do rely on them.   I'm not in favour of removing them until we have 
> proof /confidence that any replacement is catching the same failures. 
>  Especially oa, tries, vnodes. (Not tries and offheap is being 
> replaced with "latest", which will be valuable simplification.)  
 
 What kind of proof do you expect? I cannot imagine how we could prove 
 that because the ability of detecting failures results from the 
 randomness of those tests. That's why when such a test fail you 
 usually cannot reproduce that easily.
>>> 
>>> 
>>> Unit tests that fail consistently but only on one configuration, should 
>>> not be removed/replaced until the replacement also catches the failure.
>>>  
 We could extrapolate that to - why we only have those configurations? 
 why don't test trie / oa + compression, or CDC, or system memtable? 
>>> 
>>> 
>>> Because, along the way, people have decided a certain configuration 
>>> deserves additional testing and it has been done this way in lieu of 
>>> any other more efficient approach.
>>> 
>>> 
>>> 


Re: Custom FSError and CommitLog Error Handling

2023-12-15 Thread Josh McKenzie
Adding a poison-pill error option on finding of corrupt data makes sense to me. 
Not sure if there's enough demand / other customization being done in this 
space to justify the user customizable aspect; any immediate other approaches 
come to mind? If not, this isn't an area of the code that's changed all that 
much, so just adding a new option seems surgical and minimal to me.

On Tue, Dec 12, 2023, at 4:21 AM, Claude Warren, Jr via dev wrote:
> I can see this as a strong improvement in Cassandra management and support 
> it. 
> 
> +1 non binding
> 
> On Mon, Dec 11, 2023 at 8:28 PM Raymond Huffman  
> wrote:
>> Hello All,
>> 
>> On our fork of Cassandra, we've implemented some custom behavior for 
>> handling CommitLog and SSTable Corruption errors. Specifically, if a node 
>> detects one of those errors, we want the node to stop itself, and if the 
>> node is restarted, we want initialization to fail. This is useful in 
>> Kubernetes when you expect nodes to be restarted frequently and makes our 
>> corruption remediation workflows less error-prone. I think we could make 
>> this behavior more pluggable by allowing users to provide custom 
>> implementations of the FSErrorHandler, and the error handler that's 
>> currently implemented at 
>> org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in 
>> the same way one can provide custom Partitioners and 
>> Authenticators/Authorizers.
>> 
>> Would you take as a contribution one of the following?
>> 1. user provided implementations of FSErrorHandler and 
>> CommitLogErrorHandler, set via config; and/or
>> 2. new commit failure and disk failure policies that write a poison pill 
>> file to disk and fail on startup if that file exists
>> 
>> The poison pill implementation is what we currently use - we call this a 
>> "Non Transient Error" and we want these states to always require manual 
>> intervention to resolve, including manual action to clear the error. I'd be 
>> happy to contribute this if other users would find it beneficial. I had 
>> initially shared this question in Slack, but I'm now sharing it here for 
>> broader visibility.
>> 
>> -Raymond Huffman


Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-15 Thread Josh McKenzie
> Goals
>  • Introduce a Cascades(2) query optimizer with rules easily extendable 
>  • Improve query performance for most common queries
>  • Add support for EXPLAIN and EXPLAIN ANALYZE to help with query 
> optimization and troubleshooting
>  • Lay the groundwork for the addition of features like joins, subqueries, 
> OR/NOT and index ordering
>  • Put in place some performance benchmarks to validate query optimizations
I think these are sensible goals. We're possibly going to face a chicken-or-egg 
problem with a feature like this that so heavily intersects with other as-yet 
written features where much of the value is in the intersection of them; if we 
continue down the current "one heuristic to rule them all" query planning 
approach we have now, we'll struggle to meaningfully explore or conceptualize 
the value of potential alternatives different optimizers could present us. Flip 
side, to Benedict's point, until SAI hits and/or some other potential future 
things we've all talked about, this cbo would likely fall directly into the 
same path that we effectively have hard-coded today (primary index path only).

One thing I feel pretty strongly about: even if the only outcome of all this 
work were to tighten up inconsistencies in our grammar and provide more robust 
EXPLAIN and EXPLAIN ANALYZE functionality to our end users, I think that would 
be highly valuable. This path of "only" would be predicated on us not having 
successful introduction of a robust secondary index implementation and a 
variety of other things we have a lot of interest in, so I find it unlikely, 
but worth calling out.

re: the removal of ALLOW FILTERING - is there room for compromise here and 
instead converting it to a guardrail that defaults to being enabled? That could 
theoretically give us a more gradual path to migration to a cost-based 
guardrail for instance, and would preserve the current robustness of the system 
while making it at least a touch more configurable.

On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote:
> Thanks for time in addressing concerns. At least with initial versions, as 
> long as there is a way to replace it with noop or disable it I would be 
> happy. This is pretty standard practice with features nowadays but I wanted 
> to highlight it as this might require some pretty tight coupling.
> 
> Chris
> 
> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer  wrote:
>> Hey Chris,
>> You raise some valid points.
>> 
>> I believe that there are 3 points that you mentioned:
>> 1) CQL restrictions are some form of safety net and should be kept
>> 2) A lot of Cassandra features do not scale and/or are too easy to use in a 
>> wrong way that can make the whole system collapse. We should not add more to 
>> that list. Especially not joins.
>> 
>> 3) Should we not start to fix features like secondary index rather than 
>> adding new ones? Which is heavily linked to 2).
>> 
>> Feel free to correct me if I got them wrong or missed one.
>> 
>> Regarding 1), I believe that you refer to the "Removing unnecessary CQL 
>> query limitations and inconsistencies" section. We are not planning to 
>> remove any safety net here.
>> What we want to remove is a certain amount of limitations which make things 
>> confusing for a user trying to write a query for no good reason. Like "why 
>> can I define a column alias but not use it anywhere in my query?" or "Why 
>> can I not create a list with 2 bind parameters?". While refactoring some CQL 
>> code, I kept on finding those types of exceptions that we can easily remove 
>> while simplifying the code at the same time.
>> 
>> For 2), I agree that at a certain scale or for some scenarios, some features 
>> simply do not scale or catch users by surprise. The goal of the CEP is to 
>> improve things in 2 ways. One is by making Cassandra smarter in the way it 
>> chooses how to process queries, hopefully improving its overall scalability. 
>> The other by being transparent about how Cassandra will execute the queries 
>> through the use of EXPLAIN. One problem of GROUP BY for example is that most 
>> users do not realize what is actually happening under the hood and therefore 
>> its limitations. I do not believe that EXPLAIN will change everything but it 
>> will help people to get a better understanding of the limitations of some 
>> features.
>> 
>> I do not know which features will be added in the future to C*. That will be 
>> discussed through some future CEPs. Nevertheless, I do not believe that it 
>> makes sense to write a CEP for a query optimizer without taking into account 
>> that we might at some point add some level of support for joins or 
>> subqueries. We have been too often delivering features without looking at 
>> what could be the possible evolutions which resulted in code where adding 
>> new features was more complex than it should have been. I do not want to 
>> make the same mistake. I want to create an optimizer that can be improved 
>> easily and 

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-08 Thread Josh McKenzie
> Unit tests that fail consistently but only on one configuration, should not 
> be removed/replaced until the replacement also catches the failure.

> along the way, people have decided a certain configuration deserves 
> additional testing and it has been done this way in lieu of any other more 
> efficient approach.

Totally agree with these sentiments as well as the framing of our current unit 
tests as "bad fuzz-tests thanks to non-determinism".

To me, this reinforces my stance on a "pre-commit vs. post-commit" approach to 
testing *with our current constraints:*
 • Test the default configuration on all supported JDK's pre-commit
 • Post-commit, treat *consistent *failures on non-default configurations as 
immediate interrupts to the author that introduced them
 • Pre-release, push for no consistent failures on any suite in any 
configuration, and no regressions in flaky tests from prior release (in ASF CI 
env).
I think there's value in having the non-default configurations, but I'm not 
convinced the benefits outweigh the costs *specifically in terms of pre-commit 
work* due to flakiness in the execution of the software env itself, not to 
mention hardware env variance on the ASF side today.

All that said - if we got to a world where we could run our jvm-based tests 
deterministically within the simulator, my intuition is that we'd see a lot of 
the test-specific, non-defect flakiness reduced drastically. In such a world 
I'd be in favor of running :allthethings: pre-commit as we'd have *much* higher 
confidence that those failures were actually attributable to the author of 
whatever diff the test is run against. 

On Fri, Dec 8, 2023, at 8:25 AM, Mick Semb Wever wrote:
>  
>  
>  
>> 
>>> I think everyone agrees here, but…. these variations are still catching 
>>> failures, and until we have an improvement or replacement we do rely on 
>>> them.   I'm not in favour of removing them until we have proof /confidence 
>>> that any replacement is catching the same failures.  Especially oa, tries, 
>>> vnodes. (Not tries and offheap is being replaced with "latest", which will 
>>> be valuable simplification.)  
>> 
>> What kind of proof do you expect? I cannot imagine how we could prove that 
>> because the ability of detecting failures results from the randomness of 
>> those tests. That's why when such a test fail you usually cannot reproduce 
>> that easily.
> 
> 
> Unit tests that fail consistently but only on one configuration, should not 
> be removed/replaced until the replacement also catches the failure.
>  
>> We could extrapolate that to - why we only have those configurations? why 
>> don't test trie / oa + compression, or CDC, or system memtable? 
> 
> 
> Because, along the way, people have decided a certain configuration deserves 
> additional testing and it has been done this way in lieu of any other more 
> efficient approach.
> 
> 
> 


Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Josh McKenzie
Congrats Mike! Good to see this recognition of your contributions to the 
project!

On Fri, Dec 8, 2023, at 10:02 AM, Patrick McFadin wrote:
> Yay! Congratulations Mike. Well deserved!
> 
> On Fri, Dec 8, 2023 at 7:00 AM Andrés de la Peña  wrote:
>> Congrats Mike!
>> 
>> On Fri, 8 Dec 2023 at 14:53, Jeremiah Jordan  
>> wrote:
>>> Congrats Mike!  Thanks for all your work on SAI and Vector index.  Well 
>>> deserved!
>>> 
>>> On Dec 8, 2023 at 8:52:07 AM, Brandon Williams  wrote:
 Congratulations Mike!
 
 Kind Regards,
 Brandon
 
 On Fri, Dec 8, 2023 at 8:41 AM Benjamin Lerer  wrote:
> 
> The PMC members are pleased to announce that Mike Adamson has accepted
> the invitation to become committer.
> 
> Thanks a lot, Mike, for everything you have done for the project.
> 
> Congratulations and welcome
> 
> The Apache Cassandra PMC members


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Josh McKenzie
> that may be long-running and that could be run indefinitely
Perfect. That was the distinction I wasn't aware of. Also means having the burn 
target as part of regular CI runs is probably a mistake, yes? i.e. if someone 
adds a burn tests that runs indefinitely, are there any guardrails or built-in 
checks or timeouts to keep it from running right up to job timeout and then 
failing?

On Thu, Nov 30, 2023, at 1:11 PM, Benedict wrote:
> 
> A burn test is a randomised test targeting broad coverage of a single system, 
> subsystem or utility, that may be long-running and that could be run 
> indefinitely, each run providing incrementally more assurance of quality of 
> the system.
> 
> A long test is a unit test that sometimes takes a long time to run, no more 
> no less. I’m not sure any of these offer all that much value anymore, and 
> perhaps we could look to deprecate them.
> 
>> On 30 Nov 2023, at 17:20, Josh McKenzie  wrote:
>> 
>> Strongly agree. I started working on a declarative refactor out of our CI 
>> configuration so circle, ASFCI, and other systems could inherit from it (for 
>> instance, see pre-commit pipeline declaration here 
>> <https://github.com/apache/cassandra/pull/2554/files#diff-a4c4d1d91048841f76d124386858bda9944644cfef8ccb4ab84319cedaf5b3feR71-R89>);
>>  I had to set that down while I finished up implementing an internal CI 
>> system since the code in neither the ASF CI structure nor circle structure 
>> (.sh embedded in .yml /cry) was re-usable in their current form.
>> 
>> Having a jvm.options and cassandra.yaml file per suite and referencing them 
>> from a declarative job definition 
>> <https://github.com/apache/cassandra/pull/2554/files#diff-a4c4d1d91048841f76d124386858bda9944644cfef8ccb4ab84319cedaf5b3feR237-R267>
>>  would make things a lot easier to wrap our heads around and maintain I 
>> think.
>> 
>> As for what qualifies as burn vs. long... /shrug couldn't tell you. Would 
>> have to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone 
>> else on-list knows.
>> 
>> On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:
>>> Hi,
>>> 
>>> I'm getting a bit lost - what are the exact differences between those test 
>>> scenarios? What are the criteria for qualifying a test to be part of a 
>>> certain scenario?
>>> 
>>> I'm working a little bit with tests and build scripts and the number of 
>>> different configurations for which we have a separate target in the build 
>>> starts to be problematic, I cannot imagine how problematic it is for a new 
>>> contributor.
>>> 
>>> It is not urgent, but we should at least have a plan on how to simplify and 
>>> unify things.
>>> 
>>> I'm in favour of reducing the number of test targets to the minimum - for 
>>> different configurations I think we should provide a parameter pointing to 
>>> jvm options file and maybe to cassandra.yaml. I know that we currently do 
>>> some super hacky things with cassandra yaml for different configs - like 
>>> concatenting parts of it. I presume it is not necessary - we can have a 
>>> default test config yaml and a directory with overriding yamls; while 
>>> building we could have a tool which is able to load the default 
>>> configuration, apply the override and save the resulting yaml somewhere in 
>>> the build/test/configs for example. That would allows us to easily use 
>>> those yamls in IDE as well - currently it is impossible.
>>> 
>>> What do you think?
>>> 
>>> Thank you and my apologize for bothering about lower priority stuff while 
>>> we have a 5.0 release headache...
>>> 
>>> Jacek
>>> 
>> 


Re: Removal of deprecations added in Cassandra 3.x

2023-11-30 Thread Josh McKenzie
> Personally, I think the removal of the deprecated code which was marked like 
> that in 3.x is quite safe to do in 5.x but I have to ask broader audience to 
> have a consensus.
Safe for us, sure. Safe for our users, not so much. No amount of including it 
in release notes guarantees they'll see it, and to Mick's point:

> Anything that is public (user-facing) and is isolated code having little cost 
> to it should just be left.
Strong +1 to this sentiment.

On Thu, Nov 30, 2023, at 10:33 AM, Mick Semb Wever wrote:
>> Personally, I think the removal of the deprecated code which was marked like 
>> that in 3.x is quite safe to do in 5.x but I have to ask broader audience to 
>> have a consensus.
> 
> 
> Strawman:
> Evaluate the cost and risk to us by having to keep the code.
> Weigh that against the effort it takes for users to adjust their prod 
> systems, and assume they are many orders of magnitude more than us.
> 
> Anything that is public (user-facing) and is isolated code having little cost 
> to it should just be left.
>   
>>  I think that what is "private" might go away in 5.x easily.
> 
> 
> Yes.
> 
> 
> 


Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-11-30 Thread Josh McKenzie
Strongly agree. I started working on a declarative refactor out of our CI 
configuration so circle, ASFCI, and other systems could inherit from it (for 
instance, see pre-commit pipeline declaration here 
);
 I had to set that down while I finished up implementing an internal CI system 
since the code in neither the ASF CI structure nor circle structure (.sh 
embedded in .yml /cry) was re-usable in their current form.

Having a jvm.options and cassandra.yaml file per suite and referencing them 
from a declarative job definition 

 would make things a lot easier to wrap our heads around and maintain I think.

As for what qualifies as burn vs. long... /shrug couldn't tell you. Would have 
to go down the git blame + dev ML + JIRA rabbit hole. :) Maybe someone else 
on-list knows.

On Thu, Nov 30, 2023, at 4:25 AM, Jacek Lewandowski wrote:
> Hi,
> 
> I'm getting a bit lost - what are the exact differences between those test 
> scenarios? What are the criteria for qualifying a test to be part of a 
> certain scenario?
> 
> I'm working a little bit with tests and build scripts and the number of 
> different configurations for which we have a separate target in the build 
> starts to be problematic, I cannot imagine how problematic it is for a new 
> contributor.
> 
> It is not urgent, but we should at least have a plan on how to simplify and 
> unify things.
> 
> I'm in favour of reducing the number of test targets to the minimum - for 
> different configurations I think we should provide a parameter pointing to 
> jvm options file and maybe to cassandra.yaml. I know that we currently do 
> some super hacky things with cassandra yaml for different configs - like 
> concatenting parts of it. I presume it is not necessary - we can have a 
> default test config yaml and a directory with overriding yamls; while 
> building we could have a tool which is able to load the default 
> configuration, apply the override and save the resulting yaml somewhere in 
> the build/test/configs for example. That would allows us to easily use those 
> yamls in IDE as well - currently it is impossible.
> 
> What do you think?
> 
> Thank you and my apologize for bothering about lower priority stuff while we 
> have a 5.0 release headache...
> 
> Jacek
> 


Re: [VOTE] Release Harry 0.0.2

2023-11-29 Thread Josh McKenzie
+1

On Wed, Nov 29, 2023, at 7:03 AM, Brandon Williams wrote:
> +1
> 
> Kind Regards,
> Brandon
> 
> On Wed, Nov 29, 2023 at 5:15 AM Alex Petrov  wrote:
> >
> > Even though we would like to bring harry in-tree, this is not an immediate 
> > priority. Meanwhile, we need to unblock RPM builds for trunk, which means 
> > no custom jars. We will have at least one more Harry release with 
> > outstanding features to avoid any blocking.
> >
> > Proposing the test build of cassandra-harry 0.0.2 for release, for TCM 
> > purposes.
> >
> > Repository:
> > https://gitbox.apache.org/repos/asf?p=cassandra-harry.git;a=shortlog;h=refs/tags/0.0.2
> >
> > Candidate SHA:
> > https://github.com/apache/cassandra-harry/commit/37761ce599242a34b027baff520e1185b7a7c3af
> > tagged with 0.0.2
> >
> > Artifacts:
> > https://repository.apache.org/content/repositories/orgapachecassandra-1320
> >
> > Key signature: A4C465FEA0C552561A392A61E91335D77E3E87CB
> >
> > Prominent changes:
> >
> > [CASSANDRA-18768] Improvements / changes required for Transactional 
> > Metadata testing:
> >   * Add an ability to run sequential r/w for more deterministic 
> > results
> >   * Implement Network Topology Strategy
> >   * Add all pds iterator to ops selector
> >   * Make sure to log when detecting that a run starts against a 
> > dirty table
> >   * Fix a concurrency issue with reorder buffer
> >   * Add some safety wheels / debugging instruments
> >   * Add a pd selector symmetry test
> >   * Make it simpler to write and log
> >   * Rename sequential rw to write before read
> >   * Avoid starving writers by readers and vice versa
> >   * Add a minimal guide for debugging falsifications
> >   * Fix select peers query for local state checker
> >   * Add examples for programmatic configuration
> >
> > [CASSANDRA-18318] Implement parsing schema provider
> > [CASSANDRA-18315] Trigger exception if we run out of partitions
> > [CASSANDRA-17603] Allow selecting subsets of columns and wilcard queries.
> > [CASSANDRA-17603] Open API for hand-crafting both mutation and read queries
> > [CASSANDRA-17603] Make it possible to run multiple Harry runners 
> > concurrently against the same keyspace
> > [CASSANDRA-17603] Implement concurrent quiescent checker
> > [CASSANDRA-17603] Pull in token util from Cassandra to avoid circular 
> > dependency
> > [CASSANDRA-17603] Pull in Cassandra concurrent utils until there is a 
> > common shared library
> >
> > The vote will be open for 24 hours. Everyone who has tested the build
> > is invited to vote. Votes by PMC members are considered binding. A
> > vote passes if there are at least three binding +1s.
> 

Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-28 Thread Josh McKenzie
Congrats Francisco!

On Tue, Nov 28, 2023, at 2:37 PM, Melissa Logan wrote:
> Congrats Francisco!
> 
> On Tue, Nov 28, 2023 at 11:34 AM Vinay Chella  wrote:
>> Congratulations Francisco !!
>> 
>> Thanks,
>> Vinay Chella
>> 
>> 
>> On Tue, Nov 28, 2023 at 11:24 AM Mick Semb Wever  wrote:
>>> 
>>> 
>>> On Tue, 28 Nov 2023 at 19:54, Dinesh Joshi  wrote:
 The PMC members are pleased to announce that Francisco Guerrero Hernandez 
 has accepted
 the invitation to become committer today.
 
 Congratulations and welcome!
>>> 
>>> 
>>> Congrats !!


Re: [DISCUSS] CASSANDRA-19113: Publishing dtest-shaded JARs on release

2023-11-28 Thread Josh McKenzie
Building these jars every time we run every CI job is just silly.

+1.

On Tue, Nov 28, 2023, at 2:08 PM, Francisco Guerrero wrote:
> Hi Abe,
> 
> I'm +1 on this. Several Cassandra-ecosystem projects build the dtest jar in 
> CI. We'd very
> much prefer to just consumed shaded dtest jars from Cassandra releases for 
> testing
> purposes.
> 
> Best,
> - Francisco
> 
> On 2023/11/28 19:02:17 Abe Ratnofsky wrote:
> > Hey folks - wanted to raise a separate thread to discuss publishing of 
> > dtest-shaded JARs on release.
> > 
> > Currently, adjacent projects that want to use the jvm-dtest framework need 
> > to build the shaded JARs themselves. This is a decent amount of work, and 
> > is duplicated across each project. This is mainly relevant for projects 
> > like Sidecar and Driver. Currently, those projects need to clone and build 
> > apache/cassandra themselves, run ant dtest-jar, and move the JAR into the 
> > appropriate place. Different build systems treat local JARs differently, 
> > and the whole process can be a bit complicated. Would be great to be able 
> > to treat these as normal dependencies.
> > 
> > https://issues.apache.org/jira/browse/CASSANDRA-19113
> > 
> > Any objections?
> > 
> > --
> > Abe
> 


Re: CEP-21 - Transactional cluster metadata merged to trunk

2023-11-27 Thread Josh McKenzie
> on our internal CI system
Some more context:

This environment adheres to the requirements we laid out in pre-commit CI on 
Cassandra 

 with a couple required differences. We don't yet include the resource 
restriction detail in the test report; it's on my backlog of things to do but I 
can confirm that less CPU and <= equivalent ASFCI memory is being allocated for 
each test suite. I also had to go the route of extracting a blend of what's in 
circle and what's in ASF CI (in terms of test suites, filtering, etc) since 
neither represented a complete view of our CI ecosystem; there are currently 
things executed in either environment not executed in the other.

I've been tracking the upstreaming of that declarative combination in 
CASSANDRA-18731 but have had some other priorities take front-seat (i.e. 
getting a new CI system based on that working since neither upstream ASF CI nor 
circle are re-usable in their current form) and will be upstreaming that ASAP. 
https://issues.apache.org/jira/browse/CASSANDRA-18731

I've left a pretty long comment on CASSANDRA-18731 about the structure of 
things and where my opinion falls; *I think we need a separate DISCUSS thread 
on the ML about CI and what we require for pre-commit smoke* suites: 
https://issues.apache.org/jira/browse/CASSANDRA-18731?focusedCommentId=17790270=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17790270

The TL;DR:
> With an *incredibly large* patch in the form of TCM (88k+ LoC, 900+ files 
> touched), we have less than a .002% test failure injection rate using the 
> above restricted smoke heuristic, and many of them look to be circle ci env 
> specific and not asf ci.

>From a cursory inspection it looks like most of the breakages being tracked on 
>the ticket Sam linked for TCM are likely to be circle env specific (new *nix 
>optimized deletion having a race, OOM's, etc). The TCM merge is actually a 
>great forcing function for us to surface anything env specific in terms of 
>timing and resourcing up-front; I'm glad we have this opportunity but it's 
>unfortunate that it's been interpreted as merging w/out passing CI as opposed 
>to having some env-difference specific kinks to work out.

*This was an incredibly huge merge.* For comparison, I just did a --stat on the 
merge for CASSANDRA-8099:
> 645 files changed, 49381 insertions(+), 42227 deletions(-)

TCM from the C* repo:
>  934 files changed, 66185 insertions(+), 21669 deletions(-)
My gut tells me it's basically impossible to have a merge of this size that 
doesn't disrupt what it's merging into, or the authors just end up slowly dying 
in rebase hell. Or both. This was a massive undertaking and compared to our 
past on this project, has had an incredibly low impact on the target it was 
merged into and the authors are rapidly burning down failures.

To the authors - great work, and thanks for being so diligent on following up 
on any disruptions this body of work has caused to other contributors' 
environments.

To the folks who were disrupted - I get it. This is deeply frustrating, green 
CI has long been many of our white whale's, and having something merge over a 
US holiday week with an incredibly active project where we don't all have time 
to keep up with everything can make things like this feel like a huge surprise. 
It's incredibly unfortunate that the timing on us transitioning to this new CI 
system and working out the kinks is when this behemoth of a merge needed to 
come through, but silver-lining.

We're making great strides. Let's not lose sight of our growth because of the 
pain in the moment of it.

~Josh

p.s. - for the record, I don't think we should hold off on merging things just 
because some folks are on holiday. :)

On Mon, Nov 27, 2023, at 3:38 PM, Sam Tunnicliffe wrote:
> I ought to clarify, we did actually have green CI modulo 3 flaky tests on our 
> internal CI system. I've attached the test artefacts to CASSANDRA-18330 
> now[1][2]: 2 of the 3 failures are upgrade dtests, with 1 other python dtest 
> failure noted. None of these were reproducible in a dev setup, so we 
> suspected them to be environmental and intended to merge before returning to 
> confirm that. The "known" failures that we mentioned in the email that 
> started this thread were ones observed by Mick running the cep-21-tcm branch 
> through Circle before merging.  
> 
> As the CEP-21 changeset was approaching 88k LoC touching over 900 files, 
> permanently rebasing as we tried to eradicate every flaky test was simply 
> unrealistic, especially as other significant patches continued to land in 
> trunk. With that in mind, we took the decision to merge so that we could 
> focus on actually removing any remaining instability.
> 
> [1] https://issues.apache.org/jira/secure/attachment/13064727/ci_summary.html
> [2] 
> 

Re: [DISCUSS] Harry in-tree

2023-11-25 Thread Josh McKenzie
Strong +1 to including harry in-tree and further, integrating a harry stress 
soak into our pre-commit and post-commit CI.

On Fri, Nov 24, 2023, at 5:10 PM, Alex Petrov wrote:
> Unfortunately my Harry talk got declined. Of course I’ll be happy to talk 
> about Harry and how it can be useful for contributors and about people’s 
> expectations. My talk is going to be about TCM again this time.
> 
> I will make sure examples are in place and are expressive by the summit.
> 
> On Fri, Nov 24, 2023, at 6:18 PM, Jeremy Hanna wrote:
>> I'm excited for Harry to come in-tree to improve the project stability and 
>> quality.  I know you're doing a talk at the Cassandra Summit about Harry to 
>> go over it.  If there's anything that can be done as part of this process to 
>> improve onboarding for Harry too, that would be great.  I'm just thinking 
>> about examples and things like that so people new to Harry can more easily 
>> write and run tests, test new features, and have a standard process for 
>> reporting findings.
>> 
>> Thanks Alex and all involved!
>> 
>> Jeremy
>> 
>>> On Nov 24, 2023, at 9:43 AM, Alex Petrov  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> With TCM landed, there will be way more Harry tests in-tree: we are using 
>>> it for many coordination tests, and there's now a simulator test that uses 
>>> Harry. During development, Harry has allowed us to uncover and resolve 
>>> numerous elusive edge cases.
>>> 
>>> I had conversations with several folks, and wanted to propose to move 
>>> harry-core to Cassandra test tree. This will substantially 
>>> simplify/streamline co-development of Cassandra and Harry. With a new 
>>> HistoryBuilder API that has helped to find and trigger [1] [2] and [3], it 
>>> will also be much more approachable.
>>> 
>>> Besides making it easier for everyone to develop new fuzz tests, it will 
>>> also substantially lower the barrier to entry. Currently, debugging an 
>>> issue found by Harry involves a cumbersome process of rebuilding and 
>>> transferring jars between Cassandra and Harry, depending on which side you 
>>> modify. This not only hampers efficiency but also deters broader adoption. 
>>> By merging harry-core into the Cassandra test tree, we eliminate this 
>>> barrier.
>>> 
>>> Thank you,
>>> --Alex
>>> 
>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-19011
>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-18993
>>> [3] https://issues.apache.org/jira/browse/CASSANDRA-18932
> 


Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

2023-11-04 Thread Josh McKenzie
> I think before we cut a beta we need to have diagnosed and fixed 18993 
> (assuming it is a bug).
Before a beta? I could see that for rc or GA definitely, but having a known 
(especially non-regressive) data loss bug in a beta seems like it's compatible 
with the guarantees we're providing for it: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

> This release is recommended for test/QA clusters where short(order of 
> minutes) downtime during upgrades is not an issue


On Sat, Nov 4, 2023, at 12:56 PM, Ekaterina Dimitrova wrote:
> Totally agree with the others. Such an issue on its own should be a priority 
> in any release. Looking forward to the reproduction test mentioned on the 
> ticket.
> 
> Thanks to Alex for his work on harry!
> 
> On Sat, 4 Nov 2023 at 12:47, Benedict  wrote:
>> Alex can confirm but I think it actually turns out to be a new bug in 5.0, 
>> but either way we should not cut a release with such a serious potential 
>> known issue.
>> 
>> > On 4 Nov 2023, at 16:18, J. D. Jordan  wrote:
>> > 
>> > Sounds like 18993 is not a regression in 5.0? But present in 4.1 as well? 
>> >  So I would say we should fix it with the highest priority and get a new 
>> > 4.1.x released. Blocking 5.0 beta voting is a secondary issue to me if we 
>> > have a “data not being returned” issue in an existing release?
>> > 
>> >> On Nov 4, 2023, at 11:09 AM, Benedict  wrote:
>> >> 
>> >> I think before we cut a beta we need to have diagnosed and fixed 18993 
>> >> (assuming it is a bug).
>> >> 
>>  On 4 Nov 2023, at 16:04, Mick Semb Wever  wrote:
>> >>> 
>> >>> 
>>  
>>  With the publication of this release I would like to switch the
>>  default 'latest' docs on the website from 4.1 to 5.0.  Are there any
>>  objections to this ?
>> >>> 
>> >>> 
>> >>> I would also like to propose the next 5.0 release to be 5.0-beta1
>> >>> 
>> >>> With the aim of reaching GA for the Summit, I would like to suggest we
>> >>> work towards the best-case scenario of 5.0-beta1 in two weeks and
>> >>> 5.0-rc1 first week Dec.
>> >>> 
>> >>> I know this is a huge ask with lots of unknowns we can't actually
>> >>> commit to.  But I believe it is a worthy goal, and possible if nothing
>> >>> sideswipes us – but we'll need all the help we can get this month to
>> >>> make it happen.
>> >>


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-02 Thread Josh McKenzie
r want to modify or 
> clarify what’s meant. This just falls naturally out of how we do things here 
> I think, and is how we go about a lot of business already. It retains the 
> agility you were talking about, setting norms cheaply.
> 
> It isn’t however a tightly held policy or legislative cudgel, it’s just what 
> those who were talking and paying attention at the time agreed. It can be 
> chucked out or rewoven at zero cost, but if the norms have taken hold and are 
> broadly understood in the same way, it won’t change much or at all, because 
> the actual glue is the norm, not the words, which only serve to broadcast 
> some formulation of the norm.
> 
> 
> 
>> On 1 Nov 2023, at 23:41, Josh McKenzie  wrote:
>> 
>>> but binding to the same extent 2 committers reviewing something we later 
>>> need to revert is binding.
>> To elaborate a bit - what I mean is "it's a bar we apply to help establish a 
>> baseline level of consensus but it's very much a 2-way door". Obviously 2 
>> committers +1'ing code is a formal agreed upon voting mechanism.
>> 
>> On Wed, Nov 1, 2023, at 7:26 PM, Josh McKenzie wrote:
>>>> Community voting is also entirely by consensus, there is no such thing as 
>>>> a simple majority community vote, technical or otherwise.
>>> Ah hah! You're absolutely correct in that this isn't one of our "blessed" 
>>> ways we vote. There's nothing written down about "committers are binding, 
>>> simple majority" for any specific category of discussion.
>>> 
>>> Are we ok with people creatively applying different ways to vote for things 
>>> where there's not otherwise guidance if they feel it helps capture 
>>> sentiment and engagement? Obviously the outcome of that isn't binding in 
>>> the same way other votes by the pmc are, but binding to the same extent 2 
>>> committers reviewing something we later need to revert is binding.
>>> 
>>> I'd rather we have a bunch of committers weigh in if we're talking about 
>>> changing import ordering, or Config.java structure, or refactoring out 
>>> singletons, or gatekeeping CI - things we've had come up over the years 
>>> where we've had a lot of people chime in and we benefit from more than just 
>>> "2 committers agree on it" but less than "We need a CEP or pmc vote for 
>>> this".
>>> 
>>> 
>>> On Wed, Nov 1, 2023, at 5:10 PM, Benedict wrote:
>>>> 
>>>> The project governance document does not list any kind of general purpose 
>>>> technical change vote. There are only three very specific kinds of 
>>>> community vote: code contributions, CEP and release votes.  Community 
>>>> voting is also entirely by consensus, there is no such thing as a simple 
>>>> majority community vote, technical or otherwise. I suggest carefully 
>>>> re-reading the document we both formulated!
>>>> 
>>>> If it is a technical contribution, as you contest, we only need a normal 
>>>> technical contribution vote to override it - i.e. two committer +1s. If 
>>>> that’s how we want to roll with it, I guess we’re not really in 
>>>> disagreement.
>>>> 
>>>> None of this really fundamentally changes anything. There’s a strong norm 
>>>> for a commit gate on CI, and nobody is going to go about breaking this 
>>>> norm willy-nilly. But equally there’s no need to panic and waste all this 
>>>> time debating hypothetical mechanisms to avoid this supposedly ironclad 
>>>> rule.
>>>> 
>>>> We clearly need to address confusion over governance though. The idea that 
>>>> agreeing things carefully costs us agility is one I cannot endorse. The 
>>>> project has leaned heavily into the consensus side of the Apache Way, as 
>>>> evidenced by our governance document. That doesn’t mean things can’t 
>>>> change quickly, it just means *before those changes become formal 
>>>> requirements *there needs to be *broad* consensus, as defined in the 
>>>> governing document. That’s it.
>>>> 
>>>> The norm existed before the vote, and it exists whether the vote was valid 
>>>> or not. That is how things evolve on the project, we just formalise them a 
>>>> little more slowly.
>>>> 
>>>> 
>>>>> On 1 Nov 2023, at 20:07, Josh McKenzie  wrote:
>>>>> 
>>>>> First off, I appreciate your time and attention on this stuff. Want to be 
>>&g

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-01 Thread Josh McKenzie
> but binding to the same extent 2 committers reviewing something we later need 
> to revert is binding.
To elaborate a bit - what I mean is "it's a bar we apply to help establish a 
baseline level of consensus but it's very much a 2-way door". Obviously 2 
committers +1'ing code is a formal agreed upon voting mechanism.

On Wed, Nov 1, 2023, at 7:26 PM, Josh McKenzie wrote:
>> Community voting is also entirely by consensus, there is no such thing as a 
>> simple majority community vote, technical or otherwise.
> Ah hah! You're absolutely correct in that this isn't one of our "blessed" 
> ways we vote. There's nothing written down about "committers are binding, 
> simple majority" for any specific category of discussion.
> 
> Are we ok with people creatively applying different ways to vote for things 
> where there's not otherwise guidance if they feel it helps capture sentiment 
> and engagement? Obviously the outcome of that isn't binding in the same way 
> other votes by the pmc are, but binding to the same extent 2 committers 
> reviewing something we later need to revert is binding.
> 
> I'd rather we have a bunch of committers weigh in if we're talking about 
> changing import ordering, or Config.java structure, or refactoring out 
> singletons, or gatekeeping CI - things we've had come up over the years where 
> we've had a lot of people chime in and we benefit from more than just "2 
> committers agree on it" but less than "We need a CEP or pmc vote for this".
> 
> 
> On Wed, Nov 1, 2023, at 5:10 PM, Benedict wrote:
>> 
>> The project governance document does not list any kind of general purpose 
>> technical change vote. There are only three very specific kinds of community 
>> vote: code contributions, CEP and release votes.  Community voting is also 
>> entirely by consensus, there is no such thing as a simple majority community 
>> vote, technical or otherwise. I suggest carefully re-reading the document we 
>> both formulated!
>> 
>> If it is a technical contribution, as you contest, we only need a normal 
>> technical contribution vote to override it - i.e. two committer +1s. If 
>> that’s how we want to roll with it, I guess we’re not really in disagreement.
>> 
>> None of this really fundamentally changes anything. There’s a strong norm 
>> for a commit gate on CI, and nobody is going to go about breaking this norm 
>> willy-nilly. But equally there’s no need to panic and waste all this time 
>> debating hypothetical mechanisms to avoid this supposedly ironclad rule.
>> 
>> We clearly need to address confusion over governance though. The idea that 
>> agreeing things carefully costs us agility is one I cannot endorse. The 
>> project has leaned heavily into the consensus side of the Apache Way, as 
>> evidenced by our governance document. That doesn’t mean things can’t change 
>> quickly, it just means *before those changes become formal requirements 
>> *there needs to be *broad* consensus, as defined in the governing document. 
>> That’s it.
>> 
>> The norm existed before the vote, and it exists whether the vote was valid 
>> or not. That is how things evolve on the project, we just formalise them a 
>> little more slowly.
>> 
>> 
>>> On 1 Nov 2023, at 20:07, Josh McKenzie  wrote:
>>> 
>>> First off, I appreciate your time and attention on this stuff. Want to be 
>>> up front about that since these kinds of discussions can get prickly all 
>>> too easily. I'm *at least* as guilty as anyone else about getting my back 
>>> up on stuff like this. Figuring out the right things to "harden" as shared 
>>> contractual ways we behave and what to leave loose and case-by-case is 
>>> going to continue to be a challenge for us as we grow.
>>> 
>>> The last thing I personally want is for us to have too many extraneous 
>>> rules formalizing things that just serve to slow down peoples' ability to 
>>> contribute to the project. The flip side of that - for all of us to work in 
>>> a shared space and collectively remain maximally productive, some 
>>> individual freedoms (ability to merge a bunch of broken code and/or ninja 
>>> in things as we see fit, needing 2 committers' eyes on things, etc) will 
>>> have to be given up.
>>> 
>>> At it's core the discussion we had was prompted by divergence between 
>>> circle and ASF CI and our release process dragging on repeatedly during the 
>>> "stabilize ASF CI" phase. The "do we require green ci before merge of 
>>> tickets"

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-01 Thread Josh McKenzie
> Community voting is also entirely by consensus, there is no such thing as a 
> simple majority community vote, technical or otherwise.
Ah hah! You're absolutely correct in that this isn't one of our "blessed" ways 
we vote. There's nothing written down about "committers are binding, simple 
majority" for any specific category of discussion.

Are we ok with people creatively applying different ways to vote for things 
where there's not otherwise guidance if they feel it helps capture sentiment 
and engagement? Obviously the outcome of that isn't binding in the same way 
other votes by the pmc are, but binding to the same extent 2 committers 
reviewing something we later need to revert is binding.

I'd rather we have a bunch of committers weigh in if we're talking about 
changing import ordering, or Config.java structure, or refactoring out 
singletons, or gatekeeping CI - things we've had come up over the years where 
we've had a lot of people chime in and we benefit from more than just "2 
committers agree on it" but less than "We need a CEP or pmc vote for this".


On Wed, Nov 1, 2023, at 5:10 PM, Benedict wrote:
> 
> The project governance document does not list any kind of general purpose 
> technical change vote. There are only three very specific kinds of community 
> vote: code contributions, CEP and release votes.  Community voting is also 
> entirely by consensus, there is no such thing as a simple majority community 
> vote, technical or otherwise. I suggest carefully re-reading the document we 
> both formulated!
> 
> If it is a technical contribution, as you contest, we only need a normal 
> technical contribution vote to override it - i.e. two committer +1s. If 
> that’s how we want to roll with it, I guess we’re not really in disagreement.
> 
> None of this really fundamentally changes anything. There’s a strong norm for 
> a commit gate on CI, and nobody is going to go about breaking this norm 
> willy-nilly. But equally there’s no need to panic and waste all this time 
> debating hypothetical mechanisms to avoid this supposedly ironclad rule.
> 
> We clearly need to address confusion over governance though. The idea that 
> agreeing things carefully costs us agility is one I cannot endorse. The 
> project has leaned heavily into the consensus side of the Apache Way, as 
> evidenced by our governance document. That doesn’t mean things can’t change 
> quickly, it just means *before those changes become formal requirements 
> *there needs to be *broad* consensus, as defined in the governing document. 
> That’s it.
> 
> The norm existed before the vote, and it exists whether the vote was valid or 
> not. That is how things evolve on the project, we just formalise them a 
> little more slowly.
> 
> 
>> On 1 Nov 2023, at 20:07, Josh McKenzie  wrote:
>> 
>> First off, I appreciate your time and attention on this stuff. Want to be up 
>> front about that since these kinds of discussions can get prickly all too 
>> easily. I'm *at least* as guilty as anyone else about getting my back up on 
>> stuff like this. Figuring out the right things to "harden" as shared 
>> contractual ways we behave and what to leave loose and case-by-case is going 
>> to continue to be a challenge for us as we grow.
>> 
>> The last thing I personally want is for us to have too many extraneous rules 
>> formalizing things that just serve to slow down peoples' ability to 
>> contribute to the project. The flip side of that - for all of us to work in 
>> a shared space and collectively remain maximally productive, some individual 
>> freedoms (ability to merge a bunch of broken code and/or ninja in things as 
>> we see fit, needing 2 committers' eyes on things, etc) will have to be given 
>> up.
>> 
>> At it's core the discussion we had was prompted by divergence between circle 
>> and ASF CI and our release process dragging on repeatedly during the 
>> "stabilize ASF CI" phase. The "do we require green ci before merge of 
>> tickets" seems like it came along as an intuitive rider; best I can recall 
>> my thinking was "how else could we have a manageable load to stabilize in 
>> ASF CI if we didn't even require green circle before merging things in", but 
>> we didn't really dig into details; from a re-reading now, that portion of 
>> the discussion was just taken for granted as us being in alignment. Given it 
>> was a codifying a norm and everyone else in the discussion generally agreed, 
>> I don't think I or anyone thought to question it.
>> 
>> 
>>> “Votes on project structure *and governance”*. Governance, per Wikipedia, 
>>> is "the way rules, n

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-01 Thread Josh McKenzie
per Wikipedia, is 
> "the way rules, norms and actions are structured and sustained.”
> 
> 
> 
> I do not see any ambiguity here. The community side provides no basis for a 
> vote of this kind, while the PMC side specifically reserves this kind of 
> decision. But evidently we need to make this clearer.
> 
> 
> 
> Regarding the legitimacy of questioning this now: I have not come up against 
> this legislation before. The norm of requiring green CI has been around for a 
> lot longer than this vote, so nothing much changed until we started 
> questioning the *specifics* of this legislation. At this point, the 
> legitimacy of the decision also matters. Clearly there is broad support for a 
> policy of this kind, but is this specific policy adequate?
> 
> 
> 
> While I endorse the general sentiment of the policy, I do not endorse a 
> policy that has no wiggle room. I have made every effort in all of my 
> policy-making to ensure there are loosely-defined escape hatches for the 
> community to use, in large part to minimise this kind of legalistic logjam, 
> which is just wasted cycles.
> 
> 
> 
> 
> 
>> On 1 Nov 2023, at 15:31, Josh McKenzie  wrote:
>> 
>>> That vote thread also did not reach the threshold; it was incorrectly 
>>> counted, as committer votes are not binding for procedural changes. I 
>>> counted at most 8 PMC +1 votes. 
>> This piqued my curiosity.
>> 
>> Link to how we vote: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance
>> *STATUS: Ratified 2020/06/25*
>> 
>> Relevant bits here:
>>> On dev@:
>>> 
>>>  1. Discussion / binding votes on releases (Consensus: min 3 PMC +1, no -1)
>>>  2. Discussion / binding votes on project structure and governance changes 
>>> (adopting subprojects, how we vote and govern, etc). (super majority)
>> 
>> The thread where we voted on the CI bar Jeremiah referenced: 
>> https://lists.apache.org/thread/2shht9rb0l8fh2gfqx6sz9pxobo6sr60
>> Particularly relevant bit:
>>> Committer / pmc votes binding. Simple majority passes.
>> I think you're arguing that voting to change our bar for merging when it 
>> comes to CI falls under "votes on project structure"? I think when I called 
>> that vote I was conceptualizing it as a technical discussion about a shared 
>> norm on how we as committers deal with code contributions, where the 
>> "committer votes are binding, simple majority" applies.
>> 
>> I can see credible arguments in either direction, though I'd have expected 
>> those concerns or counter-arguments to have come up back in Jan of 2022 when 
>> we voted on the CI changes, not almost 2 years later after us operating 
>> under this new shared norm. The sentiments expressed on the discuss and vote 
>> thread were consistently positive and uncontentious; this feels to me like 
>> it falls squarely under the spirit of lazy consensus only at a much larger 
>> buy-in level than usual: 
>> https://community.apache.org/committers/decisionMaking.html#lazy-consensus
>> 
>> We've had plenty of time to call this vote and merge bar into question (i.e. 
>> every ticket we merge we're facing the "no regressions" bar), and the only 
>> reason I'd see us treating TCM or Accord differently would be because 
>> they're much larger bodies of work at merge so it's going to be a bigger 
>> lift to get to non-regression CI, and/or we would want a release cut from a 
>> formal branch rather than a feature branch for preview.
>> 
>> An alternative approach to keep this merge and CI burden lower would have 
>> been more incremental work merged into trunk periodically, an argument many 
>> folks in the community have made in the past. I personally have mixed 
>> feelings about it; there's pros and cons to both approaches.
>> 
>> All that said, I'm in favor of us continuing with this as a valid and 
>> ratified vote (technical norms == committer binding + simple majority). If 
>> we want to open a formal discussion about instead considering that a 
>> procedural change and rolling things back based on those grounds I'm fine 
>> with that, but we'll need to discuss that and think about the broader 
>> implications since things like changing import ordering, tooling, or other 
>> ecosystem-wide impacting changes (CI systems we all share, etc) would 
>> similarly potentially run afoul of needing supermajority pmc participation 
>> of we categorize that type of work as "project structure" as per the 
>> governance rules.
>> 
>> On T

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-01 Thread Josh McKenzie
gt;>>>>>>> 
>>>>>>>>>>>>>> On 25 Oct 2023, at 21:55, Benedict  wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am surprised this needs to be said, but - especially for 
>>>>>>>>>>>>>> long-running CEPs - you must involve yourself early, and 
>>>>>>>>>>>>>> certainly within some reasonable time of being notified the work 
>>>>>>>>>>>>>> is ready for broader input and review. In this case, more than 
>>>>>>>>>>>>>> six months ago.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This isn’t the first time this has happened, and it is 
>>>>>>>>>>>>>> disappointing to see it again. Clearly we need to make this 
>>>>>>>>>>>>>> explicit in the guidance docs.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regarding the release of 5.1, I understood the proposal to be 
>>>>>>>>>>>>>> that we cut an actual alpha, thereby sealing the 5.1 release 
>>>>>>>>>>>>>> from new features. Only features merged before we cut the alpha 
>>>>>>>>>>>>>> would be permitted, and the alpha should be cut as soon as 
>>>>>>>>>>>>>> practicable. What exactly would we be waiting for? 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If we don’t have a clear and near-term trigger for branching 5.1 
>>>>>>>>>>>>>> for its own release, shortly after Accord and TCM merge, then I 
>>>>>>>>>>>>>> am in favour of instead delaying 5.0.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 25 Oct 2023, at 19:40, Mick Semb Wever  
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm open to the suggestions of not branching cassandra-5.1 
>>>>>>>>>>>>>>> and/or naming a preview release something other than 5.1-alpha1.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But… the codebases and release process (and upgrade tests) do 
>>>>>>>>>>>>>>> not currently support releases with qualifiers that are not 
>>>>>>>>>>>>>>> alpha, beta, or rc.  We can add a preview qualifier to this 
>>>>>>>>>>>>>>> list, but I would not want to block getting a preview release 
>>>>>>>>>>>>>>> out only because this wasn't yet in place.  
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hence the proposal used 5.1-alpha1 simply because that's what 
>>>>>>>>>>>>>>> we know we can do today.  An alpha release also means 
>>>>>>>>>>>>>>> (typically) the branch.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is anyone up for looking into adding a "preview" qualifier to 
>>>>>>>>>>>>>>> our release process? 
>>>>>>>>>>>>>>> This may also solve our previous discussions and desire to have 
>>>>>>>>>>>>>>> quarterly releases that folk can use through the trunk dev 
>>>>>>>>>>>>>>> cycle.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Personally, with my understanding of timelines in front of us 
>>>>>>>>>>>>>>> to fully review and stabilise tcm, it makes sense to branch it 
>>>>>>>>>>>>>>> as soon as it's merged.  It's easiest to stabilise it on a 
>>>>>>>>>>>>>>> branch, and there's definitely the desire and de

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-28 Thread Josh McKenzie
> ACCORD in particular was hyped in numerous talks and presentations and noone 
> cautioned it might not hit 5.0, quite the opposite
We need to be very careful in the future about how we communicate the 
availability of future novel work, especially when the ones promoting the 
delivery of that work and timelines aren't the ones actively working on the 
code. And to be explicit - I don't think there's any bad actors here; I think 
this is a natural consequence of specialization of skills and focus in the 
community as well as disjoint between different groups of people.

Also, it's become clear to me that we still weren't all in alignment on our 
view of "do we ship 5.0 based on a date or do we ship 5.0 based on feature 
availability". Since we're still going through some evolution in our release 
philosophy (train vs. feature, etc), this is to be expected. We're getting 
there.

Having a marketing working group has helped bridge this gap, and getting more 
participation from other people in the community on that effort would help 
align more of us.


On Fri, Oct 27, 2023, at 5:00 PM, German Eichberger via dev wrote:
> Definitely want to second Josh. When I reached out on the ACCORD channel 
> about testing folks were super helpful and transparent about bugs, etc.
> 
> Frankly, I was pretty frustrated that ACCORD+TCM slipped. I was looking 
> forward to it and felt let down - but I also haven't done anything to help 
> other than trying it out. So, I only have myself to blame... 
> 
> That there was a surprise for many of us that it slipped is an indication 
> there wasn't enough communication - we should probably rethink how we 
> communicate progress, especially on long running and highly anticipated 
> initiatives. Maybe a paragraph in the "Project Status Update" (but then we 
> need more frequent updates ) -- or send a separate update e-mail or as Maxim 
> is suggesting to some newly created release list. 
> 
> A highly anticipated feature has more visibility and we need to account for 
> that with more communication other than the usual channels. ACCORD in 
> particular was hyped in numerous talks and presentations and noone cautioned 
> it might not hit 5.0, quite the opposite --so we need to ask ourselves how 
> people who go on stage as Cassandra experts are not aware that it could slip. 
> That's where I think more communication could help -- 
> 
> 
> Thanks,
> German
> 
> 
> 
> 
> 
> *From:* Josh McKenzie 
> *Sent:* Friday, October 27, 2023 10:13 AM
> *To:* dev 
> *Subject:* [EXTERNAL] Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and 
> cut an immediate 5.1-alpha1)
>  
> Lots of threads of thought have popped up here. The big one I feel needs to 
> be clearly addressed and inspected is the implication of development not 
> happening transparently and not being inclusive or available for 
> participation by the community on major features.
> 
> The CEP process + dedicated development channels on ASF slack + public JIRA's 
> + feature branches in the ASF repo we've seen with specifically TCM and 
> Accord are the most transparent this kind of development has *ever been* on 
> this project, and I'd argue right at the sweet spot or past where the degree 
> of reaching out to external parties to get feedback starts to not just hit 
> diminishing returns but starts to actively hurt a small group of peoples' 
> ability to make rapid progress on something.
> 
> No-one can expect to review everything, and no-one can expect to follow every 
> JIRA, commit, or update. This is why we have the role of a committer; a 
> person in this community we've publicly communicated we trust based on earned 
> merit (and in our project's case, at least 2 people who's opinion we trust) 
> to do quality work, validate it, and reach our expected bar for correctness, 
> performance, and maintainability. If a CEP is voted in and 2 committers have 
> an implementation they feel meets the goals, CI is green, and nobody has a 
> serious technical concern that warrants a binding -1, we're good. It doesn't, 
> and shouldn't, matter who currently employs or sponsors their work. It 
> doesn't, and shouldn't, matter whether individuals on the project who were 
> interested in collaborating on that work missed one or multiple 
> announcements, or whether they saw those announcements and just didn't have 
> the cycles to engage when they wanted to.
> 
> Now - we can always improve. We can always try and be proactive, knowing each 
> other and our interests and reaching out to specific folks to make sure 
> they're aware that work has hit a collaboration point or inflection point. I 
> can do (apparently much) better about sending out more consistent project 
> status updates with calls to action around whe

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-27 Thread Josh McKenzie
grown and a lot of things are going on in parallel. 
>>>> >> There are also more interdependencies between the different projects. 
>>>> >> In my opinion what we are lacking is a global overview of the different 
>>>> >> things going on in the project and some rough ideas of the status of 
>>>> >> the different significant pieces. It would allow us to better organize 
>>>> >> ourselves.
>>>> >>
>>>> >> Le jeu. 26 oct. 2023 à 00:26, Benedict  a écrit :
>>>> >>>
>>>> >>> I have spoken privately with Ekaterina, and to clear up some possible 
>>>> >>> ambiguity: I realise nobody has demanded a delay to this work to 
>>>> >>> conduct additional reviews; a couple of folk have however said they 
>>>> >>> would prefer one.
>>>> >>>
>>>> >>>
>>>> >>> My point is that, as a community, we need to work on ensuring folk 
>>>> >>> that care about a CEP participate at an appropriate time. If they 
>>>> >>> aren’t able to, the consequences of that are for them to bear.
>>>> >>>
>>>> >>>
>>>> >>> We should be working to avoid surprises as CEP start to land. To this 
>>>> >>> end, I think we should work on some additional paragraphs for the 
>>>> >>> governance doc covering expectations around the landing of CEPs.
>>>> >>>
>>>> >>>
>>>> >>> On 25 Oct 2023, at 21:55, Benedict  wrote:
>>>> >>>
>>>> >>> 
>>>> >>>
>>>> >>> I am surprised this needs to be said, but - especially for 
>>>> >>> long-running CEPs - you must involve yourself early, and certainly 
>>>> >>> within some reasonable time of being notified the work is ready for 
>>>> >>> broader input and review. In this case, more than six months ago.
>>>> >>>
>>>> >>>
>>>> >>> This isn’t the first time this has happened, and it is disappointing 
>>>> >>> to see it again. Clearly we need to make this explicit in the guidance 
>>>> >>> docs.
>>>> >>>
>>>> >>>
>>>> >>> Regarding the release of 5.1, I understood the proposal to be that we 
>>>> >>> cut an actual alpha, thereby sealing the 5.1 release from new 
>>>> >>> features. Only features merged before we cut the alpha would be 
>>>> >>> permitted, and the alpha should be cut as soon as practicable. What 
>>>> >>> exactly would we be waiting for?
>>>> >>>
>>>> >>>
>>>> >>> If we don’t have a clear and near-term trigger for branching 5.1 for 
>>>> >>> its own release, shortly after Accord and TCM merge, then I am in 
>>>> >>> favour of instead delaying 5.0.
>>>> >>>
>>>> >>>
>>>> >>> On 25 Oct 2023, at 19:40, Mick Semb Wever  wrote:
>>>> >>>
>>>> >>> 
>>>> >>> I'm open to the suggestions of not branching cassandra-5.1 and/or 
>>>> >>> naming a preview release something other than 5.1-alpha1.
>>>> >>>
>>>> >>> But… the codebases and release process (and upgrade tests) do not 
>>>> >>> currently support releases with qualifiers that are not alpha, beta, 
>>>> >>> or rc.  We can add a preview qualifier to this list, but I would not 
>>>> >>> want to block getting a preview release out only because this wasn't 
>>>> >>> yet in place.
>>>> >>>
>>>> >>> Hence the proposal used 5.1-alpha1 simply because that's what we know 
>>>> >>> we can do today.  An alpha release also means (typically) the branch.
>>>> >>>
>>>> >>> Is anyone up for looking into adding a "preview" qualifier to our 
>>>> >>> release process?
>>>> >>> This may also solve our previous discussions and desire to have 
>>>> >>> quarterly releases that folk can use through the trunk dev cycle.
>>>> >>>
>>>> >>> Personally, with my understanding of timelines in front of us to fully 
>>>>

Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Josh McKenzie
In case you're keeping score on how frequently these are coming out: *please 
stop*. ;)

Silver lining - looks like we have a lot to discuss this round! Last update was 
late July and we've been churning through the 5.0 freeze and stabilization 
phase.


*[New Contributors Getting Started]
*
Check out https://the-asf.slack.com, channel #cassandra-dev. Reply directly to 
me on this email if you need an invite for your account, and reach out to the 
@cassandra_mentors alias in the channel if you need to get oriented.

We have a list of curated "getting started" tickets you can find here, filtered 
to "ToDo" (i.e. not yet worked): 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162=2652.

*Helpful links:**
*
- Getting Started with Development on C*: 
https://cassandra.apache.org/_/development/gettingstarted.html
- Building and IDE integration (worktrees are your friend; msg me on slack if 
you need pointers): https://cassandra.apache.org/_/development/ide.html
- Code Style: https://cassandra.apache.org/_/development/code_style.html


*[Dev mailing list]
*
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27:

My last email of shame was 35 threads. Drumroll for this one...
91. *Yeesh*. Let me stick to highlights.

Ekaterina pushed through dropping JDK8 support and adding JDK17 support... back 
in July. If you didn't know about it by know, consider yourself doubly 
notified. :) . https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k 
I think I can speak on behalf of all of us when I say: **Thank You Ekaterina.**

This came up recently on another thread about when to branch 5.1, but we 
discussed our freeze plans and exception rules for TCM and Accord here: 
https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick was 
essentially looking for a similar waiver for Vector search since it was well 
abstracted, depended on SAI and external libs, and in general shouldn't be too 
big of a disruption to get into 5.0. General consensus at the time was "sure", 
and the work has since been completed. But here's the reminder and link for 
posterity (and in case you missed it).

Jaydeep reached out about a potential short-term solution to detecting 
token-ownership mismatch while we don't yet have TCM; this seems more pressing 
now as we're looking at a 5.0 without yet having TCM in it. The dev ML thread 
is here: https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and 
he created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing with 
the topic. There's a relatively modest (7 files, just over 300 lines) PR 
available here: https://github.com/apache/cassandra/pull/2595/files; I haven't 
looked into it, but it might be worth considering getting this into 5.0 since 
it looks like we're moving to cutting w/out TCM. Any thoughts?

We had a pretty good discussion about automated repair scheduling, discussing 
whether it should live in the DB proper vs. in the sidecar, pros and cons, 
pressures, etc. Not sure if things moved beyond that; I know there's at least a 
few implementations out there that haven't yet made their way back to the ASF 
project proper. Thread: 
https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My hope is we 
can avoid the gridlock we hit for a long time with the sidecar where there are 
multiple implementations with different tradeoffs and everyone's 
disincentivized from accepting a solution different from their own in-house one 
since it'd theoretically require re-tooling. Tough problem with no easy 
solutions, but would love to see this become a first class citizen in the 
ecosystem.

Paulo brought up a discussion about moving to disk_access_mode = 
mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we 
actually changed that in the 5.0 branch? Thread: 
https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just pulled 
on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() == .mmap 
rather than .mmap_index_only.

David Capwell worked on adding some retries to repair messages when they're 
failing to make the process more robust: 
https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1. Reception was 
positive enough that he went so far as to back-port it and also work on some 
for IR. Looks like he could use a reviewer here: 
https://issues.apache.org/jira/browse/CASSANDRA-18962 - and this is patch 
available.

Mike Adamson reached out about adding / taking a dependency on jvector: 
https://lists.apache.org/thread/zkqg7mk9hp35zn0cf1tvywc2m3l63jrn. The general 
gist of it was "looks good, written by committer(s) / pmc members, permissvely 
licensed. Go for it". Some discussion about copyright holders and whether that 
matters from an ASF perspective, and we've further had some good discussion 
about the application of generative AI tooling to not just code contributed to 
the ASF, but also in dependencies we bring into the 

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-25 Thread Josh McKenzie
> If we cannot meet at least that quality level (Green CI) we should not merge
We should probably make it a formally agreed upon point to not merge things 
unless we're sure they won't destabilize, and thus block release of, a branch. 
So green CI for a feature (excepting feature-specific tests if it's still a 
work in progress), experimental flag if we don't consider it prod ready, should 
be absolute bare minimum for anything to merge really IMO.

On Wed, Oct 25, 2023, at 4:17 AM, Benjamin Lerer wrote:
> The proposal includes 3 things:
> 1. Do not include TCM and Accord in 5.0 to avoid delaying 5.0
> 2. The next release will be 5.1 and will include only Accord and TCM
> 3. Merge TCM and Accord right now in 5.1 (making an initial release)
> 
> I am fine with question 1 and do not have a strong opinion on which way to go.
> 2. Means that every new feature will have to wait for post 5.1 even if it is 
> ready before 5.1 is stabilized and shipped. If we do a 5.1 release why not 
> take it as an opportunity to release more things. I am not saying that we 
> will. Just that we should let that door open.
> 3. There is a need to merge TCM and Accord as maintaining those separate 
> branches is costly in terms of time and energy. I fully understand that. On 
> the other hand merging TCM and Accord will make the TCM review harder and I 
> do believe that this second round of review is valuable as it already 
> uncovered a valid issue. Nevertheless, I am fine with merging TCM as soon as 
> it passes CI and continuing the review after the merge. If we cannot meet at 
> least that quality level (Green CI) we should not merge just for creating an 
> 5.1.alpha release for the summit.
> 
> Now, I am totally fine with a preview release without numbering and with big 
> warnings that will only serve as a preview for the summit.
> 
> Le mer. 25 oct. 2023 à 06:33, Berenguer Blasi  a 
> écrit :
>> I also think there's many good new features in 5.0 already they'd make a 
>> good release even on their own. My 2 cts.
>> 
>> On 24/10/23 23:20, Brandon Williams wrote:
>> > The catch here is that we don't publish docker images currently.  The
>> > C* docker images available are not made by us.
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> > On Tue, Oct 24, 2023 at 3:31 PM Patrick McFadin  wrote:
>> >> Let me make that really easy. Hell yes
>> >>
>> >> Not everybody runs CCM, I've tried but I've met resistance.
>> >>
>> >> Compiling your own version usually involves me saying the words "Yes, ant 
>> >> realclean exists. I'm not trolling you"
>> >>
>> >> docker pull  works on every OS and curates a single node 
>> >> experience.
>> >>
>> >>
>> >>
>> >> On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie  
>> >> wrote:
>> >>> In order for the project to advertise the release outside the dev@ list 
>> >>> it needs to be a formal release.
>> >>>
>> >>> That's my reading as well:
>> >>> https://www.apache.org/legal/release-policy.html#release-definition
>> >>>
>> >>> I wonder if there'd be value in us having a cronned job that'd do 
>> >>> nightly docker container builds on trunk + feature branches, archived 
>> >>> for N days, and we make that generally known to the dev@ list here so 
>> >>> folks that want to poke at the current state of trunk or other branches 
>> >>> could do so with very low friction. We'd probably see more engagement on 
>> >>> feature branches if it was turn-key easy for other C* devs to spin the 
>> >>> up and check them out.
>> >>>
>> >>> For what you're talking about here Patrick (a docker image for folks 
>> >>> outside the dev@ audience and more user-facing), we'd want to vote on it 
>> >>> and go through the formal process.
>> >>>
>> >>> On Tue, Oct 24, 2023, at 3:10 PM, Jeremiah Jordan wrote:
>> >>>
>> >>> In order for the project to advertise the release outside the dev@ list 
>> >>> it needs to be a formal release.  That just means that there was a 
>> >>> release vote and at least 3 PMC members +1’ed it, and there are more +1 
>> >>> than there are -1, and we follow all the normal release rules.  The ASF 
>> >>> release process doesn’t care what branch you cut the artifacts from or 
>> >>> what version you call it.
>> >>>
>> >>> So the project can cut artif

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-10-25 Thread Josh McKenzie
Is the primary pain point you're trying to solve getting a 2nd committer 
reviewer Maxim? And / or making the review process simpler / cleaner for 
someone?

On Wed, Oct 18, 2023, at 5:06 PM, Maxim Muzafarov wrote:
> Hello everyone,
> 
> It has been a long time since the last update on this thread, so I
> wanted to share some status updates: The issue is still awaiting
> review, but all my hopes are pinned on Benjamin :-)
> 
> My question here is, is there anything I can do to facilitate the
> review for anyone who wants to delve into the patch?
> 
> I have a few thoughts to follow:
> - CEPify the changes - this will allow us to see the result of the
> discussion on a single page without having to re-read the whole
> thread;
> - Write a blog post with possible design solutions - this will both
> reveal the results of the discussion and potentially will draw some
> attention to the community;
> - Presenting and discussing slides at one of the Cassandra Town Halls;
> 
> I tend to the 1-st and/or 2-nd points. What are the best practices we
> have here for such cases though? Any thoughts?
> 
> On Tue, 11 Jul 2023 at 15:51, Maxim Muzafarov  wrote:
> >
> > Thank you for your comments and for sharing the ticket targeting
> > strategy, I'm really happy to see this page where I have found all the
> > answers to the questions I had. So, I tend towards your view and will
> > just land this ticket on the 5.0 release only for now as it makes
> > sense for me as well.
> >
> > I didn't add the feature flag for this feature because for 99% of the
> > source code changes it only works with Cassandra internals leaving the
> > public API unchanged. A few remarks on this are:
> > - the display format of the vtable property has changed to match the
> > yaml configuration style, this doesn't mean that we are displaying
> > property values in a completely different way in fact the formats
> > match with only 4 exceptions mentioned in the message above (this
> > should be fine for the major release I hope);
> > - a new column, which we've agreed to add (I'll fix the PR shortly);
> >
> >
> > I would also like to mention the follow-up todos required by this
> > issue to set the right expectations. Currently, we've brought a few
> > properties under the framework to make them updateable with the
> > SettingsTable, so that you can keep focusing on the framework itself
> > rather than on tagging the configuration properties themselves with
> > the @Mutable annotation. Although the solution is self-sufficient for
> > the already tagged properties, we still need to bring the rest of them
> > under the framework afterwards. I'll create an issue and do it right,
> > we'll be done with the inital patch.
> >
> >
> > On Fri, 7 Jul 2023 at 20:37, Josh McKenzie  wrote:
> > >
> > > This really is great work Maxim; definitely appreciate all the hard work 
> > > that's gone into it and I think the users will too.
> > >
> > > In terms of where it should land, we discussed this type of question at 
> > > length on the ML awhile ago and ended up codifying it in the wiki: 
> > > https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases
> > >
> > > When working on a ticket, use the following guideline to determine which 
> > > branch to apply it to (Note: See How To Commit for details on the commit 
> > > and merge process)
> > >
> > > Bugfix: apply to oldest applicable LTS and merge up through latest GA to 
> > > trunk
> > >
> > > In the event you need to make changes on the merge commit, merge with -s 
> > > ours and revise the commit via --amend
> > >
> > > Improvement: apply to trunk only (next release)
> > >
> > > Note: refactoring and removing dead code qualifies as an Improvement; our 
> > > priority is stability on GA lines
> > >
> > > New Feature: apply to trunk only (next release)
> > >
> > > Our priority is to keep the 2 LTS releases and latest GA stable while 
> > > releasing new "latest GA" on a cadence that provides new improvements and 
> > > functionality to users soon enough to be valuable and relevant.
> > >
> > >
> > > So in this case, target whatever unreleased next feature release (i.e. 
> > > SEMVER MAJOR || MINOR) we have on deck.
> > >
> > > On Thu, Jul 6, 2023, at 1:21 PM, Ekaterina Dimitrova wrote:
> > >
> > > Hi,
> > >
> > > First of all, thank you for all the work!
> > > I personal

Re: CASSANDRA-18775 (Cassandra supported OSs)

2023-10-25 Thread Josh McKenzie
+1 to drop if we're not using.

On Fri, Oct 20, 2023, at 6:58 PM, Ekaterina Dimitrova wrote:
> +1 on removal the whole lib if we are sure we don’t need it. Nothing better 
> than some healthy house cleaning 
> 
>  -1 on partial removals
> 
> On Fri, 20 Oct 2023 at 17:34, David Capwell  wrote:
>> +1 to drop the whole lib… 
>> 
>> 
>>> On Oct 20, 2023, at 7:55 AM, Jeremiah Jordan  
>>> wrote:
>>> 
>>> Agreed.  -1 on selectively removing any of the libs.  But +1 for removing 
>>> the whole thing if it is no longer used.
>>> 
>>> -Jeremiah
>>> 
>>> On Oct 20, 2023 at 9:28:55 AM, Mick Semb Wever  wrote:
> Does anyone see any reason _not_ to do this?
 
 
 Thanks for bring this to dev@
 
 I see reason not to do it, folk do submit patches for other archs despite 
 us not formally maintaining and testing the code for those archs.  Some 
 examples are PPC64 Big Endian (CASSANDRA-7476), s390x (CASSANDRA-17723), 
 PPC64 Little Endian (CASSANDRA-7381), sparcv9 (CASSANDRA-6628).  Wrote 
 this on the ticket too.
 
 +1 for removing sigar altogether (as Brandon points out). 


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Josh McKenzie
und
>>>>> for that sort of bug.
>>>>> 
>>>>> So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
>>>>> plan mainly because I don't think 5.1 is (or should be) a fast follow.
>>>>> 
>>>>> For the user community, the communication should be straightforward. TCM +
>>>>> Accord are turning out to be much more complicated than was originally
>>>>> scoped, and for good reasons. Our first principle is to provide a stable
>>>>> and reliable system, so as a result, we'll be de-coupling TCM + Accord 
>>>>> from
>>>>> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
>>>>> additional hardening and testing is done. We can communicate this in a 
>>>>> blog
>>>>> post.,
>>>>> 
>>>>> To make this much more palatable to our use community, if we can get a
>>>>> build and docker image available ASAP with Accord, it will allow 
>>>>> developers
>>>>> to start playing with the syntax. Up to this point, that hasn't been 
>>>>> widely
>>>>> available unless you compile the code yourself. Developers need to
>>>>> understand how this will work in an application, and up to this point, the
>>>>> syntax is text they see in my slides. We need to get some hands-on and 
>>>>> that
>>>>> will get our user community engaged on Accord this calendar year. The
>>>>> feedback may even uncover some critical changes we'll need to make. Lack 
>>>>> of
>>>>> access to Accord by developers is a critical problem we can fix soon and
>>>>> there will be plenty of excitement there and start building use cases
>>>>> before the final code ships.
>>>>> 
>>>>> I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
>>>>> but maybe one for my birthday?
>>>>> 
>>>>> Patrick
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Oct 24, 2023 at 7:23 AM Josh McKenzie  
>>>>> wrote:
>>>>> 
>>>>> > Maybe it won't be a glamorous release but shipping
>>>>> > 5.0 mitigates our worst case scenario.
>>>>> >
>>>>> > I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
>>>>> > memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
>>>>> > accurate, all combine to make 5.0 a pretty glamorous release IMO
>>>>> > independent of TCM and Accord. Accord is a true paradigm-shift 
>>>>> > game-changer
>>>>> > so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
>>>>> > resolve one of the biggest pain-points in our system for over a decade, 
>>>>> > but
>>>>> > I think 5.0 is a very meaty release in its own right today.
>>>>> >
>>>>> > Anyway - I agree with you Brandon re: timelines. If things take longer
>>>>> > than we'd hope (which, if I think back, they do roughly 100% of the 
>>>>> > time on
>>>>> > this project), blocking on these features could both lead to a 
>>>>> > significant
>>>>> > delay in 5.0 going out as well as increasing pressure and risk of 
>>>>> > burnout
>>>>> > on the folks working on it. While I believe we all need some balanced
>>>>> > urgency to do our best work, being under the gun for something with a 
>>>>> > hard
>>>>> > deadline or having an entire project drag along blocked on you is not 
>>>>> > where
>>>>> > I want any of us to be.
>>>>> >
>>>>> > Part of why we talked about going to primarily annual calendar-based
>>>>> > releases was to avoid precisely this situation, where something that
>>>>> > *feels* right at the cusp of merging leads us to delay a release
>>>>> > repeatedly. We discussed this a couple times this year:
>>>>> > 1: https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3,
>>>>> > where Mick proposed a "soft-freeze" for everything w/out exception and 
>>>>> > 1st
>>>>> > week October "hard-freeze", and there was assumed to be lazy consensus
>>>>> > 2: https://lists.a

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Josh McKenzie
> Maybe it won't be a glamorous release but shipping
> 5.0 mitigates our worst case scenario.
I disagree with this characterization of 5.0 personally. UCS, SAI, Trie 
memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are 
accurate, all combine to make 5.0 a pretty glamorous release IMO independent of 
TCM and Accord. Accord is a true paradigm-shift game-changer so it's easy to 
think of 5.0 as uneventful in comparison, and TCM helps resolve one of the 
biggest pain-points in our system for over a decade, but I think 5.0 is a very 
meaty release in its own right today.

Anyway - I agree with you Brandon re: timelines. If things take longer than 
we'd hope (which, if I think back, they do roughly 100% of the time on this 
project), blocking on these features could both lead to a significant delay in 
5.0 going out as well as increasing pressure and risk of burnout on the folks 
working on it. While I believe we all need some balanced urgency to do our best 
work, being under the gun for something with a hard deadline or having an 
entire project drag along blocked on you is not where I want any of us to be.

Part of why we talked about going to primarily annual calendar-based releases 
was to avoid precisely this situation, where something that *feels* right at 
the cusp of merging leads us to delay a release repeatedly. We discussed this a 
couple times this year:
1: https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3, where Mick 
proposed a "soft-freeze" for everything w/out exception and 1st week October 
"hard-freeze", and there was assumed to be lazy consensus
2: https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw, where we 
kept along with what we discussed in 1 but added in CEP-30 to be waivered in as 
well.

So. We're at a crossroads here where we need to either follow through with what 
we all agreed to earlier this year, or acknowledge that our best intention of 
calendar-based releases can't stand up to our optimism and desire to get these 
features into the next major.

There's no immediate obvious better path to me in terms of what's best for our 
users. This is a situation of risk tolerance with a lot of unknowns that could 
go either way.

Any light that folks active on TCM and Accord could shed in terms of their best 
and worst-case scenarios on timelines for those features might help us narrow 
this down a bit. Otherwise, I'm inclined to defer to our past selves and fall 
back to "we agreed to yearly calendar releases for good reason. Let's stick to 
our guns."

On Tue, Oct 24, 2023, at 9:37 AM, Brandon Williams wrote:
> The concern I have with this is that is a big slippery 'if' that
> involves development time estimation, and if it tends to take longer
> than we estimate (as these things tend to do), then I can see a future
> where 5.0 is delayed until the middle of 2024, and I really don't want
> that to happen.  Maybe it won't be a glamorous release but shipping
> 5.0 mitigates our worst case scenario.
> 
> Kind Regards,
> Brandon
> 
> On Mon, Oct 23, 2023 at 4:02 PM Dinesh Joshi  wrote:
> >
> > I have a strong preference to move out the 5.0 date to have accord and TCM. 
> > I don’t see the point in shipping 5.0 without these features especially if 
> > 5.1 is going to follow close behind it.
> >
> > Dinesh
> >
> > On Oct 23, 2023, at 4:52 AM, Mick Semb Wever  wrote:
> >
> > 
> >
> > The TCM work (CEP-21) is in its review stage but being well past our 
> > cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would 
> > like to propose the following.
> >
> > We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut 
> > an immediate 5.1-alpha1 release.
> >
> > I see this as a win-win scenario for us, considering our current situation. 
> >  (Though it is unfortunate that Accord is included in this scenario because 
> > we agreed it to be based upon TCM.)
> >
> > This will mean…
> >  - We get to focus on getting 5.0 to beta and GA, which already has a ton 
> > of features users want.
> >  - We get an alpha release with TCM and Accord into users hands quickly for 
> > broader testing and feedback.
> >  - We isolate GA efforts on TCM and Accord – giving oss and downstream 
> > engineers time and patience reviewing and testing.  TCM will be the biggest 
> > patch ever to land in C*.
> >  - Give users a choice for a more incremental upgrade approach, given just 
> > how many new features we're putting on them in one year.
> >  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all 
> > 4.x versions, just as if it had landed in 5.0.
> >
> >
> > The risks/costs this introduces are
> >  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch, 
> > and at some point decide to undo this work, while we can throw away the 
> > cassandra-5.1 branch we would need to do a bit of work reverting the 
> > changes in trunk.  This is a _very_ edge case, as confidence levels on the 
> > design and 

Re: CASSANDRA-18941 produce size bounded SSTables from CQLSSTableWriter

2023-10-24 Thread Josh McKenzie
> to 4.0 and up to trunk
Think the proposal is all supported branches from 4.0 up.

+1 here.

On Mon, Oct 23, 2023, at 10:19 PM, guo Maxwell wrote:
> +1, but I want to know why only trunk and 4.0 ? not all the versions 
> involved, like 4.1 ,5.0 。
> 
> Francisco Guerrero  于2023年10月24日周二 07:47写道:
>> +1 (nb). I think this is a great addition to offline tools that use SSTable 
>> writer in general.
>> 
>> On 2023/10/23 23:21:13 Yifan Cai wrote:
>> > Hi,
>> > 
>> > I want to propose merging the patch in CASSANDRA-18941 to 4.0 and up to
>> > trunk and hope we are all OK with it.
>> > 
>> > In CASSANDRA-18941, I am adding the capability to produce size-bounded
>> > SSTables in CQLSSTableWriter for sorted data. It can greatly benefit
>> > Cassandra Analytics (https://github.com/apache/cassandra-analytics) for
>> > bulk writing SSTables, since it avoids buffering and sorting on flush,
>> > given the data source is sorted already in the bulk write process.
>> > Cassandra Analytics supports Cassandra 4.0 and depends on the cassandra-all
>> > 4.0.x library. Therefore, we are mostly interested in using the new
>> > capability in 4.0.
>> > 
>> > CQLSSTableWriter is only used in offline tools and never in the code path
>> > of Cassandra server.
>> > 
>> > Any objections to merging the patch to 4.0 and up to trunk?
>> > 
>> > - Yifan
>> >


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Josh McKenzie
> If I had to pick a month of the year to release software used by large 
> enterprises, it probably would be something like March instead of December.
That's... a good point. If we end up on a cadence of major's in December (since 
we slipped to then for 4.1 and inherit that from that calendar year "pressure") 
we're setting ourselves up to release right in the largest consistent 
change-freeze window I know of for most users.

> It will be another 2.2 release.
Let me live on with the stories I tell myself about the hordes of Windows users 
that appreciated Windows support before the Storage Engine rewrite, thank you 
very much. :D

On Mon, Oct 23, 2023, at 1:57 PM, Caleb Rackliffe wrote:
> ...or like the end of January. Either way, feel free to ignore the "aside" :)
> 
> On Mon, Oct 23, 2023 at 12:53 PM Caleb Rackliffe  
> wrote:
>> Kind of in the same place as Benedict/Aleksey.
>> 
>> If we release a 5.1 in, let's say...March of next year, the number of 5.0 
>> users is going to be very minimal. Nobody is going to upgrade anything 
>> important from now through the first half of January anyway, right? They're 
>> going to be making sure their existing clusters aren't exploding.
>> 
>> (We still want TCM/Accord to be available to people to test by Summit, but 
>> that feels unrelated to whether we cut a 5.1 branch...)
>> 
>> Aside: If I had to pick a month of the year to release software used by 
>> large enterprises, it probably would be something like March instead of 
>> December. I have no good research to back that up, of course... 
>> 
>> On Mon, Oct 23, 2023 at 12:19 PM Benedict  wrote:
>>> 
>>> To be clear, I’m not making an argument either way about the path forwards 
>>> we should take, just concurring about a likely downside of this proposal. I 
>>> don’t have a strong opinion about how we should proceed.
>>> 
>>> 
>>>> On 23 Oct 2023, at 18:16, Benedict  wrote:
>>>> 
>>>> 
>>>> I agree. If we go this route we should essentially announce an immediate 
>>>> 5.1 alpha at the same time as 5.0 GA, and I can’t see almost anybody 
>>>> rolling out 5.0 with 5.1 so close on its heels.
>>>> 
>>>> 
>>>>> On 23 Oct 2023, at 18:11, Aleksey Yeshchenko  wrote:
>>>>> I’m not so sure that many folks will choose to go 4.0->5.0->5.1 path 
>>>>> instead of just waiting longer for TCM+Accord to be in, and go 4.0->5.1 
>>>>> in one hop.
>>>>> 
>>>>> Nobody likes going through these upgrades. So I personally expect 5.0 to 
>>>>> be a largely ghost release if we go this route, adopted by few, just a 
>>>>> permanent burden on the merge path to trunk.
>>>>> 
>>>>> Not to say that there isn’t valuable stuff in 5.0 without TCM and Accord 
>>>>> - there most certainly is - but with the expectation that 5.1 will follow 
>>>>> up reasonably shortly after with all that *and* two highly anticipated 
>>>>> features on top, I just don’t see the point. It will be another 2.2 
>>>>> release.
>>>>> 
>>>>> 
>>>>>> On 23 Oct 2023, at 17:43, Josh McKenzie  wrote:
>>>>>> 
>>>>>> We discussed that at length in various other mailing threads Jeff - kind 
>>>>>> of settled on "we're committing to cutting a major (semver MAJOR or 
>>>>>> MINOR) every 12 months but want to remain flexible for exceptions when 
>>>>>> appropriate".
>>>>>> 
>>>>>> And then we discussed our timeline for 5.0 this year and settled on the 
>>>>>> "let's try and get it out this calendar year so it's 12 months after 
>>>>>> 4.1, but we'll grandfather in TCM and Accord past freeze date if they 
>>>>>> can make it by October".
>>>>>> 
>>>>>> So that's the history for how we landed here.
>>>>>> 
>>>>>>> 2) Do we drop the support of 3.0 and 3.11 after 5.0.0 is out or after 
>>>>>>> 5.1.0 is?
>>>>>> This is my understanding, yes. Deprecation and support drop is 
>>>>>> predicated on the 5.0 release, not any specific features or anything.
>>>>>> 
>>>>>> On Mon, Oct 23, 2023, at 12:29 PM, Jeff Jirsa wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Oct 23, 2023 at 4:52 AM Mick Semb Wever  wrote:
>>>>>>>> 
>>>>>>>> The TCM work (CEP-21) is in its review stage but being well past our 
>>>>>>>> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I 
>>>>>>>> would like to propose the following.
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I think this presumes that 5.0 GA is date driven instead of feature 
>>>>>>> driven.
>>>>>>> 
>>>>>>> I'm sure there's a conversation elsewhere, but why isn't this date 
>>>>>>> movable?


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Josh McKenzie
We discussed that at length in various other mailing threads Jeff - kind of 
settled on "we're committing to cutting a major (semver MAJOR or MINOR) every 
12 months but want to remain flexible for exceptions when appropriate".

And then we discussed our timeline for 5.0 this year and settled on the "let's 
try and get it out this calendar year so it's 12 months after 4.1, but we'll 
grandfather in TCM and Accord past freeze date if they can make it by October".

So that's the history for how we landed here.

> 2) Do we drop the support of 3.0 and 3.11 after 5.0.0 is out or after 5.1.0 
> is?
This is my understanding, yes. Deprecation and support drop is predicated on 
the 5.0 release, not any specific features or anything.

On Mon, Oct 23, 2023, at 12:29 PM, Jeff Jirsa wrote:
> 
> 
> On Mon, Oct 23, 2023 at 4:52 AM Mick Semb Wever  wrote:
>> 
>> The TCM work (CEP-21) is in its review stage but being well past our cut-off 
>> date¹ for merging, and now jeopardising 5.0 GA efforts, I would like to 
>> propose the following.
>> 
> 
> 
> I think this presumes that 5.0 GA is date driven instead of feature driven.
> 
> I'm sure there's a conversation elsewhere, but why isn't this date movable?
> 
>  


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Josh McKenzie
> This has to be in 5.0, even if it’s alpha and ships after December, or this 
> is going to be disaster that will take us much longer to unravel. 
I'm curious to hear more about this.

> What is it going to take to get it into 5.0? What is off track and how did we 
> get here?
I'm going to crystal-ball a combination of "we're in mythical man-month 
territory" and "we're doing something that's never been done before on a 
code-base that's a decade and a half old. In a distributed system. That takes 
(often unpredictable) time."

I'm +1 to what you've laid out here Mick. The idea of having another branch to 
merge through makes me sad, but it's worth it to both get 5.0 into users' hands 
around our committed cadence + also get an alpha of these new features into 
their hands as well IMO.

On Mon, Oct 23, 2023, at 11:14 AM, Paulo Motta wrote:
> From a user perspective I have to say I was excited to see Accord/TCM being 
> released on 5.0 but at the same time a little nervous about seeing so many 
> overhauling features being shipped on the same release.
> 
> I think rushing last minute features hurts the stability goals we set for the 
> project. As far as I understand, we have agreed to have a "release train" 
> model where everything ready by the release date is shipped and anything else 
> slips to the next version.
> 
> 5.0 will bring a number of exciting innovations and I don't think not 
> including TCM/Accord can be considered a disaster. I think letting the 
> community test the currently shipped features separately from TCM/Accord will 
> be a benefit from a stability perspective without hurting the project 
> momentum.
> 
> I think TCM/Accord are such major and long awaited improvements to the 
> project that deserve its own exclusive release, otherwise they can easily 
> shadow the other exciting features being shipped. I don't see any issue in 
> performing an earlier release next year if TCM/Accord is ready by then.
> 
> Regarding the versioning scheme, if we follow the versioning scheme we have 
> defined "by the book" then TCM/Accord would belong to a 6.0 version, which I 
> have to admit feels a bit weird but it would signal to the user community 
> that a major change is being introduced. I don't feel strongly about this so 
> would be fine with a 5.1 even though it would be a departure from the new 
> versioning scheme we have agreed upon.
> 
> On Mon, Oct 23, 2023 at 10:55 AM Patrick McFadin  wrote:
>> I’m going to be clearer in my statement. 
>> 
>> This has to be in 5.0, even if it’s alpha and ships after December, or this 
>> is going to be disaster that will take us much longer to unravel. 
>> 
>> On Mon, Oct 23, 2023 at 7:49 AM Jeremiah Jordan  
>> wrote:
>>> +1 from me assuming we have tickets and two committer +1’s on them for 
>>> everything being committed to trunk, and CI is working/passing before it 
>>> merges.  The usual things, but I want to make sure we do not compromise on 
>>> any of them as we try to “move fast” here.
>>> 
>>> -Jeremiah Jordan
>>> 
>>> On Oct 23, 2023 at 8:50:46 AM, Sam Tunnicliffe  wrote:
 
 +1 from me too. 
 
 Regarding Benedict's point, backwards incompatibility should be minimal; 
 we modified snitch behaviour slightly, so that local snitch config only 
 relates to the local node, all peer info is fetched from cluster metadata. 
 There is also a minor change to the way failed bootstraps are handled, as 
 with TCM they require an explicit cancellation step (running a nodetool 
 command). 
 
 Whether consensus decrees that this constitutes a major bump or not, I 
 think decoupling these major projects from 5.0 is the right move. 
  
 
> On 23 Oct 2023, at 12:57, Benedict  wrote:
> 
> 
> I’m cool with this.
> 
> We may have to think about numbering as I think TCM will break some 
> backwards compatibility and we might technically expect the follow-up 
> release to be 6.0
> 
> Maybe it’s not so bad to have such rapid releases either way.
> 
> 
>> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>> 
>> 
>> The TCM work (CEP-21) is in its review stage but being well past our 
>> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would 
>> like to propose the following.
>> 
>> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and 
>> cut an immediate 5.1-alpha1 release.
>> 
>> I see this as a win-win scenario for us, considering our current 
>> situation.  (Though it is unfortunate that Accord is included in this 
>> scenario because we agreed it to be based upon TCM.)
>> 
>> This will mean…
>>  - We get to focus on getting 5.0 to beta and GA, which already has a 
>> ton of features users want.
>>  - We get an alpha release with TCM and Accord into users hands quickly 
>> for broader testing and feedback.
>>  - We isolate GA efforts on TCM and 

Re: [DISCUSS] CommitLog default disk access mode

2023-10-18 Thread Josh McKenzie
+1 to adding the feature, clear and easy configurability, and if after a major 
cycle we can say with confidence it's beating the status quo in the vast 
majority of general cases, flip default. I mean, logically it *should* be, but 
infra software at the scale we do requires great care. :)

This is great work Amit - well done.

On Mon, Oct 16, 2023, at 4:28 PM, Dinesh Joshi wrote:
> I haven't looked at the patch yet so take whatever I say here with a pinch of 
> salt.
> 
> Philosophically, defaults should not change unless there is a clear 
> demonstrable benefit in majority cases for our users. In this case DirectIO 
> should have clear benefits. That said, this is a new feature and I would 
> personally default it to off. We should document it and allow for our users 
> to enable it. This derisks the project in case there is an inadvertent change 
> in behavior.
> 
> Dinesh
> 
>> On Oct 15, 2023, at 11:34 PM, Pawar, Amit  wrote:
>> 
>> [Public]
>> 
>> 
>> Hi,
>>  
>> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
>> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
>> this by default could be useful feature to address IO bottleneck seen during 
>> peak load.
>>  
>> Need your input regarding changing this default. Please suggest.
>>  
>> https://issues.apache.org/jira/browse/CASSANDRA-18464
>>  
>> thanks,
>> Amit Pawar
>>  
>> [1] - https://github.com/apache/cassandra/pull/2777


Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Josh McKenzie
> If some piece of code is not used anymore then simplifying the code is the 
> best thing to do
In the case of unused / unreferenced, sure. In the case of "other things use 
this but we shouldn't add any more dependencies on this because we need to 
remove it", a @Deprecated annotation w/version, reason, etc could be pretty 
useful.

Also - my instinct is that we have a lot of stuff in our ecosystem that depends 
on public methods in the codebase (I assume sidecar, bulk writer / reader, CDC 
clients though I tried to provide a formal API there, etc. etc) and I for one 
would be receptive to discussions on dev@ for the things people in the 
ecosystem have taken dependencies on so we can discuss whether or not to a) 
formally support those, and/or b) wrap an actual API around them so we can 
decouple those signatures from implementation.

Our lack of rigor around what's a public API and what's not combined with our 
historic default posture of "none of it's an API, if you depend on it it's on 
you and we'll break it, also we don't provide many public extension points nor 
do we provide more than the core functionality of the DB in our ecosystem so 
have fun" *may not be* the optimal posture for us in terms of ecosystem 
adoption + long-term maintenance burden. I realize we've done this in the name 
of us being able to be as productive as possible working on the core DB itself, 
but I'm not entirely convinced it's actually the most productive path tbh.

Go slow to go fast, invest to reap returns, etc.

On Fri, Oct 13, 2023, at 9:16 AM, Miklosovic, Stefan via dev wrote:
> I forgot the round #3.
> 
> That would consist of an ant task which would scan the source. Since we 
> enforced that each Deprecation annotation has to have its "since" on compile 
> time, we can write a parser in that task which would tell you what you have 
> to do in order to be sure that your next release will not contain any stuff 
> which should not be there. E.g. when we release 6.0, all 4.0 stuff can go 
> away etc ...
> 
> 
> From: Miklosovic, Stefan via dev 
> Sent: Friday, October 13, 2023 15:00
> To: dev@cassandra.apache.org
> Cc: Miklosovic, Stefan
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> 
> OK. So here we are ... round 1 will be to map how bad it is, round 2 will be 
> the removal of what should not be there. I am not sure if round 2 will be 
> done before 5.0 is out (that would be ideal, to release 5.0 without a lot of 
> baggage like that) so it will be better if we split this effort into two 
> parts.
> 
> 
> From: Benjamin Lerer 
> Sent: Friday, October 13, 2023 14:45
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Ok, thanks Stefan I understand the context better now. Looking at the PR.
> Some make sense also for serialization reasons but some make no sense to me.
> 
> 
> Le ven. 13 oct. 2023 à 14:26, Benjamin Lerer 
> mailto:b.le...@gmail.com>> a écrit :
> I’ve been told in the past not to remove public methods in a patch release 
> though.
> 
> Then I am curious to get the rationale behind that. If some piece of code is 
> not used anymore then simplifying the code is the best thing to do. It makes 
> maintenance easier and avoids mistakes.
> Le ven. 13 oct. 2023 à 14:11, Miklosovic, Stefan via dev 
> mailto:dev@cassandra.apache.org>> a écrit :
> Maybe for better understanding what we talk about, there is the PR which 
> implements the changes suggested here (1)
> 
> It is clear that @Deprecated is not used exclusively on JMX / Configuration 
> but we use it internally as well. This is a very delicate topic and we need 
> to go, basically, one by one.
> 
> I get that there might be some kind of a "nervousness" around this as we 
> strive for not breaking it unnecessarily so there might be a lot of 
> exceptions etc and I completely understand that but what I lack is clear 
> visibility into what we plan to do with it (if anything).
> 
> There is deprecated stuff as old as Cassandra 1.2 / 2.0 (!!!) and it is 
> really questionable if we should not just get rid of that once for all. I am 
> OK with keeping it there if we decide that, but we should provide some 
> additional information like when it was deprecated and why it is necessary to 
> keep it around otherwise the code-base will bloat and bloat ...
> 
> (1) 
> https://github.com/apache/cassandra/pull/2801/files
> 
> 
> From: Mick Semb Wever mailto:m...@apache.org>>
> Sent: 

Re: Avoiding pushes to broken branches

2023-10-10 Thread Josh McKenzie
What about having a nag-bot that notifies #cassandra-dev on ASF slack hourly if 
the builds are broken?

On Tue, Oct 10, 2023, at 8:02 AM, Mick Semb Wever wrote:
> I'd like to suggest some improvements for identifying and announcing
> broken branches and to avoid pushing commits to broken branches.
> 
> For CI we should have two gates.
> 
> The first is pre-commit testing, which we have already discussed, e.g.
> either ci-cassandra or circleci can be used (and repeated tests are
> expected to be run on circleci).
> 
> The second gate is whether the branch the commit is being merged to is
> in a healthy state.  Our canonical CI system is ci-cassandra, and for
> post-commit health it is our only CI.  Last week ci-cassandra was
> broken on all branches from 4.0 up.  The cause was two reasons,
> neither our fault: debian packaging (CASSANDRA-18910) and xerces2 xml
> file processing OOM (cassandra-builds:8d11eea).
> 
> Knowing if a branch is broken (before pushing) is just to check
> ci-cassandra.apache.org
> 
> Folk have suggested this is not enough, and that a message to the dev@
> would also help, but part of the problem is that there's no one place
> that people check before pushing (there's no requirement on anyone to
> be keeping up to date with dev@ at all times).
> 
> To summarise, I feel we currently don't have good practices for
> - identifying and announcing a broken CI,
> - knowing who is investigating it,
> - labelling it with the cause,
> - knowing who is working on a fix
> 
> 
> The suggested actions I'm proposing for us all to adopt are:
> 
> 1. Before committing please check dev@ and ci-cassandra.a.o
> 2. If you see ci-cassandra is red on a branch, and no dev@ thread has
> been started, please start the dev@ thread and create the ticket,
> 3. Put the ticket id into the description of the first red build
> ("Add description")
> 4. By default, hold off on pushing to broken branches.
> 
> 
> WRT (2), we should just be able to send the automated build failures
> to dev@ instead of builds@, but failed builds are often not sending
> such notifications, so this isn't something we can rely on yet.
> 
> 
> Reference slack threads:
> - https://the-asf.slack.com/archives/CK23JSY2K/p1696693542832489
> - https://the-asf.slack.com/archives/CK23JSY2K/p1696599019480519
> - https://the-asf.slack.com/archives/CK23JSY2K/p1696351208371029
> - https://the-asf.slack.com/archives/CK23JSY2K/p1695878499669699
> 


Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-10 Thread Josh McKenzie
Sounds like we're relitigating the basics of how @Deprecated, forRemoval, 
since, and javadoc @link all intersect to make deprecation less painful ;)

So:
 1. Built-in java.lang.Deprecated: required
 2. Can use since and forRemoval if you have that info handy and think it'd be 
useful (would make it a lot easier to grep for things to pull before a major)
 3. If it's being replaced by something, you should {@link #} the javadoc for 
it so people know where to bounce over to
I've been leaning pretty heavily on the functionality of point 3 for 
documenting cross-module implicit dependencies as I come across them lately so 
that one resonates with me.

On Tue, Oct 10, 2023, at 4:38 AM, Miklosovic, Stefan wrote:
> OK.
> 
> Let's go with in-built java.lang.Deprecated annotation. If somebody wants to 
> document that in more detail, there are Javadocs as mentioned. Let's just 
> stick with the standard stuff.
> 
> I will try to implement this for 5.0 (versions since it was deprecated) with 
> my take on what should be removed (forRemoval = true) but that should be 
> definitely cross-checked on review as Mick mentioned.
> 
> 
> From: Mick Semb Wever 
> Sent: Monday, October 9, 2023 10:55
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Tangential question to this is if everything we deprecated is eligible for 
> removal? In other words, are there any cases when forRemoval would be false? 
> Could you elaborate on that and give such examples or do you all think that 
> everything which is deprecated will be eventually removed?
> 
> 
> Removal cannot be default.  This came up in the subtickets of CASSANDRA-18306.
> 
> I suggest that adding " forRemoval = true" and the later actual removal of 
> the code both require broader consensus.  I'm open to that being on the 
> ticket or needing a thread on the ML.  Small stuff, common sense says on the 
> ticket is enough, but a few folk have already stated that deprecated code 
> that has minimal maintenance overhead should not be removed.
> 


Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-06 Thread Josh McKenzie
Might be nice to support a 3rd param that's a String for the reason it's 
deprecated. i.e. "Replaced by X",  "Unmaintained", "Obsolete", "See 
CASSANDRA-N", link to a dev ML thread on pony mail, etc. That way if 
someone comes across it in the codebase they have some context to follow up on 
if it's the shape of a thing they need w/out having to go full-bore w/git blame 
and JQL.

On Fri, Oct 6, 2023, at 4:43 AM, Miklosovic, Stefan wrote:
> Hi list,
> 
> I have a ticket to discuss (1). 
> 
> When we deprecate APIs / methods etc, what I want to suggest is that we might 
> start to explicitly add the version when that happened. For example, if you 
> deprecated something which goes to 5.0, would you be so nice to do this?
> 
> @Deprecated(since = "5.0") 
> 
> Similarly, that annotation offers one more field - forRemoval, so using it 
> like this: 
> 
> @Deprecated(since = "5.0", forRemoval = true) 
> 
> means that this is eligible to be deleted in Cassandra 6.0. 
> 
> With this information, it is way more comfortable to just "grep" where we are 
> at when it comes to deprecations eligible to be deleted in the next version. 
> Currently, we basically have to go one by one and figure out if it is not old 
> enough to remove. I believe this would bring more transparency into what is 
> planned to be removed and when as well it will be clearly visible what should 
> be removed in the next version and it is not. 
> 
> Tangential question to this is if everything we deprecated is eligible for 
> removal? In other words, are there any cases when forRemoval would be false? 
> Could you elaborate on that and give such examples or do you all think that 
> everything which is deprecated will be eventually removed?
> 
> (1) https://issues.apache.org/jira/browse/CASSANDRA-18912
> 
> Thanks and regards


Re: [VOTE] Accept java-driver

2023-10-03 Thread Josh McKenzie
> I see now this will likely be instead apache/cassandra-java-driver
I was wondering about that. apache/java-driver seemed pretty broad. :)

>From the linked page:
Check that all active committers have a signed CLA on record. TODO – attach list
I've been part of these discussions and work so am familiar with the status of 
it (as well as guidance and clearance from the foundation re: folks we couldn't 
reach) - but might be worthwhile to link to the sheet or perhaps instead 
provide a summary of the 49 java contributors, their CLA signing status, 
attempts to reach out, etc for other PMC members that weren't actively involved 
back when we were working through it.

As for my vote: +1

Thanks everyone for the hard work getting to this point. This really is a 
significant contribution to the project.

On Tue, Oct 3, 2023, at 6:48 AM, Brandon Williams wrote:
> +1
> 
> Kind Regards,
> Brandon
> 
> On Mon, Oct 2, 2023 at 11:53 PM Mick Semb Wever  wrote:
> >
> > The donation of the java-driver is ready for its IP Clearance vote.
> > https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
> >
> > The SGA has been sent to the ASF.  This does not require acknowledgement 
> > before the vote.
> >
> > Once the vote passes, and the SGA has been filed by the ASF Secretary, we 
> > will request ASF Infra to move the datastax/java-driver as-is to 
> > apache/java-driver
> >
> > This means all branches and tags, with all their history, will be kept.  A 
> > cleaning effort has already cleaned up anything deemed not needed.
> >
> > Background for the donation is found in CEP-8: 
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> >
> > PMC members, please take note of (and check) the IP Clearance requirements 
> > when voting.
> >
> > The vote will be open for 72 hours (or longer). Votes by PMC members are 
> > considered binding. A vote passes if there are at least three binding +1s 
> > and no -1's.
> >
> > regards,
> > Mick
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Josh McKenzie
> it may be better to support most cloud storage
> It simply only supports S3, which feels a bit customized for a certain user 
> and is not universal enough.Am I right ?
I agree w/the eventual goal (and constraint on design now) of supporting most 
popular cloud storage vendors, but if we have someone with an itch to scratch 
and at the end of that we end up with first steps in a compatible direction to 
ultimately supporting decoupled / abstracted storage systems, that's fantastic.

To Jeff's point - so long as we can think about and chart a general path of 
where we want to go, if Claude has the time and inclination to handle 
abstracting out the API in that direction and one implementation, that's 
fantastic IMO.

I know there's some other folks out there who've done some interception / 
refactoring of the FileChannel stuff to support disaggregated storage; curious 
what their experiences were like.


On Tue, Sep 26, 2023, at 4:20 AM, Claude Warren, Jr via dev wrote:
> The intention of the CEP is to lay the groundwork to allow development of 
> ChannelProxyFactories that are pluggable in Cassandra.  In this way any 
> storage system can be a candidate for Cassandra storage provided FileChannels 
> can be created for the system. 
> 
> As I stated before I think that there may be a need for a java.nio.FileSystem 
> implementation for  the proxies but I have not had the time to dig into it 
> yet.
> 
> Claude
> 
> 
> On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:
>> In my mind , it may be better to support most cloud storage : aws, 
>> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it 
>> seems there may need a filesystem interface layer for object storage. And 
>> should we support ,distributed system like hdfs ,or something else. We 
>> should first discuss what should be done and what should not be done. It 
>> simply only supports S3, which feels a bit customized for a certain user and 
>> is not universal enough.Am I right ?
>> 
>> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>>> My intention is to develop an S3 storage system using  
>>> https://github.com/carlspring/s3fs-nio 
>>> 
>>> There are several issues yet to be solved:
>>>  1. There are some internal calls that create files in the table directory 
>>> that do not use the channel proxy.  I believe that these are making calls 
>>> on File objects.  I think those File objects are Cassandra File objects not 
>>> Java I/O File objects, but am unsure.
>>>  2. Determine if the carlspring s3fs-nio library will be performant enough 
>>> to work in the long run.  There may be issues with it:
>>>1. Downloading entire files before using them rather than using views 
>>> into larger remotely stored files.
>>>2. Requiring a complete file to upload rather than using the partial 
>>> upload capability of the S3 interface.
>>> 
>>> 
>>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
 "Rather than building this piece by piece, I think it'd be awesome if 
 someone drew up an end-to-end plan to implement tiered storage, so we can 
 make sure we're discussing the whole final state, and not an 
 implementation detail of one part of the final state?"
 
 Do agree with jeff for this ~~~ If these feature can be supported in oss 
 cassandra , I think it will be very popular, whether in  a private 
 deployment environment or a public cloud service (our experience can prove 
 it). In addition, it is also a cost-cutting option for users too
 
 Jeff Jirsa  于2023年9月26日周二 00:11写道:
> 
> - I think this is a great step forward. 
> - Being able to move sstables around between tiers of storage is a 
> feature Cassandra desperately needs, especially if one of those tiers is 
> some sort of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by 
> a team that's already implemented this end to end? 
> - Rather than building this piece by piece, I think it'd be awesome if 
> someone drew up an end-to-end plan to implement tiered storage, so we can 
> make sure we're discussing the whole final state, and not an 
> implementation detail of one part of the final state?
> 
> 
> 
> 
> 
> 
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev 
>  wrote:
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside 
>> of the standard storage space.  
>> 
>> There are two desires  driving this change:
>>  1. The ability to temporarily move some keyspaces/tables to storage 
>> outside the normal directory tree to other disk so that compaction can 
>> occur in situations where there is not enough disk space for compaction 
>> and the processing to the moved data can not be suspended.
>>  2. The ability to store infrequently used data on slower cheaper 
>> storage layers.
>> I have a working POC implementation [2] though there are 

Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-22 Thread Josh McKenzie
> I highly doubt liability works like that in all jurisdictions
That's a fantastic point. When speculating there, I overlooked the fact that 
there are literally dozens of legal jurisdictions in which this project is used 
and the foundation operates.

As a PMC let's take this to legal.

On Fri, Sep 22, 2023, at 9:16 AM, Jeff Jirsa wrote:
> To do that, the cassandra PMC can open a legal JIRA and ask for a (durable, 
> concrete) opinion.
> 
> 
> On Fri, Sep 22, 2023 at 5:59 AM Benedict  wrote:
>> 
>>>>  1. my understanding is that with the former the liability rests on the 
>>>> provider of the lib to ensure it's in compliance with their claims to 
>>>> copyright
>> I highly doubt liability works like that in all jurisdictions, even if it 
>> might in some. I can even think of some historic cases related to Linux 
>> where patent trolls went after users of Linux, though I’m not sure where 
>> that got to and I don’t remember all the details.
>> 
>> But anyway, none of us are lawyers and we shouldn’t be depending on this 
>> kind of analysis. At minimum we should invite legal to proffer an opinion on 
>> whether dependencies are a valid loophole to the policy.
>> 
>> 
>> 
>>> On 22 Sep 2023, at 13:48, J. D. Jordan  wrote:
>>> 
>>> 
>>> This Gen AI generated code use thread should probably be its own mailing 
>>> list DISCUSS thread?  It applies to all source code we take in, and accept 
>>> copyright assignment of, not to jars we depend on and not only to vector 
>>> related code contributions.
>>> 
>>>> On Sep 22, 2023, at 7:29 AM, Josh McKenzie  wrote:
>>>> 
>>>> So if we're going to chat about GenAI on this thread here, 2 things:
>>>>  1. A dependency we pull in != a code contribution (I am not a lawyer but 
>>>> my understanding is that with the former the liability rests on the 
>>>> provider of the lib to ensure it's in compliance with their claims to 
>>>> copyright and it's not sticky). Easier to transition to a different dep if 
>>>> there's something API compatible or similar.
>>>>  2. With code contributions we take in, we take on some exposure in terms 
>>>> of copyright and infringement. git revert can be painful.
>>>> For this thread, here's an excerpt from the ASF policy:
>>>>> a recommended practice when using generative AI tooling is to use tools 
>>>>> with features that identify any included content that is similar to parts 
>>>>> of the tool’s training data, as well as the license of that content.
>>>>> 
>>>>> Given the above, code generated in whole or in part using AI can be 
>>>>> contributed if the contributor ensures that:
>>>>> 
>>>>>  1. The terms and conditions of the generative AI tool do not place any 
>>>>> restrictions on use of the output that would be inconsistent with the 
>>>>> Open Source Definition (e.g., ChatGPT’s terms are inconsistent).
>>>>>  2. At least one of the following conditions is met:
>>>>>1. The output is not copyrightable subject matter (and would not be 
>>>>> even if produced by a human)
>>>>>2. No third party materials are included in the output
>>>>>3. Any third party materials that are included in the output are being 
>>>>> used with permission (e.g., under a compatible open source license) of 
>>>>> the third party copyright holders and in compliance with the applicable 
>>>>> license terms
>>>>>  3. A contributor obtain reasonable certainty that conditions 2.2 or 2.3 
>>>>> are met if the AI tool itself provides sufficient information about 
>>>>> materials that may have been copied, or from code scanning results
>>>>>1. E.g. AWS CodeWhisperer recently added a feature that provides 
>>>>> notice and attribution
>>>>> When providing contributions authored using generative AI tooling, a 
>>>>> recommended practice is for contributors to indicate the tooling used to 
>>>>> create the contribution. This should be included as a token in the source 
>>>>> control commit message, for example including the phrase “Generated-by
>>>>> 
>>>> 
>>>> I think the real challenge right now is ensuring that the output from an 
>>>> LLM doesn't include a string of tokens that's identical to something in 
>>>> its input training dataset if it's trained on non-pe

Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-22 Thread Josh McKenzie
So if we're going to chat about GenAI on this thread here, 2 things:
 1. A dependency we pull in != a code contribution (I am not a lawyer but my 
understanding is that with the former the liability rests on the provider of 
the lib to ensure it's in compliance with their claims to copyright and it's 
not sticky). Easier to transition to a different dep if there's something API 
compatible or similar.
 2. With code contributions we take in, we take on some exposure in terms of 
copyright and infringement. git revert can be painful.
For this thread, here's an excerpt from the ASF policy:
> a recommended practice when using generative AI tooling is to use tools with 
> features that identify any included content that is similar to parts of the 
> tool’s training data, as well as the license of that content.
> 
> Given the above, code generated in whole or in part using AI can be 
> contributed if the contributor ensures that:
> 
>  1. The terms and conditions of the generative AI tool do not place any 
> restrictions on use of the output that would be inconsistent with the Open 
> Source Definition (e.g., ChatGPT’s terms are inconsistent).
>  2. At least one of the following conditions is met:
>1. The output is not copyrightable subject matter (and would not be even 
> if produced by a human)
>2. No third party materials are included in the output
>3. Any third party materials that are included in the output are being 
> used with permission (e.g., under a compatible open source license) of the 
> third party copyright holders and in compliance with the applicable license 
> terms
>  3. A contributor obtain reasonable certainty that conditions 2.2 or 2.3 are 
> met if the AI tool itself provides sufficient information about materials 
> that may have been copied, or from code scanning results
>1. E.g. AWS CodeWhisperer recently added a feature that provides notice 
> and attribution
> When providing contributions authored using generative AI tooling, a 
> recommended practice is for contributors to indicate the tooling used to 
> create the contribution. This should be included as a token in the source 
> control commit message, for example including the phrase “Generated-by
> 

I think the real challenge right now is ensuring that the output from an LLM 
doesn't include a string of tokens that's identical to something in its input 
training dataset if it's trained on non-permissively licensed inputs. That plus 
the risk of, at least in the US, the courts landing on the side of saying that 
not only is the output of generative AI not copyrightable, but that there's 
legal liability on either the users of the tools or the creators of the models 
for some kind of copyright infringement. That can be sticky; if we take PR's 
that end up with that liability exposure, we end up in a place where either the 
foundation could be legally exposed and/or we'd need to revert some pretty 
invasive code / changes.

For example, Microsoft and OpenAI have publicly committed to paying legal fees 
for people sued for copyright infringement for using their tools: 
https://www.verdict.co.uk/microsoft-to-pay-legal-fees-for-customers-sued-while-using-its-ai-products/?cf-view.
 Pretty interesting, and not a step a provider would take in an environment 
where things were legally clear and settled.

So while the usage of these things is apparently incredibly pervasive right 
now, "everybody is doing it" is a pretty high risk legal defense. :)

On Fri, Sep 22, 2023, at 8:04 AM, Mick Semb Wever wrote:
> 
> 
> On Thu, 21 Sept 2023 at 10:41, Benedict  wrote:
>> 
>> At some point we have to discuss this, and here’s as good a place as any. 
>> There’s a great news article published talking about how generative AI was 
>> used to assist in developing the new vector search feature, which is itself 
>> really cool. Unfortunately it *sounds* like it runs afoul of the ASF legal 
>> policy on use for contributions to the project. This proposal is to include 
>> a dependency, but I’m not sure if that avoids the issue, and I’m equally 
>> uncertain how much this issue is isolated to the dependency (or affects it 
>> at all?)
>> 
>> Anyway, this is an annoying discussion we need to have at some point, so 
>> raising it here now so we can figure it out.
>> 
>> [1] 
>> https://thenewstack.io/how-ai-helped-us-add-vector-search-to-cassandra-in-6-weeks/
>>  
>> 
>> [2] https://www.apache.org/legal/generative-tooling.html
>> 
> 
> 
> My reading of the ASF's GenAI policy is that any generated work in the 
> jvector library (and cep-30 ?) are not copyrightable, and that makes them ok 
> for us to include.
> 
> If there was a trace to copyrighted work, or the tooling imposed a copyright 
> or restrictions, we would then have to take 

Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-21 Thread Josh McKenzie
Oops; thought I'd already +1'ed earlier in the thread.

In case it wasn't clear: +1 on inclusion as-is.

On Thu, Sep 21, 2023, at 4:00 PM, Josh McKenzie wrote:
> My .02 re: the copyright: the library is licensed ASL v2.0. Who it's 
> originally copyrighted by / to (Jonathan personally, DataStax as a corporate 
> entity, Santa Claus, my dog :)) doesn't really have any impact on the 
> legalities of our ability to make use of it or the durability or safety of 
> the code in our ecosystem.
> 
> Especially for an optional feature with clear alternative implementations, 
> this doesn't bother me at all. It's well within ASF policy to include 
> permissively licensed code copyrighted by other people or entities.
> 
> On Thu, Sep 21, 2023, at 1:02 PM, Mick Semb Wever wrote:
>> 
>>> I am confused by your +1 here. You are +1 on including it, but only if the 
>>> copyright were different?  Given DataStax wrote the library I don’t see how 
>>> that will change?
>>  
>> 
>> No blocker on including the library.  I'm hoping we can address concerns in 
>> parallel, I don't want to hold things up.  (They might become a blocker on 
>> the next release, depending on where discussions go, so we should start 'em.)
> 


Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-21 Thread Josh McKenzie
My .02 re: the copyright: the library is licensed ASL v2.0. Who it's originally 
copyrighted by / to (Jonathan personally, DataStax as a corporate entity, Santa 
Claus, my dog :)) doesn't really have any impact on the legalities of our 
ability to make use of it or the durability or safety of the code in our 
ecosystem.

Especially for an optional feature with clear alternative implementations, this 
doesn't bother me at all. It's well within ASF policy to include permissively 
licensed code copyrighted by other people or entities.

On Thu, Sep 21, 2023, at 1:02 PM, Mick Semb Wever wrote:
> 
>> I am confused by your +1 here. You are +1 on including it, but only if the 
>> copyright were different?  Given DataStax wrote the library I don’t see how 
>> that will change?
>  
> 
> No blocker on including the library.  I'm hoping we can address concerns in 
> parallel, I don't want to hold things up.  (They might become a blocker on 
> the next release, depending on where discussions go, so we should start 'em.)


Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-09-19 Thread Josh McKenzie
I support including this in 5.0.

This looks to me like a significant correctness and stabilization effort, very 
similar to other large bodies of work we merged in post freeze for testing and 
stabilizing 4.0.

On Tue, Sep 19, 2023, at 5:42 PM, Chris Lohfink wrote:
> I absolutely love the idea of this being in 5.0, I am +1 for what it is worth
> 
> On Tue, Sep 19, 2023 at 4:04 PM David Capwell  wrote:
>> To try to get repair more stable, I added optional retry logic (patch is 
>> still in review) to a handful of critical repair verbs.  This patch is 
>> disabled by default but allows you to opt-in to retries so ephemeral issues 
>> don’t cause a repair to fail after running for a long time (assuming they 
>> resolve within the retry window). There are 2 protocol level changes to 
>> enable this: VALIDATION_RSP and SYNC_RSP now send an ACK (if the sender 
>> doesn’t attach a callback, these ACKs get ignored in all versions; see 
>> org.apache.cassandra.net.ResponseVerbHandler#doVerb and Verb.REPAIR_RSP).  
>> Given that we have already forked, I believe we would need to give a waiver 
>> to allow this patch due to this change.
>> 
>> The patch was written on trunk, but figured back porting 5.0 would be rather 
>> trivial and this was brought up during the review, so floating this to a 
>> wider audience.
>> 
>> If you look at the patch you will see that it is very large, but this is 
>> only to make testing of repair coordination easier and deterministic, the 
>> biggest code changes are:
>> 
>> 1) Moving from ActiveRepairService.instance to 
>> ActiveRepairService.instance() (this is the main reason so many files were 
>> touched; this was needed so unit tests don’t load the whole world)
>> 2) Repair no longer reaches into global space and instead is provided the 
>> subsystems needed to perform repair; this change is local to repair code
>> 
>> Both of these changes were only for testing as they allow us to simulate 1k 
>> repairs in around 15 seconds with 100% deterministic execution.


Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread Josh McKenzie
> I am strongly in favour of permitting the table definition forbidding nulls - 
> and perhaps even defaulting to this behaviour. But I don’t think we should 
> have types that are inherently incapable of being null.
I'm with Benedict. Seems like this could help prevent whatever "nulls in 
primary key columns" problems Aleksey was alluding to on those tickets back in 
the day that pushed us towards making the new types non-emptiable as well (i.e. 
primary keys are non-null in table definition).

Furthering Alex' question, having a default value for unset fields in any 
non-collection context seems... quite surprising to me in a database. I could 
see the argument for making container / collection types non-nullable, maybe, 
but that just keeps us in a potential straddle case (some types nullable, some 
not).

On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
> 
> If I understand this suggestion correctly it is a whole can of worms, as 
> types that can never be null prevent us ever supporting outer joins that 
> return these types.
> 
> I am strongly in favour of permitting the table definition forbidding nulls - 
> and perhaps even defaulting to this behaviour. But I don’t think we should 
> have types that are inherently incapable of being null. I also certainly 
> don’t think we should have bifurcated our behaviour between types like this.
> 
> 
> 
>> On 19 Sep 2023, at 11:54, Alex Petrov  wrote:
>> 
>> To make sure I understand this right; does that mean there will be a default 
>> value for unset fields? Like 0 for numerical values, and an empty vector (I 
>> presume) for the vector type?
>> 
>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>>> Hi everybody,
>>> 
>>> I noticed that the new Vector type accepts empty ByteBuffer values as an 
>>> input representing null.
>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
>>> types non -emptiable. This approach makes more sense to me as having to 
>>> deal with empty value is error prone in my opinion.
>>> I also think that it would be good to standardize on one approach to avoid 
>>> confusion.
>>> 
>>> Should we make the Vector type non-emptiable and stick to it for the new 
>>> types?
>>> 
>>> I like to hear your opinion.
>> 


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
> There's also tests that hardcode
I started mentally twitching when I hit that point in the sentence.

**Kill them with fire.**

On Sun, Aug 13, 2023, at 4:51 PM, Mick Semb Wever wrote:
>> 
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L717-L719
>> https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/db/DirectoriesTest.java#L757-L759
>> 
>> Can I open a ticket to track fixes for these and any other issues I run into 
>> while moving to using "build/tmp"?
> 
> 
> Go for it. :-) 
> There's also tests that hardcode other paths that breaks the use of 
> `build.dir`


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
> I think we want/need relative paths, e.g. "build/tmp", and if the path is in 
> a mounted volume there can be another container still running.
Sure. The specifics of *what* path isn't interesting to me.

The pattern of:
1. Let env declare where TEMP lives
2. Write things to TEMP
3. Delete things from TEMP every time we run a new suite or do "ant clean"

Is.

Could also take it a step further and let env declare RESULTS_PATH for things 
they want to be durable and add an "ant clean-results" target.

On Sun, Aug 13, 2023, at 11:33 AM, Derek Chen-Becker wrote:
> Nevermind, I found "tmp.dir"
> 
> On Sun, Aug 13, 2023 at 9:29 AM Derek Chen-Becker  
> wrote:
>> Cool,
>> 
>> I'm a little confused. Is "tmp.dir" a custom Java property that we expose? I 
>> thought that the standard "property was "java.io.tmpdir". Let me take a stab 
>> at setting tmp.dir to build/tmp and see if I run into any issues (or still 
>> see any files in /tmp).
>> 
>> Cheers,
>> 
>> Derek
>> 
>> On Sun, Aug 13, 2023 at 8:24 AM Mick Semb Wever  wrote:
>>> 
 While doing some local testing, I noticed that my /tmp drive completely 
 filled with test artifact files (e.g. data directories, logs, commit logs, 
 etc). Mick pointed out that we do attempt to do some "find" based cleanup 
 in CI 
 (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
  but I was wondering if it might be better to do the following for direct 
 ant builds:
 
 1. If TMPDIR is set, use it. It does not appear to be honored, currently, 
 so I need to do some analysis of what would need to be done here
 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set 
 TMPDIR with that directory
 3. Update the "ant clean" task to delete TMPDIR when we've generated it, 
 or attempt the find-based cleanup if TMPDIR was provided
 
 Does anyone know if there are any hard-coded assumptions that test files 
 will live directly under /tmp?
>>> 
>>> 
>>> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>>>  
>>> What comes to mind:
>>>  - TMPDIR works best today with the python and scripting stuff
>>>  - setting TMPDIR can break tests, hence unit test script set instead 
>>> $TMP_DIR which is passed to `-Dtmp.dir=…`
>>>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>>>  - it is hard to customise everything
>>>  - it needs to work locally on your machine as well as in docker 
>>> containers, as well as CI
>>> 
>>> If we want something that is wiped by `ant clean` I would suggest using the 
>>> build/tmp directory by default.
>>> In-tree scripts do this for unit tests: 
>>> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>>>  but are not yet doing it for the dtests: 
>>> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>>>  
>>> 
>>> So I don't think we need (3). If the caller has specified TMPDIR it is then 
>>> their responsibility to clean it.
>>> 
>>> We can probably avoid trying to set TMPDIR, instead defaulting the 
>>> `tmp.dir` property to  the build/tmp directory.
>>> 
>>> The goal of any changes in build.xml should be, in addition to providing 
>>> the best dev exp, to simplify the testing and CI layers above it.
>>> 
>> 
>> 
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>> 
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 


Re: [Discuss] cleaning up build temp files

2023-08-13 Thread Josh McKenzie
Why not use "/${CASS_BUILD_TMP}/cassandra." on a given run and then on 
subsequent runs "rm -rf f/${CASS_BUILD_TMP}/cassandra.*"? If CASS_BUILD_TMP is 
not defined, default to /tmp.

"ant clean" can also wipe it.

If it's a safe assumption that we only ever need 1 instance of data in that 
space (i.e. we won't have 2 builds / tests running in a single container 
concurrently) it seems the above would solve the problem. Different 
environments (circle, ASF, etc) could define CASS_BUILD_TMP differently if 
needed for their env and problem is solved.

On Sun, Aug 13, 2023, at 10:23 AM, Mick Semb Wever wrote:
> 
>> While doing some local testing, I noticed that my /tmp drive completely 
>> filled with test artifact files (e.g. data directories, logs, commit logs, 
>> etc). Mick pointed out that we do attempt to do some "find" based cleanup in 
>> CI 
>> (https://github.com/apache/cassandra-builds/blob/trunk/jenkins-dsl/cassandra_job_dsl_seed.groovy#L437-439),
>>  but I was wondering if it might be better to do the following for direct 
>> ant builds:
>> 
>> 1. If TMPDIR is set, use it. It does not appear to be honored, currently, so 
>> I need to do some analysis of what would need to be done here
>> 2. If TMPDIR is not set, use "mktemp" to create a temp directory and set 
>> TMPDIR with that directory
>> 3. Update the "ant clean" task to delete TMPDIR when we've generated it, or 
>> attempt the find-based cleanup if TMPDIR was provided
>> 
>> Does anyone know if there are any hard-coded assumptions that test files 
>> will live directly under /tmp?
> 
> 
> This will need testing with in-tree scripts, ci-cassandra, and circleci  :(
>  
> What comes to mind:
>  - TMPDIR works best today with the python and scripting stuff
>  - setting TMPDIR can break tests, hence unit test script set instead 
> $TMP_DIR which is passed to `-Dtmp.dir=…`
>  - /tmp is often set up to be a more appropropriate fs (and volume size)
>  - it is hard to customise everything
>  - it needs to work locally on your machine as well as in docker containers, 
> as well as CI
> 
> If we want something that is wiped by `ant clean` I would suggest using the 
> build/tmp directory by default.
> In-tree scripts do this for unit tests: 
> https://github.com/apache/cassandra/blob/trunk/.build/run-tests.sh#L160
>  but are not yet doing it for the dtests: 
> https://github.com/apache/cassandra/blob/trunk/.build/run-python-dtests.sh#L58
>  
> 
> So I don't think we need (3). If the caller has specified TMPDIR it is then 
> their responsibility to clean it.
> 
> We can probably avoid trying to set TMPDIR, instead defaulting the `tmp.dir` 
> property to  the build/tmp directory.
> 
> The goal of any changes in build.xml should be, in addition to providing the 
> best dev exp, to simplify the testing and CI layers above it.


Re: Tokenization and SAI query syntax

2023-08-07 Thread Josh McKenzie
 2023 at 00:23, Jon Haddad  
>>>>> wrote:
>>>>>> Assuming SAI is a superset of SASI, and we were to set up something so 
>>>>>> that SASI indexes auto convert to SAI, this gives even more weight to my 
>>>>>> point regarding how differing behavior for the same syntax can lead to 
>>>>>> issues.  Imo the best case scenario results in the user not even 
>>>>>> noticing their indexes have changed.
>>>>>> 
>>>>>> An (maybe better?) alternative is to add a flag to the index 
>>>>>> configuration for "compatibility mod", which might address the concerns 
>>>>>> around using an equality operator when it actually is a partial match.
>>>>>> 
>>>>>> For what it's worth, I'm in agreement that = should mean full equality 
>>>>>> and not token match.
>>>>>> 
>>>>>> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
>>>>>> > For what it's worth, I'd very much like to completely remove SASI from 
>>>>>> > the
>>>>>> > codebase for 6.0. The only remaining functionality gaps at the moment 
>>>>>> > are
>>>>>> > LIKE (prefix/suffix) queries and its limited tokenization
>>>>>> > capabilities, both of which already have SAI Phase 2 Jiras.
>>>>>> >
>>>>>> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
>>>>>> > wrote:
>>>>>> >
>>>>>> > > SASI just uses “=“ for the tokenized equality matching, which is the 
>>>>>> > > exact
>>>>>> > > thing this discussion is about changing/not liking.
>>>>>> > >
>>>>>> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
>>>>>> > > > 
>>>>>> > > wrote:
>>>>>> > > >
>>>>>> > > > I do not think LIKE actually applies here. LIKE is used for 
>>>>>> > > > prefix,
>>>>>> > > contains, or suffix searches in SASI depending on the index type.
>>>>>> > > >
>>>>>> > > > This is about exact matching of tokens.
>>>>>> > > >
>>>>>> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
>>>>>> > > >> 
>>>>>> > > wrote:
>>>>>> > > >>
>>>>>> > > >> Certain bits of functionality also already exist on the SASI 
>>>>>> > > >> side of
>>>>>> > > things, but I'm not sure how much overlap there is.  Currently, 
>>>>>> > > there's a
>>>>>> > > LIKE keyword that handles token matching, although it seems to have 
>>>>>> > > some
>>>>>> > > differences from the feature set in SAI.
>>>>>> > > >>
>>>>>> > > >> That said, there seems to be enough of an overlap that it would 
>>>>>> > > >> make
>>>>>> > > sense to consider using LIKE in the same manner, doesn't it?  I 
>>>>>> > > think it
>>>>>> > > would be a little odd if we have different syntax for different 
>>>>>> > > indexes.
>>>>>> > > >>
>>>>>> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
>>>>>> > > >>
>>>>>> > > >> I think one complication here is that there seems to be a desire, 
>>>>>> > > >> that
>>>>>> > > I very much agree with, to expose as much of the underlying 
>>>>>> > > flexibility of
>>>>>> > > Lucene as much as possible.  If it means we use Caleb's suggestion, 
>>>>>> > > I'd ask
>>>>>> > > that the queries that SASI and SAI both support use the same syntax, 
>>>>>> > > even
>>>>>> > > if it means there's two ways of writing the same query.  To use 
>>>>>> > > Caleb's
>>>>>> > > example, this would mean supporting both LIKE and the `expr` column.
>>>>>> > > >>
>>>>>> > > >> Jon
>>>>>> > > &g

Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-08-07 Thread Josh McKenzie
Merge path for bugs on 3.0 is pretty brutal at this point. Good thing 2 will 
drop off when we GA 5.0.

Updated wiki w/new branches plus some examples: link 


On Mon, Aug 7, 2023, at 11:18 AM, Mick Semb Wever wrote:
> 
> Forward merging cassandra-4.1 … cassandra-5.0 … trunk is now required ! 
> 
> trunk is still got 5.0 in the build.xml, but that's only temporary until 
> 18705 lands, and of no harm i believe… (i could easily be wrong, but not 
> AFAIK)
> 
> 
> On Mon, 7 Aug 2023 at 13:38, Brandon Williams  wrote:
>> Is this intended to be used now and change the merge order?  I ask
>> because 18705 mentions bumping build.xml and CHANGES.txt amongst
>> others that haven't been done which is leading to confusion.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Sat, Aug 5, 2023 at 4:46 PM Mick Semb Wever  wrote:
>> >
>> >
>> > With no objections, and everything folk mentioned above in, the 
>> > cassandra-5.0 branch is cut.
>> >
>> > Next steps are bumping trunk to 5.1 and then cutting a 5.0-alpha1
>> >
>> > The bumping to 5.1 has a few steps involved in it, but the initial in-tree 
>> > PRs are ready for review, with CI being run, see CASSANDRA-18705
>> >
>> >
>> >
>> > On Sat, 29 Jul 2023 at 00:00, Brandon Williams  wrote:
>> >>
>> >> +1 to everything stated here.
>> >>
>> >> Kind Regards,
>> >> Brandon
>> >>
>> >> On Wed, Jul 26, 2023 at 5:28 PM Mick Semb Wever  wrote:
>> >> >
>> >> >
>> >> > The previous thread¹ on when to freeze 5.0 landed on freezing the first 
>> >> > week of August, with a waiver in place for TCM and Accord to land later 
>> >> > (but before October).
>> >> >
>> >> > With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 
>> >> > work that hasn't landed is Vector search (CEP-30).
>> >> >
>> >> > Are there any objections to a waiver on Vector search?  All the 
>> >> > groundwork: SAI and the vector type; has been merged, with all 
>> >> > remaining work expected to land in August.
>> >> >
>> >> > I'm keen to freeze and see us shift gears – there's already SO MUCH in 
>> >> > 5.0 and a long list of flakies.  It takes time and patience to triage 
>> >> > and identify the bugs that hit us before GA.  The freeze is about being 
>> >> > "mostly feature complete",  so we have room for things before our first 
>> >> > beta (precedence is to ask).   If we hope for a GA by December, account 
>> >> > for the 6 weeks turnaround time for cutting and voting on one alpha, 
>> >> > one beta, and one rc release, and the quiet period that August is, we 
>> >> > really only have September and October left.
>> >> >
>> >> > I already feel this is asking a bit of a miracle from us given how 4.1 
>> >> > went (and I'm hoping I will be proven wrong).
>> >> >
>> >> > In addition, are there any objections to cutting an 5.0-alpha1 release 
>> >> > as soon as we freeze?
>> >> >
>> >> > This is on the understanding vector, tcm and accord will become 
>> >> > available in later alphas.  Originally the discussion¹ was waiting for 
>> >> > Accord for alpha1, but a number of folk off-list have requested earlier 
>> >> > alphas to help with testing.
>> >> >
>> >> >
>> >> > ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Josh McKenzie
> >>
>> >> On Thu, 3 Aug 2023 at 17:11, Jeremiah Jordan  
>> >> wrote:
>> >> >
>> >> > I don’t think anyone wants to remove the javadocs.  This thread is 
>> >> > about removing the broken ant task which generates html files from them.
>> >> >
>> >> > +1 from me on removing the ant task.  If someone feels the task is 
>> >> > useful they can always implement one that does not crash and add it 
>> >> > back.
>> >> >
>> >> > -Jeremiah
>> >> >
>> >> > On Aug 3, 2023 at 9:59:55 AM, "Claude Warren, Jr via dev" 
>> >> >  wrote:
>> >> >>
>> >> >> I think that we can get more developers interested if there are 
>> >> >> available javadocs.  While many of the core classes are not going to 
>> >> >> be touched by someone just starting, being able to understand what the 
>> >> >> external touch points are and how they interact with other bits of the 
>> >> >> system can be invaluable, particularly when you don't have the entire 
>> >> >> code base in front of you.
>> >> >>
>> >> >> For example, I just wrote a tool that explores the distribution of 
>> >> >> keys across multiple sstables, I needed some of the tools classes but 
>> >> >> not much more.  Javadocs would have made that easy if I did not have 
>> >> >> the source code in front of me.
>> >> >>
>> >> >> I am -1 on removing the javadocs.
>> >> >>
>> >> >> On Thu, Aug 3, 2023 at 4:35 AM Josh McKenzie  
>> >> >> wrote:
>> >> >>>
>> >> >>> If anything, the codebase could use a little more 
>> >> >>> package/class/method markup in some places
>> >> >>>
>> >> >>> I am impressed with how diplomatic and generous you're being here 
>> >> >>> Derek. :D
>> >> >>>
>> >> >>> On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
>> >> >>>
>> >> >>> That is a good idea. I would like to have Javadocs valid when going 
>> >> >>> through them in IDE. To enforce it, we would have to fix it first. If 
>> >> >>> we find a way how to validate Javadocs without actually rendering 
>> >> >>> them, that would be cool.
>> >> >>>
>> >> >>> There is a lot of legacy and rewriting of some custom-crafted 
>> >> >>> formatting of some comments might be quite a tedious task to do if it 
>> >> >>> is required to have them valid. I am in general for valid 
>> >> >>> documentation and even enforcing it but what to do with what is 
>> >> >>> already there ...
>> >> >>>
>> >> >>> 
>> >> >>> From: Jacek Lewandowski 
>> >> >>> Sent: Wednesday, August 2, 2023 23:38
>> >> >>> To: dev@cassandra.apache.org
>> >> >>> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
>> >> >>>
>> >> >>> NetApp Security WARNING: This is an external email. Do not click 
>> >> >>> links or open attachments unless you recognize the sender and know 
>> >> >>> the content is safe.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> With or without outputting JavaDoc to HTML, there are some errors 
>> >> >>> which we should maybe fix. We want to keep the documentation, but 
>> >> >>> there can be syntax errors which may prevent IDE generating a proper 
>> >> >>> preview. So, the question is - should we validate the JavaDoc 
>> >> >>> comments as a precommit task? Can it be done without actually 
>> >> >>> generating HTML output?
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Jacek
>> >> >>>
>> >> >>> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker 
>> >> >>> mailto:de...@chen-becker.org>> napisał:
>> >> >>> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the 
>> >> >>> tool and/or it's output (not th

Re: [DISCUSS] Creating a 5.0 landing page

2023-08-03 Thread Josh McKenzie
We actually already have an events page: 
https://cassandra.apache.org/_/events.html; not sure if you were saying we 
should add one Ekaterina or saying we should add this content there. +1 to the 
content there and having a landing page that points there + integrating 
meetups, town halls, etc.

Community -> Events on the menu up top in case someone missed it.

On Thu, Aug 3, 2023, at 4:21 PM, Ekaterina Dimitrova wrote:
> 
> Hi Hugh,
> 
> Thank you for reaching out. I think this is a great idea. Also, great timing, 
> considering the community is discussing a potential 5.0 alpha release soon. 
> 
> It seems to me you actually suggest more than one page?
> 1) 5.0 and new features - could this be an update of the What’s new page? - 
> https://cassandra.apache.org/doc/trunk/cassandra/new/index.html
> Adding also links to some of the talks sounds great to me.
> 2) Dedicated events page? We were using the Blogs page before but I don’t 
> think it is a bad idea to split Blog posts from Events page.
> 
> Thank you
> Ekaterina
> 
> 
> On Wed, 2 Aug 2023 at 21:03, Hugh Lashbrooke  wrote:
>> With the upcoming release of Apache Cassandra 5.0, I’d like to create a 
>> landing page for the release and what that could look like.
>> 
>> The landing page would be intended to educate users about what is coming up 
>> in this important release, highlighting why upgrading will be valuable to 
>> them, as well as guiding them into more community activities, such as Town 
>> Halls, where they can learn more and become further involved.
>> 
>> The 5.0 landing page could include:
>>  • An overview of the release with a brief summary of the major features
>>  • A page for each CEP that is likely to be included–with key features, 
>> implementation information, and other technical details. These pages can 
>> also include recordings of relevant Contributor Meetings. Here is an example 
>> for CEP-28 - Spark Bulk Analytics Library 
>> .
>>  • CTAs to community platforms and activities - Slack, Meetups, Town Halls, 
>> Contributor Meetings, etc.
>> Let’s discuss! Does this sound valuable? If so, I will create a Jira ticket 
>> and am happy to get started. What other things do you think should be 
>> included in a page like this?
>> 


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-02 Thread Josh McKenzie
> If anything, the codebase could use a little more package/class/method markup 
> in some places
I am impressed with how diplomatic and generous you're being here Derek. :D

On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
> That is a good idea. I would like to have Javadocs valid when going through 
> them in IDE. To enforce it, we would have to fix it first. If we find a way 
> how to validate Javadocs without actually rendering them, that would be cool.
> 
> There is a lot of legacy and rewriting of some custom-crafted formatting of 
> some comments might be quite a tedious task to do if it is required to have 
> them valid. I am in general for valid documentation and even enforcing it but 
> what to do with what is already there ...
> 
> 
> From: Jacek Lewandowski 
> Sent: Wednesday, August 2, 2023 23:38
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> With or without outputting JavaDoc to HTML, there are some errors which we 
> should maybe fix. We want to keep the documentation, but there can be syntax 
> errors which may prevent IDE generating a proper preview. So, the question is 
> - should we validate the JavaDoc comments as a precommit task? Can it be done 
> without actually generating HTML output?
> 
> Thanks,
> Jacek
> 
> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker 
> mailto:de...@chen-becker.org>> napisał:
> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the tool 
> and/or it's output (not the markup itself) :P If anything, the codebase could 
> use a little more package/class/method markup in some places, so I'm 
> definitely only in favor of getting rid of the ant task. I should amend my 
> statement to be "...I suspect most people are not opening their browsers and 
> looking at Javadoc..." :)
> 
> Cheers,
> 
> Derek
> 
> 
> 
> On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie 
> mailto:jmcken...@apache.org>> wrote:
> most people are not looking at Javadoc when working on the codebase.
> I definitely use it extensively inside the IDE. But never as a compiled set 
> of external docs.
> 
> Which is to say, I'm +1 on removing the target and I'd ask everyone to keep 
> javadoccing your classes and methods where things are non-obvious or there's 
> a logical coupling with something else in the system. :)
> 
> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
> +1. If a need comes up for Javadoc we can fix it at that point, but I suspect 
> most people are not looking at Javadoc when working on the codebase.
> 
> Cheers,
> 
> Derek
> 
> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams 
> mailto:dri...@gmail.com>> wrote:
> I don't think even if it works anyone is going to use the output, so
> I'm good with removal.
> 
> Kind Regards,
> Brandon
> 
> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
> mailto:e.dimitr...@gmail.com>> wrote:
> >
> > Hi everyone,
> > We were looking into a user report around our ant javadoc task recently.
> > That made us realize it is not run in CI; it finishes successfully even if 
> > there are hundreds of errors, some potentially breaking doc pages.
> >
> > There was a ticket discussion where a few community members mentioned that 
> > this task was probably unnecessary. Can we remove it, or shall we fix it?
> >
> > Best regards,
> > Ekaterina
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at 
> https://keybase.io/dchenbecker<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fdchenbecker=05%7C01%7CStefan.Miklosovic%40netapp.com%7C7ca04f0f58764996ab1e08db93a0de2a%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638266091373361824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=n%2BrDfikzzoQG%2Fg%2BRvNqEEE6vHP8ZmY1skeosesLK9v0%3D=0>
>  and   |
> | 
> https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpgp.mit.edu%2Fpks%2Flookup%3Fsearch%3Dderek%2540chen-becker.org=05%7C01%7CStefan.Miklosovic%40netapp.com%7C7ca04f0f58764996ab1e08db93a0de2a%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638266091373518054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Tnu5cIoIFZGqhaqOjCjW8yK%2BDTT2%2B0ifvFNs1pJO93s%3D=0>
>  |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 
> 
> 


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-02 Thread Josh McKenzie
> most people are not looking at Javadoc when working on the codebase.
I definitely use it extensively **inside the IDE**. But never as a compiled set 
of external docs.

Which is to say, I'm +1 on removing the target and I'd ask everyone to keep 
javadoccing your classes and methods where things are non-obvious or there's a 
logical coupling with something else in the system. :)

On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
> +1. If a need comes up for Javadoc we can fix it at that point, but I suspect 
> most people are not looking at Javadoc when working on the codebase.
> 
> Cheers,
> 
> Derek
> 
> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams  wrote:
>> I don't think even if it works anyone is going to use the output, so
>> I'm good with removal.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
>>  wrote:
>> >
>> > Hi everyone,
>> > We were looking into a user report around our ant javadoc task recently.
>> > That made us realize it is not run in CI; it finishes successfully even if 
>> > there are hundreds of errors, some potentially breaking doc pages.
>> >
>> > There was a ticket discussion where a few community members mentioned that 
>> > this task was probably unnecessary. Can we remove it, or shall we fix it?
>> >
>> > Best regards,
>> > Ekaterina
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-27 Thread Josh McKenzie
+1 to what you've stated here Mick with a question: where did we land on 
flagging new features as experimental? Seems like it's an "at author's 
discretion" - search of the list turned up not too much structure there. Had a 
statement to that effect from Benjamin here 
.

I ask because I think there's *a lot* in 5.0 and it's not clear to me that the 
various features have been exercised in comparable ways to gain confidence in 
their prod readiness. For 5.0 we have (or will likely have):
 1. Trie Memtables
 2. Trie Indexed Tables
 3. UCS
 4. SAI
 5. Vector Search
 6. Transactional Metadata
 7. Accord
While 6 (TrM) can't fall under that umbrella for architectural reasons, for the 
rest the label might be useful.

Flip side, of course, is the argument that the features should be exercised 
heavily enough (unit, dtest, harry, simulator, in qa env, etc) before merge 
that nothing merges in that's not prod ready; don't know if that's true or not 
for all the features above (genuinely don't know; not looking to spread FUD), 
and it's not immediately obvious to me that that's the optimal way for us to 
balance innovation vs. stabilization.

Basically I think it'd be nice if "Experimental" fostered more safe innovation 
and expansion in our ecosystem rather than being something of a pariah dumping 
ground to retroactively apply to things that have proven unstable or not 
completed. :) Plus there'd be value in signaling that to users.

Hm. I think that implies feature categories of "alpha (experimental, may be 
removed (see MV's))", "beta (api stable and expected to be prod-hardened)", and 
then prod ready. Bit of a pandora's box now that I type this out; sorry.

On Wed, Jul 26, 2023, at 6:38 PM, J. D. Jordan wrote:
> 
> I think this plan seems reasonable to me. +1
> 
> -Jeremiah
> 
>> On Jul 26, 2023, at 5:28 PM, Mick Semb Wever  wrote:
>> 
>> 
>> The previous thread¹ on when to freeze 5.0 landed on freezing the first week 
>> of August, with a waiver in place for TCM and Accord to land later (but 
>> before October).
>> 
>> With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work 
>> that hasn't landed is Vector search (CEP-30).  
>> 
>> Are there any objections to a waiver on Vector search?  All the groundwork: 
>> SAI and the vector type; has been merged, with all remaining work expected 
>> to land in August.
>> 
>> I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 
>> and a long list of flakies.  It takes time and patience to triage and 
>> identify the bugs that hit us before GA.  The freeze is about being "mostly 
>> feature complete",  so we have room for things before our first beta 
>> (precedence is to ask).   If we hope for a GA by December, account for the 6 
>> weeks turnaround time for cutting and voting on one alpha, one beta, and one 
>> rc release, and the quiet period that August is, we really only have 
>> September and October left.  
>> 
>> I already feel this is asking a bit of a miracle from us given how 4.1 went 
>> (and I'm hoping I will be proven wrong). 
>> 
>> In addition, are there any objections to cutting an 5.0-alpha1 release as 
>> soon as we freeze?  
>> 
>> This is on the understanding vector, tcm and accord will become available in 
>> later alphas.  Originally the discussion¹ was waiting for Accord for alpha1, 
>> but a number of folk off-list have requested earlier alphas to help with 
>> testing.
>> 
>> 
>> ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-07-27 Thread Josh McKenzie
+1 to the change pre 5.0.

Any committers have bandwidth to review 
https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-14667?

PR can be found here: https://github.com/apache/cassandra/pull/2238/files

On Thu, Jul 27, 2023, at 7:59 AM, Maxim Muzafarov wrote:
> Bump this topic up for visibility as the code freeze is coming soon.
> 
> This seems like a good change to include in 5.0 as this kind of
> library upgrade is more natural when the major version changes. It is
> still possible to postpone it to 6.0, but the main concern here is
> that the current version of dropwizard metrics library is obsolete and
> no longer supported and it is better to avoid emergencies that could
> arise (like the panic with log4j library upgrade some time ago).
> 
> The change itself is straightforward and deserves more eyes on it from
> my point of view.
> 
> On Fri, 21 Jul 2023 at 14:51, Maxim Muzafarov  wrote:
> >
> > Hello everyone,
> >
> > It still needs a pair of eyes to push it forward.
> >
> >
> > I came across another good thing that might help us to overcome the
> > difficulties with the dropwizard metrics dependency upgrade. The
> > change relates to the driver itself and reuses the same approach that
> > was used to deal with the driver's netty dependencies. We need to
> > shade the dropwizard metrics classes and no longer rely on the
> > cassandra classpath at least for the 3.x version of the java driver,
> > and make the next 3.11.4 release of the java driver accordingly.
> >
> > The changes for the driver are here:
> > https://github.com/datastax/java-driver/pull/1685
> >
> > This will give us (and users as well) the confidence to move forward
> > with this change to 5.x alongside the 3.11 version of the driver
> > usage. Looking forward to your thoughts.
> >
> > Changes for the Cassandra part are here:
> > https://github.com/apache/cassandra/pull/2238/files
> >
> > On Mon, 3 Jul 2023 at 15:15, Maxim Muzafarov  wrote:
> > >
> > > I'd like to mention the approach we took here: to untangle the driver
> > > update in tests with the dropwizard library version (cassandra-driver
> > > 3.11 requires the "old" JMXReporter classes in the classpath) we have
> > > copied the classes into the tests themselves, as it is allowed by the
> > > Apache License 2.0. This way we can update the metrics library itself
> > > and then update the driver used in the tests afterwards.
> > >
> > > If there are no objections, we need another committer to take a look
> > > at these changes:
> > > https://issues.apache.org/jira/browse/CASSANDRA-14667
> > > https://github.com/apache/cassandra/pull/2238/files
> > >
> > > Thanks in advance for your help!
> > >
> > > On Wed, 28 Jun 2023 at 16:04, Bowen Song via dev
> > >  wrote:
> > > >
> > > > IMHO, anyone upgrading software between major versions should expect to
> > > > see breaking changes. Introducing breaking or major changes is the whole
> > > > point of bumping major version numbers.
> > > >
> > > > Since the library upgrade need to happen sooner or later, I don't see
> > > > any reason why it should not happen in the 5.0 release.
> > > >
> > > >
> > > > On 27/06/2023 19:21, Maxim Muzafarov wrote:
> > > > > Hello everyone,
> > > > >
> > > > >
> > > > > We use the Dropwizard Metrics 3.1.5 library, which provides a basic
> > > > > set of classes to easily expose Cassandra internals to a user through
> > > > > various interfaces (the most common being JMX). We want to upgrade
> > > > > this library version in the next major release 5.0 up to the latest
> > > > > stable 4.2.19 for the following reasons:
> > > > > - the 3.x (and 4.0.x) Dropwizard Metrics library is no longer
> > > > > supported, which means that if we face a critical CVE, we'll still
> > > > > need to upgrade, so it's better to do it sooner and more calmly;
> > > > > - as of 4.2.5 the library supports jdk11, jdk17, so we will be in-sync
> > > > > [1] as well as having some of the compatibility fixes mentioned in the
> > > > > related JIRA [2];
> > > > > - there have been a few user-related requests [3][4] whose
> > > > > applications collide with the old version of the library, we want to
> > > > > help them;
> > > > >
> > > > >
> > > > > The problem
> > > > >
> > > > > The problem with simply upgrading is that the JmxReporter class of the
> > > > > library has moved from the com.codahale.metrics package in the 3.x
> > > > > release to the com.codahale.metrics.jmx package in the 4.x release.
> > > > > This is a problem for applications/tools that rely on the cassandra
> > > > > classpath (lib/jars) as after the upgrade they may be looking for the
> > > > > JmxReporter class which has changed its location.
> > > > >
> > > > > A good example of the problem that we (or a user) may face after the
> > > > > upgrade is our tests and the cassandra-driver-core 3.1.1, which uses
> > > > > the old 3.x version of the library in tests. Of course, in this case,
> > > > > we can upgrade the cassandra driver up to 4.x [5][6] 

Re: [Discuss] Repair inside C*

2023-07-27 Thread Josh McKenzie
> The idea that your data integrity needs to be opt-in has never made sense to 
> me from the perspective of either the product or the end user.
I could not agree with this more. 100%.

> The current (and past) state of things where running the DB correctly 
> **requires* *running a separate process (either community maintained or 
> official C* sidecar) is incredibly painful for folks.
I'm 50/50 on this (and I have some opinions here; bear with me :D ).

To me this goes beyond the question of just "where do we coordinate repair" 
into "what role does a node play vs. the sidecar and how does that intersect 
w/the industry today".

Having just 1 process you run on N machines is much nicer from an operations 
standpoint and it's *much* cleaner and easier for us as a project to not have 
to deal with signaling, shmem, and going down the IPC rabbit hole. A modular 
monolith, if you will.

That said, I feel like zeitgeist has been all-in in terms of microservices and 
control planes, whether they're the right solution or not. The affordances on 
being able to build out independent teams and large organization dev velocity, 
never-mind the ideal of being able to cleanly upgrade or rewrite internal 
components, is attractive enough on paper that it feels like most groups have 
gone that direction and accepted the perceived costs; I view Cassandra as being 
something of an architectural anachronism at this point. And to call back to 
the prior paragraph, I *think* you get all those positive affordances w/a 
modular monolith. Sadly, google trends 

 don't really give me a lot of hope there.

In an ideal world operators (or better yet, an automated operations process) 
would be able to dynamically adjust resource allocation to nodes based on 
"burstiness of the buffering" (i.e. lots of data building up in CL's needing to 
be flushed, or compaction need, or repair need); It's not immediately obvious 
to me how we'd gracefully do that in a single process paradigm in containers 
w/out becoming a noisy neighbor but it's not impossible. Kind of goes meta 
outside C*'s scope into how you're coordinating your hardware and software 
interactions; maybe that's the cleaner route: we clearly signal metrics for 
each major operation the DB needs to do to indicate their backlog and an 
external orchestration process / system / ??? handles the resource allocation. 
i.e. we don't take that on.

Certainly we can do a lot better when it comes to internal scheduling of DB 
operations to one another than we are today (start using cql rate limiting, 
dynamically determine a rolling average of needs to smooth out burst requests, 
make byte-based rate-limiting an option, user-space threads w/loom and some 
kind of QoS prioritization based on backlogs, etc).

I personally view moving maintenance tasks into the sidecar as a reasonable 
"first step satisficing compromise". If anything, that'd potentially give us 
some breathing room to get our house in order on the "I/O" process (as opposed 
to sidecar as "maintenance process") to then re-integrate things back in in a 
more clean, planned fashion with some better tools to do it right.

~Josh


On Wed, Jul 26, 2023, at 7:20 PM, C. Scott Andreas wrote:
> I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.
> 
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.
> 
> Once TCM has landed, we’ll have much stronger primitives for repair 
> orchestration in the database itself. But I don’t think that should block 
> progress on a repair scheduling solution in the sidecar, and there is nothing 
> that would prevent someone from continuing to use a sidecar-based solution in 
> perpetuity if they preferred.
> 
> - Scott
> 
> > On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> > 
> > I'm 100% in favor of repair being part of the core DB, not the sidecar.  
> > The current (and past) state of things where running the DB correctly 
> > *requires* running a separate process (either community maintained or 
> > official C* sidecar) is incredibly painful for folks.  The idea that your 
> > data integrity needs to be opt-in has never made sense to me from the 
> > perspective of either the product or the end user.
> > 
> > I've worked with way too many teams that have either configured this 
> > incorrectly or not at all.  
> > 
> > Ideally Cassandra would ship with repair built in and on by default.  Power 
> > users can disable if they want to continue to maintain their own repair 
> > tooling for some reason.
> > 
> > Jon
> > 
> >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> >> All,
> >> We had a brief discussion in [2] 

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Josh McKenzie
+1 to the "on by default" camp.

> What comes to mind is how we brought down people clusters and made sstables 
> unreadable with the introduction of the chunk_length configuration in 1.0
I think a key difference here is that changing chunk length is something that 
materially changes behavior and expectations w/a coupled system, whereas 
switching crypto providers has the much smaller failure mode of "the 
implementations aren't binary compatible even though they're supposed to be, 
and are very heavily tested TO be".

Totally agree that a "surprise! it didn't load so now your nodes won't start" 
approach would be a Very Bad Experience for users. Falling back from ACCP and 
squawking about the lack might actually be nice to help folks where it doesn't 
load / work / etc know to look into it. It really makes a material difference.

On Wed, Jul 26, 2023, at 4:02 PM, Jordan West wrote:
> It sounds like some of the concerns have shifted then. I would like to better 
> understand the YAML one. Like Jeremiah said it may be a better topic for the 
> ticket. Would appreciate an example exception or error people are concerned 
> about. 
> 
> If the issue is the “fail fast” on start I’m sure we can find a solution 
> everyone accepts and move forward. 
> 
> If we are agreed “on by default” is the way to go that’s awesome! 
> 
> Jordan 
> 
> On Wed, Jul 26, 2023 at 12:59 Jeremiah Jordan  
> wrote:
>> I had a discussion with Mick on slack.  His concern is not with enabling 
>> ACCP.  His concern is around the testing of the new C* yaml config code 
>> which is included in the patch that is used to decide if ACCP should be 
>> enabled or not, and if startup should fail if it can’t be enabled.
>> 
>> I agree.  We should make sure that the new C* yaml config code is solid 
>> before we commit this patch, especially when it has the possibility of cause 
>> node startup to fail on purpose.  But that should be a discussion for the 
>> ticket I think, not for this thread.
>> 
>> So I think we are back to the original question.  Should ACCP be used by 
>> default in trunk.  From what I have seen I do not see anyone who is against 
>> that?
>> 
>> -Jeremiah
>> 
>> 
>> On Jul 26, 2023 at 2:53:02 PM, Jordan West  wrote:
>>> +1 Scott. And agreed all involved are looking out for the best interests of 
>>> C* users. And I appreciate those with concerns contributing to addressing 
>>> them. 
>>> 
>>> I’m all for making upgrades smooth bc I do them so often. A huge portion of 
>>> our 4.1 qualification is “will it break on upgrade”? Because of that I’m 
>>> confident in this patch and concerned about many other areas. I think it’s 
>>> commedable to want to reach a point where teams have the trust in the 
>>> community to have done that for them but that starts w better test coverage 
>>> and concrete evidence. 
>>> 
>>> Given all that, I think we should move forward w Ayushi’s proposal to make 
>>> it on by default. 
>>> 
>>> Jordan 
>>> 
>>> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas  wrote:
 I think these concerns are well-intended, but they feel rooted in 
 uncertainty rather than in factual examples of areas where risk is 
 present. I would appreciate elaboration on the specific areas of risk that 
 folks imagine.
 
 I would encourage those who express skepticism to try the patch, and I 
 endorse Ayushi's proposal to enable it by default.
 
 
 – Scott
 
> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" 
>  wrote:
> 
> 
> We can make it opt-in, wait one major to see what bugs pop up and we 
> might do that opt-out eventually. We do not need to hurry up with this. I 
> understand everybody's expectations and excitement but it really boils 
> down to one line change in yaml. People who are so much after the 
> performance will be definitely aware of this knob to turn on to squeeze 
> even more perf ...
> 
> I look around dtests Jeremiah mentioned but I would just moved on and 
> make it opt-in if we are not 100% persuaded about it _yet_.
> 
> 
> From: Mick Semb Wever 
> Sent: Wednesday, July 26, 2023 20:48
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is 
> safe.
> 
> 
> 
> 
> What comes to mind is how we brought down people clusters and made 
> sstables unreadable with the introduction of the chunk_length 
> configuration in 1.0. It wasn't about how tested the compression 
> libraries were, but about the new configuration itself. Introducing 
> silent defaults has more surface area for bugs than introducing explicit 
> defaults that only apply to new clusters and are so opt-in for existing 
> clusters.
> 

Re: Tokenization and SAI query syntax

2023-07-24 Thread Josh McKenzie
> `column CONTAINS term`. Contains is used by both Java and Python for 
> substring searches, so at least some users will be surprised by term-based 
> behavior.
I wonder whether users are in their "programming language" headspace or in 
their "querying a database" headspace when interacting with CQL? i.e. this 
would only present confusion if we expected users to be thinking in the idioms 
of their respective programming languages. If they're thinking in terms of SQL, 
MATCHES would probably end up confusing them a bit since it doesn't match the 
general structure of the MATCH operator.

That said, I also think CONTAINS loses something important that you allude to 
here Jonathan:
> with corresponding query-time tokenization and analysis.  This means that the 
> query term is not always a substring of the original string!  Besides obvious 
> transformations like lowercasing, you have things like PhoneticFilter 
> available as well.
So to me, neither MATCHES nor CONTAINS are particularly great candidates.

So +1 to the "I don't actually hate it" sentiment on:
> column : term`. Inspired by Lucene’s syntax

On Mon, Jul 24, 2023, at 8:35 AM, Benedict wrote:
> 
> I have a strong preference not to use the name of an SQL operator, since it 
> precludes us later providing the SQL standard operator to users.
> 
> What about CONTAINS TOKEN term? Or CONTAINS TERM term?
> 
> 
>> On 24 Jul 2023, at 13:34, Andrés de la Peña  wrote:
>> 
>> `column = term` is definitively problematic because it creates an ambiguity 
>> when the queried column belongs to the primary key. For some queries we 
>> wouldn't know whether the user wants a primary key query using regular 
>> equality or an index query using the analyzer.
>> 
>> `term_matches(column, term)` seems quite clear and hard to misinterpret, but 
>> it's quite long to write and its implementation will be challenging since we 
>> would need a bunch of special casing around SelectStatement and functions.
>> 
>> LIKE, MATCHES and CONTAINS could be a bit misleading since they seem to 
>> evoke different behaviours to what they would have.
>> 
>> `column LIKE :term:` seems a bit redundant compared to just using `column : 
>> term`, and we are still introducing a new symbol.
>> 
>> I think I like `column : term` the most, because it's brief, it's similar to 
>> the equivalent Lucene's syntax, and it doesn't seem to clash with other 
>> different meanings that I can think of.
>> 
>> On Mon, 24 Jul 2023 at 13:13, Jonathan Ellis  wrote:
>>> Hi all,
>>> 
>>> With phase 1 of SAI wrapping up, I’d like to start the ball rolling on 
>>> aligning around phase 2 features.
>>> 
>>> In particular, we need to nail down the syntax for doing non-exact string 
>>> matches.  We have a proof of concept that includes full Lucene analyzer and 
>>> filter functionality – just the text transformation pieces, none of the 
>>> storage parts – which is the gold standard in this space.  For example, the 
>>> StandardAnalyzer [1] lowercases all terms and removes stopwords (common 
>>> words like “a”, “is”, “the” that are usually not useful to search against). 
>>>  Lucene also has classes that offer stemming, special case handling for 
>>> email, and many languages besides English [2].
>>> 
>>> What syntax should we use to express “rows whose analyzed tokens match this 
>>> search term?”
>>> 
>>> The syntax must be clear that we want to look for this term within the 
>>> column data using the configured index with corresponding query-time 
>>> tokenization and analysis.  This means that the query term is not always a 
>>> substring of the original string!  Besides obvious transformations like 
>>> lowercasing, you have things like PhoneticFilter available as well.
>>> 
>>> Here are my thoughts on some of the options:
>>> 
>>> `column = term`.  This is what the POC does today and it’s super confusing 
>>> to overload = to mean something other than exact equality.  I am not a fan.
>>> 
>>> `column LIKE term` or `column LIKE %term%`. The closest SQL operator, but 
>>> neither the wildcarded nor unwildcarded syntax matches the semantics of 
>>> term-based search.
>>> 
>>> `column MATCHES term`. I rather like this one, although Mike points out 
>>> that “match” has a meaning in the context of regular expressions that could 
>>> cause confusion here.
>>> 
>>> `column CONTAINS term`. Contains is used by both Java and Python for 
>>> substring searches, so at least some users will be surprised by term-based 
>>> behavior.
>>> 
>>> `term_matches(column, term)`. Postgresql FTS makes you use functions like 
>>> this for everything.  It’s pretty clunky, and we would need to make the 
>>> amazingly hairy SelectStatement even hairier to handle “use a function 
>>> result in a predicate” like this.
>>> 
>>> `column : term`. Inspired by Lucene’s syntax.  I don’t actually hate it.
>>> 
>>> `column LIKE :term:`. Stick with the LIKE operator but add a new symbol to 
>>> indicate term matching.  Arguably more SQL-ish than 

Cassandra project status, 2023-07-19

2023-07-19 Thread Josh McKenzie
In case you were wondering, if you switch your ToDo list structure enough times 
you're bound to have something slip through the cracks. Like, say, a periodic 
project status update email. If you're curious where I landed: "All tools are 
comparably insufficient in different ways". /sigh

Don't be like me; be better.

Last update was about 7 weeks ago. Have we all been lounging around all summer, 
taking vacation and doing nothing useful? We released 4.0.11 yesterday and the 
votes ongoing for 4.1.3, so I'm going to go with a "No". :) Either that or 
we're the kind of folks who work on open-source code for fun on vacation. 
¯\_(ツ)_/¯

Here's the CHANGES.txt for 4.1.3: 
https://github.com/apache/cassandra/blob/4.1.3-tentative/CHANGES.txt
Hit up the dev list to vote!

There's a community survey! Go take it. Please. :) From Patrick's thread on the 
dev ML:
> There are only 2 questions required, and the rest are all optional, so
> answer whatever you can. It’s all helpful information.
> 
> https://forms.gle/KVNd7UmUfcBuoNvF7
> 
> The survey will run until July 29, 2023. Once completed, the results will
> be anonymized and the results posted on http://cassandra.apache.org


*[New Contributors Getting Started]
*
https://the-asf.slack.com in #cassandra-dev is where we congregate. (reply to 
me on this email if you need an invite for your account), and reach out to the 
@cassandra_mentors alias with any questions about the code. Or just ask a 
public question about JMX vs. REST and watch the hornets nest come to life! :D

For a "where should I get started?" list of curated work, you can check that 
out here: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162.
 Anything in the "ToDo" column is a great place to get started.

Some useful links:
- Getting Started with Development on C*: 
https://cassandra.apache.org/_/development/gettingstarted.html
- Building and IDE integration (worktrees are your friend; msg me on slack if 
you need pointers): https://cassandra.apache.org/_/development/ide.html
- Code Style: https://cassandra.apache.org/_/development/code_style.html


*[Dev mailing list]
*
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-5-30|dto=2023-7-20:

Hm; 35 topics compared to the 52 from the last "oops I took too long" update. 
Looks like some were reasonably dense. What have we been up to?

Stefan Miklosovic was looking into the addition of wiremock into the project as 
part of CASSANDRA-16555: 
https://lists.apache.org/thread/s4q0159wsrw19kcco555xjnt6b60z8jx. No updates on 
that in about a month; seemed like there wasn't big concern on it but some 
questions about documentation since we have a couple mocking libs already. This 
was merged in as part of https://issues.apache.org/jira/browse/CASSANDRA-16555.

Jordan West reached out about changing our default SSL provider to either ACCP 
or tc-native: https://lists.apache.org/thread/lx43340ycmm2t22ds13wmjzxj5vvjly9. 
Some users have seen pretty significant reduction in CPU usage by switching 
over; this is being tracked in 
https://issues.apache.org/jira/browse/CASSANDRA-18624.

Some changes in the DropWizard JMXReporter were making a whole slew of people 
sad. API's people! They're like marriage, or better yet, having children: It's 
a long-term commitment. Maxim Muzafarov has a patch available in 
CASSANDRA-14667 for this. Dev thread: 
https://lists.apache.org/thread/g3v06c3xtrgtcsr4x52nwj6v7dj9l6dl, JIRA: 
https://issues.apache.org/jira/browse/CASSANDRA-14667

As a heads up to everyone, ant generate-idea-files now generates for JDK17 as 
well. ML thread: 
https://lists.apache.org/thread/0hnf7wtcf86pn6b2lydd4kqcfjw8n4t9, JIRA: 
https://issues.apache.org/jira/browse/CASSANDRA-18467

Jacek Lewandowski hit the list to update folks w/his plans around limiting 
paging by bytes and queries by memory; we're working through this on 
CASSANDRA-11745, though likely not before a freeze for 5.0 whenever that may 
be. ML thread: 
https://lists.apache.org/thread/9zymojzgq5sw9jpmghf36nh8yn1kwz6k, JIRA: 
https://issues.apache.org/jira/browse/CASSANDRA-11745

Brad Schoening reached out about a potential change in argument parsing for 
cassandra-stress over to the Apache Commons CLI library here: 
https://lists.apache.org/thread/hbg25nn1jobx69zrl1syv1vl3qc318tn. Ekaterina 
pointed out we need to update it but otherwise it's been pretty radio silent 
for 9 days; probably have a lazy consensus there. Changes here *could* disrupt 
folks that have automated work using cassandra-stress which is worth keeping in 
mind. Given how a) aggressively internal this tool is, and b) the number of 
different stress tools out there targeting end-users (tlp-stress, NoSQLBench, 
etc), my personal .02 is that we go for it and unify those options. If anyone 
else disagrees, now's the time.

Looks like we're going to deprecate the CloudstackSnitch snitch in 5.0 and let 
users know they're more or less on their own if they're using it since it's 
Very 

Re: Changing the output of tooling between majors

2023-07-13 Thread Josh McKenzie
> I just find it ridiculous we can not change "someProperty: 10" to "Some 
> Property: 10" and there is so much red tape about that.
Well, we're talking about programmatic parsing here. This feels like 
complaining about a compiler that won't let you build if you're missing a ;

We *can* change it, but that doesn't mean the aggregate cost/benefit across our 
entire ecosystem is worth it. The value of correcting a typo is pretty small, 
and the cost for everyone downstream is not. This is why we should spellcheck 
things in API's before we release them. :)

On Wed, Jul 12, 2023, at 2:45 PM, Miklosovic, Stefan wrote:
> Eric,
> 
> I appreciate your feedback on this, especially more background about where 
> you are comming from in the second paragraph.
> 
> I think we are on the same page afterall. I definitely understand that people 
> are depending on this output and we need to be careful. That is why I propose 
> to change it only each major. What I feel is that everybody's usage / 
> expectations is little bit different and outputs of the commands are very 
> diverse and it is hard to balance this so everybody is happy.
> 
> I am trying to come up with a solution which would not change the most 
> important commands unnecessarily while also having some free room to tweak 
> the existing commands where we see it appropriate. I just find it ridiculous 
> we can not change "someProperty: 10" to "Some Property: 10" and there is so 
> much red tape about that.
> 
> If I had to summarize this whole discussion, the best conclustion I can think 
> of is to not change what is used the most (this would probably need to be 
> defined more explicitly) and if we have to change something else we better 
> document that extensively and provide json/yaml for people to be able to 
> divorce from the parsing of human-readable format (which probably all agree 
> should not happen in the first place).
> 
> What I am afraid of is that in order to satisfy these conditions, if, for 
> example, we just want to fix a typo or the format of a key of some value, the 
> we would need to deliver JSON/YAML format as well if there is not any yet and 
> that would mean that the change of such triviality would require way more 
> work in terms of the implementation of JSON/YAML format output. Some commands 
> are quite sophisticated and I do not want to be blocked to change a field in 
> human-readable out because providing corresponding JSON/YAML format would be 
> gigantic portion of the work itself.
> 
> From what I see you guys want to condition any change by offering json/yaml 
> as well and I dont know if that is just not too much.
> 
> 
> 
> From: Eric Evans 
> Sent: Wednesday, July 12, 2023 19:48
> To: dev@cassandra.apache.org
> Subject: Re: Changing the output of tooling between majors
> 
> You don't often get email from eev...@wikimedia.org. Learn why this is 
> important
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> 
> 
> On Wed, Jul 12, 2023 at 1:54 AM Miklosovic, Stefan 
> mailto:stefan.mikloso...@netapp.com>> wrote:
> I agree with Jackson that having a different output format (JSON/YAML) in 
> order to be able to change the default output resolves nothing in practice.
> 
> As Jackson said, "operators who maintain these scripts aren’t going to 
> re-write them just because a better way of doing them is newly available, 
> usually they’re too busy with other work and will keep using those old 
> scripts until they stop working".
> 
> This is true. If this approach is adopted, what will happen in practice is 
> that we change the output and we provide a different format and then a user 
> detects this change because his scripts changed. As he has existing solution 
> in place which parses the text from human-readable output, he will try to fix 
> that, he will not suddenly convert all scripting he has to parsing JSON just 
> because we added it. Starting with JSON parsing might be done if he has no 
> scripting in place yet but then we would not cover already existing 
> deployments.
> 
> I think this is quite an extreme conclusion to draw.  If tooling had stable, 
> structured output formats, and if we documented an expectation that 
> human-readable console output was unstable, then presumably it would be safe 
> to assume that any new scripters would avail themselves of the stable 
> formats, or expect breakage later.  I think it's also fair to assume that at 
> least some people would spend the time to convert their scripts, particularly 
> if forced to revisit them (for example, after a breaking change to console 
> output).  As someone who manages several large-scale mission-critical 
> Cassandra clusters under constrained resources, this is how I would approach 
> it.
> 
> TL;DR Don't let perfect by the enemy of 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
> Revert for only trunk patches right? 
> I’d say we need to completely stabilize the environment, no noise before we 
> go into that direction.
Hm. Is the concern multi-branch reverts w/merge commits being awful? Because I 
hear that. Starting trunk-only would be reasonable enough I think, especially 
since we'd be bugfix only on other branches anyway and expect less test 
destabilization. This is tickling my memory a bit; I think we talked about... 
something? Different on how we handle CI and vetting on trunk compared to other 
branches. I'll have to dig around later and see if I can surface that.

I think completely stabilizing the environment is going to be something of a 
chicken / egg problem. Until we move away from our heterogenous execution 
environment w/constant degraded and failing agents and/or get more automated 
robustness (re-run stage w/just timed out tests for example), I don't think 
we'll be able to get to a completely stabilized environment. And IMO the "if 
you break it you buy it (revert)" approach would strictly serve to help us in 
our move in that direction.

As I type this out, it strikes me that this feels similar to being on-call for 
the code you write. When there's real-world stakes / pain / discomfort that 
*will be applied* to you if you're not thorough in your consideration, you 
think about things differently and it improves the quality of your work as a 
result.

I suspect the risk of having personal delivery timelines slip because your code 
introduced test failures would be a pretty strong incentive to both be more 
careful about how you work on what you're doing plus incentive to chip in and 
work on the CI environment as well to prevent any CI-stack specific errors in 
the future.

I think about this in terms of where the tax is being paid. If the pressure is 
applied to the person who contributed the code, they have to pay the tax. If we 
allow these kind of failures to rest in the system, the entire rest of the dev 
community pays the tax. The former seems less aggregate cost to us as a project 
than the latter to me?

On Wed, Jul 12, 2023, at 9:10 AM, Ekaterina Dimitrova wrote:
> Revert for only trunk patches right? 
> I’d say we need to completely stabilize the environment, no noise before we 
> go into that direction.
> 
> On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski  
> wrote:
>> Would it be re-opening the ticket or creating a new ticket with "revert of 
>> fix" ?
>> 
>> 
>> 
>> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova  
>> napisał(a):
>>> jenkins_jira_integration 
>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>>  script updating the JIRA ticket with test results if you cause a 
>>> regression + us building a muscle around reverting your commit if they 
>>> break tests.“
>>> 
>>> I am not sure people finding the time to fix their breakages will be solved 
>>> but at least they will be pinged automatically. Hopefully many follow Jira 
>>> updates.
>>> 
>>> “  I don't take the past as strongly indicative of the future here since 
>>> we've been allowing circle to validate pre-commit and haven't been 
>>> multiplexing.”
>>> I am interested to compare how many tickets for flaky tests we will have 
>>> pre-5.0 now compared to pre-4.1.
>>> 
>>> 
>>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>>> __
>>>> (This response ended up being a bit longer than intended; sorry about that)
>>>> 
>>>>> What is more common though is packaging errors,
>>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>>>> upgrade tests, being less responsive post-commit as you already
>>>>> moved on
>>>> *Two that ***should ***be resolved in the new regime:***
>>>> * Packaging errors should be caught pre as we're making the artifact 
>>>> builds part of pre-commit.
>>>> * I'm hoping to merge the commit log segment allocation so CDC allocator 
>>>> is the only one for 5.0 (and just bypasses the cdc-related work on 
>>>> allocation if it's disabled thus not impacting perf); the existing 
>>>> targeted testing of cdc specific functionality should be sufficient to 
>>>> confirm its correctness as it doesn't vary from the primary allocation 
>>>> path when it comes to mutation space in the buffer
>>>> * Upgrade tests are going to be part of the pre-commit suite
>>>> 
>>>> *Outstanding issues:***
>>>> * compression. If we just run with defaults we won't test all cases so 
>>>> errors could

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
I have a weak preference for re-opening the original ticket and tracking the 
revert + fix there. Keeps the workflow in one place. "Downside" is having 
multiple commits with "CASSANDRA-XX" in the message but that might actually 
be a nice thing grepping through to see what changes were made for a specific 
effort.

> I am not sure people finding the time to fix their breakages will be solved 
> but at least they will be pinged automatically.
That's where the "muscle around git revert" comes in. If we all agree to revert 
patches that break tests, fix them, and then re-merge them, I think that both 
keeps that work in the "original mental bucket required to be done", and also 
pressures all of us to take our pre-commit CI seriously and continue to refine 
it until such breakages don't occur, or occur so rarely they reach an 
acceptable level.

We also will offer the ability to run the pre-commit suite pre-merge or the 
post-commit suite pre-merge for folks who would prefer that approach to 
investment (machine time vs. risk of human time).

On Wed, Jul 12, 2023, at 8:52 AM, Jacek Lewandowski wrote:
> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
> 
> 
> 
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova  
> napisał(a):
>> jenkins_jira_integration 
>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>  script updating the JIRA ticket with test results if you cause a regression 
>> + us building a muscle around reverting your commit if they break tests.“
>> 
>> I am not sure people finding the time to fix their breakages will be solved 
>> but at least they will be pinged automatically. Hopefully many follow Jira 
>> updates.
>> 
>> “  I don't take the past as strongly indicative of the future here since 
>> we've been allowing circle to validate pre-commit and haven't been 
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have 
>> pre-5.0 now compared to pre-4.1.
>> 
>> 
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>> __
>>> (This response ended up being a bit longer than intended; sorry about that)
>>> 
>>>> What is more common though is packaging errors,
>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>>> upgrade tests, being less responsive post-commit as you already
>>>> moved on
>>> *Two that ***should ***be resolved in the new regime:***
>>> * Packaging errors should be caught pre as we're making the artifact builds 
>>> part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator is 
>>> the only one for 5.0 (and just bypasses the cdc-related work on allocation 
>>> if it's disabled thus not impacting perf); the existing targeted testing of 
>>> cdc specific functionality should be sufficient to confirm its correctness 
>>> as it doesn't vary from the primary allocation path when it comes to 
>>> mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>> 
>>> *Outstanding issues:***
>>> * compression. If we just run with defaults we won't test all cases so 
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we have 
>>> a transient burst of these types of issues? And would we expect these to 
>>> vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a combination 
>>> of the jenkins_jira_integration 
>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>>  script updating the JIRA ticket with test results if you cause a 
>>> regression + us building a muscle around reverting your commit if they 
>>> break tests.
>>> 
>>> To quote Jacek:
>>>> why don't run dtests w/wo sstable compression x w/wo internode encryption 
>>>> x w/wo vnodes, 
>>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
>>>> think this is a matter of cost vs result. 
>>> 
>>> I think we've organically made these decisions and tradeoffs in the past 
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now t

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Josh McKenzie
(This response ended up being a bit longer than intended; sorry about that)

> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
*Two that ***should ***be resolved in the new regime:**
*
* Packaging errors should be caught pre as we're making the artifact builds 
part of pre-commit.
* I'm hoping to merge the commit log segment allocation so CDC allocator is the 
only one for 5.0 (and just bypasses the cdc-related work on allocation if it's 
disabled thus not impacting perf); the existing targeted testing of cdc 
specific functionality should be sufficient to confirm its correctness as it 
doesn't vary from the primary allocation path when it comes to mutation space 
in the buffer
* Upgrade tests are going to be part of the pre-commit suite

*Outstanding issues:**
*
* compression. If we just run with defaults we won't test all cases so errors 
could pop up here
* system_ks_directory related things: is this still ongoing or did we have a 
transient burst of these types of issues? And would we expect these to vary 
based on different JDK's, non-default configurations, etc?
* Being less responsive post-commit: My only ideas here are a combination of 
the jenkins_jira_integration 
<https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
 script updating the JIRA ticket with test results if you cause a regression + 
us building a muscle around reverting your commit if they break tests.

To quote Jacek:
> why don't run dtests w/wo sstable compression x w/wo internode encryption x 
> w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 

I think we've organically made these decisions and tradeoffs in the past 
without being methodical about it. If we can:
1. Multiplex changed or new tests
2. Tighten the feedback loop of "tests were green, now they're *consistently* 
not, you're the only one who changed something", and
3. Instill a culture of "if you can't fix it immediately revert your commit"

Then I think we'll only be vulnerable to flaky failures introduced across 
different non-default configurations as side effects in tests that aren't 
touched, which *intuitively* feels like a lot less than we're facing today. We 
could even get clever as a day 2 effort and define packages in the primary 
codebase where changes take place and multiplex (on a smaller scale) their 
respective packages of unit tests in the future if we see problems in this area.

Flakey tests are a giant pain in the ass and a huge drain on productivity, 
don't get me wrong. *And* we have to balance how much cost we're paying before 
each commit with the benefit we expect to gain from that. I don't take the past 
as strongly indicative of the future here since we've been allowing circle to 
validate pre-commit and haven't been multiplexing.

Does the above make sense? Are there things you've seen in the trenches that 
challenge or invalidate any of those perspectives?

On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
> Isn't novnodes a special case of vnodes with n=1 ?
> 
> We should rather select a subset of tests for which it makes sense to run 
> with different configurations. 
> 
> The set of configurations against which we run the tests currently is still 
> only the subset of all possible cases. 
> I could ask - why don't run dtests w/wo sstable compression x w/wo internode 
> encryption x w/wo vnodes, 
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
> think this is a matter of cost vs result. 
> This equation contains the likelihood of failure in configuration X, given 
> there was no failure in the default 
> configuration, the cost of running those tests, the time we delay merging, 
> the likelihood that we wait for 
> the test results so long that our branch diverge and we will have to rerun 
> them or accept the fact that we merge 
> a code which was tested on outdated base. Eventually, the overall new 
> contributors experience - whether they 
> want to participate in the future.
> 
> 
> 
> śr., 12 lip 2023 o 07:24 Berenguer Blasi  
> napisał(a):
>> On our 4.0 release I remember a number of such failures but not recently. 
>> What is more common though is packaging errors, 
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests, 
>> being less responsive post-commit as you already moved on,... Either the 
>> smoke pre-commit has approval steps for everything or we should give imo a 
>> devBranch alike job to the dev pre-commit. I find it terribly useful. My 
>> 2cts.
>> 
>> On 11/7/23 18:26, Josh McKenzie wrote:
>>>>

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-11 Thread Josh McKenzie
> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at 
> reviewer's discretion
In general, maybe offering a dev the option of choosing either "pre-commit 
smoke" or "post-commit full" at their discretion for any work would be the 
right play.

A follow-on thought: even with something as significant as Accord, TCM, Trie 
data structures, etc - I'd be a bit surprised to see tests fail on JDK17 that 
didn't on 11, or with vs. without vnodes, in ways that weren't immediately 
clear the patch stumbled across something surprising and was immediately 
trivially attributable if not fixable. *In theory* the things we're talking 
about excluding from the pre-commit smoke test suite are all things that are 
supposed to be identical across environments and thus opaque / interchangeable 
by default (JDK version outside checking build which we will, vnodes vs. non, 
etc).

Has that not proven to be the case in your experience?

On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
> A strong +1 to getting to a single CI system. CircleCI definitely has some 
> niceties and I understand why it's currently used, but right now we get 2 CI 
> systems for twice the price. +1 on the proposed subsets.
> 
> Derek
> 
> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie  wrote:
>> __
>> I'm personally not thinking about CircleCI at all; I'm envisioning a world 
>> where all of us have 1 CI *software* system (i.e. reproducible on any env) 
>> that we use for pre-commit validation, and then post-commit happens on 
>> reference ASF hardware.
>> 
>> So:
>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green, 
>> merge.
>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link back 
>> to the JIRA where the commit took place
>> 
>> Circle would need to remain in lockstep with the requirements for point 1 
>> here.
>> 
>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>> +1 to Josh which is exactly my line of thought as well. But that is only 
>>> valid if we have a solid Jenkins that will eventually run all test configs. 
>>> So I think I lost track a bit here. Are you proposing:
>>> 
>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) 
>>> config of tests
>>> 
>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in 
>>> case of problems?
>>> 
>>> Or sthg different like having 1 also in Jenkins?
>>> 
>>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>>> I think 500 runs combining all configs could be reasonable, since it's 
>>>> unlikely to have config-specific flaky tests. As in five configs with 100 
>>>> repetitions each.
>>>> 
>>>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>>>>> Maybe. Kind of depends on how long we write our tests to run doesn't it? 
>>>>> :)
>>>>> 
>>>>> But point taken. Any non-trivial test would start to be something of a 
>>>>> beast under this approach.
>>>>> 
>>>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>>>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie  
>>>>>> wrote:
>>>>>> > 3. Multiplexed tests (changed, added) run against all JDK's and a 
>>>>>> > broader range of configs (no-vnode, vnode default, compression, etc)
>>>>>> 
>>>>>> I think this is going to be too heavy...we're taking 500 iterations
>>>>>> and multiplying that by like 4 or 5?
>>>>>> 
>>>>> 
>> 
> 
> 
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 


Re: Removal of CloudstackSnitch

2023-07-10 Thread Josh McKenzie
> 2) keep it there in 5.0 but mark it @Deprecated
I'd say Deprecate, log warnings that it's not supported nor maintained and 
people to use it at their own risk, and that it's going to be removed.

That is, assuming the maintenance burden of it isn't high. I assume not since, 
as Brandon said, they're quite pluggable and well modularized.

On Mon, Jul 10, 2023, at 9:57 AM, Brandon Williams wrote:
> I agree with Ekaterina, but also want to point out that snitches are
> pluggable, so whatever we do should be pretty safe.  If someone
> discovers after the removal that they need it, they can just plug it
> back in.
> 
> Kind Regards,
> Brandon
> 
> On Mon, Jul 10, 2023 at 8:54 AM Ekaterina Dimitrova
>  wrote:
> >
> > Hi Stefan,
> >
> > I think we should follow our deprecation rules and deprecate it in 5.0, 
> > potentially remove in 6.0. (Deprecate in one major, remove in the next 
> > major)
> > Maybe the deprecation can come with a note on your findings for the users, 
> > just in case someone somewhere uses it and did not follow the user mailing 
> > list?
> >
> > Thank you
> > Ekaterina
> >
> > On Mon, 10 Jul 2023 at 9:47, Miklosovic, Stefan 
> >  wrote:
> >>
> >> Hi list,
> >>
> >> I want to ask about the future of CloudstackSnitch.
> >>
> >> This snitch was added 9 years ago (1). I contacted the original author of 
> >> that snitch, Pierre-Yves Ritschard, who is currently CEO of a company he 
> >> coded that snitch for.
> >>
> >> In a nutshell, Pierre answered that he does not think this snitch is 
> >> relevant anymore and the company is using different way how to fetch 
> >> metadata from a node, rendering CloudstackSnitch, as is, irrelevant for 
> >> them.
> >>
> >> I also wrote an email to user ML list (2) about two weeks ago and nobody 
> >> answered that they are using it either.
> >>
> >> The current implementation is using this approach (3) but I think that it 
> >> is already obsolete in the snitch because snitch is adding a path to 
> >> parsed metadata service IP which is probably not there at all in the 
> >> default implementation of Cloudstack data server.
> >>
> >> What also bothers me is that we, as a community, seem to not be able to 
> >> test the functionality of this snitch as I do not know anybody with a 
> >> Cloudstack deployment who would be able to test this reliably.
> >>
> >> For completeness, in (1), Brandon expressed his opinion that unless users 
> >> come forward for this snitch, he thinks the retiring it is the best option.
> >>
> >> For all cloud-based snitches, we did the refactorization of the code in 
> >> 16555 an we work on improvement in 18438 which introduces a generic way 
> >> how metadata services are called and plugging in custom logic or reusing a 
> >> default implementation of a cloud connector is very easy, further making 
> >> this snitch less relevant.
> >>
> >> This being said, should we:
> >>
> >> 1) remove it in 5.0
> >> 2) keep it there in 5.0 but mark it @Deprecated
> >> 3) keep it there
> >>
> >> Regards
> >>
> >> (1) https://issues.apache.org/jira/browse/CASSANDRA-7147
> >> (2) https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3
> >> (3) 
> >> https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dns
> 


Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-10 Thread Josh McKenzie
>  • Remove the checkstyle dependency from "jar" and "test"
>  • Create a single "check" target that includes all the checks we expect to 
> pass in the CI (currently Checkstyle, RAT, and Eclipse-Warnings), making this 
> task the default.
+1 here.

(of note: haven't forgotten the request from this thread to share local env; 
just gotten sidetracked by things and also realized how little I've actually 
modified locally since I just run most of the linting against delta'ed files 
only to keep my changed work in compliance. Still a very noisy mess when 
SpotBugs is run against the entire codebase proper)

On Mon, Jul 10, 2023, at 7:13 AM, Brandon Williams wrote:
> On Mon, Jul 10, 2023 at 6:07 AM Jacek Lewandowski
>  wrote:
> > Remove the checkstyle dependency from "jar" and "test"
> > Create a single "check" target that includes all the checks we expect to 
> > pass in the CI (currently Checkstyle, RAT, and Eclipse-Warnings), making 
> > this task the default.
> 
> I support this.  Having checkstyle run when building is clearly
> constant friction for many, even though you can disable it.
> 


Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-10 Thread Josh McKenzie
I'm personally not thinking about CircleCI at all; I'm envisioning a world 
where all of us have 1 CI *software* system (i.e. reproducible on any env) that 
we use for pre-commit validation, and then post-commit happens on reference ASF 
hardware.

So:
1: Pre-commit subset of tests (suites + matrices + env) runs. On green, merge.
2: Post-commit tests (all suites, matrices, env) runs. If failure, link back to 
the JIRA where the commit took place

Circle would need to remain in lockstep with the requirements for point 1 here.

On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
> +1 to Josh which is exactly my line of thought as well. But that is only 
> valid if we have a solid Jenkins that will eventually run all test configs. 
> So I think I lost track a bit here. Are you proposing:
> 
> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD) config 
> of tests
> 
> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you in 
> case of problems?
> 
> Or sthg different like having 1 also in Jenkins?
> 
> On 7/7/23 17:55, Andrés de la Peña wrote:
>> I think 500 runs combining all configs could be reasonable, since it's 
>> unlikely to have config-specific flaky tests. As in five configs with 100 
>> repetitions each.
>> 
>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>>> Maybe. Kind of depends on how long we write our tests to run doesn't it? :)
>>> 
>>> But point taken. Any non-trivial test would start to be something of a 
>>> beast under this approach.
>>> 
>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie  wrote:
>>>> > 3. Multiplexed tests (changed, added) run against all JDK's and a 
>>>> > broader range of configs (no-vnode, vnode default, compression, etc)
>>>> 
>>>> I think this is going to be too heavy...we're taking 500 iterations
>>>> and multiplying that by like 4 or 5?
>>>> 
>>> 


Re: Changing the output of tooling between majors

2023-07-08 Thread Josh McKenzie
 are
> 
> describecluster
> describering
> failuredetector
> gcstats
> getauditlog
> getauthcacheconfig
> getconcurrency
> getendpoints
> getfullquerylog
> getlogginglevels
> getseeds
> info
> listpendinghints
> netstats
> profileload (replacement of toppartition (which should be removed in 5.0, 
> actually))
> proxyhistograms
> rangekeysample
> repair
> repair_admin
> ring
> status
> statusautocompaction
> statusbinary
> statusgossip
> tablehistograms
> toppartitions
> viewbuildstatus
> 
> From these, if one asks which ones actually make sense to try to tweak the 
> output of, they might be
> 
> describecluster
> describering
> info
> listpendinghints
> netstats
> proxyhistograms
> repair_admin (if somebody wants to list stuff in json)
> ring
> status
> tablehistograms
> viewbuildstatus
> 
> The point I want to make is that I do not think the problem of changing the 
> output is too hot. There is basically like 15 at most commands for which the 
> output matter because there is not their CQL equivalent or JSON / YAML output.
> 
> If we are providing CQL / JSON / YAML for couple years, I do not believe that 
> the argument "lets not break it for folks in nodetool" is still relevant. CQL 
> output is there from times of 4.0 at least (at least!) and YAML / JSON is 
> also not something completely new. It is not like we are suddenly forcing 
> people to change their habits, there was enough time to update the stuff to 
> CQL / json / yaml etc ...
> 
> But really, the question I still don't have an answer for is who is actually 
> parsing the output, I think I ping user ML list to probe the situation a 
> little bit.
> 
> (1) https://gist.github.com/smiklosovic/3f4ea8ccae53ad503af13c53789815be
> (2) https://gist.github.com/smiklosovic/f9a681016c22e2dfe88c883b6881cb7c
> 
> 
> From: Josh McKenzie 
> Sent: Saturday, July 8, 2023 14:47
> To: dev
> Subject: Re: Changing the output of tooling between majors
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Once there is, we are free to change the default output however we want.
> One thing I always try to keep in mind on discussions like this. A thought 
> experiment (with very hand-wavy numbers; try not to get hung up on them):
> 
> * Let's say there are 5,000 discrete "users" of C* out there (different 
> groups of people using the DB)
> * And assume 5% have written some kind of scripting / automation to parse our 
> tooling output (250)
> * And let's assume it'd take 18 developer hours (a few days at 6 hours/day) 
> to retool to the new output, validate and test correctness, and then roll it 
> out to qa, test, validate, and then to prod, test, validate
> 
> You're looking at 250 * 18 hours, 4,500 hours, 112.5 40 hour work weeks (2+ 
> years for some poor sod without vacations) worth of work from what seems to 
> be a simple change.
> 
> Now, that estimate could be off by an order of magnitude either way, but the 
> motion of the exercise is valuable, I think. There's a real magnified 
> downstream cost to our community when we make changes to APIs and we need to 
> weigh that against the cost to the project in terms of maintaining those 
> interfaces.
> 
> The above mental exercise really strongly applies to the periodic discussions 
> where we talk about deprecating JMX support.
> 
> Not saying we should or shouldn't change things here for the record, just 
> want to call this out for anyone that might not have been thinking about 
> things this way.
> 
> On Fri, Jul 7, 2023, at 3:23 PM, Brandon Williams wrote:
> On Fri, Jul 7, 2023 at 2:20 PM Miklosovic, Stefan
> mailto:stefan.mikloso...@netapp.com>> wrote:
> >
> > Great thanks. That might work.
> >
> > So we do not change the default output unless there is json / yaml 
> > equivalent.
> >
> > Once there is, we are free to change the default output however we want.
> 
> Yes, exactly.  Then we have the best of both worlds: programmatic
> access that isn't flimsy, and a pretty display however we want it.
> 
> 
> 


Re: Changing the output of tooling between majors

2023-07-08 Thread Josh McKenzie
> Once there is, we are free to change the default output however we want.
One thing I always try to keep in mind on discussions like this. A thought 
experiment (with very hand-wavy numbers; try not to get hung up on them):

* Let's say there are 5,000 discrete "users" of C* out there (different groups 
of people using the DB)
* And assume 5% have written some kind of scripting / automation to parse our 
tooling output (250)
* And let's assume it'd take 18 developer hours (a few days at 6 hours/day) to 
retool to the new output, validate and test correctness, and then roll it out 
to qa, test, validate, and then to prod, test, validate

You're looking at 250 * 18 hours, 4,500 hours, 112.5 40 hour work weeks (2+ 
years for some poor sod without vacations) worth of work from what seems to be 
a simple change.

Now, that estimate could be off by an order of magnitude either way, but the 
motion of the exercise is valuable, I think. There's a real magnified 
downstream cost to our community when we make changes to APIs and we need to 
weigh that against the cost to the project in terms of maintaining those 
interfaces.

The above mental exercise *really strongly* applies to the periodic discussions 
where we talk about deprecating JMX support.

Not saying we should or shouldn't change things here for the record, just want 
to call this out for anyone that might not have been thinking about things this 
way.

On Fri, Jul 7, 2023, at 3:23 PM, Brandon Williams wrote:
> On Fri, Jul 7, 2023 at 2:20 PM Miklosovic, Stefan
>  wrote:
> >
> > Great thanks. That might work.
> >
> > So we do not change the default output unless there is json / yaml 
> > equivalent.
> >
> > Once there is, we are free to change the default output however we want.
> 
> Yes, exactly.  Then we have the best of both worlds: programmatic
> access that isn't flimsy, and a pretty display however we want it.
> 


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-07 Thread Josh McKenzie
This really is great work Maxim; definitely appreciate all the hard work that's 
gone into it and I think the users will too.

In terms of where it should land, we discussed this type of question at length 
on the ML awhile ago and ended up codifying it in the wiki: 
https://cwiki.apache.org/confluence/display/CASSANDRA/Patching%2C+versioning%2C+and+LTS+releases

> When working on a ticket, use the following guideline to determine which 
> branch to apply it to (Note: See *How To Commit 
> * for details 
> on the commit and merge process)
> 
>  • Bugfix: apply to oldest applicable LTS and merge up through latest GA to 
> trunk
>• In the event you need to make changes on the merge commit, merge with 
> *-s ours *and revise the commit via *--amend*
>  • Improvement: apply to *trunk only (next release)*
>• *Note: refactoring and removing dead code qualifies as an Improvement; 
> our priority is stability on GA lines*
>  • New Feature: apply to *trunk only (next release)*
> Our priority is to keep the 2 LTS releases and latest GA stable while 
> releasing new "latest GA" on a cadence that provides new improvements and 
> functionality to users soon enough to be valuable and relevant.
> 

So in this case, target whatever unreleased next feature release (i.e. SEMVER 
MAJOR || MINOR) we have on deck.

On Thu, Jul 6, 2023, at 1:21 PM, Ekaterina Dimitrova wrote:
> Hi,
> 
> First of all, thank you for all the work! 
> I personally think that it should be ok to add a new column.
> 
> I will be very happy to see this landing in 5.0. 
> I am personally against porting this patch to 4.1. To be clear, I am sure you 
> did a great job and my response would be the same to every single person - 
> the configuration is quite wide-spread and the devil is in the details. I do 
> not see a good reason for exception here except convenience. There is no 
> feature flag for these changes too, right?
> 
> Best regards,
> Ekaterina
> 
> На четвъртък, 6 юли 2023 г. Miklosovic, Stefan  
> написа:
>> Hi Maxim,
>> 
>> I went through the PR and added my comments. I think David also reviewed it. 
>> All points you mentioned make sense to me but I humbly think it is necessary 
>> to have at least one additional pair of eyes on this as the patch is 
>> relatively impactful.
>> 
>> I would like to see additional column in system_views.settings of name 
>> "mutable" and of type "boolean" to see what field I am actually allowed to 
>> update as an operator.
>> 
>> It seems to me you agree with the introduction of this column (1) but there 
>> is no clear agreement where we actually want to put it. You want this whole 
>> feature to be committed to 4.1 branch as well which is an interesting 
>> proposal. I was thinking that this work will go to 5.0 only. I am not 
>> completely sure it is necessary to backport this feature but your 
>> argumentation here (2) is worth to discuss further.
>> 
>> If we introduce this change to 4.1, that field would not be there but in 5.0 
>> it would. So that way we will not introduce any new column to 
>> system_views.settings.
>> We could also go with the introduction of this column to 4.1 if people are 
>> ok with that.
>> 
>> For the simplicity, I am slightly leaning towards introducing this feature 
>> to 5.0 only.
>> 
>> (1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
>> (2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041
>> 
>> 
>> From: Maxim Muzafarov 
>> Sent: Friday, June 23, 2023 13:50
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change 
>> running configuration
>> 
>> NetApp Security WARNING: This is an external email. Do not click links or 
>> open attachments unless you recognize the sender and know the content is 
>> safe.
>> 
>> 
>> 
>> 
>> Hello everyone,
>> 
>> 
>> As there is a lack of feedback for an option to go on with and having
>> a discussion for pros and cons for each option I tend to agree with
>> the vision of this problem proposed by David :-) After a lot of
>> discussion on Slack, we came to the @ValidatedBy annotation which
>> points to a validation method of a property and this will address all
>> our concerns and issues with validation.
>> 
>> I'd like to raise the visibility of these changes and try to find one
>> more committer to look at them:
>> https://issues.apache.org/jira/browse/CASSANDRA-15254
>> https://github.com/apache/cassandra/pull/2334/files
>> 
>> I'd really appreciate any kind of review in advance.
>> 
>> 
>> Despite the number of changes +2,043 −302 and the fact that most of
>> these additions are related to the tests themselves, I would like to
>> highlight the crucial design points which are required to make the
>> SettingsTable virtual table updatable. Some of these have already been
>> discussed in this thread, and I would like to provide a brief 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-07 Thread Josh McKenzie
Maybe. Kind of depends on how long we write our tests to run doesn't it? :)

But point taken. Any non-trivial test would start to be something of a beast 
under this approach.

On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie  wrote:
> > 3. Multiplexed tests (changed, added) run against all JDK's and a broader 
> > range of configs (no-vnode, vnode default, compression, etc)
> 
> I think this is going to be too heavy...we're taking 500 iterations
> and multiplying that by like 4 or 5?
> 


  1   2   3   4   5   >