Re: Welcome Chris Bannister, James Hartig, Jackson Flemming and João Reis, as cassandra-gocql-driver committers

2024-09-12 Thread Dinesh Joshi
Congratulations, everyone!

On Thu, Sep 12, 2024 at 4:40 AM Mick Semb Wever  wrote:

> The PMC's members are pleased to announce that Chris Bannister, James
> Hartig, Jackson Flemming and João Reis have accepted invitations to
> become committers on the Drivers subproject.
>
> Thanks a lot for everything you have done with the gocql driver all these
> years.  We are very excited to see the driver now inside the Apache
> Cassandra project.
>
> Congratulations and welcome!!
>
> The Apache Cassandra PMC
>


Re: [DISCUSS] CASSANDRA-13704 Safer handling of out of range tokens

2024-09-12 Thread Dinesh Joshi
My 2c are below –

We have a patch that is preventing a known data loss issue. People may or
may not know they're suffering from this issue so this should go in all
supported versions of Cassandra with it enabled by default. Will this cause
issues for operators? Sure. Is it worth keeping this feature off to avoid
issues for operators? No. We can mitigate any upgrade related issues by
putting in a warning in the release notes.

I'm +1 on this patch landing in all supported branches of Cassandra and
feature being on by default with adequate warnings for the operator.

Thanks,

Dinesh



On Thu, Sep 12, 2024 at 12:56 PM Josh McKenzie  wrote:

> I'd like to propose we treat all data loss bugs as "fix by default on all
> supported branches even if that might introduce user-facing changes".
>
> Even if only N of M people on a thread have experienced it.
> Even if we only uncover it through testing (looking at you Harry).
>
> My gut tells me this is something we should have a clear cultural value
> system around as a project, and that value system should be "Above all
> else, we don't lose data". Just because users aren't aware it might be
> happening doesn't mean it's not a *massive* problem.
>
> I would bet good money that there are *a lot* of user-felt pains using
> this project that we're all unfortunately insulated from.
>
> On Thu, Sep 12, 2024, at 3:35 PM, Mick Semb Wever wrote:
>
> Great that the discussion explores the issue as well.
>
> So far we've heard three* companies being impacted, and four times in
> total…?  Info is helpful here.
>
> *) Jordan, you say you've been hit by _other_ bugs _like_ it.  Jon i'm
> assuming the company you refer to doesn't overlap. JD we know it had
> nothing to do with range movements and could/should have been prevented far
> simpler with operational correctness/checks.
>
> In the extreme, when no writes have gone to any of the replicas, what
> happened ? Either this was CL.*ONE, or it was an operational failure (not
> C* at fault).  If it's an operational fault, both the coordinator and the
> node can be wrong.  With CL.ONE, just the coordinator can be wrong and the
> problem still exists (and with rejection enabled the operator is now more
> likely to ignore it).
>
> WRT to the remedy, is it not to either run repair (when 1+ replica has
> it), or to load flushed and recompacted sstables (from the period in
> question) to their correct nodes.  This is not difficult, but
> understandably lost-sleep and time-intensive.
>
> Neither of the above two points I feel are that material to the outcome,
> but I think it helps keep the discussion on track and informative.   We
> also know there are many competent operators out there that do detect data
> loss.
>
>
>
> On Thu, 12 Sept 2024 at 20:07, Caleb Rackliffe 
> wrote:
>
> If we don’t reject by default, but log by default, my fear is that we’ll
> simply be alerting the operator to something that has already gone very
> wrong that they may not be in any position to ever address.
>
> On Sep 12, 2024, at 12:44 PM, Jordan West  wrote:
>
> 
> I’m +1 on enabling rejection by default on all branches. We have been bit
> by silent data loss (due to other bugs like the schema issues in 4.1) from
> lack of rejection on several occasions and short of writing extremely
> specialized tooling its unrecoverable. While both lack of availability and
> data loss are critical, I will always pick lack of availability over data
> loss. Its better to fail a write that will be lost than silently lose it.
>
> Of course, a change like this requires very good communication in NEWS.txt
> and elsewhere but I think its well worth it. While it may surprise some
> users I think they would be more surprised that they were silently losing
> data.
>
> Jordan
>
> On Thu, Sep 12, 2024 at 10:22 Mick Semb Wever  wrote:
>
> Thanks for starting the thread Caleb, it is a big and impacting patch.
>
> Appreciate the criticality, in a new major release rejection by default is
> obvious.   Otherwise the logging and metrics is an important addition to
> help users validate the existence and degree of any problem.
>
> Also worth mentioning that rejecting writes can cause degraded
> availability in situations that pose no problem.  This is a coordination
> problem on a probabilistic design, it's choose your evil: unnecessary
> degraded availability or mislocated data (eventual data loss).   Logging
> and metrics makes alerting on and handling the data mislocation possible,
> i.e. avoids data loss with manual intervention.  (Logging and metrics also
> face the same problem with false positives.)
>
> I'm +0 for rejection default in 5.0.1, and +1 for only logging default in
> 4.x
>
>
> On Thu, 12 Sept 2024 at 18:56, Jeff Jirsa  wrote:
>
> This patch is so hard for me.
>
> The safety it adds is critical and should have been added a decade ago.
> Also it’s a huge patch, and touches “everything”.
>
> It definitely belongs in 5.0. I’d probably reject by default in 5.0.1.
>
> 4.0 

Re: Welcome Jordan West and Stefan Miklosovic as Cassandra PMC members!

2024-08-30 Thread Dinesh Joshi
Congratulations to the both of you. Thank you for being valuable
members of the Apache Cassandra community and all the hard work you
have put in over the years!

Dinesh

On Fri, Aug 30, 2024 at 1:21 PM Jon Haddad  wrote:
>
> The PMC's members are pleased to announce that Jordan West and Stefan 
> Miklosovic have accepted invitations to become PMC members.
>
> Thanks a lot, Jordan and Stefan, for everything you have done for the project 
> all these years.
>
> Congratulations and welcome!!
>
> The Apache Cassandra PMC


Re: 【DISCUSS】The configuration of Commitlog archiving

2024-08-30 Thread Dinesh Joshi
Thanks for bringing this to the mailing list. I quickly skimmed the
feature and I agree with you that having an arbitrary command executed
could be dangerous. However, this is a 12 year old feature and so am
guessing there are people using it.

As far as locking down the feature, I don't think it is feasible to
lock it down as it allows execution of arbitrary scripts.

Dinesh

On Fri, Aug 30, 2024 at 9:14 AM guo Maxwell  wrote:
>
> Commitlog has the ability of archive  log file, see 
> CommitLogArchiver.java,  we can achieve the purpose of archive and restore 
> commitlog by configuring archive_command and restore_command in 
> commitlog_archiving.properties.The archive_command and restore_command can be 
> some linux/unix shell command.  However, I found that the shell command can 
> actually be filled with any script, even if "rm -rf" .I have tested this 
> situation and it finally succeeded with my test file being deleted.
>
> Personally, I think it is a dangerous behavior, because if there are no 
> system-level restrictions and users are allowed to do anything in these shell 
> commands. So here I want to discuss with you whether it is necessary to 
> impose any restrictions on use, or do we need a new way of 
> archiving/restoring commitlog?
>
> Of course, before that, I would also like to ask, how many people are using 
> archive and restore of commitlog? It seems that the commitlog archive code 
> has not been updated for a long time.
>
> I have two ideas.
> One is to make some restrictions on the command context based on the existing 
> usage methods, such as strictly only allowing the current cp/mv/ln %path to 
> %name.Other redundant strings in the command are not allowed.
> Another one , As I roughly investigated the archive of mysql and pg. They do 
> not give users too much space (I am talking about letting users define their 
> own archiving command ), and archive directly to a designated location. For 
> us, I feel that we can refer to c * Incremental backup of sstable,  add a 
> hardlink to the commitlog to the specified location, but this place may 
> modify the original configuration method, such as setting the archive 
> location and restoring location of the node through nodetool and deprecate 
> the  commitlog_archiving.properties configuration.
>
> I am just putting forward some views  here, and looking forward to your 
> feedback. 😀
>


Welcome Doug Rohrer as Cassandra Committer

2024-08-23 Thread Dinesh Joshi
The Apache Cassandra PMC is thrilled to announce that Doug Rohrer has
accepted the invitation to become a committer!

Doug has worked on several aspects of Cassandra, Sidecar, and
Analytics. Congratulations and welcome!

The Apache Cassandra PMC members


Re: Implement details of Protocol v5 framing

2024-08-14 Thread Dinesh Joshi
Hi Vincent,

This is the Cassandra user's mailing list. You can engage the Cassandra
developers on dev@cassandra.apache.org mailing list. I have bcc'd the user
list and added the dev list.

To answer your question, the spec may have drifted and may require a fix.
Please feel free to raise a jira and contribute a patch to the
documentation.

On a side note, why are you implementing your own driver?

Thanks,

Dinesh

On Sun, Aug 11, 2024 at 8:29 AM Vincent Rischmann 
wrote:

> Hello,
>
> this may not be the best place to ask this, feel free to redirect me.
>
> I'm working on writing a Cassandra client in my spare time and am
> currently implementing the framing that has been added in protocol v5.
>
> I followed the spec available here:
> https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L97
> but I hit an issue regarding the CRC32, when I tried to decode a frame
> generated by cqlsh (which I captured using wireshark) I couldn't get the
> right checksum.
>
> After debugging for a while I realized that the CRC32 hash is always
> initialized with 4 "magic" bytes:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/Crc.java#L38-L54
>
> Shouldn't this be added to the specification ?
>


Re: [VOTE] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-08-07 Thread Dinesh Joshi
+1

I think this is a low risk change which is going to avoid duplication of
effort so it is worth it.

On Sat, Aug 3, 2024 at 11:19 PM Yifan Cai  wrote:

> Hi,
>
> I am proposing backporting CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0.
>
> There is a discussion thread
>  on the
> topic. In summary, the backport would benefit Cassandra Analytics by
> providing a unified solution, and the patch is considered low-risk. While
> there are concerns about adding features to 4.0 and 4.1, there is generally
> support for 5.0.
>
> The vote will be open for 72 hours (longer if needed). Votes by PMC
> members are considered binding. A vote passes if there are at least three
> binding +1s and no -1's.
>
> Kind regards,
> Yifan Cai
>


Re: [DISCUSS] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-07-31 Thread Dinesh Joshi
On Tue, Jul 30, 2024 at 11:47 AM Mick Semb Wever  wrote:

> This also incentivises intentionally not introducing support for that api
> in older mainlines.  We KISS, if the user wants that ecosystem benefit they
> need to upgrade to at least mainline X.
>
> Once older mainlines have it then we have this problem.  An alternative to
> the risk of having to always update all the mainlines, is to let the
> ecosystem branch to provide support for the different mainlines as/when
> needed.  Both are painful.
>

In order to support upgrades where clusters may be running the current
major - 1 version, we need to support two versions of Cassandra in the
Analytics library or Sidecar (due to rolling upgrades). So it makes sense
to backport to the current major and the prior major release.


Re: [DISCUSS] Backport CASSANDRA-19800 to Cassandra-4.0, 4.1 and 5.0

2024-07-30 Thread Dinesh Joshi
Any change we bring to stable releases, even though it is non-user facing
change, brings in the possibility of introducing bugs and / or unintended
side effects. Therefore it is important to carefully consider the trade
offs when we make changes to older releases. We also have a precedent to
make changes to older releases for testing.

In this particular case Yifan's patch is touching a fairly isolated class
and presents a low risk. The benefit here is that the analytics sub-project
would avoid additional work arounds for Cassandra 4 & 4.1. I am fine with
granting an exception for backporting to 4.1 and 4.0 in this case.

thanks,

Dinesh

On Tue, Jul 30, 2024 at 9:52 AM Yifan Cai  wrote:

> Here is my 2 cents. Maybe we need to differentiate the user-facing
> improvements and ecosystem-internal improvements, or have a discussion
> about it.
> I guess when the current policy of "improvements and new features on trunk
> only" was made, it was to target the user-facing improvements. The internal
> changes are not exposed to cassandra users directly.
> As Josh pointed out, with more projects (sidecar and analytics have
> dependency on cassandra public interface) in the ecosystem, we are more
> likely to encounter the scenarios where we want to modify the mainline
> branches for integration purposes.
>
> The downside of preventing the integration updates to the older branches
> is having different solutions per Cassandra version in the other projects
> under the Cassandra umbrella. It is a maintenance pain and potentially
> causes errors. It is my original motivation of backporting the patch to the
> other branches.
>
> - Yifan
>
> On Tue, Jul 30, 2024 at 6:04 AM Josh McKenzie 
> wrote:
>
>> Some thoughts:
>>
>>1. Most of our PMC votes are majority-based, not binding -1. So Jeff
>>being -1 doesn't mean the whole PMC being -1. So don't take his -1 as 
>> being
>>a show stopper or indicative of everyone on the PMC (and don't take me
>>saying this as the converse ;))
>>2. I expect we have a lot of debt when it comes to our ecosystem
>>integrations on older branches. Bringing those projects into the ASF
>>umbrella and into the project ecosystem is at odds with a hard policy of
>>"we don't add improvements or new features to old branches",
>>*specifically* in cases like this where the desire is to get uniform
>>support for ecosystem projects across all supported branches of C*
>>3. We're moving into a world where we will likely more frequently
>>modify the mainline branch with new functionality to integrate with
>>ecosystem changes (sidecar, analytics, drivers?). It's probably at least
>>worth a conversation as to whether our current policy (improvements and 
>> new
>>features main branch only) is optimal across everything equally or if 
>> there
>>should be nuance for ecosystem integrations.
>>4. To Jeff's point: everyone is always going to have some minor
>>improvement they'd like to back-port to older branches.
>>
>> I haven't thought deeply enough about this specific situation to have a
>> well formed opinion, but figured calling out the above things is worth
>> doing. This probably won't be the last time we look at our supported
>> branches and have some pain we'd like to address based on the inconsistent
>> ecosystem support and API piece across them.
>>
>> On Mon, Jul 29, 2024, at 1:32 PM, Yifan Cai wrote:
>>
>> It sounds like we are all good with backporting to 5.0.
>>
>> Thank you all for the feedback.
>>
>> - Yifan
>>
>> On Fri, Jul 26, 2024 at 12:21 PM Jeff Jirsa  wrote:
>>
>>
>>
>>
>> On Jul 26, 2024, at 11:09 AM, Yifan Cai  wrote:
>>
>> 
>> Thanks Jeff for restating the policy.
>>
>> According to the release lifecycle doc
>>
>>
>>- Missing features from newer generation releases are back-ported on
>>per - PMC vote basis.
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>>
>> We do not have a policy to prevent new features strictly for the branches
>> in maintenance state.
>>
>> IMO, the patch qualifies as the missing feature. (As said, it is useful
>> for Cassandra Analytics, and it is good to have the same bridge
>> implementation amongst different cassandra versions)
>>
>> Therefore, I would like to call for a vote.
>>
>>
>> Sure
>>
>> I’m -1 on 4.0 and 4.1
>>
>> - Jeff
>>
>>
>> On Fri, Jul 26, 2024 at 10:25 AM Jeff Jirsa  wrote:
>>
>> Everyone has a low risk change they want to backport to every branch, 4.0
>> and 4.1 in particular are way past the point we should be adding features
>>
>> The policy exists and it’s a pure feature not a regression
>>
>>
>>
>>
>>
>> > On Jul 26, 2024, at 9:59 AM, Brandon Williams  wrote:
>> >
>> > Given how low risk this is, I don't see an issue with backporting it
>> > and I'm sure the usefulness outweighs what risk there is. +1 (5.0.1
>> > though, not 5.0.0)
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> >> On Fri, Jul 26, 2024 at 11:52 AM Yifan Cai  wrote:
>> 

Re: Welcome Joey Lynch as Cassandra PMC member

2024-07-24 Thread Dinesh Joshi
Congratulations, Joey!

On Wed, Jul 24, 2024 at 12:26 PM Patrick McFadin  wrote:

> Every once in a while you see an announcement and think "Wasn't that
> already a thing?" and then realize it wasn't. This is one of those times.
> Congratulations Joey on the thing I thought you already were so clearly you
> deserve this!
>
>
> On Wed, Jul 24, 2024 at 11:57 AM David Capwell  wrote:
>
>> Congrats!
>>
>> On Jul 24, 2024, at 10:41 AM, Jon Haddad  wrote:
>>
>> Congrats Joey!
>>
>> On Wed, Jul 24, 2024 at 9:56 AM Francisco Guerrero 
>> wrote:
>>
>>> Congrats, Joey!
>>>
>>> On 2024/07/24 14:12:11 Benjamin Lerer wrote:
>>> >  The PMC's members are pleased to announce that Joey Lynch has
>>> accepted the
>>> > invitation to become a PMC member.
>>> >
>>> > Thanks a lot, Joey, for everything you have done for the project all
>>> these
>>> > years.
>>> >
>>> > Congratulations and welcome
>>> >
>>> > The Apache Cassandra PMC members
>>> >
>>>
>>
>>


Re: [DISCUSS] Replace airlift/airline library with Picocli

2024-07-20 Thread Dinesh Joshi
On Tue, Jul 16, 2024 at 8:48 AM Jeff Jirsa  wrote:

> if it’s unmaintained, let’s remove it before we’re doing it on fire.
>

+1

Fire drills are never pleasant.

CLI parsing isn't a huge area of personal interest to me. However, it
presents a non-trivial attack surface as input processing is a ripe target
for vulnerabilities. I don't know if there are vulnerabilities lying around
in hiding but if / when they are reported we will need to address them
outside of the library or migrate to a maintained library at that time.
Neither option is very appealing at that point. So I am of the opinion we
should transition to a maintained library with healthy community support.
Both picocli and commons-cli have good adoption and community around them.


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-08 Thread Dinesh Joshi
I agree about picking libraries on their merit but a major factor for any
open source project should consider today is the possibility of
unfavorable/hostile licensing changes.

On Mon, Jul 8, 2024 at 1:15 PM Jon Haddad  wrote:

> Without getting into the pros and cons of both libraries, I have to point
> out there's something unsettling about making decisions about libraries we
> used based on arbitrary rules an employer has put into place on its
> employees.  The project isn't governed by Apple, it's governed by
> individual contributors to open source.
>
> We need to pick libraries based on their merits.  Apple's draconian rules
> should not prevent us from using the best option available.
>
> Jon
>
>
> On Mon, Jul 8, 2024 at 12:57 PM Dinesh Joshi  wrote:
>
>> I agree, having a DISCUSS thread with a specific subject line is less
>> likely to be overlooked.
>>
>> One thing I'd like to note here is PicoCLI and Airline 2 are independent
>> projects that are ALv2 licensed. A subset of the Cassandra contributors may
>> have difficulty contributing to such projects due to preexisting policies
>> that their employers may have in place.
>>
>> I am concerned about hostile licensing changes in the future which will
>> necessitate another migration for us. That said, is there a specific reason
>> to not consider Apache Commons CLI[1]?
>>
>> Dinesh
>>
>> [1] https://commons.apache.org/proper/commons-cli/
>>
>> On Mon, Jul 8, 2024 at 10:22 AM David Capwell  wrote:
>>
>>> I don't think that a separate thread would add extra visibility
>>>
>>>
>>> Disagree.  This thread is about adding a feature branch, so many could
>>> ignore if they don’t care.  The fact you are switching the library (and
>>> which one) is something we have to hunt for.  By having a new DISCUSS
>>> thread it makes it clear which library you wish to add, and people can sign
>>> off if they care or not.
>>>
>>> I wouldn’t create this thread until you settle on which one you wish to
>>> move forward with.
>>>
>>> Is adding the PicoCLI library as a project dependency getting any objections
>>> from the Community?
>>>
>>>
>>> Thats the point of the new DISCUSS thread.  By being very clear you wish
>>> to add PicoCLI people can either validate we are allowed to, or raise any
>>> objections.  I have not really seen any pushback so far outside of 1 case
>>> that wasn’t legally allowed to be used.
>>>
>>> Take a look at previous threads about adding different libraries.
>>>
>>> On Jul 8, 2024, at 7:58 AM, Caleb Rackliffe 
>>> wrote:
>>>
>>> +1 on picocli
>>>
>>> RE the feature branch, I would just maintain the feature branch in your
>>> own fork to break out whatever "reviewable units" of code you want. When
>>> all the incremental review is done (I have no problem going back and
>>> forth), squash everything together, do whatever additional testing you
>>> need, and commit.
>>>
>>> On Fri, Jul 5, 2024 at 10:40 AM Maxim Muzafarov 
>>> wrote:
>>>
>>>> > Once you are happy with your chosen library, we need a DISCUSS thread
>>>> to add this new library (current protocol).
>>>>
>>>> Thanks, David. This is a good point, do we need a separate DISCUSS
>>>> thread or can we just use this one? I'm in favour of keeping the
>>>> discussion in one place, especially when topics are closely related. I
>>>> don't think that a separate thread would add extra visibility, but if
>>>> that is the way the community has adopted - no problem at all, I'll
>>>> repost.
>>>>
>>>>
>>>> The reasons for replacing the Airlift/Airline [1] with the PicoCli [2]
>>>> are as follows (in order of priority):
>>>>
>>>> 1. The library is under the Apache-2.0 License
>>>> https://github.com/remkop/picocli?tab=Apache-2.0-1-ov-file#readme
>>>>
>>>> 2. The project is active and well-maintained (last release on 8 May
>>>> 2024)
>>>> https://github.com/remkop/picocli/releases
>>>>
>>>> 3. The library has ZERO dependencies, in some of the cases a single
>>>> file can just be dropped into the sources (it's even pointed out in
>>>> the documentation)
>>>> https://picocli.info/#_add_as_source
>>>>
>>>> 4. Compared to the Airlift library, the PicoC

Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-08 Thread Dinesh Joshi
s don’t.  This
>> actually will slow you down as each commit now must be a JIRA, you go
>> through review of each, must show a success CI, etc.
>> >
>> > Now, if you wish to split this into multiple steps that is fine, but
>> the list of places is basically node tool (kinda has to go in at once) and
>> small CLIs.  If you wish to migrate the small ones in isolation first, I am
>> cool with that merging to w/e branch the logic is targeting, but you won’t
>> be able to break up node tool without breaking everything… but if you did
>> this in your own fork then no one cares.
>> >
>> > I won’t block a feature branch, but just don’t see a clear “why” and
>> only see cons.
>> >
>> > We are changing the command markup library, so there are two extra
>> > things to be checked:
>> > - We parse CLI arguments in the same way (as the parser is different
>> > in a new library);
>> > - The command help output is the same so that the user won't see any
>> difference;
>> >
>> >
>> > Personally I would POC a limited node tool change with JVM dtest as we
>> require passing the output to the test (the prototypes you listed doesn’t
>> include JVM Dtest integration).  If one library makes this more annoying,
>> then do we care about fancy new features we don’t use when it makes the
>> features we do use harder?  If you start with the smaller tools first then
>> spend a ton of time migrating node tool then find JVM dtest is broken, then
>> you will spend so much more time fixing this, I would strongly recommend
>> doing some throw away POC to make sure w/e way you go won’t break JVM
>> Dtest’s node tool support.
>> >
>> > Once you are fine with your selected library, we will need a DISCUSS
>> thread to add that new library (current protocol).  This mostly just makes
>> the pick more visible, and normally we only check simple things like “are
>> we legally allowed to use” and “is this project dead?”.
>> >
>> >
>> > On Jul 3, 2024, at 6:06 AM, Maxim Muzafarov  wrote:
>> >
>> > Thank you all for your comments,
>> >
>> > I want to stress, that these changes won't affect the input/output
>> > formatting of commands, ensuring everything is the same.
>> >
>> > We are changing the command markup library, so there are two extra
>> > things to be checked:
>> > - We parse CLI arguments in the same way (as the parser is different
>> > in a new library);
>> > - The command help output is the same so that the user won't see any
>> difference;
>> >
>> > Additional tests cover both cases.
>> >
>> > On Mon, 1 Jul 2024 at 20:08, Dinesh Joshi  wrote:
>> >
>> >
>> > I don't personally think there is a strong need for a feature branch.
>> If it makes it easy for you, please go ahead with a feature branch.
>> >
>> > One thing I had raised in the past was the desire to have a flag that
>> would generate machine readable output for nodetool commands. If this can
>> be done with a minor incremental effort, it would definitely reduce the
>> burden on operators / integrations that rely on the nodetool output. As I
>> have earlier indicated in the past, relying on human readable output for
>> CLI tools like nodetool is fragile and providing a JSON output as an
>> alternative is a great first step in eliminating that dependency. I'm just
>> curious about the level of effort. If it is too much or too invasive, we
>> can consider producing JSON output for inclusion in the next major release.
>> >
>> > On Fri, Jun 28, 2024 at 6:47 AM Maxim Muzafarov 
>> wrote:
>> >
>> >
>> > Hello everyone,
>> >
>> >
>> > The nodetool relies on the airlift/airline library to mark up the CLI
>> > commands used to manage Cassandra, which are part of our public API.
>> > This library is no longer maintained, so we need to update it anyway,
>> > and the good news is that we already have several good alternatives:
>> > airline-2 [3] or picocli [2].
>> >
>> > In this message, I'm mainly talking about CASSANDRA-17445 [4], which
>> > refers to the problem and is a prerequisite for a larger CEP-38 CQL
>> > Management API [5]. It doesn't make sense to use annotations from the
>> > deprecated library to build a new API, so this is another reason to
>> > update the library as soon as possible and do some inherently small
>> > code refactoring required for the CEP-38.
&

Re: [VOTE] CEP-42: Constraints Framework

2024-07-01 Thread Dinesh Joshi
+1

On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg  wrote:

> Hi,
>
> I am +1 on CEP-42 with the latest updates to the CEP to clarify syntax,
> error messages, constraint naming and generated naming, alter/drop,
> describe etc.
>
> I think this now tracks very closely to how other SQL databases define
> constraints and the syntax is easily extensible to multi-column and
> multi-table constraints.
>
> Ariel
>
> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>
> With all the feedback that came in the discussion thread after the call
> for votes, I’d like to extend the period another 72 hours starting today.
>
> As before, a vote passes if there are at least 3 binding +1s and no
> binding vetoes.
>
> Thanks,
> Bernardo Botella
>
> On Jun 24, 2024, at 7:17 AM, Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
> Hi everyone,
>
> I would like to start the voting for CEP-42.
>
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
> Discussion:
> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>
> The vote will be open for 72 hours. A vote passes if there are at least 3
> binding +1s and no binding vetoes.
>
> Thanks,
> Bernardo Botella
>
>
>


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-01 Thread Dinesh Joshi
I don't personally think there is a strong need for a feature branch. If it
makes it easy for you, please go ahead with a feature branch.

One thing I had raised in the past was the desire to have a flag that would
generate machine readable output for nodetool commands. If this can be done
with a minor incremental effort, it would definitely reduce the burden on
operators / integrations that rely on the nodetool output. As I have
earlier indicated in the past, relying on human readable output for CLI
tools like nodetool is fragile and providing a JSON output as an
alternative is a great first step in eliminating that dependency. I'm just
curious about the level of effort. If it is too much or too invasive, we
can consider producing JSON output for inclusion in the next major release.

On Fri, Jun 28, 2024 at 6:47 AM Maxim Muzafarov  wrote:

> Hello everyone,
>
>
> The nodetool relies on the airlift/airline library to mark up the CLI
> commands used to manage Cassandra, which are part of our public API.
> This library is no longer maintained, so we need to update it anyway,
> and the good news is that we already have several good alternatives:
> airline-2 [3] or picocli [2].
>
> In this message, I'm mainly talking about CASSANDRA-17445 [4], which
> refers to the problem and is a prerequisite for a larger CEP-38 CQL
> Management API [5]. It doesn't make sense to use annotations from the
> deprecated library to build a new API, so this is another reason to
> update the library as soon as possible and do some inherently small
> code refactoring required for the CEP-38.
>
> In addition to being widely used and well supported, the Picocli
> library offers the following advantages for us:
> - We can detach the jmx-specific parameters from the commands so that
> they can be reused in other APIs (e.g. without host, port) while
> remaining backwards compatible;
> - We can set up nodetool's autocompletion after the migration with
> minimal effort;
> - There is a good Picocli ecosystem of tools that we can use to
> simplify our codebase, e.g. generate man pages tool to make our CLIs
> more Unix friendly [7];
>
>
> = Prototype =
>
> I have a working prototype [8] that shows what the result will look
> like. The prototype includes:
> - Tests between the execution of commands via the nodetool and nodtoolv2;
> - 5 out of 164 nodetool commands have been moved so far, to show the
> refactoring we need to do to the command's body;
> - The command help output under for the nodetoolv2 is the same as it
> is currently for the nodetool and this is the default, however a
> "cassandra.cli.picocli.layout" is added to switch to the Picocli
> defaults;
> - You can also see that the colour scheme is applied by the Picocli
> out of the box, and this is how it looks [9];
> - The nodetoolv2 is called first when the shell is triggered, and if
> the nodetoolv2 doesn't contain the command it needs yet, it falls back
> to the nodetool and the old argument parser;
>
>
> Since the number of commands is quite large (164), I'd like to create
> a feature branch and move all the commands one at a time, while
> keeping the output backwards by applying additional tests at the same
> time and checking that the CI is always green. I think the "feature
> branch" approach will be less stressful for us since it focuses on
> requiring a review of only tedious changes to the feature branch,
> rather than reviewing the 15k line patch.
>
>
> Anyway, I am open to any suggestions and advice based on your
> experience and best practices for this case. Looking forward to your
> thoughts and suggestions.
>
>
>
> [1] https://github.com/airlift/airline
> [2] https://picocli.info/
> [3] https://github.com/rvesse/airline
> [4] https://issues.apache.org/jira/browse/CASSANDRA-17445
> [5]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API
> [6]
> https://github.com/apache/cassandra/pull/2497/files#diff-acdd5f29d28df5c02f4bfc933528f084508b4923112e312e68a4aff7df973bce
> [7] https://picocli.info/man/gen-manpage.html
> [8] https://github.com/apache/cassandra/pull/2497/files
> [9]
> https://github.com/apache/cassandra/assets/3415046/57b14ae0-ff59-43d2-b542-10d3218ae075
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-29 Thread Dinesh Joshi
The read time constraint application is going to be expensive and possibly
complicated to implement with low RoI. Therefore my suggestion is to defer
it. If there are situations where it appears to be helpful, we can always
reconsider it.

On Tue, Jun 25, 2024 at 3:34 PM Yifan Cai  wrote:

> - Alter and Drop constraints are as follows
>> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
>>
>
> I think you mean the following syntax to modify existing constraints,
> since constraints are part of the table definition.
> ALTER TABLE [keyspace_name.]table_name ALTER CONSTRAINT [constraint_name]
> CHECK check_expression
>
> Dinesh's proposal to check on read is a good addition. I think it is
> *optional* and should be enabled/disabled w/ configuration. The extra
> check may not be desirable in some circumstances, e.g. the use cases do not
> ever change the constraints and do not have other write data other than
> CQL.
> Since the original CEP defines that the constraints are applied at the
> write time, we need to update the CEP if we decide to include the check on
> read.
>
> - Yifan
>
>
> On Tue, Jun 25, 2024 at 1:13 PM Štefan Miklošovič 
> wrote:
>
>> I wonder how often it is that users will apply the constraints on tables
>> with data while they know their data is probably not compliant with the
>> constraint configuration. I humbly think that people are aware of this in
>> advance and what usually happens is that there is some kind of a job which
>> consolidates the data (or migrates them to a new table) before admins put a
>> "lid" on that so moving forward nobody puts there anything which would
>> violate it.
>>
>> I probably have not kept myself up to date with the discussion but I was
>> thinking that constraints are effectively there just on the write path.
>> Whatever is read is not a job of a constraint to refuse to return.
>>
>> On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi  wrote:
>>
>>> Abe, that's a good point. We need to call out distinct use-cases here.
>>> When a fresh cluster is set up with constraints we don't have any issues
>>> because the data written and read back is going to be compliant to the
>>> constraint(s). For existing data in a cluster where new constraints are
>>> applied or existing constraints changed in such a way that may render
>>> existing data unreadable, we need a good user experience. This is what I
>>> propose –
>>>
>>> 1. When a constraint is added or changed in such a way that existing
>>> data could be rendered unreadable, we should warn the user.
>>>
>>> 2. Give the user a choice of whether it is ok for the data to be
>>> rendered unreadable and an error is issued or a warning should be issued
>>> when the read violates the constraint but data is still readable. New data
>>> going in will meet the constraint but old data would need to be rewritten
>>> for the application to make it compliant.
>>>
>>> With this approach the application developer can decide what is right
>>> for their particular use-case. In many cases the application developer may
>>> decide to rewrite the data when they see a warning.
>>>
>>>
>>> On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:
>>>
>>>> If we're going to introduce a feature that looks like SQL constraints,
>>>> we should make sure it's "reasonably" compliant. In particular, we should
>>>> avoid situations where a user creates a constraint, writes some data, then
>>>> reads data that violates that constraint, unless they've expressed that
>>>> violations on read would be acceptable.
>>>>
>>>> For Postgres, when adding a new constraint you can specify NOT VALID to
>>>> avoid scanning all existing relevant data[1]. If we want to avoid
>>>> scan-on-DDL, this tradeoff needs to be made clear to a user.
>>>>
>>>> As we've already discussed, constraints must deal with operations that
>>>> appear within limits on the write path, but once reconciled on read or
>>>> during compaction can lead to a violation. Adding to non-frozen collections
>>>> is one example. Expecting users to understand the write path for
>>>> collections feels unrealistic to me; I wonder if we should express in the
>>>> constraint itself that it only applies during write.
>>>>
>>>> Anything that uses "nodetool import" (including cassandra-analytics)
>>>> could theoretically p

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
Abe, that's a good point. We need to call out distinct use-cases here. When
a fresh cluster is set up with constraints we don't have any issues because
the data written and read back is going to be compliant to the
constraint(s). For existing data in a cluster where new constraints are
applied or existing constraints changed in such a way that may render
existing data unreadable, we need a good user experience. This is what I
propose –

1. When a constraint is added or changed in such a way that existing data
could be rendered unreadable, we should warn the user.

2. Give the user a choice of whether it is ok for the data to be rendered
unreadable and an error is issued or a warning should be issued when the
read violates the constraint but data is still readable. New data going in
will meet the constraint but old data would need to be rewritten for
the application to make it compliant.

With this approach the application developer can decide what is right for
their particular use-case. In many cases the application developer may
decide to rewrite the data when they see a warning.


On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:

> If we're going to introduce a feature that looks like SQL constraints, we
> should make sure it's "reasonably" compliant. In particular, we should
> avoid situations where a user creates a constraint, writes some data, then
> reads data that violates that constraint, unless they've expressed that
> violations on read would be acceptable.
>
> For Postgres, when adding a new constraint you can specify NOT VALID to
> avoid scanning all existing relevant data[1]. If we want to avoid
> scan-on-DDL, this tradeoff needs to be made clear to a user.
>
> As we've already discussed, constraints must deal with operations that
> appear within limits on the write path, but once reconciled on read or
> during compaction can lead to a violation. Adding to non-frozen collections
> is one example. Expecting users to understand the write path for
> collections feels unrealistic to me; I wonder if we should express in the
> constraint itself that it only applies during write.
>
> Anything that uses "nodetool import" (including cassandra-analytics) could
> theoretically push constraint-violating mutations to a table. We could
> update import to scan table contents first, or add a flag to trust the data
> in imported SSTables and make cassandra-analytics executors aware of
> table-level constraints.
>
> Some client implementations read the system_schema tables to build their
> object mappers, I'd like to confirm that nothing will require clients to be
> aware of these new schema constructs.
>
> Overall, I'm supportive of the distinctions discussed between constraints
> and guardrails and like the direction this is heading; I'd just like to
> make sure the more detailed semantics aren't confusing or misleading for
> our users, and semantics are much harder to change in the future.
>
> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
>
>


Re: [VOTE][IP CLEARANCE] GoCQL driver

2024-06-25 Thread Dinesh Joshi
+1

Thank you Mick and everyone else involved in this effort.

On Tue, Jun 25, 2024 at 11:12 AM Jeff Jirsa  wrote:

> +1
>
> Thank you for being explicit about which authors of gocql have signed the
> ICLA
>
> > Where The Gocql Authors for copyright purposes are below. Those marked
> with
> > asterisk have agreed to donate (copyright assign) their contributions to
> the
> > Apache Software Foundation, signing CLAs when appropriate.
>
> On Jun 25, 2024, at 10:32 AM, Mick Semb Wever  wrote:
>
>   .
>
>
>> The vote will be open for 72 hours (or longer). Votes by PMC members are
>> considered binding. A vote passes if there are at least three binding
>> +1s and no -1's.
>>
>
>
> +1
>
>
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
On Tue, Jun 25, 2024 at 10:59 AM Josh McKenzie  wrote:

>
> My intuition is the vote got called a *smidge* early but that things are
> very much moving in the right direction and are very close.
>

Agreed and the vote thread got us more feedback which is valuable :)


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
+1 on Doug's suggestion. The operator sets a limit that application
developers should not be allowed to violate. This is precisely the type of
safety that we should strive for.

To Jordan's point, I also agree that the read before write type of
constraints should be avoided but if there is a very good case for it, we
can discuss it.

We should also consider adding a NOT NULL constraint on columns. This will
allow applications to model columns that are mandatory for INSERT and
UPDATEs.




On Tue, Jun 25, 2024 at 9:24 AM Ariel Weisberg  wrote:

> Hi,
>
> I am also +1 on Doug's distinction between things that can be managed by
> operators and things that can be managed by applications.
>
> Some things to note about the syntax is that there are parens around the
> condition in SQL. In your example there are multiple anonymous constraints
> on the same column, how are anonymous constraints handled? Does the
> database automatically generate a named constraint for them so they can be
> referenced later? Do we allow multiple constraints on the same column and
> AND them together?
>
> Ariel
>
>
>
> On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
>
> Hi Ariel and Jon,
>
> Let me address your question first. Yes, AND is supported in the proposal.
> Below you can find some examples of different constraints applied to the
> same column.
>
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not
> opposed to it if it is more consistent with terminology in the databases
> universe.
>
> So, to recap, there seems to be general agreement on the usefulness of the
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called,
> I see there are three different proposals for syntax:
>
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency
> with SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
>
> For the guardrails vs cql syntax, I think that keeping the conceptual
> separation that has been explored in this thread, and perfectly recapped by
> Doug, is closer to what we are trying to achieve with this framework. In my
> opinion, having them in the CQL schema definition provides those
> application level constraints that Doug mentions in an more accesible way
> than having to configure such specific guardrais.
>
> For the addition of the CHECK keyword, I'm definitely not opposed to it if
> it helps Cassandra users coming from other databases understand concepts
> that were already familiar to them.
>
> I hope this helps move the conversation forward,
> Bernardo
>
>
>
> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>
> Hi,
>
> I see a vote for this has been called. I should have provided more prompt
> feedback sooner.
>
> I am a strong +1 on adding column level constraints being a good thing to
> add. I'm not too concerned about row/partition/table level constraints, but
> I would like to change the syntax before I would be +1 on this CEP.
>
> It would be good to align the syntax as closely as possible to our
> existing syntax, and if not that then MySQL/Postgres. For example it looks
> like we don't have a string length function so maybe add `LENGTH`
> (consistent with MySQL/Postgres) to also use with column level constraints.
>
> It looks like there are generally two forms of constraint syntax, one is
> expressed as part of the column definition, and the other is a named or
> anonymous constraint on the table.
> https://www.w3schools.com/sql/sql_check.asp
>
> Can we align with having these column level ones as `CHECK` constraints
> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if
> creating a named or multi-column constraint?
>
> Will column level check constraints support `AND` so that you can specify
> multiple constraints on the column? I am not sure if that is supported in
> other databases, but it would be good to align on that as well.
>
> RE some implementation things to keep in mind:
>
> If TCM is in use and the constraints are defined in the schema data
> structure this should work fine with Accord because all coordinators
> (regular, recovery) will deterministically agree on the constraints being
> enforced BUT... this also has to map to how/when constraints are enforced.
>
> Both Accord and Paxos work best when the constraints are enforced when the
> final mutation to be applied is created 

Re: [DISCUSS] spark-cassandra-connector donation to Analytics subproject

2024-06-24 Thread Dinesh Joshi
This would be a great contribution to have for the Analytics subproject.
The current bulk functionality in the Analytics subproject complements the
spark-cassandra-connector so I see it as a good fit for donation.

On Mon, Jun 24, 2024 at 12:32 AM Mick Semb Wever  wrote:

>
> What are folks thoughts on accepting a donation of
> the spark-cassandra-connector project into the Analytics subproject ?
>
> A number of folks have requested this, stating that they cannot contribute
> to the project while it is under DataStax.  The project has largely been in
> maintenance mode the past few years.  Under ASF I believe that it will
> attract more attention and contributions, and offline discussions I have
> had indicate that the spark-cassandra-connector remains an important
> complement to the bulk analytics component.
>


Re: Cassandra PMC Chair Rotation, 2024 Edition

2024-06-20 Thread Dinesh Joshi
Thank you everybody. I hope to do my best in this role. A big thanks to
Josh who has been a great PMC Chair!

On Thu, Jun 20, 2024 at 11:40 AM Yifan Cai  wrote:

> Thank you for the service, Josh!
> Congrats, Dinesh!
>
> On Thu, Jun 20, 2024 at 11:32 AM Jean-Armel Luce 
> wrote:
>
>> Josh, thanks for the job
>> Dinesh, congrats!!
>>
>> Le jeu. 20 juin 2024 à 19:42, David Capwell  a
>> écrit :
>>
>>> Congrats!
>>>
>>> On Jun 20, 2024, at 9:10 AM, Melissa Logan 
>>> wrote:
>>>
>>> Josh, thank you for your time as chair + congrats Dinesh!
>>>
>>> On Thu, Jun 20, 2024 at 9:08 AM Abe Ratnofsky  wrote:
>>>
>>>> Congrats Dinesh! Thank you Josh!
>>>>
>>>> On Jun 20, 2024, at 11:53 AM, Jeremiah Jordan <
>>>> jeremiah.jor...@gmail.com> wrote:
>>>>
>>>> Welcome to the Chair role Dinesh!  Congrats!
>>>>
>>>> On Jun 20, 2024 at 10:50:37 AM, Josh McKenzie 
>>>> wrote:
>>>>
>>>>> Another PMC Chair baton pass incoming! On behalf of the Apache
>>>>> Cassandra Project Management Committee (PMC) I would like to welcome and
>>>>> congratulate our next PMC Chair Dinesh Joshi (djoshi).
>>>>>
>>>>> Dinesh has been a member of the PMC for a few years now and many of
>>>>> you likely know him from his thoughtful, measured presence on many of our
>>>>> collective discussions as we've grown and evolved over the past few years.
>>>>>
>>>>> I appreciate the project trusting me as liaison with the board over
>>>>> the past year and look forward to supporting Dinesh in the role in the
>>>>> future.
>>>>>
>>>>> Repeating Mick (repeating Paulo's) words from last year: The chair is
>>>>> an administrative position that interfaces with the Apache Software
>>>>> Foundation Board, by submitting regular reports about project status and
>>>>> health. Read more about the PMC chair role on Apache projects:
>>>>> - https://www.apache.org/foundation/how-it-works.html#pmc
>>>>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>>>>> -
>>>>> https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>>>>
>>>>> The PMC as a whole is the entity that oversees and leads the project
>>>>> and any PMC member can be approached as a representative of the committee.
>>>>> A list of Apache Cassandra PMC members can be found on:
>>>>> https://cassandra.apache.org/_/community.html
>>>>>
>>>>
>>>>
>>>


Re: [VOTE] CEP-24 Password validation / generation

2024-06-17 Thread Dinesh Joshi
+1.

I have some minor feedback on how the configuration of different character
classes works but that can be handled during the patch review.

On Mon, Jun 17, 2024 at 2:32 AM Štefan Miklošovič 
wrote:

> Hi everyone,
>
> I would like to start the voting for CEP-24 as all feedback in the
> discussion threads seem to be addressed.
>
> Proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=228494146
> JIRA under which it will be delivered:
> https://issues.apache.org/jira/browse/CASSANDRA-17457
> Draft implementation:
> https://github.com/instaclustr/cassandra/tree/CEP-24-simplified
>
> Discuss threads:
>
> https://lists.apache.org/thread/1hs27lx2pw9lmp7rw499vn0m7vl2bgt1
> https://lists.apache.org/thread/1hs27lx2pw9lmp7rw499vn0m7vl2bgt1
>
> The reason there are two threads is that I replied to the first one after
> that CEP was dormant for a very long time and it just created new thread
> for that, most probably an issue with my e-mail client ...
>
> The vote will be open for 72 hours (longer if needed). A vote passes if
> there are at least 3 binding +1s and no binding vetoes.
>
> Thanks,
>
> Stefan Miklosovic
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Dinesh Joshi
On Thu, Jun 6, 2024 at 1:50 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> I will update the CEP being specific with the two specific Constraint
> types I will be adding, which are size and value (the ones shown in the
> example).
>

Could you identify constraints for the most common data types? It would be
nice to ship a good set of default constraints. For example, it would be
nice to constrain numeric & date data types within a range, text could
comply with a pattern, etc.

One question that I'm not sure if it came up, is whether a column could
have multiple constraints?

Dinesh


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Dinesh Joshi
On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> It is interesting to see this feedback. When I look at CEP-24 where I am
> obsessing about a user being able to misconfigure the password validation
> strength so if a user hits a "weak" node then she would be able to bypass
> it, and I see what is our approach here, then I am not sure what I was
> waiting so long for and I should probably be just more aggressive with the
> CEP and all the "caveats" could be just overlooked and deferred to
> "sometimes later".
>

Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread.
Had I paid attention I would have suggested waiting on TCM doesn't make
the feature any different. The feature is less likely to be misconfigured
in a cluster. CEP-24 is valuable and password compliance with policies is a
super useful feature which IMO shouldn't have been held back due to lack of
TCM.


Re: [VOTE] Release Apache Cassandra Java Driver 4.18.1 (2nd attempt)

2024-05-21 Thread Dinesh Joshi
+1

verified hashes and signatures.

On Tue, May 21, 2024 at 4:00 PM Bret McGuire  wrote:

>Greetings all!  We're going to give this another go.
>
>Apologies for the confusion that sprang out of our last attempt.  It
> appears that the Nexus staging repository for the 4.18.1 release was
> accidentally released shortly after it was created.  As a result Maven
> artifacts for this release are already out in the wild
> ,
> so this vote will be a little unusual.  We'll be voting normally, but if
> the vote is successful we'll simply leave the Maven artifacts exactly where
> they are.  If the vote is unsuccessful these artifacts will be removed from
> Maven Central and we'll try again with 4.18.2.
>
>Big thanks to mck and driftx for their help in untangling these issues.
>
>With all of that said let's get to it.  I’m proposing the test build
> of Cassandra Java Driver 4.18.1 for release.
>
> sha1: cbdde2878786fa6c4077a21352cbe738875f2106
>
> Git:  
> https://github.com/apache/cassandra-java-driver/tree/4.18.1
>
> Maven Artifacts:
> As
> discussed above
>
>
>The Source release and Binary convenience artifacts are available here:
>
>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/4.18.1/
>
>
>The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
>One additional note: this is my first time doing a release of the Java
> driver so if you're so inclined you can find my PGP key information in the
> KEYS  file.
>
>
>Thanks all!
>
>
>- Bret -
>


Re: [VOTE] Release Apache Cassandra 4.1.5

2024-05-17 Thread Dinesh Joshi
+1

On Fri, May 17, 2024 at 10:53 AM Brandon Williams <
brandonwilli...@apache.org> wrote:

> Friendly reminder that this vote is still open and lacks one binding
> vote to pass.
>
> On Thu, May 2, 2024 at 11:36 AM Brandon Williams
>  wrote:
> >
> > Proposing the test build of Cassandra 4.1.5 for release.
> >
> > sha1: 6b134265620d6b39f9771d92edd29abdfd27de6a
> > Git: https://github.com/apache/cassandra/tree/4.1.5-tentative
> > Maven Artifacts:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1329/org/apache/cassandra/cassandra-all/4.1.5/
> >
> > The Source and Build Artifacts, and the Debian and RPM packages and
> > repositories, are available here:
> > https://dist.apache.org/repos/dist/dev/cassandra/4.1.5/
> >
> > The vote will be open for 72 hours (longer if needed). Everyone who
> > has tested the build is invited to vote. Votes by PMC members are
> > considered binding. A vote passes if there are at least three binding
> > +1s and no -1's.
> >
> > [1]: CHANGES.txt:
> > https://github.com/apache/cassandra/blob/4.1.5-tentative/CHANGES.txt
> > [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/4.1.5-tentative/NEWS.txt
>


Re: Is there appetite to maintain the gocql driver (in the drivers subproject) ?

2024-05-15 Thread Dinesh Joshi
On Wed, May 15, 2024 at 12:09 AM Mick Semb Wever  wrote:

> Yes Dinesh.   João Reis managed to get hold of both Chris and Martin.
>>> Responses have been slow, but everyone is on board.  This is not to be
>>> considered a hostile fork, despite in all likelihood not being able to do a
>>> full IP donation.
>>>
>>
Great! I have no concerns at this point.


Re: Is there appetite to maintain the gocql driver (in the drivers subproject) ?

2024-05-14 Thread Dinesh Joshi
On Tue, May 14, 2024 at 10:05 AM Mick Semb Wever  wrote:

>
> Ok, so we're got confidence now on how to approach this, confirmation from
> the project's maintainers supporting it, and interest from a handful of
> people interested in maintaining and contributing to the project.
>

Did you talk to the current maintainers off list or did I miss some thread
where the maintainers indicated their support in maintaining this project?


Fwd: Save the Date: Apache At Visa Summit on May 16, 2024 ( Registration Open.)

2024-05-10 Thread Dinesh Joshi
FYI, if anybody is interested in this event today is the registration
deadline.

-- Forwarded message -
From: Battula, Brahma Reddy 
Date: Mon, Apr 29, 2024 at 10:27 AM
Subject: Re: Save the Date: Apache At Visa Summit on May 16, 2024 (
Registration Open.)
To: d...@community.apache.org 
CC: Brahma Reddy Battula 


Hello again ASF Community,



We're happy to announce that registration is now open for *Apache at Visa
Summit*!

The event is free for all ASF Committers to attend. we look forward to
seeing you there.

*Sign up today at* *https://cvent.me/Ymn97E *


Best,

*Brahma Reddy Battula*

*Principal Data Engineer, Visa; VP (PMC chair) Apache Ambari; PMC member:
Ambari, Hadoop*







*From: *Battula, Brahma Reddy 
*Date: *Tuesday, 23 April 2024 at 10:25 PM
*To: *d...@community.apache.org 
*Cc: *Brahma Reddy Battula 
*Subject: *Save the Date: Apache At Visa Summit on May 16, 2024



Dear ASF community,



We're thrilled to invite you to the *Apache At Visa Summit*, one day event
on *Thursday, May 16, 2024*, hosted at the Visa Headquarters in Foster
City, California.  This is a unique opportunity to explore the fascinating
journey of open source, learn insights from the ASF community
representatives (PMC members, project VPs, ASF Board members, and more) and
network with fellow community members.



Starting at 9:00 AM, the day will kick off with warm welcomes from key Visa
executives - Sam Hamilton, and Gary Slater, followed by key note from Rajat
Taneja.



*Craig Russell* will take us through the 25-year journey of ASF,
highlighting how the ASF’s longevity lies in the individuals behind its
operations, projects, and initiatives. *Ellen Friedman*, an independent
technologist and co-author of several books on machine learning and
analytics, will demonstrate how individuals outside of the traditional
coder profile can successfully contribute to projects, build their
communities, and attract and inspire future developers.



*Ted Dunning* will share how ASF, as the home of “Big Data”, is poised to
incubate the newest breakthroughs in AI technologies with the assurance
that development of future mission-critical innovations are fully open, The
Apache Way. *Julian Hyde*, the original creator of Apache Calcite and a
senior staff engineer at Google, will discuss how Open Source, standards,
academia, and industry are advancing databases and analytics.



*Jun Rao*, a co-founder of Confluent and original co-creator of Apache
Kafka, will present on large language models and real-time GenAI data
streaming architectures using Apache Kafka and Flink.  *Phil Steitz*, the
former Global CTO at American Express and former ASF Chairman, will explain
how financial services institutions can leverage the “full estate” to gain
competitive advantage, meet critical code requirements, and benefit the
community at-large.



We can't wait to share this exciting day with you. Mark your calendar and
prepare for a day of learning, sharing, and contributing to the open source
community! The invitation to register for the event will be forthcoming
soon.



Best,

*Brahma Reddy Battula*

*Principal Data Engineer, Visa; VP (PMC chair) Apache Ambari; PMC member:
Ambari, Hadoop*


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-29 Thread Dinesh Joshi
On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't
>

We should have a digest / checksum for each of the file components computed
and stored on disk so this doesn't need to be recomputed each time. Most
files / components are immutable and therefore their checksum won't change.
There are some components which may be mutated and therefore their checksum
may need to be recomputed. However, data integrity is not something we can
compromise on. On the receiving node, CPU utilization is not a big issue as
that node isn't servicing traffic.

I was too lazy to dig into the code and someone who is more familiar with
the SSTable components / file format can help shed light on checksums.


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-25 Thread Dinesh Joshi
I am not familiar with ECS but if we’re going to go for it I would prefer
it to be a sub project really. Jon, what do you think?

On Thu, Apr 25, 2024 at 2:44 PM Brandon Williams  wrote:

> I want to begin by saying I am generally +1 on this because I have
> become a fan of easy-cass-stress after using it, but I am curious if
> this is intended to be a subproject, or replace cassandra-stress?  If
> the latter, we are going to have to reconcile the build systems
> somehow.  I don't really want to drag ECS back to ant, but I also
> don't want two different build systems in-tree.
>
> Kind Regards,
> Brandon
>
> On Thu, Apr 25, 2024 at 9:38 AM Jon Haddad  wrote:
> >
> > I've been asked by quite a few people, both in person and in JIRA [1]
> about contributing easy-cass-stress [2] to the project.  I've been happy to
> maintain the project myself over the years but given its widespread use I
> think it makes sense to make it more widely available and under the
> project's umbrella.
> >
> > My goal with the project was always to provide something that's easy to
> use.  Up and running in a couple minutes, using the parameters to shape the
> workload rather than defining everything through configuration.  I was
> happy to make this tradeoff since Cassandra doesn't have very many types of
> queries and it's worked well for me over the years.
> >
> > Obviously I would continue working on this project, and I hope this
> would encourage others to contribute.  I've heard a lot of good ideas that
> other teams have implemented in their folks.  I'd love to see those ideas
> make it into the project, and it sounds like it would be a lot easier for
> teams to get approval to contribute if it was under the project umbrella.
> >
> > Would love to hear your thoughts.
> >
> > Thanks,
> > Jon
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-18661
> > [2] https://github.com/rustyrazorblade/easy-cass-stress
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Dinesh Joshi
On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:

>
> If there is a faster/better way to replace a node why not  have Cassandra
> support that natively without the sidecar so people who aren’t running the
> sidecar can benefit?
>

I am not the author of the CEP so take whatever I say with a pinch of salt.
Scott and Jordan have pointed out some benefits of doing this in the
Sidecar vs Cassandra.

Today Cassandra is able to do fast node replacements. However, this CEP is
addressing an important corner case when Cassandra is unable to start up
due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
on old hardware? Sure. However, you would still need operator intervention
to start it up in some special mode both on the old and new node so the new
node can peer with the old node, copy over its data and join the ring. This
would still require some orchestration outside the database. The Sidecar
can do that orchestration for the operator. The point I'm making here is
that the CEP addresses a real issue. The way it is currently built can
improve over time with improvements in Cassandra.

Dinesh


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Dinesh Joshi
On Fri, Apr 19, 2024 at 3:12 PM Jon Haddad  wrote:

> I haven't looked at streaming over TLS, so I might be way off base here,
> but our own docs (
> https://cassandra.apache.org/doc/latest/cassandra/architecture/streaming.html)
> say ZCS is not available when using encryption, and if we have to bring the
> data into the JVM then I'm not sure how it would even work.  sendfile is a
> direct file descriptor to file descriptor copy.  How are we simultaneously
> doing kernel-only operations while also performing encryption in the JVM?
>

Yes, the 'zero copy' aspect of streaming is not available when we stream
over TLS as we're required to bring in those bytes into the JVM to encrypt.
However, we still get the benefit of copying entire files and skipping the
non-trivial ser/deser & GC overhead associated with streaming individual
partitions. Cassandra will handle this transparently[1] depending on
whether you enable TLS or not.

Dinesh

[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/AsyncStreamingOutputPlus.java#L159


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Dinesh Joshi
On Mon, Apr 8, 2024 at 10:23 AM Jon Haddad  wrote:

> This seems like a lot of work to create an rsync alternative.  I can't
> really say I see the point.  I noticed your "rejected alternatives"
> mentions it with this note:
>

I want to point out a few things before dismissing it as an 'rsync
alternative' -

1. rsync is dangerous for many reasons. Top reason is security. rsync
executed over ssh offers a much broader access than is necessary for this
use-case. Operators also have to maintain multiple sets of credentials for
AuthN/AuthZ - ssh being just one of them. Finally, ssh isn't simply allowed
in some environments.

2. rsync is an incomplete solution. You still need to wrap rsync in a
script that will ensure that it does the right thing for each version of
Cassandra, accounts for failures, retries, etc.

The way I see it is if this solves a problem and adds value for even a
subset of our users it would be valuable to accept it.

Dinesh


Re: Is there appetite to maintain the gocql driver (in the drivers subproject) ?

2024-04-08 Thread Dinesh Joshi
If we take this on - are there any active contributors that can be raised
as committers to maintain this project?

On Wed, Apr 3, 2024 at 2:36 PM Nate McCall  wrote:

> We've talked through this before. Benjamin sussed out the main issue,
> IIRC.
> tl,dr:
> - The AUTHORS lists everyone who ever made a commit (
> https://github.com/gocql/gocql/blob/master/AUTHORS)
> - The license is BSD-3 and explicitly says the copyright is owned by the
> authors (https://github.com/gocql/gocql/blob/master/LICENSE#L1)
> - We had a previous discussion about 6 years ago:
> https://www.mail-archive.com/dev@cassandra.apache.org/msg13008.html
>
> We can open an issue with LEGAL to see what they say at least?
>
> -N
>
> On Tue, Feb 6, 2024 at 10:25 AM Mick Semb Wever  wrote:
>
>>
>> The current sole maintainer of the gocql driver has stated the project is
>> essentially in attic mode and is asking for new maintainers.
>>
>> https://groups.google.com/g/gocql/c/v0FruczBb2w
>>
>> No one has suggested the repo be donated to the ASF yet, but before
>> anyone should raise any such suggestion we should check if we have folk in
>> the project that would be willing to help out with such a donation.
>>
>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-04-08 Thread Dinesh Joshi
hi folks - sorry to have dropped the ball on responding to this thread.

My 2 cents are as follows -

1. Having a separate JIRA project for each sub-project will add management
overhead. This option, however, allows us to model unique workflows for the
sub-project.

2. Managing the sub-project as part of the Cassandra JIRA project would
imply less management overhead but the sub-project would need to conform to
the same workflows.

I would pick option 1 unless there is a strong reason and desire to manage
a separate Jira project. We can always split out the Java Driver project if
things don't work out. OTOH merging a Jira project is harder.

Thanks,

Dinesh

On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:

> CEP-8 proposes using separate Jira projects per Cassandra sub-project:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>
> > We suggest distinct Jira projects, one per driver, all to be created.
>
> I don't see any discussion changing that from the [DISCUSS] or vote
> threads:
> https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
> https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
> https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
>
> But looks like upon acceptance that was changed:
> https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
>
> > New issues will be tracked under the CASSANDRA project on Apache’s JIRA <
> https://issues.apache.org/jira/projects/CASSANDRA> under the component
> ‘Client/java-driver’.
>
> I'm in favor of using the same Jira as Cassandra proper. Committership is
> project-wide, so having a standardized process (same ticket flow, review
> rules, labels, etc. is beneficial). But multiple votes happened based on
> the content of the CEP, so we should stick to what was voted on and move to
> a separate Jira.
>
> --
> Abe
>


Re: [Discuss] Introducing Flexible Authentication in Cassandra via Feature Flag

2024-02-12 Thread Dinesh Joshi
Hi Gaurav,

Thank you for the document. I read through it and wasn't entirely clear
about the problem you're trying to solve.

If you're talking about enabling authentication for the very first time on
a cluster which does not have any authentication then there are different
ways of handling it.

> Minimized downtime: Seamless authentication rollout without service
interruptions.

I am not sure why would the app or the cluster take any downtime as rolling
restarts in Cassandra are safe and do not cause downtime. If Cassandra
doesn't require authentication, specifying credentials in the driver should
not cause issues. So users are safe to transition from an unauthenticated
to an authenticated cluster.

> Enhanced security: Detailed logs for improved threat detection and
troubleshooting.

Allowing client connections to proceed with invalid credentials poses a
serious security risk. Cassandra 4.0+ has Audit Logging which IIRC does log
invalid login attempts (if not, it would be trivial to add them).

CASSANDRA-11471 allows for improved protocol negotiation which would allow
you to specify a list of acceptable authentication mechanisms but I don't
think this is being worked on at the moment.

That said, right now you can certainly implement your own Authenticator
which implements your custom logic without any changes to Cassandra itself.

Dinesh


On Mon, Feb 12, 2024 at 11:45 AM Gaurav Agarwal 
wrote:

> Dear Cassandra Community,
>
> I'm excited to share a proposal for a new feature that I believe would
> significantly enhance the platform's security and operational flexibility: *a
> flexible authentication mechanism implemented through a feature flag *.
>
> Currently, enforcing authentication in Cassandra requires a disruptive,
> full-cluster restart, posing significant risks in live environments. My
> proposal, the *auth_enforcement_flag*, addresses this challenge by
> offering three modes:
>
> *Hard:* Enforces strict authentication with detailed logging.
> *Soft:* Monitors connection attempts (valid and invalid) without
> enforcing authentication.
> *None:* Maintains the current Cassandra behavior.
>
> This flag enables:
> *Minimized downtime: *Seamless authentication rollout without service
> interruptions.
> *Enhanced security:* Detailed logs for improved threat detection and
> troubleshooting.
> *Gradual adoption:* Phased implementation with real-world feedback
> integration.
>
> I believe this feature provides substantial benefits for both users and
> administrators. Please see the detailed proposal here: Introducing
> flexible authentication mechanism
> 
>
> I warmly invite the community to review this proposal and share your
> valuable feedback. I'm eager to discuss its potential impact and
> collaborate on making Cassandra even better.
>
> Thank you for your time and consideration.
>
> Sincerely,
> Gaurav Agarwal
> Software Engineer at Uber
>


Re: CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-23 Thread Dinesh Joshi
Hi Shylaja,

If you'd like we can continue this on the ticket you opened. Here are my
concerns -

1. QPL Java Library[1] (JNI bindings to Intel's QPL) does not have any
license information on the repo. This needs to be corrected. Please see the
types of licenses we can use[2] for further information.

2. Can you describe how the compressor will behave when the cluster is made
up of heterogeneous hardware? For example, let's say we have a mix of
machines where some support Intel's IAA and some don't?

3. Does QPL have checksumming built in?

thanks,

Dinesh

[1] https://github.com/intel/qpl-java
[2] https://www.apache.org/legal/resolved.html#category-a

On Mon, Jan 22, 2024 at 6:37 PM Kokoori, Shylaja 
wrote:

> Dinesh & Abe,
>
> Thank you very much for your feedback.
>
>
>
> The algorithm used by this HW compressor is compatible with Deflate but
> there is a constraint of 4K window size. Therefore the concern is that
> existing data may not decompress correctly as is. That is why we chose the
> path of adding a new compressor.
>
> Another reason is that, there are some additional features available in
> the hardware which are not compatible with zlib. With this approach we
> could enable those features as well.
>
>
>
> We are also planning to accelerate existing compressors, if that is the
> preferred approach we will try to come up with a solution to work around
> the 4k window limitation.
>
>
>
> Thank you,
>
> Shylaja
>
>
>
> *From:* Dinesh Joshi 
> *Sent:* Monday, January 22, 2024 11:18 AM
> *To:* dev@cassandra.apache.org
> *Subject:* Re: CASSANDRA-19268: Improve Cassandra compression performance
> using hardware accelerators
>
>
>
> Shylaja,
>
>
>
> Cassandra uses ZStd, LZ4 and other compression libraries via JNI to
> compress data. The intel hardware accelerator support is integrated into
> those libraries and we can benefit from it. If there are special parameters
> that need to be passed in to these libraries we can make those changes on
> the database but as such Cassandra does not directly implement the
> compression algorithms itself.
>
>
>
> Dinesh
>


Re: CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-22 Thread Dinesh Joshi
Shylaja,

Cassandra uses ZStd, LZ4 and other compression libraries via JNI to
compress data. The intel hardware accelerator support is integrated into
those libraries and we can benefit from it. If there are special parameters
that need to be passed in to these libraries we can make those changes on
the database but as such Cassandra does not directly implement the
compression algorithms itself.

Dinesh


Re: Custom FSError and CommitLog Error Handling

2023-12-17 Thread Dinesh Joshi
> On Dec 11, 2023, at 11:27 AM, Raymond Huffman  
> wrote:
> 
> On our fork of Cassandra, we've implemented some custom behavior for handling 
> CommitLog and SSTable Corruption errors. Specifically, if a node detects one 
> of those errors, we want the node to stop itself, and if the node is 
> restarted, we want initialization to fail. This is

This is the correct behavior if you can reliably detect disk / memory failure 
which is usually the cause of corruption.

> FSErrorHandler, and the error handler that's currently implemented at 
> org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in 
> the same way one can provide custom Partitioners and 
> Authenticators/Authorizers.

How would you implement this custom FSErrorHandler? Would it significantly vary 
between operators of Cassandra? If improperly implemented it may lead to 
serious outages.

> Would you take as a contribution one of the following?
> 1. user provided implementations of FSErrorHandler and CommitLogErrorHandler, 
> set via config; and/or
> 2. new commit failure and disk failure policies that write a poison pill file 
> to disk and fail on startup if that file exists

Maybe this can be added as feature to Cassandra without a need to customize / 
making it pluggable. It appears to be useful as described. If you have a branch 
with the proposed behavior it might make  it easier to clarify any questions.

Dinesh

Re: [ATTENTION] Forced push on cassandra-5.0 branch !!!

2023-12-16 Thread Dinesh Joshi
thanks for the heads up. Is there anything we could do to avoid bad merges in 
the future?

Dinesh

> On Dec 16, 2023, at 3:26 PM, Mick Semb Wever  wrote:
> 
> 
> The cassandra-5.0 branch accidentally got 229 trunk merge commits brought 
> into it.
> 
> This has been fixed now, but required a forced push.  I've gone ahead and 
> done this quickly for the sake of avoiding most folk from seeing it.
> 
> The fix was
> 
> git switch cassandra-5.0
> git reset --hard 2fc2be5
> git push --force origin cassandra-5.0
> 
> 



Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 5:35 PM, Paulo Motta  wrote:
> 
> This could be a potential hook for out-of-process caching.
> 
> Would something like this be valuable/feasible?

It is certainly feasible. I am not sure about its value.

Dinesh

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
I would avoid taking away a feature even if it works in narrow set of 
use-cases. I would instead suggest -

1. Leave it disabled by default.
2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
off. Cassandra should ideally detect this and do it automatically.
3. Move to Caffeine instead of OHC.

I would suggest having this as the middle ground.

> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
> 
>   
>   
>> 
>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>> later release
> 
> 
> 
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
> 
> Yes it works in some very narrow situations, but those situations often 
> change over time and again just bites the user.  Without the row-cache I 
> believe users would quickly find other, more suitable and lasting, solutions.



Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg  wrote:
> 
> 1. Fork OHC and start publishing under a new package name and continue to use 
> it

Who would fork it? Where would you fork it? My first instinct is that this 
would not be viable path forward.

> 2. Replace OHC with a different cache implementation like Caffeine which 
> would move it on heap

Doesn’t seem optimal but given the advent of newer garbage collectors, we might 
be able to run Cassandra with larger heap sizes and moving this to heap may be 
a non-issue. Someone needs to try it out and measure  the performance impact 
with Zgc or Shenandoah.

> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
> later release

In my experience, Row cache has historically helped in narrow workloads where 
you have really hot rows but in other workloads it can hurt performance. So 
keeping it around may be fine as long as people can disable it.

Moving it on-heap using Caffeine maybe the easiest option here.


Dinesh

Re: [VOTE] Release Apache Cassandra Java Driver 4.18.0

2023-12-12 Thread Dinesh Joshi
+1On Dec 8, 2023, at 11:43 PM, Mick Semb Wever  wrote:Proposing the test build of Cassandra Java Driver 4.18.0 for release.sha1: 105d378fce16804a8af4c26cf732340a0c63b3c9Git: https://github.com/apache/cassandra-java-driver/tree/4.18.0Maven Artifacts:https://repository.apache.org/content/repositories/orgapachecassandra-1322 The Source release and Binary convenience artifacts are available here:https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/4.18.0/This is the first release post-donation of the Java Driver.  The maven coordinates have changed from com.datastax.oss to org.apache.cassandra, while all package names remain the same.  There is still work to be done on a number of fronts, e.g. being vendor-neutrality, covered under CASSANDRA-18611.The vote will be open for 72 hours (longer if needed). Everyone who has tested the build is invited to vote. Votes by PMC members are considered binding. A vote passes if there are at least three binding +1s and no -1's.


Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-28 Thread Dinesh Joshi
The PMC members are pleased to announce that Francisco Guerrero Hernandez has 
accepted
the invitation to become committer today.

Congratulations and welcome!

The Apache Cassandra PMC members

Re: [DISCUSSION] CEP-38: CQL Management API

2023-11-17 Thread Dinesh Joshi
Hi Maxim,

Thanks for putting this CEP together! This is a great start. I have gone over 
the CEP and there is one thing that stuck out to me.

Among the 'basic requirements', I see you have this -

> A dedicated admin port with the native protocol behind it, 
> allowing only admin commands, to address the concerns when
> the native protocol is disabled in certain circumstances 
> e.g. the disablebinary command is executed;

I understand what you're achieve here. However, there are a few reasons we 
should probably offer some choice to our users w.r.t. using a dedicated port 
for management functions.

Today Cassandra exposes several ports - 9042, 9142, 7000 and 7001. The sidecar 
runs on port 9043. Thats a lot of ports. I would prefer to allow users to 
access management functionality over one of the existing ports.

I realize that this would mean a subtle change in behavior for disablebinary 
when we offer it over port 9042 and not when the operator decides to use a 
dedicated port.

More importantly, I think having this functionality exposed over the storage 
ports may be even better. The storage ports are typically firewalled off from 
the end users. Operators and tooling, however, usually have access to these 
ports. This especially makes sense from a security standpoint where we'd like 
to limit users from accessing management functionality.

What do others think about this approach?

thanks,

Dinesh

> On Nov 13, 2023, at 10:08 AM, Maxim Muzafarov  wrote:
> 
> Hello everyone,
> 
> While we are still waiting for the review to make the settings virtual
> table updatable (CASSANDRA-15254), which will improve the
> configuration management experience for users, I'd like to take
> another step forward and improve the C* management approach we have as
> a whole. This approach aims to make all Cassandra management commands
> accessible via CQL, but not only that.
> 
> The problem of making commands accessible via CQL presents a complex
> challenge, especially if we aim to minimize code duplication across
> the implementation of management operations for different APIs and
> reduce the overall maintenance burden. The proposal's scope goes
> beyond simply introducing a new CQL syntax. It encompasses several key
> objectives for C* management operations, beyond their availability
> through CQL:
> - Ensure consistency across all public APIs we support, including JMX
> MBeans and the newly introduced CQL. Users should see consistent
> command specifications and arguments, irrespective of whether they're
> using an API or a CLI;
> - Reduce source code maintenance costs. With this new approach, when a
> new command is implemented, it should automatically become available
> across JMX MBeans, nodetool, CQL, and Cassandra Sidecar, eliminating
> the need for additional coding;
> - Maintain backward compatibility, ensuring that existing setups and
> workflows continue to work the same way as they do today;
> 
> I would suggest discussing the overall design concept first, and then
> diving into the CQL command syntax and other details once we've found
> common ground on the community's vision. However, regardless of these
> details, I would appreciate any feedback on the design.
> 
> I look forward to your comments!
> 
> Please, see the design document: CEP-38: CQL Management API
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API



Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Dinesh Joshi
I have a strong preference to move out the 5.0 date to have accord and TCM. I 
don’t see the point in shipping 5.0 without these features especially if 5.1 is 
going to follow close behind it.

Dinesh

> On Oct 23, 2023, at 4:52 AM, Mick Semb Wever  wrote:
> 
> 
> 
> The TCM work (CEP-21) is in its review stage but being well past our cut-off 
> date¹ for merging, and now jeopardising 5.0 GA efforts, I would like to 
> propose the following.
> 
> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut an 
> immediate 5.1-alpha1 release.
> 
> I see this as a win-win scenario for us, considering our current situation.  
> (Though it is unfortunate that Accord is included in this scenario because we 
> agreed it to be based upon TCM.)
> 
> This will mean…
>  - We get to focus on getting 5.0 to beta and GA, which already has a ton of 
> features users want.
>  - We get an alpha release with TCM and Accord into users hands quickly for 
> broader testing and feedback.
>  - We isolate GA efforts on TCM and Accord – giving oss and downstream 
> engineers time and patience reviewing and testing.  TCM will be the biggest 
> patch ever to land in C*.
>  - Give users a choice for a more incremental upgrade approach, given just 
> how many new features we're putting on them in one year.
>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all 4.x 
> versions, just as if it had landed in 5.0.
> 
> 
> The risks/costs this introduces are
>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch, and 
> at some point decide to undo this work, while we can throw away the 
> cassandra-5.1 branch we would need to do a bit of work reverting the changes 
> in trunk.  This is a _very_ edge case, as confidence levels on the design and 
> implementation of both are already tested and high.
>  - We will have to maintain an additional branch.  I propose that we treat 
> the 5.1 branch in the same maintenance window as 5.0 (like we have with 3.0 
> and 3.11).  This also adds the merge path overhead.
>  - Reviewing of TCM and Accord will continue to happen post-merge.  This is 
> not our normal practice, but this work will have already received its two +1s 
> from committers, and such ongoing review effort is akin to GA stabilisation 
> work on release branches.
> 
> 
> I see no other ok solution in front of us that gets us at least both the 5.0 
> beta and TCM+Accord alpha releases this year.  Keeping in mind users demand 
> to start experimenting with these features, and our Cassandra Summit in 
> December.
> 
> 
> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
> 
> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Dinesh Joshi
I haven't looked at the patch yet so take whatever I say here with a pinch of 
salt.

Philosophically, defaults should not change unless there is a clear 
demonstrable benefit in majority cases for our users. In this case DirectIO 
should have clear benefits. That said, this is a new feature and I would 
personally default it to off. We should document it and allow for our users to 
enable it. This derisks the project in case there is an inadvertent change in 
behavior.

Dinesh

> On Oct 15, 2023, at 11:34 PM, Pawar, Amit  wrote:
> 
> [Public]
> 
> 
> Hi,
>  
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
> this by default could be useful feature to address IO bottleneck seen during 
> peak load.
>  
> Need your input regarding changing this default. Please suggest.
>  
> https://issues.apache.org/jira/browse/CASSANDRA-18464
>  
> thanks,
> Amit Pawar
>  
> [1] - https://github.com/apache/cassandra/pull/2777



Re: [VOTE] Accept java-driver

2023-10-03 Thread Dinesh Joshi
+1

This is great for the project. Thank you for all the hard work everyone put 
into this! It has been a long journey to get to this point.

Dinesh

> On Oct 2, 2023, at 9:53 PM, Mick Semb Wever  wrote:
> 
> 
> The donation of the java-driver is ready for its IP Clearance vote.
> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
> 
> The SGA has been sent to the ASF.  This does not require acknowledgement 
> before the vote.
> 
> Once the vote passes, and the SGA has been filed by the ASF Secretary, we 
> will request ASF Infra to move the datastax/java-driver as-is to 
> apache/java-driver
> 
> This means all branches and tags, with all their history, will be kept.  A 
> cleaning effort has already cleaned up anything deemed not needed.
> 
> Background for the donation is found in CEP-8: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> 
> PMC members, please take note of (and check) the IP Clearance requirements 
> when voting.
> 
> The vote will be open for 72 hours (or longer). Votes by PMC members are 
> considered binding. A vote passes if there are at least three binding +1s and 
> no -1's.
> 
> regards,
> Mick


Re: [VOTE] Release dtest-api 0.0.16

2023-08-19 Thread Dinesh Joshi
With 3 binding +1s and no -1s, the vote passes. Thank you everybody.

> On Aug 19, 2023, at 9:50 AM, Blake Eggleston  wrote:
> 
> +1
> 
>> On Aug 17, 2023, at 12:37 AM, Alex Petrov  wrote:
>> 
>> 
>> +1
>> 
>> On Thu, Aug 17, 2023, at 4:46 AM, Brandon Williams wrote:
>>> +1
>>> 
>>> Kind Regards,
>>> Brandon
>>> 
>>> On Wed, Aug 16, 2023 at 4:34 PM Dinesh Joshi >> <mailto:djo...@apache.org>> wrote:
>>> >
>>> > Proposing the test build of in-jvm dtest API 0.0.16 for release.
>>> >
>>> > Repository:
>>> > https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>>> >
>>> > Candidate SHA:
>>> > https://github.com/apache/cassandra-in-jvm-dtest-api/commit/1ba6ef93d0721741b5f6d6d72cba3da03fe78438
>>> > tagged with 0.0.16
>>> >
>>> > Artifacts:
>>> > https://repository.apache.org/content/repositories/orgapachecassandra-1307/org/apache/cassandra/dtest-api/0.0.16/
>>> >
>>> > Key signature: 53371F9B1B425A336988B6A03B6042413D323470
>>> >
>>> > Changes since last release:
>>> >
>>> > * CASSANDRA-18727 - JMXUtil.getJmxConnector should retry connection 
>>> > attempts
>>> >
>>> > The vote will be open for 24 hours. Everyone who has tested the build
>>> > is invited to vote. Votes by PMC members are considered binding. A
>>> > vote passes if there are at least three binding +1s.
>>> >
>>> 



[VOTE] Release dtest-api 0.0.16

2023-08-16 Thread Dinesh Joshi
Proposing the test build of in-jvm dtest API 0.0.16 for release.

Repository:
https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git

Candidate SHA:
https://github.com/apache/cassandra-in-jvm-dtest-api/commit/1ba6ef93d0721741b5f6d6d72cba3da03fe78438
tagged with 0.0.16

Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1307/org/apache/cassandra/dtest-api/0.0.16/

Key signature: 53371F9B1B425A336988B6A03B6042413D323470

Changes since last release:

* CASSANDRA-18727 - JMXUtil.getJmxConnector should retry connection attempts

The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.



Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-09 Thread Dinesh Joshi
Brad,

Thanks for starting this discussion. My understanding is that we're
simply adding pip support for cqlsh and Apache Cassandra project will
officially publish a cqlsh pip package. This is a good goal but other
than having an official pip package, what is it that we're gaining?
Please don't interpret this as push back on your proposal but I am
unclear on what we're trying to solve by making this official
distribution. There are several distribution channels and it is
untenable to officially support all of them.

If we do adopt this, there will be non-zero overhead of the release
process. This is fine but we need volunteers to run this process. My
understanding is that they need to be ideally PMC or at least Committers
on the project to go through all the steps to successfully release a new
artifact for our users.

I would have liked this CEP to go a bit further than just packaging
cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
doesn't need to live in the cassandra repo. Extracting cqlsh into it's
separate repo would allow us to truly decouple cqlsh from the server.
This is already true for the most part as we rely on the Python driver
which is compatible with several cassandra releases. As it stands today
it is not possible for us to update cqlsh without making a Cassandra
release.

If you truly want to go a bit further, we should consider rewriting
cqlsh in Java so we can easily share code from the server. We can then
potentially use Java Native Image[1] to produce a truly platform
independent binary like golang. Python has its strengths but it does get
hairy as it expects certain runtime components on the target. Java With
Native Image we make things very simple from a user's perspective very
similar to how golang produces statically linked binaries. This might be
a very far out thought but it is worth exploring. I believe GraalVM's
license might allow us to produce binaries that we can incorporate in
our release but IANAL so maybe we can ask ASF legal on their opinion.

Giving cqlsh it's own identity as a sub-project might help us build a
roadmap and evolve it along these lines.

I would like other folks to chime in with their opinions.

Dinesh

On 8/9/23 09:18, Brad wrote:
> 
> As per the CEP process guidelines, I'm starting a formal DISCUSS thread
> to resume the conversation started here[1]. 
> 
> The developers who maintain the Python CQLSH client on the official
> Python PYPI repository would like to integrate and donate their open
> source work to the Apache Cassandra project so it can be more tightly
> and seamlessly integrated.
> 
> The Apache Cassandra project pre-dates the adoption in Python 3.4 of
> PyPI as the default package manager. As a result, an unofficial
> distribution has been provided by a group of developers who have
> maintained the repository there since October 2013. 
> 
> The installable version of CQLSH on PyPI.org allows end users to install
> a cqlsh client with PIP - no tarball or path setup required. I.e.,
> 
>           $ pip install cqlsh
> 
> This popular package has 50K downloads per month and is today maintained
> by Jeff Wideman and Brad Schoening. The PYPI package is updated upon
> every major release by simply repackaging the CQLSH that ships with
> every Cassandra release.
> 
> CQLSH PyPI Repository:  https://pypi.org/project/cqlsh/
> 
> 
> 
> This CEP Proposal suggests incorporating PYPI as a regular part of the
> Cassandra release process and making the CQLSH project on PYPI an
> official distribution point.
> 
> The full CEP can be reviewed at:
> 
> Wiki: CEP-35: Add PIP support for CQLSH
> 
> .
> 
> Jira: CASSANDRA-18654
> 
> 
> 
> But in brief, the proposal will:
> 
>   * Add PyPI.org as an official distribution point for CQLSH
>   * Allow end users to install CQLSH with simply 'pip install cqlsh' on
> MacOS, Windows and Linux platforms.
>   * Donate the modest amount of existing configuration files by the
> authors to Apache Cassandra
>   * This only involves the Python CQLSH client, no changes to
> distribution of Java server side code and tools are involved.
> 
> We welcome further discussion and suggestions regarding this proposal on
> the  mailing list here.
> 
> Regards,
> 
> Jeff Widman &
> Brad Schoening
> 
> [1] https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d
> 



Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-29 Thread Dinesh Joshi
+1 to on by default.

I see the concern about breaking users by introducing 'silent defaults'. IMO 
ACCP itself is a non-breaking change. If I have missed something please point 
it out and I'll happy to reconsider my position.

The advantages of having ACCP on by default far outweigh the risk of not. One 
tangible benefit is that people will have _noticeable_ performance gains. There 
are big users of ACCP on this list that have years of experience running this 
in prod and if there were any breakages, I can assure you that you would see a 
lot of jiras being filed along with patches.

Regarding change to the YAML, the best path forward is to ensure that we all 
test these changes thoroughly. Thats the point of the branching and releasing 
test artifacts. I don't consider this specific change to the YAML to be a high 
risk one (famous last words...? lol).

Dinesh

> On Jul 26, 2023, at 2:13 PM, Josh McKenzie  wrote:
> 
> +1 to the "on by default" camp.
> 
>> What comes to mind is how we brought down people clusters and made sstables 
>> unreadable with the introduction of the chunk_length configuration in 1.0
> I think a key difference here is that changing chunk length is something that 
> materially changes behavior and expectations w/a coupled system, whereas 
> switching crypto providers has the much smaller failure mode of "the 
> implementations aren't binary compatible even though they're supposed to be, 
> and are very heavily tested TO be".
> 
> Totally agree that a "surprise! it didn't load so now your nodes won't start" 
> approach would be a Very Bad Experience for users. Falling back from ACCP and 
> squawking about the lack might actually be nice to help folks where it 
> doesn't load / work / etc know to look into it. It really makes a material 
> difference.
> 
> On Wed, Jul 26, 2023, at 4:02 PM, Jordan West wrote:
>> It sounds like some of the concerns have shifted then. I would like to 
>> better understand the YAML one. Like Jeremiah said it may be a better topic 
>> for the ticket. Would appreciate an example exception or error people are 
>> concerned about. 
>> 
>> If the issue is the “fail fast” on start I’m sure we can find a solution 
>> everyone accepts and move forward. 
>> 
>> If we are agreed “on by default” is the way to go that’s awesome! 
>> 
>> Jordan 
>> 
>> On Wed, Jul 26, 2023 at 12:59 Jeremiah Jordan > > wrote:
>> I had a discussion with Mick on slack.  His concern is not with enabling 
>> ACCP.  His concern is around the testing of the new C* yaml config code 
>> which is included in the patch that is used to decide if ACCP should be 
>> enabled or not, and if startup should fail if it can’t be enabled.
>> 
>> I agree.  We should make sure that the new C* yaml config code is solid 
>> before we commit this patch, especially when it has the possibility of cause 
>> node startup to fail on purpose.  But that should be a discussion for the 
>> ticket I think, not for this thread.
>> 
>> So I think we are back to the original question.  Should ACCP be used by 
>> default in trunk.  From what I have seen I do not see anyone who is against 
>> that?
>> 
>> -Jeremiah
>> 
>> 
>> On Jul 26, 2023 at 2:53:02 PM, Jordan West > > wrote:
>>> +1 Scott. And agreed all involved are looking out for the best interests of 
>>> C* users. And I appreciate those with concerns contributing to addressing 
>>> them. 
>>> 
>>> I’m all for making upgrades smooth bc I do them so often. A huge portion of 
>>> our 4.1 qualification is “will it break on upgrade”? Because of that I’m 
>>> confident in this patch and concerned about many other areas. I think it’s 
>>> commedable to want to reach a point where teams have the trust in the 
>>> community to have done that for them but that starts w better test coverage 
>>> and concrete evidence. 
>>> 
>>> Given all that, I think we should move forward w Ayushi’s proposal to make 
>>> it on by default. 
>>> 
>>> Jordan 
>>> 
>>> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas >> > wrote:
>>> I think these concerns are well-intended, but they feel rooted in 
>>> uncertainty rather than in factual examples of areas where risk is present. 
>>> I would appreciate elaboration on the specific areas of risk that folks 
>>> imagine.
>>> 
>>> I would encourage those who express skepticism to try the patch, and I 
>>> endorse Ayushi's proposal to enable it by default.
>>> 
>>> 
>>> – Scott
>>> 
 On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" 
 mailto:stefan.mikloso...@netapp.com>> wrote:
 
 
 We can make it opt-in, wait one major to see what bugs pop up and we might 
 do that opt-out eventually. We do not need to hurry up with this. I 
 understand everybody's expectations and excitement but it really boils 
 down to one line change in yaml. People who are so much after the 
 performance will be definitely aware of this knob to turn on to

Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-26 Thread Dinesh Joshi
Mick,

This sounds like a good plan. CEP-33 and 34 are ready to go. We're running into 
CI related issues but once they clear up we'll merge them. I anticipate we'll 
be done in a week's time.

Thanks,

Dinesh

> On Jul 26, 2023, at 3:27 PM, Mick Semb Wever  wrote:
> 
> 
> The previous thread¹ on when to freeze 5.0 landed on freezing the first week 
> of August, with a waiver in place for TCM and Accord to land later (but 
> before October).
> 
> With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work that 
> hasn't landed is Vector search (CEP-30).  
> 
> Are there any objections to a waiver on Vector search?  All the groundwork: 
> SAI and the vector type; has been merged, with all remaining work expected to 
> land in August.
> 
> I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 
> and a long list of flakies.  It takes time and patience to triage and 
> identify the bugs that hit us before GA.  The freeze is about being "mostly 
> feature complete",  so we have room for things before our first beta 
> (precedence is to ask).   If we hope for a GA by December, account for the 6 
> weeks turnaround time for cutting and voting on one alpha, one beta, and one 
> rc release, and the quiet period that August is, we really only have 
> September and October left.  
> 
> I already feel this is asking a bit of a miracle from us given how 4.1 went 
> (and I'm hoping I will be proven wrong). 
> 
> In addition, are there any objections to cutting an 5.0-alpha1 release as 
> soon as we freeze?  
> 
> This is on the understanding vector, tcm and accord will become available in 
> later alphas.  Originally the discussion¹ was waiting for Accord for alpha1, 
> but a number of folk off-list have requested earlier alphas to help with 
> testing.
> 
> 
> ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3



Re: [Discuss] Repair inside C*

2023-07-26 Thread Dinesh Joshi
I concur, repair is an intrinsic part of the database and belongs inside it. We 
can certainly expose a REST control plane API via the sidecar for triggering it 
on demand, scheduling, etc.

That said, there are various implementation of repair scheduling and 
orchestration that a lot of organizations maintain in their proprietary 
sidecars. It would be beneficial in the interim to consolidate on a common 
solution in the sidecar. Eventually we need a version of repair in the database 
that just works without the need of any operator intervention.


> On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> 
> I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
> current (and past) state of things where running the DB correctly *requires* 
> running a separate process (either community maintained or official C* 
> sidecar) is incredibly painful for folks.  The idea that your data integrity 
> needs to be opt-in has never made sense to me from the perspective of either 
> the product or the end user.
> 
> I've worked with way too many teams that have either configured this 
> incorrectly or not at all.  
> 
> Ideally Cassandra would ship with repair built in and on by default.  Power 
> users can disable if they want to continue to maintain their own repair 
> tooling for some reason. 
> 
> Jon
> 
> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> All,
>> 
>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>> about having integrated repair into Cassandra and how great that is. I 
>> expressed my disappointment that they didn't work with the community on that 
>> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
>> already had the idea and wrote the code [3] - so I wanted to start a 
>> discussion to gauge interest and maybe how to revive that effort.
>> 
>> Thanks,
>> German
>> 
>> [1] 
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>> 



Re: [VOTE] CEP-34: mTLS based client and internode authenticators

2023-07-21 Thread Dinesh Joshi
+1

> On Jul 21, 2023, at 11:07 AM, Francisco Guerrero  wrote:
> 
> +1 (nb). This is a very valuable enhancement for the project.
> 
> Thanks for the contribution, Jyothsna!
> 
> On 2023/07/21 16:57:45 Jyothsna Konisa wrote:
>> Hi Everyone!
>> 
>> I would like to start a vote thread for CEP-34.
>> 
>> Proposal:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34%3A+mTLS+based+client+and+internode+authenticators
>> JIRA   :
>> https://issues.apache.org/jira/browse/CASSANDRA-18554
>> Draft Implementation : https://github.com/apache/cassandra/pull/2372
>> Discussion :
>> https://lists.apache.org/thread/pnfg65r76rbbs70hwhsz94ds6yo2042f
>> 
>> The vote will be open for 72 hours. A vote passes if there are at least 3
>> binding +1s and no binding vetoes.
>> 
>> Thanks,
>> Jyothsna Konisa.
>> 



Re: Cassandra Sidecar CI is now green!

2023-07-21 Thread Dinesh Joshi
Thanks Francisco, Mick and Yifan for making this happen!

> On Jul 20, 2023, at 4:00 PM, Francisco Guerrero  wrote:
> 
> Hi list,
> 
> I wanted to bring some visibility into the Cassandra Sidecar CI health [1].
> It seems like it has been broken for quite a while and we have finally fixed
> it today.
> 
> Special thanks to Mick for noticing the issue and bringing it up to me. Also,
> thanks to Yifan and Dinesh for reviewing the PR [2] and helping me iterate
> over the PR.
> 
> Best,
> - Francisco
> 
> [1] https://ci-cassandra.apache.org/job/cassandra~sidecar/
> [2] https://issues.apache.org/jira/browse/CASSANDRASC-66



Re: [VOTE] Release Apache Cassandra 4.1.3

2023-07-20 Thread Dinesh Joshi
+1


> On Jul 18, 2023, at 11:28 PM, Miklosovic, Stefan 
>  wrote:
> 
> Proposing the test build of Cassandra 4.1.3 for release.
> 
> sha1: 2a4cd36475de3eb47207cd88d2d472b876c6816d
> Git: https://github.com/apache/cassandra/tree/4.1.3-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1304/org/apache/cassandra/cassandra-all/4.1.3/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.1.3/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.1.3-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.1.3-tentative/NEWS.txt


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-19 Thread Dinesh Joshi
Does anybody have any questions / comments?

Dinesh

> On Jul 17, 2023, at 12:37 PM, Dinesh Joshi  wrote:
> 
> Hi folks,
> 
> Given the feedback received, we thought it would be best to do a CEP. Here's 
> the link: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34
> 
> It incorporates the feedback we've received. Please let me know if there are 
> any other comments. We'll wait for a bit and start a VOTE thread for it.
> 
> Thanks,
> 
> Dinesh
> 
>> On Jul 12, 2023, at 12:13 AM, Dinesh Joshi  wrote:
>> 
>> I can certainly start a VOTE thread for the CQL syntax addition. There
>> hasn't been any feedback that suggests that there is an unaddressed
>> concern to the changes we are making.
>> 
>> That said, I'm not sure if there was explicit decision that has resulted
>> in an update to the project's governance to reflect this requirement? If
>> there is I seem to have missed it. There was a discussion in the past
>> about notifying the dev list to ensure there is visibility to changes
>> but I don't recall whether there was an explicit voting requirement.
>> 
>> On 7/11/23 19:17, Yuki Morishita wrote:
>>>> folks - I think we’ve achieved lazy consensus here. Please continue
>>> with feedback on the jira.
>>> 
>>> Hi Dinesh,
>>> 
>>> As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?
>>> 
>>> For the future reference, in my opinion, adding new CQL syntax should
>>> have a CEP as it is not something we can easily change once defined.
>> 
> 



Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-17 Thread Dinesh Joshi
Hi folks,

Given the feedback received, we thought it would be best to do a CEP. Here's 
the link: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34

It incorporates the feedback we've received. Please let me know if there are 
any other comments. We'll wait for a bit and start a VOTE thread for it.

Thanks,

Dinesh

> On Jul 12, 2023, at 12:13 AM, Dinesh Joshi  wrote:
> 
> I can certainly start a VOTE thread for the CQL syntax addition. There
> hasn't been any feedback that suggests that there is an unaddressed
> concern to the changes we are making.
> 
> That said, I'm not sure if there was explicit decision that has resulted
> in an update to the project's governance to reflect this requirement? If
> there is I seem to have missed it. There was a discussion in the past
> about notifying the dev list to ensure there is visibility to changes
> but I don't recall whether there was an explicit voting requirement.
> 
> On 7/11/23 19:17, Yuki Morishita wrote:
>>> folks - I think we’ve achieved lazy consensus here. Please continue
>> with feedback on the jira.
>> 
>> Hi Dinesh,
>> 
>> As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?
>> 
>> For the future reference, in my opinion, adding new CQL syntax should
>> have a CEP as it is not something we can easily change once defined.
> 



Re: [VOTE] Release Apache Cassandra 4.0.11

2023-07-13 Thread Dinesh Joshi
+1

> On Jul 13, 2023, at 12:12 AM, Miklosovic, Stefan 
>  wrote:
> 
> Proposing the test build of Cassandra 4.0.11 for release.
> 
> sha1: f8584b943e7cd62ed4cb66ead2c9b4a8f1c7f8b5
> Git: https://github.com/apache/cassandra/tree/4.0.11-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1303/org/apache/cassandra/cassandra-all/4.0.11/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.0.11/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.0.11-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.0.11-tentative/NEWS.txt



Re: Changing the output of tooling between majors

2023-07-13 Thread Dinesh Joshi
This adds maintenance overhead but is a potential alternative. I would only 
flip the flag. I would prefer to make the default "legacy" output and innovate 
behind a "--output-format=v2" flag. That way tools do not break or have to 
change to pass in the new flag.

Ideally we should always version our output format - structured or not.

Dinesh

> On Jul 13, 2023, at 9:08 AM, German Eichberger via dev 
>  wrote:
> 
> Let's take this discussion in a different direction: If we add a --legacy 
> ​ argument where we are supporting an old version for those who 
> need/want it but have the (breaking) changes on the default this feels like a 
> compromise - and then we can deprecate the legacy format without impacting 
> innovation. We can also flip this with requiring a flag for the changed 
> format if we feel this is better.
> 
> This let's us innovate without breaking anyone. Thoughts?
> 
> Thanks,
> German


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-12 Thread Dinesh Joshi
I can certainly start a VOTE thread for the CQL syntax addition. There
hasn't been any feedback that suggests that there is an unaddressed
concern to the changes we are making.

That said, I'm not sure if there was explicit decision that has resulted
in an update to the project's governance to reflect this requirement? If
there is I seem to have missed it. There was a discussion in the past
about notifying the dev list to ensure there is visibility to changes
but I don't recall whether there was an explicit voting requirement.

On 7/11/23 19:17, Yuki Morishita wrote:
>> folks - I think we’ve achieved lazy consensus here. Please continue
> with feedback on the jira.
> 
> Hi Dinesh,
> 
> As Jeremiah commented on JIRA, shouldn't we have a vote in the ML?
> 
> For the future reference, in my opinion, adding new CQL syntax should
> have a CEP as it is not something we can easily change once defined.


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-11 Thread Dinesh Joshi
folks - I think we’ve achieved lazy consensus here. Please continue with feedback on the jira.Thanks,DineshOn Jul 7, 2023, at 12:23 PM, Jyothsna Konisa  wrote:Hi Yuki, Jeremiah & Christopher,Thank you very much for the feedback. Regarding removing superuser check for adding/removing identities, I have relaxed that check and added permissions check instead. With this change only users with appropriate permissions to add/drop identities can perform that action.About extending `Create Role` cqlsh statement, we have a couple of reasons for not doing that. We designed the mTLS authenticator in such a way that a single role can be associated with multiple identities, EX: there can be several identities which are read_only users. Also, having a separate cqlsh statement for identities makes it more pluggable and independent. If we still think that extending the create role statement would be a convenient feature, we can add it as required in the followup patches.Christopher, I will be acting upon your feedback regarding having identity in the cassandra.yaml optionally configurable.Thanks,Jyothsna Konisa.On Thu, Jul 6, 2023 at 5:30 PM Dinesh Joshi <djo...@apache.org> wrote:> On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan <jerem...@datastax.com> wrote:
> 
> I don’t think users necessarily need to be able to update their own identities.  I just don’t want to have to use the super user role.  The super user role has all power over all things in the data base.  I don’t want to have to give that much power to the person who manages identities, I just want to give them the power to manage identities.

Makes sense. I think Jyothsna already pushed an update to the PR to relax the restriction. Please feel free to take a look at it.

Dinesh






Re: Changing the output of tooling between majors

2023-07-09 Thread Dinesh Joshi
> On Jul 8, 2023, at 8:43 AM, Miklosovic, Stefan  
> wrote:
>  
> If we are providing CQL / JSON / YAML for couple years, I do not believe that 
> the argument "lets not break it for folks in nodetool" is still relevant. CQL 
> output is there from times of 4.0 at least (at least!) and YAML / JSON is 
> also not something completely new. It is not like we are suddenly forcing 
> people to change their habits, there was enough time to update the stuff to 
> CQL / json / yaml etc ...

What % of Cassandra users are using 4.0+? Operators who upgrade to 4.0 and 
beyond may still use their existing scripts. Therefore keeping things stable is 
important. Until nodetool can support JSON as output format for all interaction 
and there is a significant adoption in the user community, I would strongly 
advise against making breaking changes to the CLI output.

Dinesh

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-06 Thread Dinesh Joshi
> On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan  wrote:
> 
> I don’t think users necessarily need to be able to update their own 
> identities.  I just don’t want to have to use the super user role.  The super 
> user role has all power over all things in the data base.  I don’t want to 
> have to give that much power to the person who manages identities, I just 
> want to give them the power to manage identities.

Makes sense. I think Jyothsna already pushed an update to the PR to relax the 
restriction. Please feel free to take a look at it.

Dinesh





Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-06 Thread Dinesh Joshi
> It is surprising to me that we load the identity from the keystore vs 
> explicitly setting an expected value in cassandra.yaml. I get that an error 
> is thrown if the identity doesn't match those of other nodes in the cluster, 
> but does it make sense to prevent startup should the value in the keystore 
> deviate from a (currently nonexistent) value in cassandra.yaml?

We can make it optionally configurable. The concern about adding identities in 
a yaml is that it generally requires a bounce for Cassandra to pick up new 
values.

> It feels like there is a parallel to how we set the cluster name in 
> cassandra.yaml even though the value is also present within our local 
> sstables and leads to startup errors should they differ.

I can see the parallels here. Thanks for the feedback.

Dinesh

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-30 Thread Dinesh Joshi
CQL syntax, setting up and migration
>> would be greatly appreciated.
>>
>>
>> On Wed, Jun 21, 2023 at 4:13 AM Jyothsna Konisa
>> > <mailto:jyothsna1...@gmail.com>> wrote:
>>
>> Hi Yuki,
>>
>> Sorry I missed answering your other question in
>> the above reply. Regarding checking what
>> identities are associated with a given role, one
>> can make a query to list identities for a given
>> role to the table. Also note that, addition or
>> removal of identities from the table can only be
>> performed by the super user only. Not even
>> read-write users can perform modifications to the
>> table.
>>
>> Also, If others have no concerns regarding this
>> patch, can we move forward with the merge? or do
>> we need voting on this one?
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>>
>> On Mon, Jun 19, 2023 at 4:00 PM Jyothsna Konisa
>> > <mailto:jyothsna1...@gmail.com>> wrote:
>>
>> Hi Yuki,
>> You are right regarding adding a custom
>> validator. If one wants to implement a CN
>> based validator, they can do that and
>> configure that validator in Cassandra.yaml in
>> "authenticator.parameters.validator_class_name".
>>
>> Regarding a role having multiple identities,
>> yes a role can have multiple identities
>> associated with it. For example, there can be
>> several read_only users for a given cluster,
>> so the role `readonly_user` can be associated
>> with multiple identities.
>>
>> Regarding the uniqueness of identity, each
>> identity should be associated with only one
>> role. For example, a single identity can not
>> be both admin user and a read only user.
>>
>> We have ensured this by carefully designing
>> the schema of the new table for storing
>> identity information by making identity as the
>> primary key which guarantees that each
>> identity is unique and the same role can have
>> multiple identities.
>>
>> Thanks,
>> Jyothsna Konisa.
>>
>> On Sun, Jun 18, 2023 at 5:42 PM Yuki Morishita
>> > <mailto:mor.y...@gmail.com>> wrote:
>>
>> HI,
>>
>> I was discussing with users the other day
>> regarding a similar feature.
>> They were thinking of implementing the
>> custom Authenticator similar to what MySQL
>> offers:
>>
>>     CREATE USER 'jeffrey'@'localhost'
>>   REQUIRE SUBJECT
>> '/C=SE/ST=Stockholm/L=Stockholm/
>>     O=MySQL demo client certificate/
>>    
>> CN=client/emailAddress=cli...@example.com
>> <mailto:cli...@example.com>';
>>
>> 
>> (https://dev.mysql.com/doc/refman/8.0/en/create-user.html#create-user-tls 
>> <https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/create-user.html*create-user-tls__;Iw!!PbtH5S7Ebw!bc-bxD5J_z84ErqBnLngRGkogZQQF2d5tQcORTek4SaE5S_LVkzIYlLIFY73R48icK6fAwtUBLyaE8NW0A$>)
>>
>> I think they can implement a custom
>> Validator that validates the identity (for
>> their case, CN) associated with a role
>> using the certificate'

Re: [VOTE] CEP 33 - CIDR filtering authorizer

2023-06-27 Thread Dinesh Joshi
+1On Jun 27, 2023, at 1:23 PM, Josh McKenzie  wrote:+1On Tue, Jun 27, 2023, at 1:17 PM, Shailaja Koppu wrote:Hi Team,(Starting a new thread for VOTE instead of reusing the DISCUSS thread, to follow usual procedure).Please vote on CEP 33 - CIDR filtering authorizer https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-33%3A+CIDR+filtering+authorizer.Thanks,Shailaja

Re: [DISCUSS] Using ACCP or tc-native by default

2023-06-22 Thread Dinesh Joshi
This would be a good addition and would make Cassandra more performant out of the box.DineshOn Jun 22, 2023, at 9:45 PM, Jordan West  wrote:Glad to see there is support for this! I think ACCP would be a good choice since there seems to be a lot of experience deploying it. I’ve opened https://issues.apache.org/jira/browse/CASSANDRA-18624. I should have some time to work on the patch soon and I will try to provide some graphs that show the performance benefit from a recent benchmark.  JordanOn Thu, Jun 22, 2023 at 19:28 Fleming, Jackson  wrote:







We run ACCP in production on 1000s of nodes across Cassandra 3.11 and 4 with great results.
 
Would love to see it baked into Cassandra.

 
Jackson

 



From:
David Capwell 
Date: Friday, 23 June 2023 at 9:22 am
To: dev 
Subject: Re: [DISCUSS] Using ACCP or tc-native by default






NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and
 know the content is safe. 








+1 to ACCP






On Jun 22, 2023, at 3:05 PM, C. Scott Andreas  wrote:

 




+1 for ACCP and can attest to its results. ACCP also optimizes for a range of hash functions and other cryptographic primitives beyond TLS acceleration for Netty.


 



On Jun 22, 2023, at 2:07 PM, Jeff Jirsa  wrote:


 


 



Either would be better than today. 


 



On Thu, Jun 22, 2023 at 1:57 PM Jordan West  wrote:



Hi,


 


I’m wondering if there is appetite to change the default SSL provider for Cassandra going forward to either ACCP [1] or tc-native in Netty? Our deployment as well as others I’m aware of make this change in
 their fork and it can lead to significant performance improvement. When recently qualifying 4.1 without using ACCP (by accident) we noticed p99 latencies were 2x higher than 3.0 w/ ACCP. Wiring up ACCP can be a bit of a pain and also requires some amount of
 customization. I think it could be great for the wider community to adopt it. 


 


The biggest hurdle I foresee is licensing but ACCP is Apache 2.0 licensed. Anything else I am missing before opening a JIRA and submitting a patch?


 


Jordan 


 


 



[1] 


https://github.com/corretto/amazon-corretto-crypto-provider








 





 









Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-17 Thread Dinesh Joshi
Folks, any feedback here?

On 6/15/23 12:46, Jyothsna Konisa wrote:
> Hi Everyone!
> 
> We are adding the following CQL queries in this patch for adding and dropping 
> identities in the new `system_auth.identity_to_role` table.
> 
> ADD IDENTITY 'testIdentity' TO ROLE 'testRole';
> DROP IDENTITY 'testIdentity';
> 
> Please let us know if anyone has any concerns!
> 
> Thanks,
> Jyothsna Konisa.
> 
> 
> On Sat, Jun 3, 2023 at 7:18 AM Derek Chen-Becker  <mailto:de...@chen-becker.org>> wrote:
> 
> Sounds great, thanks for the clarification!
> 
> Cheers,
> 
> Derek
> 
> On Sat, Jun 3, 2023 at 12:48 AM Dinesh Joshi  <mailto:djo...@apache.org>> wrote:
> 
>> On Jun 2, 2023, at 9:06 PM, Derek Chen-Becker
>> mailto:de...@chen-becker.org>> wrote:
>>
>> This certainly looks like a nice addition to the operator's
>> tools for securing cluster access. Out of curiosity, is there
>> anything in this work that would *preclude* a different
>> authentication scheme for internode at some point in the
>> future? Has there ever been discussion of pluggability similar
>> to the client protocol?
> 
> This is a pluggable implementation so it's not mandatory to use
> it and doesn't preclude one from using a different mechanism in
> the future. We haven't explicitly discussed pluggability i.e.
> part of protocol negotiation in the past for internode
> connections. However, this work also does not preclude us from
> implementing such changes. If we do add negotiation this could
> be one of the authentication mechanisms. So it would be
> complimentary.
> 
> 
>> Also, am I correct in understanding that this would allow for
>> multiple certificates for the same identity (e.g. distinct
>> cert per node)? I certainly understand the decision to keep
>> things simple and have all nodes share identity from the
>> perspective of operational simplicity, but I also don't want
>> to get in a situation where a single compromised node would
>> require an invalidation and redeployment on all nodes in the
>> cluster.
> 
> I don't recommend all nodes share the same certificate. Each
> node in the cluster should obtain a unique certificate with the
> same SPIFFE. In the event a node is compromised, the operator
> can revoke that node's certificate without having to redeploy to
> all nodes in the cluster.
> 
> thanks,
> 
> Dinesh
> 
> 
> 
> -- 
> +---+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker
> <https://keybase.io/dchenbecker>and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
> <https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 



Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Dinesh Joshi
+1

Dinesh

> On Jun 13, 2023, at 7:15 AM, Jeremy Hanna  wrote:
> 
> 
> Calling for a vote on CEP-8 [1].
> 
> To clarify the intent, as Benjamin said in the discussion thread [2], the 
> goal of this vote is simply to ensure that the community is in favor of the 
> donation. Nothing more.
> The plan is to introduce the drivers, one by one. Each driver donation will 
> need to be accepted first by the PMC members, as it is the case for any 
> donation. Therefore the PMC should have full control on the pace at which new 
> drivers are accepted.
> 
> If this vote passes, we can start this process for the Java driver under the 
> direction of the PMC.
> 
> Jeremy
> 
> 1. 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-02 Thread Dinesh Joshi
> On Jun 2, 2023, at 9:06 PM, Derek Chen-Becker  wrote:
> 
> This certainly looks like a nice addition to the operator's tools for 
> securing cluster access. Out of curiosity, is there anything in this work 
> that would *preclude* a different authentication scheme for internode at some 
> point in the future? Has there ever been discussion of pluggability similar 
> to the client protocol?

This is a pluggable implementation so it's not mandatory to use it and doesn't 
preclude one from using a different mechanism in the future. We haven't 
explicitly discussed pluggability i.e. part of protocol negotiation in the past 
for internode connections. However, this work also does not preclude us from 
implementing such changes. If we do add negotiation this could be one of the 
authentication mechanisms. So it would be complimentary.


> Also, am I correct in understanding that this would allow for multiple 
> certificates for the same identity (e.g. distinct cert per node)? I certainly 
> understand the decision to keep things simple and have all nodes share 
> identity from the perspective of operational simplicity, but I also don't 
> want to get in a situation where a single compromised node would require an 
> invalidation and redeployment on all nodes in the cluster.

I don't recommend all nodes share the same certificate. Each node in the 
cluster should obtain a unique certificate with the same SPIFFE. In the event a 
node is compromised, the operator can revoke that node's certificate without 
having to redeploy to all nodes in the cluster.

thanks,

Dinesh

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-02 Thread Dinesh Joshi
> On Jun 2, 2023, at 1:56 PM, Christopher Bradford  wrote:
> 
> I am not sure what you mean by this would be used alongside internode and 
> client TLS? The mutual TLS authentication allows the server to authenticate 
> the client's identity using a client TLS certificate. The authenticators 
> we're adding enable this functionality. There isn't an expectation that the 
> same certificates be used. In fact, clients should not use the same 
> certificates as the internode.
> 
> My apologies if questions (1 and 2) were a bit convoluted. I'm going to walk 
> through both client and internode below as I perceive the PR.
> 
> Client TLS connections have the client certificate checked against the trust 
> store (if client_encryption_options.require_client_auth is set to true). It 
> looks like the authenticator in the PR checks the identity in the subject 
> alternative name part of the certificate against identity / roles 
> relationships specified via CQL. This all makes sense to me, but to be clear 
> my question was whether the certificate used by the client is the same as the 
> certificate used to secure the connection. My initial reading here is that 
> yes the certificates are the same, there's only ever one certificate, we're 
> just looking in two locations for trust. We would first be checking the 
> certificate's trust before the request is ever processed (is the CA or this 
> certificate in the trust store). Then the SAN of the certificate is utilized 
> to determine who the request is being performed by which is then matched up 
> with a role and request processing continues as usual.

Correct.


> Internode communication is where I started to get confused. It wasn't clear 
> where we were authorizing identities as trusted. We still have the 
> server_encryption_options.require_client_auth (similar to 
> client_encryption_options.require_client_auth) boolean to force checking the 
> trust of the provided certificate against our trust store. I was looking for 
> a way to specify either an allowed list of identities or a pattern to match 
> on. Rereading the PR showed me that we are extracting the valid identities 
> from the outbound keystore (reference link). This doesn't seem correct as the 
> associated documentation in cassandra.yaml indicates this is where the public 
> and private key information is stored for a node's outbound (client) 
> connections to other nodes. Should this instead be the 
> server_encryption_options.truststore alongside trusted CA certificates? In 
> either case it seems as though we would need to load the public certificate 
> for all servers in the cluster (including the specified SPIFFE SAN). Is that 
> correct? This means there's no way (yet) to match against a specific pattern 
> of identities, instead they must all be explicitly allowed.

The reason we use the keystore is that the node extracts its own identity and 
expects other nodes in the cluster to share the same identity. This default 
behavior makes it easy to avoid configuring individual identities of nodes in 
the cluster. It's critical to recognize that if we had a separate identity for 
each node in the cluster, then we would need to update all nodes in the cluster 
when a new node is added or removed. This way all nodes in the cluster can have 
a shared identity while simultaneously preventing unnecessary operational pain 
of adding and removing identities each time a node is added or removed from the 
cluster.


> Back to my original question, it appears as though we are using a single 
> certificate for internode where we first check the trust chain of the 
> certificate then check the subject against a valid list in some store. This 
> is the same behavior for client certificates. The genesis of my question was 
> whether there would be separate certificates for the SPIFFE work on top of 
> the certificates used for the base TLS communication. There are not (which is 
> good IMO) instead a SAN is expected to be included in the certificate which 
> is then used for checking identity. (Note internode and client certificates 
> may still be separate and in most cases should be).

Correct.

> Given I've sorted out questions 1 and 2 in my mental model, question 3 is a 
> little different. I think the question is more how do I manage certificates 
> and trust given these changes? I think this can be answered with:
> 
> For clients we provide a CA certificate in the client trust store and 
> identities via the new CQL syntax for mapping roles to SPIFFE identities. 
> Easy.
> 
> Internode communication is handled by adding a CA certificate to the server 
> trust store and outbound node client certificates to a store (which is not 
> clear at the moment per above) which include a SPIFFE identity as part of 
> their SAN.

I hope my explanation above clarifies this confusion.

> 
> Please let me know if this is accurate.
> 
> To recap open questions from above:
> • 
> Should we be using server_encryption_op

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-02 Thread Dinesh Joshi
> Is there an expectation that this would be used alongside internode and 
> client TLS? Would the certificates be the same, different, or is that an 
> implementation detail for the specific deployment to determine?

I am not sure what you mean by this would be used alongside internode and 
client TLS? The mutual TLS authentication allows the server to authenticate the 
client's identity using a client TLS certificate. The authenticators we're 
adding enable this functionality. There isn't an expectation that the same 
certificates be used. In fact, clients should not use the same certificates as 
the internode.

> For some reason I'm having trouble understanding the internode authentication 
> portion of this ticket (authenticating a client with a certificate makes 
> sense vs just authenticating the connection). Why is this needed on top of 
> the connection-level TLS we have now?

Current connection-level TLS only secures the TLS connection. It doesn't 
authenticate the peer. This adds the ability to authenticate the peer in 
addition to securing the TLS connection.

> When an operator or DBA is looking to add a new identity is that just handled 
> as part of the new CQL statement or is there some certificate management 
> required on the nodes? I assume it's just the CA that needs to be placed on 
> the nodes to establish trust in the certificate itself then authz happening 
> within C* after determining the certificate can be trusted, but want to be 
> certain.

Each deployment is different so certificate management isn't scope of the 
database itself. However, the operator can rotate the certificates using an 
external agent and Cassandra will pick them up through SSL hot reloading. We 
don't just rely on a CA trusting the client certificate, we extract the 
identity from the certificate (for example: SPIFFE) and ensure that it is 
allowed to access the cluster. This means someone can't just obtain a 
certificate signed by a CA that Cassandra cluster trusts and connect to it.

> As a minor nit, should we include static certificates in the test data? I see 
> they expire in 2033 which is a fair way off, but I wonder if it would make 
> sense to generate the certificates as part of the test setup.
Thanks for the feedback!

CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-02 Thread Dinesh Joshi
Hi dev@,

We're planning to add mTLS client authentication as well as internode 
authentication in CASSANDRA-18554. While this is all backward compatible, we 
thought it would be a good idea to notify the dev list. If anybody is 
interested please take a look at the JIRA.

Thanks,

Dinesh

Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-26 Thread Dinesh Joshi
This is exciting. Thank you for all your hard work on getting ICLAs from contributors. I am in favor of moving forward.On May 26, 2023, at 5:54 AM, Jeremy Hanna  wrote:To add to a somewhat crowded [DISCUSS] party, I'd like to restart the discussion around CEP-8.This is the original thread from July 2020: https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxmAt the time, several good points were discussed and the CEP has been updated with many of them.  One point in particular was that we should start with the DataStax Java driver as it is the reference implementation of the CQL protocol, a dependency of the project, and the most used of the 7 drivers discussed.  Other points were about package naming evolution and DataStax specific functionality.  I believe everyone agreed that we should take the first step of contributing the drivers as-is to minimize user disruption.  That way we get through the legal and procedural process for the first driver.  Then we can proceed with discussing how it will be managed and by whom.As the next step in donating the Java driver to the project and as we talked about at ApacheCon last year, we needed to verify that we had all of the CLAs for all of the codebase.  Over the last year, Greg, Benjamin, Mick, Josh, Scott, and I have been tracking down all of the contributors of the DataStax Java driver that had not signed the DataStax CLA and asked them to please sign that one or the ASF CLA.  After having discussed the CLAs with the PMC and ASF legal, we believe we are ready to proceed.At this point, we'd like to propose CEP-8 for consideration, starting the process to accept the DataStax Java driver as an official ASF project.CEP-8: Datastax Drivers Donation - CASSANDRA - Apache Software Foundationcwiki.apache.orgJeremy

Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Dinesh Joshi
+1On May 25, 2023, at 8:45 AM, Jonathan Ellis  wrote:Let's make this official.CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+IndexesPOC that demonstrates all the big rocks, including distributed queries: https://github.com/datastax/cassandra/tree/cep-vsearch-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: [CASSANDRA-11471] Authentication mechanism negotiation (OPTIONS/SUPPORTED)

2023-05-25 Thread Dinesh Joshi
Leaving the naming aside (the hardest part of any software), I am generally 
positive about your idea. A protocol version bump may be avoidable like you 
suggested. Perhaps a prototype of this idea is in order to help shape the idea? 
Would you like to take it on?

> On May 21, 2023, at 4:21 AM, Derek Chen-Becker  wrote:
> 
> We had a recent discussion in Slack about how to potentially use the OPTIONS 
> and SUPPORTED messages in the existing CQL protocol to allow the server to 
> advertise more than one authentication method and allow the client to then 
> choose which authenticator to use. The primary use case here is to allow 
> seamless migration to a new authenticator without having to have all parties 
> involved agree on a single class (and avoid a disruptive change). There's 
> already a ticket open that was focused on making a change to the binary 
> protocol (https://issues.apache.org/jira/browse/CASSANDRA-11471) but I think 
> that we can accomplish this in a backwards compatible way that avoids a 
> change to the protocol itself.
> 
> What I propose is to allow a server configured for this graceful auth change 
> to send an additional value in the [string multimap] body of the SUPPORTED 
> message that indicates which authenticators are supported, in descending 
> priority order. For example, if I wanted to migrate my server to support both 
> PlainTextAuthProvider and some new MyAwesomeAuthProvider, I would configure 
> my client to query options and the server would respond with
> 
> 'AUTHENTICATORS': ['MyAwesomeAuthProvider', 'PlainTextAuthProvider']
> 
> The client can then choose from its own supported providers and send it as 
> part of the STARTUP message [string map] body:
> 
> 'AUTHENTICATOR': 'MyAwesomeAuthenticator'
> 
> I'm not good with naming so feel free to propose a different key for either 
> of these map entries. In any case, the server then validates that the 
> client-chosen authenticator is actually supported and would then proceed with 
> the AUTHENTICATE message. In the case where the client sends an 
> invalid/unsupported authenticator choice, the server can simply respond with 
> an AUTHENTICATE using the most-preferred configured authenticator.
> 
> I think this is a better approach than changing the binary protocol because 
> the mechanism already exists for negotiating options and this seems like a 
> natural use case that avoids having to create an entirely new version of the 
> protocol. It does not appear to conflict with the existing protocol 
> definition but I'm not 100% certain. Section 4.1.1 discusses "Possible 
> options"  for the STARTUP message 
> (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec#L296),
>  but that's an unfortunate use of English that's ambiguous as to whether it 
> means "The only ones supported" or "Supported but not exclusively".
> 
> I've taken a look at the Java and Python driver source so far and I can't 
> find anything that would seem to cause a problem by returning a SUPPORTED 
> multimap entry that the client isn't aware of (in both they would appear to 
> ignore it), but I'll also admit that this is the first time I've looked at 
> this part of the client code and I could be missing something. Is anyone 
> aware of possible problems that would be caused by using this approach? In 
> particular, if there are clients that strictly validate all entries in the 
> SUPPORTED map then this could cause a problem. 
> 
> Worst case, we may still need a protocol version bump if the enumeration of 
> STARTUP options is intended to be strict, but at least this would not require 
> a new message type and would fit into the existing framework for negotiation 
> between client and server.
> 
> Thoughts, questions, or concerns would be appreciated.
> 
> Cheers,
> 
> Derek
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 



Re: [VOTE] Release dtest-api 0.0.15

2023-05-25 Thread Dinesh Joshi
With 5 +1s and no -1s, the vote passes. Thanks everybody.

> On May 24, 2023, at 9:58 AM, Jon Meredith  wrote:
> 
> +1
> 
> On Wed, May 24, 2023 at 10:13 AM Francisco Guerrero  <mailto:fran...@apache.org>> wrote:
>> +1 (nb)
>> 
>> On 2023/05/24 15:38:54 Alex Petrov wrote:
>> > +1
>> > 
>> > On Wed, May 24, 2023, at 5:36 PM, Doug Rohrer wrote:
>> > > +1 (nb)
>> > > 
>> > > > On May 24, 2023, at 11:32 AM, Brandon Williams > > > > <mailto:dri...@gmail.com>> wrote:
>> > > > 
>> > > > +1
>> > > > 
>> > > > Kind Regards,
>> > > > Brandon
>> > > > 
>> > > > On Wed, May 24, 2023 at 10:31 AM Dinesh Joshi > > > > <mailto:djo...@apache.org>> wrote:
>> > > >> 
>> > > >> Proposing the test build of in-jvm dtest API 0.0.15 for release.
>> > > >> 
>> > > >> Repository:
>> > > >> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>> > > >> 
>> > > >> Candidate SHA:
>> > > >> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/48af78d1d4b5f285d3dd4991afd4df3101e3983a
>> > > >> tagged with 0.0.15
>> > > >> 
>> > > >> Artifacts:
>> > > >> https://repository.apache.org/content/repositories/orgapachecassandra-1290/org/apache/cassandra/dtest-api/0.0.15/
>> > > >> 
>> > > >> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
>> > > >> 
>> > > >> Changes since last release:
>> > > >> 
>> > > >> * CASSANDRA-18537: Add JMX utility class to in-jvm dtest to ease
>> > > >> development of new tests using JMX
>> > > >> 
>> > > >> The vote will be open for 24 hours. Everyone who has tested the build
>> > > >> is invited to vote. Votes by PMC members are considered binding. A
>> > > >> vote passes if there are at least three binding +1s.
>> > > 
>> > > 



Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Dinesh Joshi
> On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> 
> Any objections to adding the concurrent wrapper and switching out agrona for 
> fastutil?

How does fastutil compare to agrona in terms of memory profile and runtime 
performance? How invasive would it be to switch?

[VOTE] Release dtest-api 0.0.15

2023-05-24 Thread Dinesh Joshi
Proposing the test build of in-jvm dtest API 0.0.15 for release.

Repository:
https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git

Candidate SHA:
https://github.com/apache/cassandra-in-jvm-dtest-api/commit/48af78d1d4b5f285d3dd4991afd4df3101e3983a
tagged with 0.0.15

Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1290/org/apache/cassandra/dtest-api/0.0.15/

Key signature: 53371F9B1B425A336988B6A03B6042413D323470

Changes since last release:

* CASSANDRA-18537: Add JMX utility class to in-jvm dtest to ease
development of new tests using JMX

The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.


Re: [VOTE] Release dtest-api 0.0.14

2023-05-16 Thread Dinesh Joshi
Vote passes with 7 +1s and no -1s.

thanks everybody.

On 5/15/23 15:12, Dinesh Joshi wrote:
> Proposing the test build of in-jvm dtest API 0.0.14 for release.
> 
> Repository:
> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
> 
> Candidate SHA:
> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/ea4b44e0ed0a4f0bbe9b18fb40ad927b49a73a32
> tagged with 0.0.14
> 
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1289/org/apache/cassandra/dtest-api/0.0.14/
> 
> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
> 
> Changes since last release:
> 
> * CASSANDRA-18511: Add support for JMX in jvm-dtest
> 
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.



[VOTE] Release dtest-api 0.0.14

2023-05-15 Thread Dinesh Joshi
Proposing the test build of in-jvm dtest API 0.0.14 for release.

Repository:
https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git

Candidate SHA:
https://github.com/apache/cassandra-in-jvm-dtest-api/commit/ea4b44e0ed0a4f0bbe9b18fb40ad927b49a73a32
tagged with 0.0.14

Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1289/org/apache/cassandra/dtest-api/0.0.14/

Key signature: 53371F9B1B425A336988B6A03B6042413D323470

Changes since last release:

* CASSANDRA-18511: Add support for JMX in jvm-dtest

The vote will be open for 24 hours. Everyone who has tested the build
is invited to vote. Votes by PMC members are considered binding. A
vote passes if there are at least three binding +1s.


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Dinesh Joshi
> On May 12, 2023, at 11:36 AM, Caleb Rackliffe  
> wrote:
> 
> [POLL] Centralize existing syntax or create new syntax?
> 
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds 
> LOCAL keyword for clarity and separation from future GLOBAL indexes)
> 
> (In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)

2.

> 
> 
> [POLL] Should there be a default? (YES/NO)

Yes.


> [POLL] What do do with the default?
> 
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by 
> default)

1 or 2.

3 and 4 are bad options IMHO.

As a user I expect defaults to remain consistent across installations with the 
same major version. Allowing configurable defaults will change CQL behavior 
based on Cassandra's configuration. This makes things very unpredictable and at 
that point it is better to force the user to explicitly select their index 
implementation.

Imagine a user's surprise where they run the same DDL script to setup a schema 
on two clusters and they end up with a _different_ index because the clusters 
had different defaults. This is not the user experience we should be aiming for.

> 
> On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  > wrote:
>>> 
>>> Given it seems most DBs have a default index (see Postgres, etc.), I tend 
>>> to lean toward having one, but that's me...
>> 
>>  
>> I'm for it too.  Would be nice to enforce the setting is globally uniform to 
>> avoid the per-node problem. Or add a keyspace option. 
>> 
>> For users replaying <5 DDLs this would just require they set the default 
>> index to 2i.
>> This is not a headache, it's a one-off action that can be clearly expressed 
>> in NEWS.
>> It acts as a deprecation warning too.
>> This prevents new uneducated users from creating the unintended index, it 
>> supports existing users, and it does not present SAI as the battle-tested 
>> default.
>> 
>> Agree with the poll, there's a number of different PoVs here already.  I'm 
>> not fond of the LOCAL addition,  I appreciate what it informs, but it's just 
>> not important enough IMHO (folk should be reading up on the index type).



Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Dinesh Joshi
I agree. 5.0 is a major release and provides an opportunity to switch defaults.

> On May 9, 2023, at 7:00 PM, Jonathan Ellis  wrote:
> 
> +1 for this, especially in the long term.  CREATE INDEX should do the right 
> thing for most people without requiring extra ceremony.
> 
> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan  > wrote:
>> If the consensus is that SAI is the right default index, then we should just 
>> change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>> 
>> 
>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe >> > wrote:
>>> 
>>> Earlier today, Mick started a thread on the future of our index creation 
>>> DDL on Slack:
>>> 
>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>> 
>>> At the moment, there are two ways to create a secondary index.
>>> 
>>> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>>> 
>>> This creates an optionally named legacy 2i on the provided table and column.
>>> 
>>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>> 
>>> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
>>>  [WITH OPTIONS = ]
>>> 
>>> This creates a secondary index on the provided table and column using the 
>>> specified 2i implementation class and (optional) parameters.
>>> 
>>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
>>> 'StorageAttachedIndex'
>>> 
>>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
>>> shorthand for the fully-qualified class name, which is also valid.)
>>> 
>>> So what is there to discuss?
>>> 
>>> The concern Mick raised is...
>>> 
>>> "...just folk continuing to use CREATE INDEX  because they think CREATE 
>>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users 
>>> doing 2i (when they think they are, and/or we definitely want them to be, 
>>> using SAI)"
>>> 
>>> To paraphrase, we want people to use SAI once it's available where 
>>> possible, and the default behavior of CREATE INDEX could be at odds w/ that.
>>> 
>>> The proposal we seem to have landed on is something like the following:
>>> 
>>> For 5.0:
>>> 
>>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>> 
>>> (Note: How this would interact w/ the existing secondary_indexes_enabled 
>>> YAML options isn't clear yet.)
>>> 
>>> Post-5.0:
>>> 
>>> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity 
>>> w/ it.
>>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
>>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This 
>>> would both be flexible enough to accommodate index implementation selection 
>>> and prescriptive enough to force the user to make a decision (and wouldn't 
>>> change the legacy behavior of the existing CREATE INDEX). In this world, 
>>> creating a legacy 2i might look something like CREATE INDEX...USING 
>>> `legacy`.
>>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>> 
>>> Eventually we would have a single enabled DDL statement for index creation 
>>> that would be minimal but also explicit/able to handle some evolution.
>>> 
>>> What does everyone think?
>> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com 
> @spyced



Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Dinesh Joshi
+1

> On May 8, 2023, at 1:52 AM, Piotr Kołaczkowski  wrote:
> 
> Let's vote.
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> 
> Piotr Kołaczkowski
> e. pkola...@datastax.com
> w. www.datastax.com



Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-06 Thread Dinesh Joshi
+1

> On May 4, 2023, at 9:46 AM, Doug Rohrer  wrote:
> 
> Hello all,
> 
> I’d like to put CEP-28 to a vote.
> 
> Proposal:
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
> 
> Jira:
> https://issues.apache.org/jira/browse/CASSANDRA-16222
> 
> Draft implementation:
> 
> - Apache Cassandra Spark Analytics source code: 
> https://github.com/frankgh/cassandra-analytics
> - Changes required for Sidecar: 
> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
> 
> Discussion:
> https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3
> 
> The vote will be open for 72 hours. 
> A vote passes if there are at least three binding +1s and no binding vetoes. 
> 
> 
> Thanks,
> 
> Doug Rohrer
> 
> 



Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Dinesh Joshi
Hi Guo,

I would expect that there would be release artifacts for the sidecar as well as 
the library once this functionality is available.

Dinesh

> On May 4, 2023, at 12:03 AM, guo Maxwell  wrote:
> 
> This is a very meaningful work, thanks , but I would like to ask a question 
> that is not particularly related to the cep project's code design itself but 
> the project engineering management : what is the future development and 
> release plan of this project? 
> As far as I know, project Cassandra Sidecar does not actually have an 
> finnally release version. I think everyone will definitely not want the 
> project code to be merged, but it has been unable to release for a long time 
> as this project relies on Cassandra sidecar.
> 
> Dinesh Joshi mailto:djo...@apache.org>> 于2023年5月4日周四 
> 02:35写道:
>> If there aren't additional questions / comments I will start the VOTE thread 
>> on this CEP tonight.
>> 
>> On 2023/05/01 19:50:12 Dinesh Joshi wrote:
>> > Does anybody have any questions that we could answer about this proposal?
> 
> 
> -- 
> you are the apple of my eye !



Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-03 Thread Dinesh Joshi
If there aren't additional questions / comments I will start the VOTE thread on 
this CEP tonight.

On 2023/05/01 19:50:12 Dinesh Joshi wrote:
> Does anybody have any questions that we could answer about this proposal?


Re: [POLL] Vector type for ML

2023-05-02 Thread Dinesh Joshi
I'm also in favor of having a general data type that is not tied to numeric 
data types alone.

On 2023/05/02 22:27:24 Jonathan Ellis wrote:
> I had a call with David.  We agreed that we want a "vector" data type with
> these properties
> 
> - Fixed length
> - No nulls
> - Random access not supported
> 
> Where we disagreed was on my proposal to restrict vectors to only numeric
> data.  David's points were that
> 
> (1) He has a use case today for a data type with the other vector
> properties,
> (2) It doesn't seem reasonable to create two data types with the same
> properties, one of which is restricted to numerics, and
> (3) The restrictions that I want for numeric vectors make more sense at the
> index and function level, than at the type level.
> 
> I'm ready to concede that David has the better case here and move forward
> with a vector implementation without that restriction.
> 
> On Tue, May 2, 2023 at 4:03 PM David Capwell  wrote:
> 
> >  How about it, David? Did you already make this?
> >
> >
> > I checked out the patch, fixed serialize/deserialize, added the
> > constraints, then added a composeForFloat(ByteBuffer), with this the impact
> > to the POC patch was the following
> >
> > 1) move away from VectorType.instance.serializer().deserialize(bb) to
> > type.composeForFloat(bb), both return float[]
> > 2) change the index validate logic to move away from checking VectorType
> > and instead check for that plus the element type == FloatType.  I didn’t
> > bother to do this as its trivial
> >
> > David. End this argument. SHOW THE CODE!
> >
> >
> > If this argument ends and people are cool with vector supporting abstract
> > type, more than glad to help get this in.
> >
> > On May 2, 2023, at 1:53 PM, Jeremy Hanna 
> > wrote:
> >
> > I'm all for bringing more functionality to the masses sooner, but the
> > original idea has a very very specific use case.  Do we have use cases for
> > a general purpose Vector/Array data structure?  If so, awesome.  I just
> > wondered if generalizing provides value, beyond being straightforward to
> > implement.  I'm just trying to be sensitive to the database code
> > maintenance and driver support for general types versus a single type for a
> > specific, well defined purpose.
> >
> > If it could easily be a plugin, that's great - but the full picture
> > involves drivers that need to support it or you end up getting binary blobs
> > you have to decode client side and then do stuff with.  So ideally if you
> > have a well defined use case that you can build into the database, having
> > it just be part of the database and associated drivers - that makes the
> > experience much much better.
> >
> > I'm not trying to say B couldn't be valuable or that a plugin couldn't be
> > feasible.  I'm just trying to enlarge the picture a bit to see what that
> > means for this use case and for the supporting drivers/clients.
> >
> > On May 2, 2023, at 3:04 PM, Benedict  wrote:
> >
> > But it’s so trivial it was already implemented by David in the span of ten
> > minutes? If anything, we’re slowing progress down by refusing to do the
> > extra types, as we’re busy arguing about it rather than delivering a
> > feature?
> >
> > FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
> > support types beyond float. Not that we should start with float.
> >
> > So, this whole debate is a mess, I think. But hey ho.
> >
> > On 2 May 2023, at 20:57, Patrick McFadin  wrote:
> >
> > 
> > I'll speak up on that one. If you look at my ranked voting, that is where
> > my head is. I get accused of scope creep (a lot) and looking at the initial
> > proposal Jonathan put on the ML it was mostly "Developers are adopting
> > vector search at a furious pace and I think I have a simple way of adding
> > support to keep Cassandra relevant for these use cases" Instead of just
> > focusing on this use case, I feel the arguments have bike shedded into
> > scope creep which means it will take forever to get into the project.
> >
> > My preference is to see one thing validated with an MVP and get it into
> > the hands of developers sooner so we can continue to iterate based on
> > actual usage.
> >
> > It doesn't say your points are wrong or your opinions are broken, I'm
> > voting for what I think will be awesome for users sooner.
> >
> > Patrick
> >
> > On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:
> >
> >> Could folk voting against a general purpose type (that could well be
> >> called a vector) briefly explain their reasoning?
> >>
> >> We established in the other thread that it’s technically trivial, meaning
> >> folk must think it is strictly superior to only support float rather than
> >> eg all numeric types (note: for the type, not the ANN).
> >>
> >> I am surprised, and the blurbs accompanying votes so far don’t seem to
> >> touch on this, mostly just endorsing the idea of a vector.
> >>
> >>
> >> On 2 May 2023, at 20:20, Patrick McFadin  wrote:
> >>
> >> 
> >> A > B > C 

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-02 Thread Dinesh Joshi
We're reusing existing Cassandra code so the performance characteristics for 
parsing should be the same as Cassandra. I will need to check if we have 
benchmarks. If we do, we'll add it to the CEP wiki page.

On 2023/05/02 19:52:28 Sebastian Estevez wrote:
> Hey Dinesh,
> 
> Yeah it makes sense that the sstable streaming is network bound since it's
> mostly just moving files.
> 
> Do you have any performance stats on the sstable parsing side inside spark?
> 
> --Seb
> 
> On Tue, May 2, 2023 at 3:31 PM Dinesh Joshi  wrote:
> 
> > It is line rate / network bound. We have a patch out in vert.x that should
> > use the zero copy path for it. But it's not a strict prereq for it.


Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-02 Thread Dinesh Joshi
It is line rate / network bound. We have a patch out in vert.x that should use 
the zero copy path for it. But it's not a strict prereq for it.

On 2023/05/02 15:39:02 Sebastian Estevez wrote:
> Hi folks,
> 
> Great stuff thanks for sharing.
> 
> The performance numbers I've seen so far are for the sidecar streaming
> sstables (seems like this is just network bound?). What kind of perf are
> you seeing at the Spark executors (at the per task level)?
> 
> --Seb
> 
> On Mon, May 1, 2023 at 3:50 PM Dinesh Joshi  wrote:
> 
> > Does anybody have any questions that we could answer about this proposal?
> >
> > On Apr 27, 2023, at 1:24 PM, Francisco Guerrero 
> > wrote:
> >
> > Hi folks,
> >
> > We have updated the confluence page with the source code for CEP-28.
> > There are two repositories with contributions. One is the patch [1]
> > for Cassandra Sidecar with the bulk APIs that enable the Cassandra
> > Spark Analytics library. The second is a new repository [2] with
> > contributions to the Cassandra Spark Analytics code
> >
> > We also have a README markdown file that you can follow to give the
> > code a try:
> >
> >
> > https://github.com/frankgh/cassandra-analytics/blob/trunk/cassandra-analytics-core-example/README.md
> >
> > Best,
> > - Francisco
> >
> > [1] Apache Cassandra Sidecar bulk APIs source code:
> > https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
> > [2] Apache Cassandra Spark Analytics source code:
> > https://github.com/frankgh/cassandra-analytics
> >
> >
> > On 2023/04/05 15:18:07 Doug Rohrer wrote: > Sorry for the delay in
> > responding here - yes, we can add some diagrams to the CEP - I’ll try to
> > get that done by end-of-week. > > Thanks, > > Doug > > > On Mar 28, 2023,
> > at 1:14 PM, J. D. Jordan  wrote: > > > > Maybe
> > some data flow diagrams could be added to the cep showing some example
> > operations for read/write? > > > >> On Mar 28, 2023, at 11:35 AM, Yifan Cai
> >  wrote: > >> > >>  > >> A lot of great discussions!
> > > >> > >> On the sidecar front, especially what the role sidecar plays in
> > terms of this CEP, I feel there might be some confusion. Once the code is
> > published, we should have clarity. > >> Sidecar does not read sstables nor
> > do any coordination for analytics queries. It is local to the companion
> > Cassandra instance. For bulk read, it takes snapshots and streams sstables
> > to spark workers to read. For bulk write, it imports the sstables uploaded
> > from spark workers. All commands are existing jmx/nodetool functionalities
> > from Cassandra. Sidecar adds the http interface to them. It might be an
> > over simplified description. The complex computation is performed in spark
> > clusters only. > >> > >> In the long run, Cassandra might evolve into a
> > database that does both OLTP and OLAP. (Not what this thread aims for) > >>
> > At the current stage, Spark is very suited for analytic purposes. > >> > >>
> > On Tue, Mar 28, 2023 at 9:06 AM Benedict  > bened...@apache.org>> wrote: > >>> I disagree with the first claim, as
> > the process has all the information it chooses to utilise about which
> > resources it’s using and what it’s using those resources for. > >>> > >>>
> > The inability to isolate GC domains is something we cannot address, but
> > also probably not a problem if we were doing everything with memory
> > management as well as we could be. > >>> > >>> But, not worth detailing
> > this thread for. Today we do very little well on this front within the
> > process, and a separate process is well justified given the state of play.
> > > >>> > >>>> On 28 Mar 2023, at 16:38, Derek Chen-Becker <
> > de...@chen-becker.org <mailto:de...@chen-becker.org>> wrote: > >>>> >
> > >>>>  > >>>> > >>>> On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch <
> > joe.e.ly...@gmail.com <mailto:joe.e.ly...@gmail.com>> wrote: > >>>> ... >
> > >>>> > >>>>> I think we might be underselling how valuable JVM isolation
> > is, > >>>>> especially for analytics queries that are going to pass the
> > entire > >>>>> dataset through heap somewhat constantly. > >>>> > >>>> Big
> > +1 here. The JVM simply does not have significant granularity of control
> > for resource utilization, but this is explicitly a feature of separate
> > processes. Add in being able to separate GC domains and you can avoid a lot
> > of noisy neighbor in-VM behavior for the disparate workloads. > >>>> > >>>>
> > Cheers, > >>>> > >>>> Derek > >>>> > >>>> > >>>> -- > >>>>
> > +---+ > >>>> |
> > Derek Chen-Becker | > >>>> | GPG Key available at
> > https://keybase.io/dchenbecker and | > >>>> |
> > https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > >>>> |
> > Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > >>>>
> > +---+ > >>>> >
> > >
> > --
> > Francisco Guerrero
> >
> >
> >
> 
> -- 
> All the best,
> 
> Sebastián
> 


Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-01 Thread Dinesh Joshi
Does anybody have any questions that we could answer about this proposal?

> On Apr 27, 2023, at 1:24 PM, Francisco Guerrero  
> wrote:
> 
> Hi folks,
> 
> We have updated the confluence page with the source code for CEP-28.
> There are two repositories with contributions. One is the patch [1]
> for Cassandra Sidecar with the bulk APIs that enable the Cassandra
> Spark Analytics library. The second is a new repository [2] with
> contributions to the Cassandra Spark Analytics code
> 
> We also have a README markdown file that you can follow to give the
> code a try:
> 
> https://github.com/frankgh/cassandra-analytics/blob/trunk/cassandra-analytics-core-example/README.md
> 
> Best,
> - Francisco
> 
> [1] Apache Cassandra Sidecar bulk APIs source code: 
> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
> [2] Apache Cassandra Spark Analytics source code: 
> https://github.com/frankgh/cassandra-analytics
> 
> 
> On 2023/04/05 15:18:07 Doug Rohrer wrote: > Sorry for the delay in responding 
> here - yes, we can add some diagrams to the CEP - I’ll try to get that done 
> by end-of-week. > > Thanks, > > Doug > > > On Mar 28, 2023, at 1:14 PM, J. D. 
> Jordan mailto:jeremiah.jor...@gmail.com>> wrote: 
> > > > > Maybe some data flow diagrams could be added to the cep showing some 
> example operations for read/write? > > > >> On Mar 28, 2023, at 11:35 AM, 
> Yifan Cai mailto:yc25c...@gmail.com>> wrote: > >> > >>  
> > >> A lot of great discussions! > >> > >> On the sidecar front, especially 
> what the role sidecar plays in terms of this CEP, I feel there might be some 
> confusion. Once the code is published, we should have clarity. > >> Sidecar 
> does not read sstables nor do any coordination for analytics queries. It is 
> local to the companion Cassandra instance. For bulk read, it takes snapshots 
> and streams sstables to spark workers to read. For bulk write, it imports the 
> sstables uploaded from spark workers. All commands are existing jmx/nodetool 
> functionalities from Cassandra. Sidecar adds the http interface to them. It 
> might be an over simplified description. The complex computation is performed 
> in spark clusters only. > >> > >> In the long run, Cassandra might evolve 
> into a database that does both OLTP and OLAP. (Not what this thread aims for) 
> > >> At the current stage, Spark is very suited for analytic purposes. > >> > 
> >> On Tue, Mar 28, 2023 at 9:06 AM Benedict    >> wrote: > >>> I disagree with the first claim, 
> as the process has all the information it chooses to utilise about which 
> resources it’s using and what it’s using those resources for. > >>> > >>> The 
> inability to isolate GC domains is something we cannot address, but also 
> probably not a problem if we were doing everything with memory management as 
> well as we could be. > >>> > >>> But, not worth detailing this thread for. 
> Today we do very little well on this front within the process, and a separate 
> process is well justified given the state of play. > >>> >  On 28 Mar 
> 2023, at 16:38, Derek Chen-Becker    >> wrote: >  >   >  >  On Tue, 
> Mar 28, 2023 at 9:03 AM Joseph Lynch    >> wrote: >  ... >  > > I think we 
> might be underselling how valuable JVM isolation is, > > especially for 
> analytics queries that are going to pass the entire > > dataset through 
> heap somewhat constantly. >  >  Big +1 here. The JVM simply does not 
> have significant granularity of control for resource utilization, but this is 
> explicitly a feature of separate processes. Add in being able to separate GC 
> domains and you can avoid a lot of noisy neighbor in-VM behavior for the 
> disparate workloads. >  >  Cheers, >  >  Derek >  >  
> >  -- >  
> +---+ >  | 
> Derek Chen-Becker | >  | GPG Key available at 
> https://keybase.io/dchenbecker and | >  | 
> https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >  | 
> Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >  
> +---+ >  > >
> -- 
> Francisco Guerrero



Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-21 Thread Dinesh Joshi
Interesting proposal Jonathan. Will grok it over the weekend and play around 
with the branch.

Do you intend to make this part of CEP-7 or as an incremental update to SAI 
once it is committed?

> On Apr 21, 2023, at 2:19 PM, Jonathan Ellis  wrote:
> 
> Happy Friday, everyone!
> 
> Rich text formatting ahead, I've attached a PDF for those who prefer that.
> 
> I propose adding approximate nearest neighbor (ANN) vector search capability 
> to Apache Cassandra via storage-attached indexes (SAI). This is a 
> medium-sized effort that will significantly enhance Cassandra’s 
> functionality, particularly for AI use cases. This addition will not only 
> provide a new and important feature for existing Cassandra users, but also 
> attract new users to the community from the AI space, further expanding 
> Cassandra’s reach and relevance.
> Introduction
> Vector search is a powerful document search technique that enables developers 
> to quickly find relevant content within an extensive collection of documents, 
> which is useful as a standalone technique, but it is particularly hot now 
> because it significantly enhances the performance of LLMs.
> 
> Vector search uses ML models to match the semantics of a question rather than 
> just the words it contains, avoiding the classic false positives and false 
> negatives associated with term-based search.  Alessandro Benedetti gives some 
> good examples in his excellent talk 
> :
> 
> 
> 
> 
> 
> You can search across any set of vectors, which are just ordered sets of 
> numbers.  In the context of natural language queries and document search, we 
> are specifically concerned with a type of vector called an embedding.  
> 
> An embedding is a high-dimensional vector that captures the underlying 
> semantic relationships and contextual information of words or phrases. 
> Embeddings are generated by ML models trained for this purpose; OpenAI 
> provides an API to do this, but open-source and self-hostable models like 
> BERT are also popular. Creating more accurate and smaller embeddings are 
> active research areas in ML.
> 
> Large language models (LLMs) can be described as a mile wide and an inch 
> deep. They are not experts on any narrow domain (although they will 
> hallucinate that they are, sometimes convincingly).  You can remedy this by 
> giving the LLM additional context for your query, but the context window is 
> small (4k tokens for GPT-3.5, up to 32k for GPT-4), so you want to be very 
> selective about giving the LLM the most relevant possible information.
> 
> Vector search is red-hot now because it allows us to easily answer the 
> question “what are the most relevant documents to provide as context” by 
> performing a similarity search between the embeddings vector of the query, 
> and those of your document universe.  Doing exact search is prohibitively 
> expensive, since you necessarily have to compare with each and every 
> document; this is intractable when you have billions or trillions of docs.  
> However, there are well-understood algorithms for turning this into a 
> logarithmic problem if you are willing to accept approximately the most 
> similar documents.  This is the “approximate nearest neighbor” problem.  (You 
> will see these referred to as kNN – k nearest neighbors – or ANN.)
> 
> Pinecone DB has a good example of what this looks like in Python code 
> .
> 
> Vector search is the foundation underlying effectively all of the AI 
> applications that are launching now.  This is particularly relevant to Apache 
> Cassandra users, who tend to manage the types of large datasets that benefit 
> the most from fast similarity search. Adding vector search to Cassandra’s 
> unique strengths of scale, reliability, and low latency, will further enhance 
> its appeal and effectiveness for these users while also making it more 
> attractive to newcomers looking to harness AI’s potential.  The faster we 
> deliver vector search, the more valuable it will be for this expanding user 
> base.
> Requirements
> Perform vector search as outlined in the Pinecone example above
> Support Float32 embeddings in the form of a new DENSE FLOAT32 cql type
> This is also useful for “classic” ML applications that derive and serve their 
> own feature vectors
> Add ANN (approximate nearest neighbor) search.
> Work with normal Cassandra data flow
> Inserting one row at a time is fine; cannot require batch ingest
> Updating, deleting rows is also fine
> Must compose with other SAI predicates as well as partition keys
> Not requirements
> Other datatypes besides Float32
> Pinecone supports only Float32 and it’s hugely ahead in mindshare so let’s 
> make things easy on ourselves and follow their precedent.
> I don’t want to scope creep beyond ANN. In particular, I don’t want to wait 
> for ORDER BY to get exact search in as well.
> How we can do this
> There is exactly one prod

Re: [VOTE] Release Apache Cassandra 4.0.9

2023-04-06 Thread Dinesh Joshi
-1 as well. We need to upgrade Zstd.

> 
> On Apr 6, 2023, at 4:57 AM, Mick Semb Wever  wrote:
> 
> 
> 
>  
>> Up to you to fail the vote and we realistically release 4.0.9 after Easter
> 
> 
> -1 to the vote. 
> 
> I support your initial veto and reasoning, and it appears you are willing to 
> recut once 18429 is resolved.


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Dinesh Joshi
I’m strongly in favor of leaving terminology as-is. On Apr 6, 2023, at 7:20 AM, Bowen Song via dev  wrote:
  

  
  
> I'm quite happy to leave things as they are if that is
the consensus.
+1 to the above



On 06/04/2023 14:54, Mike Adamson
  wrote:


  
  My apologies. I started this discussion off the
back of a usability discussion around new user accessibility to
Cassandra and the premise that there is an initial steep
learning curve for new users. Including new users who have
worked for a long time in the traditional DBMS field.


On the basis of the reason for the discussion,  TABLEGROUP
  doesn't sit well because of user types / functions / indexes
  etc. which are not strictly tables and is also yet another
  Cassandra only term. 


NAMESPACE could work but it's different usage in
  other systems could be just as confusing to new users. 


And, I certainly don't think having multiple names for the
  same thing just to satisfy different parties is a good idea at
  all. 


I'm quite happy to leave things as they are if that is the
  consensus.
  
  
  
On Thu, 6 Apr 2023 at 14:16,
  Josh McKenzie 
  wrote:


  

  
KEYSPACE is fine. If we want to introduce
  a standard nomenclature like DATABASE that’s also
  fine. Inventing brand new ones is not fine, there’s no
  benefit.

  
  I'm with Benedict in principle, with
Aleksey in practice; I think KEYSPACE and SCHEMA are
actually fine enough.
  
  
  
  If and when we get to any kind of
multi-tenancy, having a more metaphorical abstraction
that users are familiar with like these becomes more
valuable; it's pretty clear that things in different
keyspaces, different databases, or even different
schemas could have different access rules, resourcing,
etc from one another.
  
  
  
  While the off-the-cuff logical TABLEGROUP
thing is a literal statement about what the thing
is, it'd be another unique term to us;  we have enough
things in our system where we've charted our own path.
My personal .02 is we don't need to go adding more. :)
  
  
  
  On Thu, Apr 6, 2023, at 8:54 AM, Mick Semb Wever
wrote:
  
  

  


  
  

  
… but that should be a different discussion
  about how we evolve config.

  



 


  I
disagree. Nomenclature being difficult can
benefit from holistic and forward thinking.
  


  Sure you
can label this off-topic if you like, but I
value our discuss threads being collaborative in
an open-mode. Sometimes the best idea is on the
tail end of a sequence of bad and/or unpopular
ideas.
  






  


  

  
  
  

  
  
  

  

  
  
  
  
  -- 
  

  

  

Mike
  Adamson
  
  
Engineering

  
  
  
+1 650 389 6000 | datastax.com
  

  
  

  
Find DataStax
Online:
        
  

  

  

  

  



Re: Welcome our next PMC Chair Josh McKenzie

2023-03-24 Thread Dinesh Joshi
Thank you Mick for all the work you did!

Welcome Josh and congratulations!

On 3/23/23 01:22, Mick Semb Wever wrote:
> It is time to pass the baton on, and on behalf of the Apache Cassandra
> Project Management Committee (PMC) I would like to welcome and
> congratulate our next PMC Chair Josh McKenzie (jmckenzie).
> 
> Most of you already know Josh, especially through his regular and
> valuable project oversight and status emails, always presenting a
> balance and understanding to the various views and concerns incoming. 
> 
> Repeating Paulo's words from last year: The chair is an administrative
> position that interfaces with the Apache Software Foundation Board, by
> submitting regular reports about project status and health. Read more
> about the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> 
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> 
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
> 
> 
> The PMC as a whole is the entity that oversees and leads the project and
> any PMC member can be approached as a representative of the committee. A
> list of Apache Cassandra PMC members can be found
> on: https://cassandra.apache.org/_/community.html
> 



  1   2   3   4   >