Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread Patrick McFadin
This is a great feature addition to CQL! I get asked about it from time to
time but then people figure out a workaround. It will be great to just have
it available.

And right on Simon! I think the only project I had as a high school senior
was figuring out how many parties I could go to and still maintain a
passing grade. Thanks for your work here.

Patrick

On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:

> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
> We plan to add support for it in conditions in a separate patch.
>
> The patch is available.
>
> As a side note, Simon chose to do his highschool senior project
> contributing to Apache Cassandra. This patch is his first contribution for
> his senior project (his second feature contribution to Apache Cassandra).
>
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-23 Thread Patrick McFadin
I finally got a chance to digest this CEP and am happy to see it raised.
This feature has been left to the end user for far too long.

It might get roasted for scope creep, but here goes. Related and something
that I've heard for years is the ability to migrate a single keyspace away
from a set of hardware... online. Similar problem but a lot more
coordination.
 - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
 - Establish replication between keyspaces and sync schema
 - Move data from Cluster A to B
 - Decommission keyspace in Cluster A

In many cases, multiple tenants present cause the cluster to overpressure.
The best solution in that case is to migrate the largest keyspace to a
dedicated cluster.

Live migration but a bit more complicated. No chance of doing this manually
without some serious brain surgery on c* and downtime.

Patrick


On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> Thank you all for the inputs and apologies for the late reply. I see good
> points raised in this discussion. *Please allow me to reply to each point
> individually.*
>
> To start with, let me focus on the point raised by Scott & Jon about file
> content verification at the destination with the source in this reply.
> Agree that just verifying the file name + size is not fool proof. The
> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't compromise on binary level check for
> these two reasons. Let me update the CEP to include binary level
> verification. During implementation, it can probably be made optional so
> that it can be skipped if someone doesn't want it.
>
> Thanks!
> Hari
>
> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
> dev@cassandra.apache.org> wrote:
>
>> We use backup/restore for our implementation of this concept. It has the
>> added benefit that the backup / restore path gets exercised much more
>> regularly than it would in normal operations, finding edge case bugs at a
>> time when you still have other ways of recovering rather than in a full
>> disaster scenario.
>>
>>
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Jordan West 
>> *Date: *Sunday, 21 April 2024 at 05:38
>> *To: *dev@cassandra.apache.org 
>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
>> for Live Migrating Instances
>>
>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>
>>
>>
>> I do really like the framing of replacing a node is restoring a node and
>> then kicking off a replace. That is effectively what we do internally.
>>
>>
>>
>> I also agree we should be able to do data movement well both internal to
>> Cassandra and externally for a variety of reasons.
>>
>>
>>
>> We’ve seen great performance with “ZCS+TLS” even though it’s not full
>> zero copy — nodes that previously took *days* to replace now take a few
>> hours. But we have seen it put pressure on nodes and drive up latencies
>> which is the main reason we still rely on an external data movement system
>> by default — falling back to ZCS+TLS as needed.
>>
>>
>>
>> Jordan
>>
>>
>>
>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
>>
>> Jeff, this is probably the best explanation and justification of the idea
>> that I've heard so far.
>>
>>
>>
>> I like it because
>>
>>
>>
>> 1) we really should have something official for backups
>>
>> 2) backups / object store would be great for analytics
>>
>> 3) it solves a much bigger problem than the single goal of moving
>> instances.
>>
>>
>>
>> I'm a huge +1 in favor of this perspective, with live migration being one
>> use case for backup / restore.
>>
>>
>>
>> Jon
>>
>>
>>
>>
>>
>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:
>>
>> I think Jordan and German had an interesting insight, or at least their
>> comment made me think about this slightly differently, and I’m going to
>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>
>>
>>
>> The CEP treats this as “move a live instance from one machine to
>> another”. I know why the author wants to do this.
>>
>>
>>
>> If you think of it instead as “change backup/restore mechanism to be able
>> to safely restore from a running instance”, you may end up with a cleaner
>> abstraction that’s easier to think about (and may also be easier to
>> generalize in clouds where you have other tools available ).
>>
>>
>>
>> I’m not familiar enough with the sidecar to know the state of
>> orchestration for backup/restore, but “ensure the original source node
>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
>> “forcibly exclude the original instance from the cluster” are all things
>> the restore code is going to need 

Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Patrick McFadin
Congratulations, everyone. I am loving this new direction for the project!

On Wed, Apr 17, 2024 at 11:16 AM Yifan Cai  wrote:

> Congrats all
> --
> *From:* Josh McKenzie 
> *Sent:* Wednesday, April 17, 2024 11:05:29 AM
> *To:* dev 
> *Subject:* Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire,
> Olivier Michallat as Cassandra Committers
>
> Congrats everyone and thanks for all the hard work to get things to this
> point!
>
> On Wed, Apr 17, 2024, at 1:18 PM, Ekaterina Dimitrova wrote:
>
> Congrats and thank you for all your work on the drivers!
>
> On Wed, 17 Apr 2024 at 13:17, Francisco Guerrero 
> wrote:
>
> Congratulations everyone!
>
> On 2024/04/17 17:14:34 Abe Ratnofsky wrote:
> > Congrats everyone!
> >
> > > On Apr 17, 2024, at 1:10 PM, Benjamin Lerer  wrote:
> > >
> > > The Apache Cassandra PMC is pleased to announce that Alexandre Dutra,
> Andrew Tolbert, Bret McGuire and Olivier Michallat have accepted the
> invitation to become committers on the java driver sub-project.
> > >
> > > Thanks for your contributions to the Java driver during all those
> years!
> > > Congratulations and welcome!
> > >
> > > The Apache Cassandra PMC members
> >
> >
>
>
>


Re: Welcome Brad Schoening as Cassandra Committer

2024-02-21 Thread Patrick McFadin
Yay! Congrats Brad!

On Wed, Feb 21, 2024 at 2:06 PM Jeremy Hanna 
wrote:

> Congratulations Brad!
>
> On Feb 21, 2024, at 3:59 PM, Leo Toff  wrote:
>
> Congratulations Brad! Thank you for helping me onboard 
>
> On Wed, Feb 21, 2024 at 1:56 PM Jeremiah Jordan 
> wrote:
>
>> Congrats!
>>
>> On Feb 21, 2024 at 2:46:14 PM, Josh McKenzie 
>> wrote:
>>
>>> The Apache Cassandra PMC is pleased to announce that Brad Schoening has
>>> accepted
>>> the invitation to become a committer.
>>>
>>> Your work on the integrated python driver, launch script environment,
>>> and tests
>>> has been a big help to many. Congratulations and welcome!
>>>
>>> The Apache Cassandra PMC members
>>>
>>
>


Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Patrick McFadin
Congratulations Maxim! Thank you for all you've done in the project and
everything to come!

Patrick

On Mon, Jan 8, 2024 at 12:32 PM Miklosovic, Stefan via dev <
dev@cassandra.apache.org> wrote:

> Great news! Congratulations.
>
> 
> From: Josh McKenzie 
> Sent: Monday, January 8, 2024 19:19
> To: dev
> Subject: Welcome Maxim Muzafarov as Cassandra Committer
>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has
> accepted
> the invitation to become a committer.
>
> Thanks for all the hard work and collaboration on the project thus far,
> and we're all looking forward to working more with you in the future.
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>
>
>


Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2023-12-22 Thread Patrick McFadin
It was great having some more extended discussions about Harry in person
last week. Anything we can do to make it easier for anyone to test
Cassandra thoroughly is an easy +1 from me!

Thanks for all your efforts so far, Alex.

Patrick

On Fri, Dec 22, 2023 at 8:03 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Obviously +1
>
> Thank you Alex
>
> pt., 22 gru 2023, 16:45 użytkownik Sumanth Pasupuleti <
> sumanth.pasupuleti...@gmail.com> napisał:
>
>> +1, thank you for your efforts in bringing Harry in-tree. Anything that
>> improves the testing ecosystem for Cassandra, particularly around complex
>> scenarios / edge cases  goes a long way in improving reliability, and with
>> having a powerful tool like Harry in-tree, it is a lot more accessible to
>> the developers than it has been. Also, thank you for keeping in mind the
>> onboarding experience of developers.
>>
>> - Sumanth
>>
>> On Fri, Dec 22, 2023 at 1:11 AM Alex Petrov  wrote:
>>
>>> Some follow-up tickets to establish the project direction:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-19229
>>>
>>> Two other things that we will work on in Tree are:
>>> https://issues.apache.org/jira/browse/CASSANDRA-18275 (model and in-JVM
>>> test for partition-restricted 2i queries)
>>> https://issues.apache.org/jira/browse/CASSANDRA-18667 (multi-threaded
>>> SAI read and write fuzz test)
>>>
>>> If you would like to get your recently added feature tested with Harry
>>> model, please let me know!
>>>
>>> On Fri, Dec 22, 2023, at 12:41 AM, Joseph Lynch wrote:
>>>
>>> +1
>>>
>>> Sounds like a great change that will help us unify around a common
>>> testing paradigm, and even pave the path to in-tree load testing plus
>>> integrated correctness checking which would be extremely valuable!
>>>
>>> -Joey
>>>
>>> On Thu, Dec 21, 2023 at 1:35 PM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> Agree w/ all the justifications mentioned above.
>>>
>>> As a reviewer on CASSANDRA-19210
>>> , my goals were
>>> to a.) look at the directory, naming, and package structure of the ported
>>> code, b.) make sure IDE integration was working, and c.) make sure any
>>> modifications to existing code (rather than direct code movements from
>>> cassandra-harry) were straightforward.
>>>
>>> On Thu, Dec 21, 2023 at 3:23 PM Alex Petrov  wrote:
>>>
>>>
>>> Hey folks,
>>>
>>> I am mostly done with a patch that brings Harry in-tree [1]. I will
>>> trigger one more CI run overnight, and my intention was to merge it some
>>> time soon, but I wanted to give a fair warning here, since this is a
>>> relatively large patch.
>>>
>>> Good news for everyone that it:
>>>   a) touches no production code whatsoever. Only test (in-jvm dtest
>>> namely) code that was using Harry already.
>>>   b) the only tests that are changed are ones that used a duplicate
>>> version of placement simulator we had both for testing TCM, and in Harry
>>>   c) in addition, I have converted 3 existing TCM tests to a new API to
>>> have some base for examples/usage.
>>>
>>> Since we were effectively relying on this code for a while now, and the
>>> intention now is to converge to:
>>>   a) fewer different generators, and have a shareable version of
>>> generators for everyone to use accross the base
>>>   b) a testing tool that can be useful for both trivial cases, and
>>> complex scenarios
>>> myself and many other Cassandra contributors have expressed an opinion
>>> that bringing Harry in-tree will be highly benefitial.
>>>
>>> I strongly believe that bringing Harry in-tree will help to lower the
>>> barrier for fuzz test and simplify co-development of Cassandra and Harry.
>>> Previously, it has been rather difficult to debug edge cases because I had
>>> to either re-compile an in-jvm dtest jar and bring it to Harry, or
>>> re-compile a Harry jar and bring it to Cassandra, which is both tedious and
>>> time consuming. Moreover, I believe we have missed at very least one RT
>>> regression [2] because Harry was not in-tree, as its tests would've caught
>>> the issue even with the model that existed.
>>>
>>> For other recently found issues, I think having Harry in-tree would have
>>> substantially lowered a turnaround time, and allowed me to share repros
>>> with developers of corresponding features much quicker.
>>>
>>> I do expect a slight learning curve for Harry, but my intention is to
>>> build a web of simple tests (worked on some of them yesterday after
>>> conversation with David already), which can follow the in-jvm-dtest pattern
>>> of find-similar-test / copy / modify. There's already copious
>>> documentation, so I do not believe not having docs for Harry was ever an
>>> issue, since there have been plenty.
>>>
>>> You all are aware of my dedication to testing and quality of Apache
>>> Cassandra, and I hope you also see the benefits of having a model checker
>>> in-tree.
>>>
>>> Thank you and happy upcoming 

Can't make it to Cassandra Summit but want to see the talks?

2023-12-11 Thread Patrick McFadin
Hi everyone,

The Linux Foundation will be streaming all of the talks from the Cassandra
Summit. Finding the streams is very easy. Go to the conference schedule:

https://events.linuxfoundation.org/cassandra-summit/program/schedule/

Each talk has a YouTube link associated with it. The Keynotes and each room
have their own stream. Find the time and the room, and show up!

If you miss the live stream, the talks will all be available on YouTube
afterward. Join us in the #cassandra-summit channel in the ASF Slack and
start a thread on any talk you have questions. We'll try to get the
speakers to join in.

Patrick


Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Patrick McFadin
Yay! Congratulations Mike. Well deserved!

On Fri, Dec 8, 2023 at 7:00 AM Andrés de la Peña 
wrote:

> Congrats Mike!
>
> On Fri, 8 Dec 2023 at 14:53, Jeremiah Jordan 
> wrote:
>
>> Congrats Mike!  Thanks for all your work on SAI and Vector index.  Well
>> deserved!
>>
>> On Dec 8, 2023 at 8:52:07 AM, Brandon Williams  wrote:
>>
>>> Congratulations Mike!
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> On Fri, Dec 8, 2023 at 8:41 AM Benjamin Lerer  wrote:
>>>
>>>
>>> The PMC members are pleased to announce that Mike Adamson has accepted
>>>
>>> the invitation to become committer.
>>>
>>>
>>> Thanks a lot, Mike, for everything you have done for the project.
>>>
>>>
>>> Congratulations and welcome
>>>
>>>
>>> The Apache Cassandra PMC members
>>>
>>>


Re: Introducing the Cassandra Catalyst program!

2023-12-01 Thread Patrick McFadin
So excited for this program! It's been a long time coming but wow, what a
great way to recognize individuals advocating for Cassandra in their own
communities.

Let's get out there and start nominating!

Patrick

On Fri, Dec 1, 2023 at 9:51 AM Melissa Logan  wrote:

> The Cassandra community is excited to introduce the Cassandra Catalyst
> program, a new initiative that aims to recognize individuals who invest in
> the growth of the community by enthusiastically sharing their expertise,
> encouraging participation, and creating a welcoming environment.
>
> This is the first PMC-led community program of its kind within the Apache
> Software Foundation ecosystem and we’re honored to be the pioneer!
>
> What does it mean to be a Cassandra Catalyst?
>
> Catalysts are trustworthy, expert contributors with a passion for
> connecting and empowering others with Cassandra knowledge. The individuals
> must be able to demonstrate strong knowledge of Cassandra such as
> production deployments, educational material, conference talks or other
> ways. In broad terms, Catalyst can participate through Contribution and
> Promotion.
>
> Who can become a Cassandra Catalyst?
>
> Anyone can nominate an individual to become a Catalyst or apply themselves.
> This program applies to existing contributors who have been involved in
> Cassandra for years or those who are newcomers to the community.
>
> The program committee includes PMC members who will be reviewing Catalyst
> applications on a rolling basis. We’ll be recognizing the first group of
> Catalysts on the keynote stage at Cassandra Summit on Dec. 12-13 so apply
> early and be recognized for your contributions!
>
> Learn more and nominate someone/apply:
>
>
> https://cassandra.apache.org/_/blog/Introducing-the-Apache-Cassandra-Catalyst-Program.html
>
>
> If you have questions, feel free to ask here or on the #cassandra-comdev
> channel.
>
> Melissa
>


Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-30 Thread Patrick McFadin
Congratulations and welcome, Francisco!

On Thu, Nov 30, 2023 at 2:45 AM Maxim Muzafarov  wrote:

> My congratulations, Francisco! :-)
>
> On Wed, 29 Nov 2023 at 13:30, Andrés de la Peña 
> wrote:
> >
> > Congrats Francisco!
> >
> > On Wed, 29 Nov 2023 at 11:37, Benjamin Lerer  wrote:
> >>
> >> Congratulations!!! Well deserved!
> >>
> >> Le mer. 29 nov. 2023 à 07:31, Berenguer Blasi 
> a écrit :
> >>>
> >>> Welcome!
> >>>
> >>> On 29/11/23 2:24, guo Maxwell wrote:
> >>>
> >>> Congrats!
> >>>
> >>> Jacek Lewandowski  于2023年11月29日周三
> 06:16写道:
> 
>  Congrats!!!
> 
>  wt., 28 lis 2023, 23:08 użytkownik Abe Ratnofsky 
> napisał:
> >
> > Congrats Francisco!
> >
> > > On Nov 28, 2023, at 1:56 PM, C. Scott Andreas <
> sc...@paradoxica.net> wrote:
> > >
> > > Congratulations, Francisco!
> > >
> > > - Scott
> > >
> > >> On Nov 28, 2023, at 10:53 AM, Dinesh Joshi 
> wrote:
> > >>
> > >> The PMC members are pleased to announce that Francisco Guerrero
> Hernandez has accepted
> > >> the invitation to become committer today.
> > >>
> > >> Congratulations and welcome!
> > >>
> > >> The Apache Cassandra PMC members
> >
>


Cassandra Summit: Engage those networks!

2023-11-29 Thread Patrick McFadin
Hi everyone,

We are a couple of weeks away from Cassandra Summit. People get busy and
forget to register or miss that there is even a summit happening. Let's
make sure everyone who wants to go gets a chance!

 - If you are going, get on the social media of your choice and let
everyone know you'll be there. Use the hashtag #cassandrasmunnit
 - If you aren't going, you can still remind other folks that it's
happening and the talks you think they should check out.

Either way, here is the basic info to include in your post:

*Schedule:
https://events.linuxfoundation.org/cassandra-summit/program/schedule/
Register:
https://events.linuxfoundation.org/cassandra-summit/register/#register-now
Discount
code: 23CS20*

*One more thing! If you are going and reading this, reply to this email
with a "Going!" or "See you there!" I would love to see who will be there
in two weeks. *


*Patrick*


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Patrick McFadin
JD, that wasn't my point. It feels like we are treating a beta like an RC,
which it isn't. Ship Beta 1 now and Beta 2 later. We need people looking
today because they will find new bugs and the signal is lost on alpha. It's
too yolo for most people.

On Tue, Nov 28, 2023 at 10:36 AM Benjamin Lerer  wrote:

> -1 based on the problems raised by Caleb.
>
> I would be fine with releasing that version as an alpha as Jeremiah
> proposed.
>
> As of this time, I'm also not aware of a user of the project operating a
>> build from the 5.0 branch at substantial scale to suss out the operational
>> side of what can be expected. If someone is running a build supporting
>> non-perf-test traffic derived from the 5.0 branch and has an experience
>> report to share it would be great to read.
>
>
> Some people at Datastax are working on such testing. It will take a bit of
> time before we get the final results though.
>
> Le mar. 28 nov. 2023 à 19:27, J. D. Jordan  a
> écrit :
>
>> That said. This is clearly better than and with many fixes from the
>> alpha. Would people be more comfortable if this cut was released as another
>> alpha and we do beta1 once the known fixes land?
>>
>> On Nov 28, 2023, at 12:21 PM, J. D. Jordan 
>> wrote:
>>
>> 
>> -0 (NB) on this cut. Given the concerns expressed so far in the thread I
>> would think we should re-cut beta1 at the end of the week.
>>
>> On Nov 28, 2023, at 12:06 PM, Patrick McFadin  wrote:
>>
>> 
>> I'm a +1 on a beta now vs maybe later. Beta doesn't imply perfect
>> especially if there are declared known issues. We need people outside of
>> this tight group using it and finding issues. I know how this rolls. Very
>> few people touch a Alpha release. Beta is when the engine starts and we
>> need to get it started asap. Otherwise we are telling ourselves we have the
>> perfect testing apparatus and don't need more users testing. I don't think
>> that is the case.
>>
>> Scott, Ekaterina, and I are going to be on stage in 2 weeks talking about
>> Cassandra 5 in the keynotes. In that time, our call to action is going to
>> be to test the beta.
>>
>> Patrick
>>
>> On Tue, Nov 28, 2023 at 9:41 AM Mick Semb Wever  wrote:
>>
>>> The vote will be open for 72 hours (longer if needed). Everyone who has
>>>> tested the build is invited to vote. Votes by PMC members are considered
>>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>>>
>>>
>>>
>>> +1
>>>
>>> Checked
>>> - signing correct
>>> - checksums are correct
>>> - source artefact builds (JDK 11+17)
>>> - binary artefact runs (JDK 11+17)
>>> - debian package runs (JDK 11+17)
>>> - debian repo runs (JDK 11+17)
>>> - redhat* package runs (JDK11+17)
>>> - redhat* repo runs (JDK 11+17)
>>>
>>>
>>> With the disclaimer:  There's a few known bugs in SAI, e.g. 19011, with
>>> fixes to be available soon in 5.0-beta2.
>>>
>>>
>>>


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Patrick McFadin
I'm a +1 on a beta now vs maybe later. Beta doesn't imply perfect
especially if there are declared known issues. We need people outside of
this tight group using it and finding issues. I know how this rolls. Very
few people touch a Alpha release. Beta is when the engine starts and we
need to get it started asap. Otherwise we are telling ourselves we have the
perfect testing apparatus and don't need more users testing. I don't think
that is the case.

Scott, Ekaterina, and I are going to be on stage in 2 weeks talking about
Cassandra 5 in the keynotes. In that time, our call to action is going to
be to test the beta.

Patrick

On Tue, Nov 28, 2023 at 9:41 AM Mick Semb Wever  wrote:

> The vote will be open for 72 hours (longer if needed). Everyone who has
>> tested the build is invited to vote. Votes by PMC members are considered
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>
>
> +1
>
> Checked
> - signing correct
> - checksums are correct
> - source artefact builds (JDK 11+17)
> - binary artefact runs (JDK 11+17)
> - debian package runs (JDK 11+17)
> - debian repo runs (JDK 11+17)
> - redhat* package runs (JDK11+17)
> - redhat* repo runs (JDK 11+17)
>
>
> With the disclaimer:  There's a few known bugs in SAI, e.g. 19011, with
> fixes to be available soon in 5.0-beta2.
>
>
>


Cassandra Summit: Early registration discount ends tomorrow

2023-11-20 Thread Patrick McFadin
Hi everyone,

If you've registered for Cassandra Summit, then ignore this email.

If not! Time to get moving. The deadline ends tomorrow.

Link to register:
https://events.linuxfoundation.org/cassandra-summit/register/

Discount code: 23CS20 (Yes you can use it with the early registration price)

If you need motivation, look at this schedule!
https://events.linuxfoundation.org/cassandra-summit/program/schedule/

Let's get everyone gathered! This is our time!

Patrick


Re: Time to register for the Cassandra Summit 2023!

2023-11-09 Thread Patrick McFadin
One other important point. In talking to our friends at the Linux
Foundation, they reminded me about scholarships for attending Cassandra
Summit.
If you would like to apply for travel or ticket assistance, follow this
link and apply:
https://events.linuxfoundation.org/cassandra-summit/attend/travel-funding/

I hope many of you will take advantage of this program and join us in San
Jose!

Patrick


On Thu, Nov 9, 2023 at 7:15 AM Patrick McFadin  wrote:

> Hi everyone!
>
> I'm going to keep this short, but it's time to gather the Cassandra
> community. December 12-13 in San Jose. Earlybird registration pricing ends
> November 21 so don't delay.
>
> Registration page:
> https://events.linuxfoundation.org/cassandra-summit/register/
> Use my discount code for 20% off: 23CS20
>
> Need some motivation? Check out this schedule:
> https://events.linuxfoundation.org/cassandra-summit/program/schedule/
>
> If you are planning on sending a group(Yes!), the Linux Foundation is
> offering a group discount. Email me and I can put you in touch with the
> right person.
>
> Let's get out and support our community!
>
> Patrick
>


Time to register for the Cassandra Summit 2023!

2023-11-09 Thread Patrick McFadin
Hi everyone!

I'm going to keep this short, but it's time to gather the Cassandra
community. December 12-13 in San Jose. Earlybird registration pricing ends
November 21 so don't delay.

Registration page:
https://events.linuxfoundation.org/cassandra-summit/register/
Use my discount code for 20% off: 23CS20

Need some motivation? Check out this schedule:
https://events.linuxfoundation.org/cassandra-summit/program/schedule/

If you are planning on sending a group(Yes!), the Linux Foundation is
offering a group discount. Email me and I can put you in touch with the
right person.

Let's get out and support our community!

Patrick


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-30 Thread Patrick McFadin
;
> >>>
> >>> On Wed, 25 Oct 2023 at 18:07, Jeremiah Jordan <
> jeremiah.jor...@gmail.com<mailto:jeremiah.jor...@gmail.com> jeremiah.jor...@gmail.com<mailto:jeremiah.jor...@gmail.com>>> wrote:
> >>>>>
> >>>>> If we do a 5.1 release why not take it as an opportunity to release
> more things. I am not saying that we will. Just that we should let that
> door open.
> >>>>
> >>>>
> >>>> Agreed.  This is the reason I brought up the possibility of not
> branching off 5.1 immediately.
> >>>>
> >>>>
> >>>> On Oct 25, 2023 at 3:17:13 AM, Benjamin Lerer  <mailto:b.le...@gmail.com><mailto:b.le...@gmail.com b.le...@gmail.com>>> wrote:
> >>>>>
> >>>>> The proposal includes 3 things:
> >>>>> 1. Do not include TCM and Accord in 5.0 to avoid delaying 5.0
> >>>>> 2. The next release will be 5.1 and will include only Accord and TCM
> >>>>> 3. Merge TCM and Accord right now in 5.1 (making an initial release)
> >>>>>
> >>>>> I am fine with question 1 and do not have a strong opinion on which
> way to go.
> >>>>> 2. Means that every new feature will have to wait for post 5.1 even
> if it is ready before 5.1 is stabilized and shipped. If we do a 5.1 release
> why not take it as an opportunity to release more things. I am not saying
> that we will. Just that we should let that door open.
> >>>>> 3. There is a need to merge TCM and Accord as maintaining those
> separate branches is costly in terms of time and energy. I fully understand
> that. On the other hand merging TCM and Accord will make the TCM review
> harder and I do believe that this second round of review is valuable as it
> already uncovered a valid issue. Nevertheless, I am fine with merging TCM
> as soon as it passes CI and continuing the review after the merge. If we
> cannot meet at least that quality level (Green CI) we should not merge just
> for creating an 5.1.alpha release for the summit.
> >>>>>
> >>>>> Now, I am totally fine with a preview release without numbering and
> with big warnings that will only serve as a preview for the summit.
> >>>>>
> >>>>> Le mer. 25 oct. 2023 à 06:33, Berenguer Blasi <
> berenguerbl...@gmail.com<mailto:berenguerbl...@gmail.com> berenguerbl...@gmail.com<mailto:berenguerbl...@gmail.com>>> a écrit :
> >>>>>>
> >>>>>> I also think there's many good new features in 5.0 already they'd
> make a
> >>>>>> good release even on their own. My 2 cts.
> >>>>>>
> >>>>>> On 24/10/23 23:20, Brandon Williams wrote:
> >>>>>> > The catch here is that we don't publish docker images currently.
> The
> >>>>>> > C* docker images available are not made by us.
> >>>>>> >
> >>>>>> > Kind Regards,
> >>>>>> > Brandon
> >>>>>> >
> >>>>>> > On Tue, Oct 24, 2023 at 3:31 PM Patrick McFadin <
> pmcfa...@gmail.com<mailto:pmcfa...@gmail.com><mailto:pmcfa...@gmail.com
> <mailto:pmcfa...@gmail.com>>> wrote:
> >>>>>> >> Let me make that really easy. Hell yes
> >>>>>> >>
> >>>>>> >> Not everybody runs CCM, I've tried but I've met resistance.
> >>>>>> >>
> >>>>>> >> Compiling your own version usually involves me saying the words
> "Yes, ant realclean exists. I'm not trolling you"
> >>>>>> >>
> >>>>>> >> docker pull  works on every OS and curates a single node
> experience.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie <
> jmcken...@apache.org<mailto:jmcken...@apache.org> jmcken...@apache.org<mailto:jmcken...@apache.org>>> wrote:
> >>>>>> >>> In order for the project to advertise the release outside the
> dev@ list it needs to be a formal release.
> >>>>>> >>>
> >>>>>> >>> That's my reading as well:
> >>>>>> >>>
> https://www.apache.org/legal/release-policy.html#release-definition<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Flegal%2Frel

Re: Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Patrick McFadin
Sent you an invite Sam. Welcome to the community!

On Fri, Oct 27, 2023 at 10:31 AM Sam  wrote:

> Please can I have an invite to the Slack workspace on this email. I'd like
> to take a look through some of the items for first time contributors :-)
>
> Thanks!
>
> On Fri, 27 Oct 2023 at 18:10, Josh McKenzie  wrote:
>
>> In case you're keeping score on how frequently these are coming out: *please
>> stop*. ;)
>>
>> Silver lining - looks like we have a lot to discuss this round! Last
>> update was late July and we've been churning through the 5.0 freeze and
>> stabilization phase.
>>
>>
>>
>> *[New Contributors Getting Started]*
>> Check out https://the-asf.slack.com, channel #cassandra-dev. Reply
>> directly to me on this email if you need an invite for your account, and
>> reach out to the @cassandra_mentors alias in the channel if you need to get
>> oriented.
>>
>> We have a list of curated "getting started" tickets you can find here,
>> filtered to "ToDo" (i.e. not yet worked):
>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162=2652
>> .
>>
>> *Helpful links:*
>> - Getting Started with Development on C*:
>> https://cassandra.apache.org/_/development/gettingstarted.html
>> - Building and IDE integration (worktrees are your friend; msg me on
>> slack if you need pointers):
>> https://cassandra.apache.org/_/development/ide.html
>> - Code Style: https://cassandra.apache.org/_/development/code_style.html
>>
>>
>>
>> *[Dev mailing list]*
>>
>> https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27
>> :
>>
>> My last email of shame was 35 threads. Drumroll for this one...
>> 91. *Yeesh*. Let me stick to highlights.
>>
>> Ekaterina pushed through dropping JDK8 support and adding JDK17
>> support... back in July. If you didn't know about it by know, consider
>> yourself doubly notified. :) .
>> https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k I think
>> I can speak on behalf of all of us when I say: *Thank You Ekaterina.*
>>
>> This came up recently on another thread about when to branch 5.1, but we
>> discussed our freeze plans and exception rules for TCM and Accord here:
>> https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick
>> was essentially looking for a similar waiver for Vector search since it was
>> well abstracted, depended on SAI and external libs, and in general
>> shouldn't be too big of a disruption to get into 5.0. General consensus at
>> the time was "sure", and the work has since been completed. But here's the
>> reminder and link for posterity (and in case you missed it).
>>
>> Jaydeep reached out about a potential short-term solution to detecting
>> token-ownership mismatch while we don't yet have TCM; this seems more
>> pressing now as we're looking at a 5.0 without yet having TCM in it. The
>> dev ML thread is here:
>> https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and he
>> created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing
>> with the topic. There's a relatively modest (7 files, just over 300 lines)
>> PR available here: https://github.com/apache/cassandra/pull/2595/files;
>> I haven't looked into it, but it might be worth considering getting this
>> into 5.0 since it looks like we're moving to cutting w/out TCM. Any
>> thoughts?
>>
>> We had a pretty good discussion about automated repair scheduling,
>> discussing whether it should live in the DB proper vs. in the sidecar, pros
>> and cons, pressures, etc. Not sure if things moved beyond that; I know
>> there's at least a few implementations out there that haven't yet made
>> their way back to the ASF project proper. Thread:
>> https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My
>> hope is we can avoid the gridlock we hit for a long time with the sidecar
>> where there are multiple implementations with different tradeoffs and
>> everyone's disincentivized from accepting a solution different from their
>> own in-house one since it'd theoretically require re-tooling. Tough problem
>> with no easy solutions, but would love to see this become a first class
>> citizen in the ecosystem.
>>
>> Paulo brought up a discussion about moving to disk_access_mode =
>> mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we
>> actually changed that in the 5.0 branch? Thread:
>> https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just
>> pulled on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() ==
>> .mmap rather than .mmap_index_only.
>>
>> David Capwell worked on adding some retries to repair messages when
>> they're failing to make the process more robust:
>> https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1.
>> Reception was positive enough that he went so far as to back-port it and
>> also work on some for IR. Looks like he could use a reviewer here:
>> https://issues.apache.org/jira/browse/CASSANDRA-18962 - and this is
>> patch available.
>>

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
Let me make that really easy. Hell yes

Not everybody runs CCM, I've tried but I've met resistance.

Compiling your own version usually involves me saying the words "Yes, ant
realclean exists. I'm not trolling you"

docker pull  works on every OS and curates a single node experience.



On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie  wrote:

> In order for the project to advertise the release outside the dev@ list
> it needs to be a formal release.
>
> That's my reading as well:
> https://www.apache.org/legal/release-policy.html#release-definition
>
> I wonder if there'd be value in us having a cronned job that'd do nightly
> docker container builds on trunk + feature branches, archived for N days,
> and we make that generally known to the dev@ list here so folks that want
> to poke at the current state of trunk or other branches could do so with
> very low friction. We'd probably see more engagement on feature branches if
> it was turn-key easy for other C* devs to spin the up and check them out.
>
> For what you're talking about here Patrick (a docker image for folks
> outside the dev@ audience and more user-facing), we'd want to vote on it
> and go through the formal process.
>
> On Tue, Oct 24, 2023, at 3:10 PM, Jeremiah Jordan wrote:
>
> In order for the project to advertise the release outside the dev@ list
> it needs to be a formal release.  That just means that there was a release
> vote and at least 3 PMC members +1’ed it, and there are more +1 than there
> are -1, and we follow all the normal release rules.  The ASF release
> process doesn’t care what branch you cut the artifacts from or what version
> you call it.
>
> So the project can cut artifacts for and release a 5.1-alpha1,
> 5.1-dev-preview1, what ever we want to version this thing, from trunk or
> any other branch name we want.
>
> -Jeremiah
>
> On Oct 24, 2023 at 2:03:41 PM, Patrick McFadin  wrote:
>
> I would like to have something for developers to use ASAP to try the
> Accord syntax. Very few people have seen it, and I think there's a learning
> curve we can start earlier.
>
> It's my understanding that ASF policy is that it needs to be a project
> release to create a docker image.
>
> On Tue, Oct 24, 2023 at 11:54 AM Jeremiah Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
> If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
> actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
> things while it is in trunk and cut the 5.1 branch when we actually think
> we are near releasing?  I don’t see any reason we can not cut “preview”
> artifacts from trunk?
>
> -Jeremiah
>
> On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
> wrote:
>
> I guess at the end of the day, shipping a release with a bunch of awesome
> features is better than holding it back.  If there's 2 big releases in 6
> months the community isn't any worse off.
>
> We either ship something, or nothing, and something is probably better.
>
> Jon
>
>
> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>
> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>
> was excited about Accord, but SAI and UCS were pretty high on the list.
>
>
> Benedict and I had a good conversation last night, and now I understand
>
> more essential details for this conversation. TCM is taking far more work
>
> than initially scoped, and Accord depends on a stable TCM. TCM is months
>
> behind and that's a critical fact, and one I personally just learned of. I
>
> thought things were wrapping up this month, and we were in the testing
>
> phase. I get why that's a topic we are dancing around. Nobody wants to say
>
> ship dates are slipping because that's part of our culture. It's
>
> disappointing and, if new information, an unwelcome surprise, but none of
>
> us should be angry or in a blamey mood because I guarantee every one of us
>
> has shipped the code late. My reaction yesterday was based on an incorrect
>
> assumption. Now that I have a better picture, my point of view is changing.
>
>
> Josh's point about what's best for users is crucial. Users deserve stable
>
> code with a regular cadence of features that make their lives easier. If we
>
> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>
> time. And I mentioned a disaster yesterday. A bigger disaster would be
>
> shipping Accord with a major bug that causes data loss, eroding community
>
> trust. Accord has to be the most bulletproof of all bulletproof features.
>
> The pressure to ship is only going to increase and that's fertile ground
>
> for that sort of bug.
>
>
> So, taking a step back and with a clearer pictu

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
I would like to have something for developers to use ASAP to try the Accord
syntax. Very few people have seen it, and I think there's a learning curve
we can start earlier.

It's my understanding that ASF policy is that it needs to be a project
release to create a docker image.

On Tue, Oct 24, 2023 at 11:54 AM Jeremiah Jordan 
wrote:

> If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
> actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
> things while it is in trunk and cut the 5.1 branch when we actually think
> we are near releasing?  I don’t see any reason we can not cut “preview”
> artifacts from trunk?
>
> -Jeremiah
>
> On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
> wrote:
>
>> I guess at the end of the day, shipping a release with a bunch of awesome
>> features is better than holding it back.  If there's 2 big releases in 6
>> months the community isn't any worse off.
>>
>> We either ship something, or nothing, and something is probably better.
>>
>> Jon
>>
>>
>> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>>
>> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>>
>> was excited about Accord, but SAI and UCS were pretty high on the list.
>>
>>
>> Benedict and I had a good conversation last night, and now I understand
>>
>> more essential details for this conversation. TCM is taking far more work
>>
>> than initially scoped, and Accord depends on a stable TCM. TCM is months
>>
>> behind and that's a critical fact, and one I personally just learned of. I
>>
>> thought things were wrapping up this month, and we were in the testing
>>
>> phase. I get why that's a topic we are dancing around. Nobody wants to say
>>
>> ship dates are slipping because that's part of our culture. It's
>>
>> disappointing and, if new information, an unwelcome surprise, but none of
>>
>> us should be angry or in a blamey mood because I guarantee every one of us
>>
>> has shipped the code late. My reaction yesterday was based on an incorrect
>>
>> assumption. Now that I have a better picture, my point of view is
>> changing.
>>
>>
>> Josh's point about what's best for users is crucial. Users deserve stable
>>
>> code with a regular cadence of features that make their lives easier. If
>> we
>>
>> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>>
>> time. And I mentioned a disaster yesterday. A bigger disaster would be
>>
>> shipping Accord with a major bug that causes data loss, eroding community
>>
>> trust. Accord has to be the most bulletproof of all bulletproof features.
>>
>> The pressure to ship is only going to increase and that's fertile ground
>>
>> for that sort of bug.
>>
>>
>> So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
>>
>> plan mainly because I don't think 5.1 is (or should be) a fast follow.
>>
>>
>> For the user community, the communication should be straightforward. TCM +
>>
>> Accord are turning out to be much more complicated than was originally
>>
>> scoped, and for good reasons. Our first principle is to provide a stable
>>
>> and reliable system, so as a result, we'll be de-coupling TCM + Accord
>> from
>>
>> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
>>
>> additional hardening and testing is done. We can communicate this in a
>> blog
>>
>> post.,
>>
>>
>> To make this much more palatable to our use community, if we can get a
>>
>> build and docker image available ASAP with Accord, it will allow
>> developers
>>
>> to start playing with the syntax. Up to this point, that hasn't been
>> widely
>>
>> available unless you compile the code yourself. Developers need to
>>
>> understand how this will work in an application, and up to this point, the
>>
>> syntax is text they see in my slides. We need to get some hands-on and
>> that
>>
>> will get our user community engaged on Accord this calendar year. The
>>
>> feedback may even uncover some critical changes we'll need to make. Lack
>> of
>>
>> access to Accord by developers is a critical problem we can fix soon and
>>
>> there will be plenty of excitement there and start building use cases
>>
>> before the final code ships.
>>
>>
>> I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
>>
>> but maybe one 

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
+1 to what you are saying, Josh. Based on the last survey, yes, everyone
was excited about Accord, but SAI and UCS were pretty high on the list.

Benedict and I had a good conversation last night, and now I understand
more essential details for this conversation. TCM is taking far more work
than initially scoped, and Accord depends on a stable TCM. TCM is months
behind and that's a critical fact, and one I personally just learned of. I
thought things were wrapping up this month, and we were in the testing
phase. I get why that's a topic we are dancing around. Nobody wants to say
ship dates are slipping because that's part of our culture. It's
disappointing and, if new information, an unwelcome surprise, but none of
us should be angry or in a blamey mood because I guarantee every one of us
has shipped the code late. My reaction yesterday was based on an incorrect
assumption. Now that I have a better picture, my point of view is changing.

Josh's point about what's best for users is crucial. Users deserve stable
code with a regular cadence of features that make their lives easier. If we
put 5.0 on hold for TCM + Accord, users will get neither for a very long
time. And I mentioned a disaster yesterday. A bigger disaster would be
shipping Accord with a major bug that causes data loss, eroding community
trust. Accord has to be the most bulletproof of all bulletproof features.
The pressure to ship is only going to increase and that's fertile ground
for that sort of bug.

So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
plan mainly because I don't think 5.1 is (or should be) a fast follow.

For the user community, the communication should be straightforward. TCM +
Accord are turning out to be much more complicated than was originally
scoped, and for good reasons. Our first principle is to provide a stable
and reliable system, so as a result, we'll be de-coupling TCM + Accord from
5.0 into a 5.1 branch, which is available in parallel to 5.0 while
additional hardening and testing is done. We can communicate this in a blog
post.,

To make this much more palatable to our use community, if we can get a
build and docker image available ASAP with Accord, it will allow developers
to start playing with the syntax. Up to this point, that hasn't been widely
available unless you compile the code yourself. Developers need to
understand how this will work in an application, and up to this point, the
syntax is text they see in my slides. We need to get some hands-on and that
will get our user community engaged on Accord this calendar year. The
feedback may even uncover some critical changes we'll need to make. Lack of
access to Accord by developers is a critical problem we can fix soon and
there will be plenty of excitement there and start building use cases
before the final code ships.

I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
but maybe one for my birthday?

Patrick



On Tue, Oct 24, 2023 at 7:23 AM Josh McKenzie  wrote:

> Maybe it won't be a glamorous release but shipping
> 5.0 mitigates our worst case scenario.
>
> I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
> memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
> accurate, all combine to make 5.0 a pretty glamorous release IMO
> independent of TCM and Accord. Accord is a true paradigm-shift game-changer
> so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
> resolve one of the biggest pain-points in our system for over a decade, but
> I think 5.0 is a very meaty release in its own right today.
>
> Anyway - I agree with you Brandon re: timelines. If things take longer
> than we'd hope (which, if I think back, they do roughly 100% of the time on
> this project), blocking on these features could both lead to a significant
> delay in 5.0 going out as well as increasing pressure and risk of burnout
> on the folks working on it. While I believe we all need some balanced
> urgency to do our best work, being under the gun for something with a hard
> deadline or having an entire project drag along blocked on you is not where
> I want any of us to be.
>
> Part of why we talked about going to primarily annual calendar-based
> releases was to avoid precisely this situation, where something that
> *feels* right at the cusp of merging leads us to delay a release
> repeatedly. We discussed this a couple times this year:
> 1: https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3,
> where Mick proposed a "soft-freeze" for everything w/out exception and 1st
> week October "hard-freeze", and there was assumed to be lazy consensus
> 2: https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw,
> where we kept along with what we discussed in 1 but added in CEP-30 to be
> waivered in as well.
>
> So. We're at a crossroads here where we need to either follow through with
> what we all agreed to earlier this year, or acknowledge that our best
> 

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Patrick McFadin
I’m going to be clearer in my statement.

This has to be in 5.0, even if it’s alpha and ships after December, or this
is going to be disaster that will take us much longer to unravel.

On Mon, Oct 23, 2023 at 7:49 AM Jeremiah Jordan 
wrote:

> +1 from me assuming we have tickets and two committer +1’s on them for
> everything being committed to trunk, and CI is working/passing before it
> merges.  The usual things, but I want to make sure we do not compromise on
> any of them as we try to “move fast” here.
>
> -Jeremiah Jordan
>
> On Oct 23, 2023 at 8:50:46 AM, Sam Tunnicliffe  wrote:
>
>> +1 from me too.
>>
>> Regarding Benedict's point, backwards incompatibility should be minimal;
>> we modified snitch behaviour slightly, so that local snitch config only
>> relates to the local node, all peer info is fetched from cluster metadata.
>> There is also a minor change to the way failed bootstraps are handled, as
>> with TCM they require an explicit cancellation step (running a nodetool
>> command).
>>
>> Whether consensus decrees that this constitutes a major bump or not, I
>> think decoupling these major projects from 5.0 is the right move.
>>
>>
>> On 23 Oct 2023, at 12:57, Benedict  wrote:
>>
>> I’m cool with this.
>>
>> We may have to think about numbering as I think TCM will break some
>> backwards compatibility and we might technically expect the follow-up
>> release to be 6.0
>>
>> Maybe it’s not so bad to have such rapid releases either way.
>>
>> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>>
>> 
>>
>> The TCM work (CEP-21) is in its review stage but being well past our
>> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
>> like to propose the following.
>>
>> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
>> an immediate 5.1-alpha1 release.
>>
>> I see this as a win-win scenario for us, considering our current
>> situation.  (Though it is unfortunate that Accord is included in this
>> scenario because we agreed it to be based upon TCM.)
>>
>> This will mean…
>>  - We get to focus on getting 5.0 to beta and GA, which already has a ton
>> of features users want.
>>  - We get an alpha release with TCM and Accord into users hands quickly
>> for broader testing and feedback.
>>  - We isolate GA efforts on TCM and Accord – giving oss and downstream
>> engineers time and patience reviewing and testing.  TCM will be the biggest
>> patch ever to land in C*.
>>  - Give users a choice for a more incremental upgrade approach, given
>> just how many new features we're putting on them in one year.
>>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
>> 4.x versions, just as if it had landed in 5.0.
>>
>>
>> The risks/costs this introduces are
>>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
>> and at some point decide to undo this work, while we can throw away the
>> cassandra-5.1 branch we would need to do a bit of work reverting the
>> changes in trunk.  This is a _very_ edge case, as confidence levels on the
>> design and implementation of both are already tested and high.
>>  - We will have to maintain an additional branch.  I propose that we
>> treat the 5.1 branch in the same maintenance window as 5.0 (like we have
>> with 3.0 and 3.11).  This also adds the merge path overhead.
>>  - Reviewing of TCM and Accord will continue to happen post-merge.  This
>> is not our normal practice, but this work will have already received its
>> two +1s from committers, and such ongoing review effort is akin to GA
>> stabilisation work on release branches.
>>
>>
>> I see no other ok solution in front of us that gets us at least both the
>> 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
>> demand to start experimenting with these features, and our Cassandra Summit
>> in December.
>>
>>
>> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>>
>>
>>
>>


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Patrick McFadin
I'm really surprised to see this email. The last I heard everything was on
track for getting into 5.0 and TBH and Accord is what a majority of users
are expecting in 5.0. And how could this be a .1 release?

What is it going to take to get it into 5.0? What is off track and how did
we get here?

On Mon, Oct 23, 2023 at 6:51 AM Sam Tunnicliffe  wrote:

> +1 from me too.
>
> Regarding Benedict's point, backwards incompatibility should be minimal;
> we modified snitch behaviour slightly, so that local snitch config only
> relates to the local node, all peer info is fetched from cluster metadata.
> There is also a minor change to the way failed bootstraps are handled, as
> with TCM they require an explicit cancellation step (running a nodetool
> command).
>
> Whether consensus decrees that this constitutes a major bump or not, I
> think decoupling these major projects from 5.0 is the right move.
>
>
> On 23 Oct 2023, at 12:57, Benedict  wrote:
>
> I’m cool with this.
>
> We may have to think about numbering as I think TCM will break some
> backwards compatibility and we might technically expect the follow-up
> release to be 6.0
>
> Maybe it’s not so bad to have such rapid releases either way.
>
> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>
> 
>
> The TCM work (CEP-21) is in its review stage but being well past our
> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> like to propose the following.
>
> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
> an immediate 5.1-alpha1 release.
>
> I see this as a win-win scenario for us, considering our current
> situation.  (Though it is unfortunate that Accord is included in this
> scenario because we agreed it to be based upon TCM.)
>
> This will mean…
>  - We get to focus on getting 5.0 to beta and GA, which already has a ton
> of features users want.
>  - We get an alpha release with TCM and Accord into users hands quickly
> for broader testing and feedback.
>  - We isolate GA efforts on TCM and Accord – giving oss and downstream
> engineers time and patience reviewing and testing.  TCM will be the biggest
> patch ever to land in C*.
>  - Give users a choice for a more incremental upgrade approach, given just
> how many new features we're putting on them in one year.
>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
> 4.x versions, just as if it had landed in 5.0.
>
>
> The risks/costs this introduces are
>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
> and at some point decide to undo this work, while we can throw away the
> cassandra-5.1 branch we would need to do a bit of work reverting the
> changes in trunk.  This is a _very_ edge case, as confidence levels on the
> design and implementation of both are already tested and high.
>  - We will have to maintain an additional branch.  I propose that we treat
> the 5.1 branch in the same maintenance window as 5.0 (like we have with 3.0
> and 3.11).  This also adds the merge path overhead.
>  - Reviewing of TCM and Accord will continue to happen post-merge.  This
> is not our normal practice, but this work will have already received its
> two +1s from committers, and such ongoing review effort is akin to GA
> stabilisation work on release branches.
>
>
> I see no other ok solution in front of us that gets us at least both the
> 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
> demand to start experimenting with these features, and our Cassandra Summit
> in December.
>
>
> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>
>
>
>


Cassandra Summit Update!

2023-10-15 Thread Patrick McFadin
Hello Cassandra Community,

Below you'll find the updated announcement being posted on the Cassandra
website.

The short version for short attention spans:
 - Cassandra Summit will be co-located with the AI.Dev conference. One
ticket, two conferences.
 - CFP for an AI track for the Cassandra Summit is open for one more week (
https://events.linuxfoundation.org/cassandra-summit/program/cfp/#suggested-topics
)
 - Register now! Earlybird ends in a few weeks. (
https://events.linuxfoundation.org/cassandra-summit/register/)

Cassandra Summit 2023 Gains Ai.dev as Co-located Event; NEW AI + Cassandra
Track

We are excited to announce that the new AI.dev: Open Source GenAI & ML
Summit 2023 
conference will be co-located with Cassandra Summit this year! This means
that Cassandra Summit will welcome an expanded audience that includes
developers who are delving into the realm of open source generative AI and
machine learning.

And with the addition of AI.dev, a NEW AI + Cassandra track

will be featured at the event. The Call for Proposals
 is open
until 9:00 AM PDT on Monday, October 23.

Here’s what you need to know:

WHEN + WHERE IS THIS HAPPENING?: Cassandra Summit + AI.dev will take place
December 12-13, 2023 at the San Jose, California McEnery Convention Center

WHO SHOULD ATTEND?: data practitioners, developers, engineers and
enthusiasts + developers who are interested in open source generative AI
and machine learning.

WHAT ARE THE CFP DETAILS? The CFP for the new AI + Cassandra track is now
open. This track will include lightning talks, conference sessions, panel
sessions and technical workshops that delve into distributed AI using
Cassandra and case studies that cover AI-powered applications using Apache
Cassandra. Submit a talk today!


HOW DO I REGISTER? Cassandra Summit and AI.dev will be running together
simultaneously and attendees will have access to both events with one
single registration. So whether you’ve already registered or are planning
to register, you’ll gain access to both of these events for one price. To
learn more or to register, visit
https://events.linuxfoundation.org/cassandra-summit/register/

Cassandra Summit is where the community can connect to share best practices
and use cases, celebrate makers and users, forge critical relationships,
and learn about advancements in Apache Cassandra. With the addition of
AI.dev, we are excited to expand the community’s flagship event and include
talks that showcase how AI and Cassandra synergize, unlocking new
possibilities and enhancing data-driven solutions.

We hope to see you soon!


Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-10 Thread Patrick McFadin
Dinesh raises some good points.

If we do adopt this, there will be non-zero overhead of the release
process. This is fine but we need volunteers to run this process. My
understanding is that they need to be ideally PMC or at least Committers
on the project to go through all the steps to successfully release a new
artifact for our users.

Which was addressed in the proposed changes part of the CEP:

- A document detailing procedures for releasing to PyPI.org. This document
should include details on:

   1. How release to PyPI can be integrated into the build process. Can
   this be done with automation?
   2. How will credentials, permissions and ownership of packages on PyPI
   be managed?

-
My first thought was automation and integration into the build release.

Can you briefly outline the steps that need to be followed for a PyPI
release, Brad?

Patrick


On Wed, Aug 9, 2023 at 2:54 PM Abe Ratnofsky  wrote:

> I think it would be good for the project to have an official PyPI
> distribution, and the signal from users (40K downloads per month) is a
> clear indication that this is useful. Timely releases would help us get
> future improvements to cqlsh out faster, and moving this to an official
> distribution would protect users against any changes in this volunteer
> effort in case something happens in the future.
>
> +1 (nb)
>
> --
> Abe
>
> On Aug 9, 2023, at 1:33 PM, Brad  wrote:
>
> HI Dinesh,
>
> You are correct that the scope of this CEP is practical, narrow and
> limited to having an official distribution of CQLSH on the official Python
> package repository. Cassandra end-users, who use the CQLSH command line,
> would benefit in several direct ways:
>
>- A timely distribution of new CQLSH versions on the official Python
>package repository aligned with Apache Cassandra releases
>- A trusted distribution overseen by Apache Cassandra instead of third
>party maintainers. Today, there is only trust-based faith that the PyPI
>distribution of CQLSH matches the Apache Open Source one.
>- A lightweight distribution of CQLSH clocking in at 110KB vs
>downloading a 50MB tarball.
>
> Perhaps those are modest goals, but I would suggest they are big wins for
> the Cassandra user community. If you haven't tried it yet, please run '*pip
> install cqlsh*' on your desktop and see how nicely it works. Indeed, the
> return-on-investment of effort here should be really high, as the work is
> mostly already done, it's just run from a private repo at
> https://github.com/jeffwidman/cqlsh and has been maintained continually
> since 2013.
>
> Other initiatives such as subdividing the project(s) or re-writing the
> REPL in another language would be out-of-scope. It would be entirely
> appropriate to have a separate discussion on those two topics, if you wish
> to start that discussion.
>
> The process and degree of overhead required to publish to PyPI will
> require some discovery and discussion. Ideally, it would be possible to
> automate it. That is definitely a topic we need further input from the
> engineers involved in the build-release process.
>
> A pre-CEP discussion of this proposal was started by Jeff on the mailing
> list back in early July, see
> https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d.
>
> Regards,
>
> Brad
>
> On Wed, Aug 9, 2023 at 3:31 PM Dinesh Joshi  wrote:
>
>> Brad,
>>
>> Thanks for starting this discussion. My understanding is that we're
>> simply adding pip support for cqlsh and Apache Cassandra project will
>> officially publish a cqlsh pip package. This is a good goal but other
>> than having an official pip package, what is it that we're gaining?
>> Please don't interpret this as push back on your proposal but I am
>> unclear on what we're trying to solve by making this official
>> distribution. There are several distribution channels and it is
>> untenable to officially support all of them.
>>
>> If we do adopt this, there will be non-zero overhead of the release
>> process. This is fine but we need volunteers to run this process. My
>> understanding is that they need to be ideally PMC or at least Committers
>> on the project to go through all the steps to successfully release a new
>> artifact for our users.
>>
>> I would have liked this CEP to go a bit further than just packaging
>> cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
>> doesn't need to live in the cassandra repo. Extracting cqlsh into it's
>> separate repo would allow us to truly decouple cqlsh from the server.
>> This is already true for the most part as we rely on the Python driver
>> which is compatible with several cassandra releases. As it stands today
>> it is not possible for us to update cqlsh without making a Cassandra
>> release.
>>
>> If you truly want to go a bit further, we should consider rewriting
>> cqlsh in Java so we can easily share code from the server. We can then
>> potentially use Java Native Image[1] to produce a truly platform
>> 

Raw results from User Survey

2023-08-01 Thread Patrick McFadin
Thanks to everyone who participated in this survey. We had a significant
enough responses to give this a legitimacy.  220 responses!

I wanted to get the raw results out first so everyone can participate with
the full picture. I'll work on a blog post to post on the Apache web site
after this is done.

Graphs (easy read)
https://docs.google.com/document/d/1Rbg-VP4Xdvgp8EKNczkqfhFYeKwfc_ZmMW0c5Gol9pk/edit?usp=sharing

Anonymized spreadsheet of responses (make your own graphs)
https://docs.google.com/spreadsheets/d/1pjhpjID5sEW4Vcff8tq0Atbcq8Cds18pXorM4CQcStk/edit?usp=sharing

I'll be giving a bit more discussion in the Cassandra marketing meeting
tomorrow if you want to come hear my thoughts.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883297

Now, what surprised you in the results?

Patrick


Who wants a free Cassandra t-shirt?

2023-07-21 Thread Patrick McFadin
We have about another week left on the user survey I posted last week. The
response has been slow, so it's time to get things in gear.

I found a box of Cassandra t-shirts that will make an excellent thank you
for anyone filling out the survey. Once the survey window closes, I'll pick
a random group of emails to receive a shirt. Given the tepid response so
far, your chances are decent to receive a shirt!

5-10 minutes. That's all it takes. Promote to your networks and let's get
some opinions known!

https://forms.gle/KVNd7UmUfcBuoNvF7

Thanks again,

Patrick


Apache Cassandra User Survey

2023-07-15 Thread Patrick McFadin
It’s been a long time since I’ve asked the community for feedback in a poll
or otherwise. A lot is changing in the data world, and we have an exciting
Cassandra release coming up with v5!
I would like to ask for five or ten minutes of your time to answer some
questions about how you use Cassandra and how we are doing as a community.
There are only 2 questions required, and the rest are all optional, so
answer whatever you can. It’s all helpful information.

https://forms.gle/KVNd7UmUfcBuoNvF7

The survey will run until July 29, 2023. Once completed, the results will
be anonymized and the results posted on http://cassandra.apache.org

Help spread the word by posting this invitation on social media, slack
channels, or emailing colleagues. The bigger the N, the better the survey!
Here’s a sample to get you started:

I recently took the Apache Cassandra® 2023 survey, and I think you should
too! By sharing your answers, you can help shape the future of the
Cassandra project and contribute to the community. Your opinion matters!
https://forms.gle/KVNd7UmUfcBuoNvF7

Patrick


[DISCUSS] Conducting a User Survey

2023-07-10 Thread Patrick McFadin
For quite a few years, I have done Twitter polls to gather helpful
information about how people use Apache Cassandra. Twitter is no longer the
best place to conduct this kind of activity since it has become a ghost
town.

We should ask more comprehensive questions to get the pulse of our user
community. I want to do a simple Google Form survey that we can promote on
every channel for a few weeks. I'll anonymize the results and post them on
cassandra.apache.org.

Here are the proposed questions I have compiled. A pretty basic set of
questions, but it would be fun to know the answer to several of these:
https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharing

Comments are open to all. Please let me know what you think.

Patrick


Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-10 Thread Patrick McFadin
I would say it helps a lot of people. 45k downloads in just last month:
https://pypistats.org/packages/cqlsh

I feel like a CEP would be in order, along the lines of CEP-8:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

Unless anyone objects, I can help you get the CEP together and we can get a
vote, then a JIRA in place for any changes in trunk.

Patrick

On Mon, Jul 10, 2023 at 4:58 PM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Same - really appreciate those efforts and also welcome the upstreaming
> and release automation...
>
> German
> --
> *From:* Jeff Widman 
> *Sent:* Sunday, July 9, 2023 1:44 PM
> *To:* Max C. 
> *Cc:* dev@cassandra.apache.org ; Brad Schoening
> 
> *Subject:* [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to
> PyPI as part of the release process
>
> You don't often get email from j...@jeffwidman.com. Learn why this is
> important 
> Thanks Max, always encouraging to hear that the time I spend on open
> source is helping others.
>
> Your use case is very similar to what drove my original desire to get
> involved with the project. Being able to `pip install cqlsh` from a dev
> machine was so much lighter weight than the alternatives.
>
> Anyone else care to weigh in on this?
>
> What are the next steps to move to a decision?
>
> Cheers,
> Jeff
>
> On Sat, Jul 8, 2023, 7:23 PM Max C.  wrote:
>
> As a user, I really appreciate your efforts Jeff & Brad.  I would *love*
> for the C* project to officially support this.
>
> In our environment we have a lot of client machines that all share common
> NFS mounted directories.  It's much easier for us to create a Python
> virtual environment on a file server with the cqlsh PyPI package installed
> than it is to install the Cassandra RPMs on every single machine.  Before I
> discovered your PyPI package, our developers would need to login to  a
> Cassandra node in order to run cqlsh.  The cqlsh PyPI package, however, is
> in our standard "python dev tools" virtual environment -- along with
> Ansible, black, isort and various other Python packages; which means it's
> accessible to everyone, everywhere.
>
> I agree that this should not *replace* packaging cqlsh in the Cassandra
> RPM, so much provide an additional *option* for installing cqlsh without
> the baggage of installing the full Cassandra package.
>
> Thanks again for your work Jeff & Brad.
>
> - Max
> On 7/6/2023 5:55 PM, Jeff Widman wrote:
>
> Myself and Brad Schoening currently maintain
> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
> every Cassandra release.
>
> This way:
>
>- anyone who wants a lightweight client to talk to a remote cassandra
>can simply `pip install cqlsh` without having to download the full
>cassandra source, unzip it, etc.
>- it's very easy for folks to use it as scaffolding in their python
>scripts/tooling since they can simply include it in the list of their
>required dependencies.
>
> We currently handle the packaging by waiting for a release, then manually
> copy/pasting the code out of the cassandra source tree into
> https://github.com/jeffwidman/cqlsh which has some additional
> build/python package configuration files, then using standard
> python tooling to publish to PyPI.
>
> Given that our project is simply a build/packaging project, I wanted to
> start a conversation about upstreaming this into core Cassandra. I realize
> that Cassandra has no interest in maintaining lots of build targets... but
> given that cqlsh is written in Python and publishing to PyPI enables DBA's
> to share more complicated tooling built on top of it this seems like a
> natural fit for core cassandra rather than a standalone project.
>
> Goal:
> When a Cassandra release happens, the build/release process automatically
> publishes cqlsh to https://pypi.org/project/cqlsh/.
>
> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
> was some initial chatter about that in
> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
> lot of complexity, and I'm honestly not sure it's a great idea. Even if
> folks later want to go that route, the first hurdle is publishing to PyPI,
> so for now let's keep the scope of the discussion limited to treating PyPI
> purely as a release target, and not as an ingredient to a release.
>
> From an implementation perspective, this should be very straightforward.
> We don't have any differences from the CQLSH source that's in cassandra,
> instead we point folks to make changes to cqlsh in the Cassandra source. In
> fact we've made multiple contributions back to `cqlsh` ourselves and have
> drastically cleaned up the code:
> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening=pullrequests.
> So the only real change is adding the package config files and the build 

Re: [VOTE] CEP 33 - CIDR filtering authorizer

2023-06-28 Thread Patrick McFadin
+1

On Wed, Jun 28, 2023 at 3:42 AM Brandon Williams  wrote:

> +1
>
> Kind Regards,
> Brandon
>
>
> On Tue, Jun 27, 2023 at 12:17 PM Shailaja Koppu  wrote:
> >
> > Hi Team,
> >
> > (Starting a new thread for VOTE instead of reusing the DISCUSS thread,
> to follow usual procedure).
> >
> > Please vote on CEP 33 - CIDR filtering authorizer
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-33%3A+CIDR+filtering+authorizer
> .
> >
> > Thanks,
> > Shailaja
>


Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-14 Thread Patrick McFadin
+1

On Wed, Jun 14, 2023 at 2:39 PM Adam Holmberg 
wrote:

> +1
>
> (long time coming!)
>
> On Wed, Jun 14, 2023 at 3:51 AM Jorge Bay Gondra 
> wrote:
>
>> +1 nb
>>
>> On Wed, Jun 14, 2023 at 9:13 AM Sam Tunnicliffe  wrote:
>>
>>> +1
>>>
>>> On 13 Jun 2023, at 15:14, Jeremy Hanna 
>>> wrote:
>>>
>>> Calling for a vote on CEP-8 [1].
>>>
>>> To clarify the intent, as Benjamin said in the discussion thread [2],
>>> the goal of this vote is simply to ensure that the community is in
>>> favor of the donation. Nothing more.
>>> The plan is to introduce the drivers, one by one. Each driver donation
>>> will need to be accepted first by the PMC members, as it is the case for
>>> any donation. Therefore the PMC should have full control on the pace at
>>> which new drivers are accepted.
>>>
>>> If this vote passes, we can start this process for the Java driver under
>>> the direction of the PMC.
>>>
>>> Jeremy
>>>
>>> 1.
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>>> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>>>
>>>
>>>


Re: [VOTE] CEP-30 ANN Vector Search

2023-06-14 Thread Patrick McFadin
Andy,

Good to see you on the ML again! CEP-30 is slated for release with 5.0
later in the year. Until then, you'll need to do a local build or try it
out in a preview in Astra. A few of us have been talking about creating a
preview docker image since there is some interest in having it run in
k8ssandra. In any case, this is very alpha code and should be treated as
such. Reporting errors or unusual results would be greatly appreciated!

Patrick



On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) <
a.e.cob...@dundee.ac.uk> wrote:

> Hi All,
>
>
>
> Great news this has gone through, I wondering if we have a timescale for
> this making it to Beta or release ?  I’m asking because we have a project
> that would benefit from this approach.
>
>
>
> Andy
>
>
>
>
>
> *From: *Jonathan Ellis 
> *Date: *Tuesday, 30 May 2023 at 14:44
> *To: *dev 
> *Subject: *Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
>
> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
>
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
>
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Patrick McFadin
+1
Love the buzz this creating with new users. Thanks for the work on this
Jonathan.

On Thu, May 25, 2023 at 8:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: Vector search demo, and query syntax

2023-05-23 Thread Patrick McFadin
| I first stumbled a bit with "there's no where clause and no filtering
allowed…"
| But I doubt that reaction from any experienced cql user will last more
than a moment.

I was also wondering about that, but this syntax looks good. More
importantly, it will be easy to explain to end users.

Patrick

On Tue, May 23, 2023 at 1:28 PM Jonathan Ellis  wrote:

> Yes, that's totally reasonable syntactically, but I'd prefer not to open
> the can of worms of ordering by some functions but not others (and I
> definitely don't want to try to tackle ordering by all functions).  "You
> can order by expressions involving SAI columns" is a pretty easy rule to
> explain.
>
> On Tue, May 23, 2023 at 12:53 PM David Capwell  wrote:
>
>> I am ok with the syntax, but wondering if a function maybe better than a
>> CQL change?
>>
>> SELECT id, start, end, text
>> FROM {self.keyspace}.{self.table}
>> ORDER BY ANN(embedding, ?)
>> LIMIT ?
>>
>> Not really a common syntax, but could be useful down the line
>>
>> On May 23, 2023, at 12:37 AM, Mick Semb Wever  wrote:
>>
>>
>>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
>>> indexes first and eventually for all SAI indexes.  So this query would
>>> becomeSELECT id, start, end, text FROM
>>> {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*
>>>
>>
>>
>> LGTM.
>>
>> I first stumbled a bit with "there's no where clause and no filtering
>> allowed…"
>>
>> But I doubt that reaction from any experienced cql user will last more
>> than a moment.
>>
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Patrick McFadin
1
Yes
4



On Mon, May 15, 2023 at 3:00 AM Benedict  wrote:

> 3: CREATE  INDEX (Otherwise 2)
> No
> If configurable, should be a distributed configuration. This is very
> different to other local configurations, as the 2i selected has semantic
> implications, not just performance (and the perf implications are also much
> greater)
>
> On 15 May 2023, at 10:45, Mike Adamson  wrote:
>
> 
>
>> [POLL] Centralize existing syntax or create new syntax?
>>
>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>
> [POLL] Should there be a default? (YES/NO)
>>
>
> Yes
>
> [POLL] What do do with the default?
>>
>> 1.) Allow a default, and switch it to SAI (no configurables)
>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>> 3.) YAML config to override default index (legacy 2i remains the default)
>> 4.) YAML config/guardrail to require index type selection (not required
>> by default)
>>
>
> 3.) YAML config to override default index (legacy 2i remains the default)
>
>
>
> On Mon, 15 May 2023 at 08:54, Mick Semb Wever  wrote:
>
>>
>>
>> [POLL] Centralize existing syntax or create new syntax?
>>>
>>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>>
>>
>>
>> (1) CREATE INDEX …
>>
>>
>>
>>> [POLL] Should there be a default? (YES/NO)
>>>
>>
>>
>> Yes (but see below).
>>
>>
>>
>>> [POLL] What do do with the default?
>>>
>>> 1.) Allow a default, and switch it to SAI (no configurables)
>>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>>> 3.) YAML config to override default index (legacy 2i remains the default)
>>> 4.) YAML config/guardrail to require index type selection (not required
>>> by default)
>>>
>>
>>
>> (4) YAML config. Commented out default of 2i.
>>
>> I agree that the default cannot change in 5.0, but our existing default
>> of 2i can be commented out.
>>
>> For the user this gives them the same feedback, and puts the same
>> requirement to edit one line of yaml, as when we disabled MVs and SASI in
>> 4.0
>> No one has complained about either of these, which is a clear signal folk
>> understood how to get their existing DDLs to work from 3.x to 4.x
>>
>
>
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
There will be a LOT of content around using SAI in 5.0.

CCing marketing ML

On Wed, May 10, 2023 at 8:38 PM Jeff Jirsa  wrote:

> Changes like this always scare me, but the benefits probably outweigh the
> risks. Probably obviously to whoever implements but please make sure if
> this happens is super visible in both NEWS and simultaneously updates the
> to-string / to-cql representation of the schema in cqlsh / drivers /
> snapshots
>
> On Wed, May 10, 2023 at 8:27 PM Patrick McFadin 
> wrote:
>
>> Having pulled a lot of developers out of the 2i fire, I would love it if
>> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
>> seems like the right move for most developers that don't read docs and
>> assume behavior.
>>
>> As much as I hate that 2i would be the configured default, I get it. New
>> feature and this is the right thing for users. Would there be any way to
>> switch 2i to SAI for the same index declaration? That would make for a nice
>> upgrade for users moving to 5 without having to re-create indexes.
>>
>> Patrick
>>
>> On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:
>>
>>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>>> prefer allowing USING...WITH... for CREATE INDEX
>>>
>>>
>>> I have 0 issues with a new syntax to make this more clear
>>>
>>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>>> more or less what my original proposal was above (modulo the configurable
>>> default).
>>>
>>>
>>> I have 0 issues deprecating and producing a ClientWarning recommending
>>> the new syntax, but I would be against removing this syntax later on… it
>>> should be low effort to keep, so breaking a user would not be desirable for
>>> me.
>>>
>>> change only the fact that CREATE INDEX retains a configurable default
>>>
>>>
>>> This option allows users to control this behavior, and allows us to
>>> change the default over time.  For 5.0 I am strongly against SAI being the
>>> default (new features disabled by default), but I wouldn’t have issues in
>>> later versions changing the default once its been out for awhile.
>>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>>
>>> In my mind this is no different from defaulting to BTI in a follow up
>>> release, but if this concern is that the legacy index leaked details such
>>> as index tables, so changing the default would have side effects in the
>>> public domain that users might not expect, then I get it… are there other
>>> concerns?
>>>
>>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
>>> wrote:
>>>
>>> tl;dr If you take my original proposal and change only the fact that CREATE
>>> INDEX retains a configurable default, I think we get to the same place?
>>>
>>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>>
>>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
>>>> I see a broad desire here to have a configurable (YAML) default
>>>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>>>> the concept of a default index implementation is pretty standard for most
>>>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>>>> still need to either revert to CREATE CUSTOM INDEX or add the
>>>> USING...WITH... extensions to CREATE INDEX to override the default or
>>>> specify parameters, which will be in play once SAI supports basic text
>>>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>>>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>>>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>>>> that's more or less what my original proposal was above (modulo the
>>>> configurable default).
>>>>
>>>> Thoughts?
>>>>
>>>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>>>
>>>>> I’m not convinced by the changing defaults argument here. The
>>>>> characteristics of the two index types are very different, and users with
>>>>> scripts that make indexes today shouldn’t have their behaviour cha

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
Having pulled a lot of developers out of the 2i fire, I would love it if
defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
seems like the right move for most developers that don't read docs and
assume behavior.

As much as I hate that 2i would be the configured default, I get it. New
feature and this is the right thing for users. Would there be any way to
switch 2i to SAI for the same index declaration? That would make for a nice
upgrade for users moving to 5 without having to re-create indexes.

Patrick

On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:

> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
> prefer allowing USING...WITH... for CREATE INDEX
>
>
> I have 0 issues with a new syntax to make this more clear
>
> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
> more or less what my original proposal was above (modulo the configurable
> default).
>
>
> I have 0 issues deprecating and producing a ClientWarning recommending the
> new syntax, but I would be against removing this syntax later on… it should
> be low effort to keep, so breaking a user would not be desirable for me.
>
> change only the fact that CREATE INDEX retains a configurable default
>
>
> This option allows users to control this behavior, and allows us to change
> the default over time.  For 5.0 I am strongly against SAI being the default
> (new features disabled by default), but I wouldn’t have issues in later
> versions changing the default once its been out for awhile.
>
> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
>
> In my mind this is no different from defaulting to BTI in a follow up
> release, but if this concern is that the legacy index leaked details such
> as index tables, so changing the default would have side effects in the
> public domain that users might not expect, then I get it… are there other
> concerns?
>
> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
> wrote:
>
> tl;dr If you take my original proposal and change only the fact that CREATE
> INDEX retains a configurable default, I think we get to the same place?
>
> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>
> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe 
> wrote:
>
>> I see a broad desire here to have a configurable (YAML) default
>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>> the concept of a default index implementation is pretty standard for most
>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>> still need to either revert to CREATE CUSTOM INDEX or add the
>> USING...WITH... extensions to CREATE INDEX to override the default or
>> specify parameters, which will be in play once SAI supports basic text
>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>> that's more or less what my original proposal was above (modulo the
>> configurable default).
>>
>> Thoughts?
>>
>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>> We could introduce new syntax that properly appreciates there’s no
>>> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
>>> these indexes involve a partition key or scatter gather
>>>
>>> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>>>
>>> 
>>> +1 , as we must Improve the image of your own default indexing ability.
>>>
>>> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
>>> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
>>> some version after 5.0?
>>>
>>> for as I know there may be users using this as a plugin-index interface,
>>> like https://github.com/Stratio/cassandra-lucene-index (though these
>>> project may be inactive, But if someone wants to do something similar in
>>> the future, we don't have to stop).
>>>
>>>
>>>
>>> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>>>
 +1 for this, especially in the long term.  CREATE INDEX should do the
 right thing for most people without requiring extra ceremony.

 On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
 jeremiah.jor...@gmail.com> wrote:

> If the consensus is that SAI is the right default index, then we
> should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM
> INDEX.
>
>
> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
> wrote:
>
> Earlier today, Mick started a thread on the future of our index
> creation DDL on Slack:
>
> 

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Patrick McFadin
+1

On Tue, May 9, 2023 at 10:58 AM Caleb Rackliffe 
wrote:

> +1
>
> On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski 
> wrote:
>
>> Let's vote.
>>
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
>>
>> Piotr Kołaczkowski
>> e. pkola...@datastax.com
>> w. www.datastax.com
>>
>


Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Patrick McFadin
Under the goals section, there is this line:


   1. Scatter/gather across replicas, combining topK from each to get
   global topK.


But what I'm hearing is, exactly how will that happen? Maybe this is an SAI
question too. How is that verified in SAI?

On Tue, May 9, 2023 at 11:07 AM David Capwell  wrote:

> Approach section doesn’t go over how this will handle cross replica
> search, this would be good to flesh out… given results have a real ranking,
> the current 2i logic may yield incorrect results… so would think we need
> num_ranges / rf queries in the best case, with some new capability to sort
> the results?  If my assumption is correct, then how errors are handled
> should also be fleshed out… Example: 1k cluster without vnode and RF=3, so
> 333 queries fanned out to match, then coordinator needs to sort… if 1 of
> the queries fails and can’t fall back to peers… does the query fail (I
> assume so)?
>
> On May 8, 2023, at 7:20 PM, Jonathan Ellis  wrote:
>
> Hi all,
>
> Following the recent discussion threads, I would like to propose CEP-30 to
> add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached
> Indexes (SAI) to Apache Cassandra.
>
> The primary goal of this proposal is to implement ANN vector search
> capabilities, making Cassandra more useful to AI developers and
> organizations managing large datasets that can benefit from fast similarity
> search.
>
> The implementation will leverage Lucene's Hierarchical Navigable Small
> World (HNSW) library and introduce a new CQL data type for vector
> embeddings, a new SAI index for ANN search functionality, and a new CQL
> operator for performing ANN search queries.
>
> We are targeting the 5.0 release for this feature, in conjunction with the
> release of SAI. The proposed changes will maintain compatibility with
> existing Cassandra functionality and compose well with the already-approved
> SAI features.
>
> Please find the full CEP document here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>


Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
Derek, despite your preference, I would hang out with you at a party.

On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker 
wrote:

> Speaking as someone who likes Erlang, maybe that's why I also like NONNULL
> FROZEN>. It's unambiguous what Cassandra is going to do with that
> type. DENSE VECTOR means I need to go read docs (and then probably
> double-check in the source to be sure) to be sure what exactly is going on.
>
> Cheers,
>
> Derek
>
> On Fri, May 5, 2023 at 9:54 AM Patrick McFadin  wrote:
>
>> I hope we are willing to consider developers that use our system because
>> if I had to teach people to use "NON-NULL FROZEN" I'm pretty sure
>> the response would be:
>>
>> Did you tell me to go write a distributed map-reduce job in Erlang? I
>> beleive I did, Bob.
>>
>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie 
>> wrote:
>>
>>> Idiomatically, to my mind, there's a question of "what space are we
>>> thinking about this datatype in"?
>>>
>>> - In the context of mathematics, nullability in a vector would be 0
>>> - In the context of Cassandra, nullability tends to mean a tombstone (or
>>> nothing)
>>> - In the context of programming languages, it's all over the place
>>>
>>> Given many models are exploring quantizing to int8 and other data types,
>>> there's definitely the "support other data types easily in the future"
>>> piece to me we need to keep in mind.
>>>
>>> So with the above and the "meet the user where they are and don't make
>>> them understand more of Cassandra than absolutely critical to use it", I
>>> lean:
>>>
>>> 1. DENSE_VECTOR
>>> 2. VECTOR
>>> 3. type[dimension]
>>>
>>> This leaves the path open for us to expand on it in the future with
>>> sparse support and allows us to introduce some semantics that indicate
>>> idioms around nullability for the users coming from a different space.
>>>
>>> "NON-NULL FROZEN" is strictly correct, however it requires
>>> understanding idioms of how Cassandra thinks about data (nulls mean
>>> different things to us, we have differences between frozen and non-frozen
>>> due to constraints in our storage engine and materialization of data, etc)
>>> that get in the way of users doing things in the pattern they're familiar
>>> with without learning more about the DB than they're probably looking to
>>> learn. Historically this has been a challenge for us in adoption; the
>>> classic "Why can't I just write and delete and write as much as I want? Why
>>> are deletes filling up my disk?" problem comes to mind.
>>>
>>> I'd also be happy with us supporting:
>>> * NON-NULL FROZEN
>>> * DENSE_VECTOR as syntactic sugar for the above
>>>
>>> If getting into the "built-in syntactic sugar mapping for communities
>>> and specific use-cases" is something we're willing to consider.
>>>
>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>
>>> I think we are still discussing implementation here when I'm talking
>>> about developer experience. I want developers to adopt this quickly, easily
>>> and be successful. Vector search is already a thing. People use it every
>>> day. A successful outcome, in my view, is developers picking up this
>>> feature without reading a manual. (Because they don't anyway and get in
>>> trouble) I did some more extensive research about what other DBs are using
>>> for syntax. The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
>>>
>>> Pinecone[1] - dense_vector, sparse_vector
>>> Elastic[2]: dense_vector
>>> Milvus[3]: float_vector, binary_vector
>>> pgvector[4]: vector
>>> Weaviate[5]: Different approach. All typed arrays can be indexed
>>>
>>> Based on that I'm advocating a similar syntax:
>>>
>>> - DENSE VECTOR
>>> or
>>> - VECTOR
>>>
>>> [1] https://docs.pinecone.io/docs/hybrid-search
>>> [2]
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
>>> [3] https://milvus.io/docs/create_collection.md
>>> [4] https://github.com/pgvector/pgvector
>>> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes
>>>
>>> On Fri, May 5, 2023 at 6:07 AM Mike Adamson 
>>> wrote:
>>>
>>> Then we can have the indexing apparatus only accept *frozen* for
>>> the HSNW cas

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
My vote is:
1. DENSE VECTOR
2. VECTOR
3. ARRAY


On Fri, May 5, 2023 at 9:43 AM David Capwell  wrote:

> Went through and created a spreed sheet of current votes… For Patric and
> Mike, I don’t see a clear vote, so I put a ? where I “think” your
> preference is… for Mick, I only put one vote as the list looked like a
> summary, but you mentioned the first was your preference
>
> *Syntax*
>
> *Jonathan Ellis*
>
> *David Capwell*
>
> *Josh McKenzie*
>
> *Caleb Rackliffe*
>
> *Patrick McFadin*
>
> *Brandon Williams*
>
> *Mike Adamson*
>
> *Benedict*
>
> *Mick Semb Wever*
>
> VECTOR
>
> 1
>
> 2
>
> 2
>
>
>
> 1
>
> ?
>
> 3
>
>
> DENSE VECTOR
>
> 2
>
> 1
>
>
>
> ?
>
>
> ?
>
>
>
> type[dimension]
>
> 3
>
> 3
>
> 3
>
> 1
>
>
> 3
>
>
> 2
>
>
> DENSE_VECTOR
>
>
>
> 1
>
>
>
>
>
>
>
> NON NULL [dimention]
>
>
> 1
>
>
>
>
>
>
> 1
>
>
> VECTOR type[n]
>
>
>
>
>
>
> 2
>
>
>
> 1
>
> ARRAY
>
>
>
>
>
>
>
>
>
>
> NON-NULL FROZEN
>
>
>
>
>
>
>
>
>
>
>
> 1 = top pick
> 2 = second pick
> 3 = third pick
>
> Let me know if I am missing anyone, or if I have bad data
>
> On May 5, 2023, at 9:23 AM, Jonathan Ellis  wrote:
>
> +10 for not inflicting unwieldy keywords on ML users.
>
> Re Josh's summary, mostly agreed, my only objection to adding the DENSE
> keyword is that I don't see a foreseeable future where we also support
> sparse vectors, so it would end up being unnecessary extra verbosity.  So
> my preference would be
>
> 1. VECTOR
> 2. DENSE VECTOR (space instead of underscore, SQL isn't
> afraid of spaces)
> 3. type[dimension]
>
> On Fri, May 5, 2023 at 10:54 AM Patrick McFadin 
> wrote:
>
>> I hope we are willing to consider developers that use our system because
>> if I had to teach people to use "NON-NULL FROZEN" I'm pretty sure
>> the response would be:
>>
>> Did you tell me to go write a distributed map-reduce job in Erlang? I
>> beleive I did, Bob.
>>
>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie 
>> wrote:
>>
>>> Idiomatically, to my mind, there's a question of "what space are we
>>> thinking about this datatype in"?
>>>
>>> - In the context of mathematics, nullability in a vector would be 0
>>> - In the context of Cassandra, nullability tends to mean a tombstone (or
>>> nothing)
>>> - In the context of programming languages, it's all over the place
>>>
>>> Given many models are exploring quantizing to int8 and other data types,
>>> there's definitely the "support other data types easily in the future"
>>> piece to me we need to keep in mind.
>>>
>>> So with the above and the "meet the user where they are and don't make
>>> them understand more of Cassandra than absolutely critical to use it", I
>>> lean:
>>>
>>> 1. DENSE_VECTOR
>>> 2. VECTOR
>>> 3. type[dimension]
>>>
>>> This leaves the path open for us to expand on it in the future with
>>> sparse support and allows us to introduce some semantics that indicate
>>> idioms around nullability for the users coming from a different space.
>>>
>>> "NON-NULL FROZEN" is strictly correct, however it requires
>>> understanding idioms of how Cassandra thinks about data (nulls mean
>>> different things to us, we have differences between frozen and non-frozen
>>> due to constraints in our storage engine and materialization of data, etc)
>>> that get in the way of users doing things in the pattern they're familiar
>>> with without learning more about the DB than they're probably looking to
>>> learn. Historically this has been a challenge for us in adoption; the
>>> classic "Why can't I just write and delete and write as much as I want? Why
>>> are deletes filling up my disk?" problem comes to mind.
>>>
>>> I'd also be happy with us supporting:
>>> * NON-NULL FROZEN
>>> * DENSE_VECTOR as syntactic sugar for the above
>>>
>>> If getting into the "built-in syntactic sugar mapping for communities
>>> and specific use-cases" is something we're willing to consider.
>>>
>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>
>>> I think we are still discussing implementation here when I'm t

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I hope we are willing to consider developers that use our system because if
I had to teach people to use "NON-NULL FROZEN" I'm pretty sure the
response would be:

Did you tell me to go write a distributed map-reduce job in Erlang? I
beleive I did, Bob.

On Fri, May 5, 2023 at 8:05 AM Josh McKenzie  wrote:

> Idiomatically, to my mind, there's a question of "what space are we
> thinking about this datatype in"?
>
> - In the context of mathematics, nullability in a vector would be 0
> - In the context of Cassandra, nullability tends to mean a tombstone (or
> nothing)
> - In the context of programming languages, it's all over the place
>
> Given many models are exploring quantizing to int8 and other data types,
> there's definitely the "support other data types easily in the future"
> piece to me we need to keep in mind.
>
> So with the above and the "meet the user where they are and don't make
> them understand more of Cassandra than absolutely critical to use it", I
> lean:
>
> 1. DENSE_VECTOR
> 2. VECTOR
> 3. type[dimension]
>
> This leaves the path open for us to expand on it in the future with sparse
> support and allows us to introduce some semantics that indicate idioms
> around nullability for the users coming from a different space.
>
> "NON-NULL FROZEN" is strictly correct, however it requires
> understanding idioms of how Cassandra thinks about data (nulls mean
> different things to us, we have differences between frozen and non-frozen
> due to constraints in our storage engine and materialization of data, etc)
> that get in the way of users doing things in the pattern they're familiar
> with without learning more about the DB than they're probably looking to
> learn. Historically this has been a challenge for us in adoption; the
> classic "Why can't I just write and delete and write as much as I want? Why
> are deletes filling up my disk?" problem comes to mind.
>
> I'd also be happy with us supporting:
> * NON-NULL FROZEN
> * DENSE_VECTOR as syntactic sugar for the above
>
> If getting into the "built-in syntactic sugar mapping for communities and
> specific use-cases" is something we're willing to consider.
>
> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>
> I think we are still discussing implementation here when I'm talking about
> developer experience. I want developers to adopt this quickly, easily and
> be successful. Vector search is already a thing. People use it every day. A
> successful outcome, in my view, is developers picking up this feature
> without reading a manual. (Because they don't anyway and get in trouble) I
> did some more extensive research about what other DBs are using for syntax.
> The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
>
> Pinecone[1] - dense_vector, sparse_vector
> Elastic[2]: dense_vector
> Milvus[3]: float_vector, binary_vector
> pgvector[4]: vector
> Weaviate[5]: Different approach. All typed arrays can be indexed
>
> Based on that I'm advocating a similar syntax:
>
> - DENSE VECTOR
> or
> - VECTOR
>
> [1] https://docs.pinecone.io/docs/hybrid-search
> [2]
> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
> [3] https://milvus.io/docs/create_collection.md
> [4] https://github.com/pgvector/pgvector
> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes
>
> On Fri, May 5, 2023 at 6:07 AM Mike Adamson  wrote:
>
> Then we can have the indexing apparatus only accept *frozen* for
> the HSNW case.
>
> I'm inclined to agree with Benedict that the index will need to be
> specifically select by option rather than inferred based on type. As such
> there is no real reason for the *frozen* requirement on the type. The
> hnsw index can be built just as easily from a non-frozen array.
>
> I am in favour of enforcing non-null on the elements of an array by
> default. I would prefer that allowing nulls in the array would be a later
> addition if and when a use case arose for it.
>
> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
> wrote:
>
> Even in the ML case, sparse can just mean zeros rather than nulls, and
> they should compress similarly anyway.
>
> If we really want null values, I'd rather leave that in collections space.
>
> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
> wrote:
>
> I actually still prefer *type[dimension]*, because I think I intuitively
> read this as a primitive (meaning no null elements) array. Then we can have
> the indexing apparatus only accept *frozen* for the HSNW case.
>
> If that isn't intuitive to anyone else, I don't really have a strong
> opinion...but...conflating "frozen" and "de

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I think we are still discussing implementation here when I'm talking about
developer experience. I want developers to adopt this quickly, easily and
be successful. Vector search is already a thing. People use it every day. A
successful outcome, in my view, is developers picking up this feature
without reading a manual. (Because they don't anyway and get in trouble) I
did some more extensive research about what other DBs are using for syntax.
The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'

Pinecone[1] - dense_vector, sparse_vector
Elastic[2]: dense_vector
Milvus[3]: float_vector, binary_vector
pgvector[4]: vector
Weaviate[5]: Different approach. All typed arrays can be indexed

Based on that I'm advocating a similar syntax:

- DENSE VECTOR
or
- VECTOR

[1] https://docs.pinecone.io/docs/hybrid-search
[2]
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
[3] https://milvus.io/docs/create_collection.md
[4] https://github.com/pgvector/pgvector
[5] https://weaviate.io/developers/weaviate/config-refs/datatypes

On Fri, May 5, 2023 at 6:07 AM Mike Adamson  wrote:

> Then we can have the indexing apparatus only accept *frozen* for
>> the HSNW case.
>>
> I'm inclined to agree with Benedict that the index will need to be
> specifically select by option rather than inferred based on type. As such
> there is no real reason for the *frozen* requirement on the type. The
> hnsw index can be built just as easily from a non-frozen array.
>
> I am in favour of enforcing non-null on the elements of an array by
> default. I would prefer that allowing nulls in the array would be a later
> addition if and when a use case arose for it.
>
> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
> wrote:
>
>> Even in the ML case, sparse can just mean zeros rather than nulls, and
>> they should compress similarly anyway.
>>
>> If we really want null values, I'd rather leave that in collections space.
>>
>> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
>> wrote:
>>
>>> I actually still prefer *type[dimension]*, because I think I
>>> intuitively read this as a primitive (meaning no null elements) array. Then
>>> we can have the indexing apparatus only accept *frozen* for
>>> the HSNW case.
>>>
>>> If that isn't intuitive to anyone else, I don't really have a strong
>>> opinion...but...conflating "frozen" and "dense" seems like a bad idea. One
>>> should indicate single vs. multi-cell, and the other the presence or
>>> absence of nulls/zeros/whatever.
>>>
>>> On Thu, May 4, 2023 at 12:51 PM Patrick McFadin 
>>> wrote:
>>>
>>>> I agree with David's reasoning and the use of DENSE (and maybe
>>>> eventually SPARSE). This is terminology well established in the data world,
>>>> and it would lead to much easier adoption from users. VECTOR is close, but
>>>> I can see having to create a lot of content around "How to use it and not
>>>> get in trouble." (I have a lot of that content already)
>>>>
>>>>  - We don't have to explain what it is. A lot of prior art out there
>>>> already [1][2][3]
>>>>  - We're matching an established term with what users would expect. No
>>>> surprises.
>>>>  - Shorter ramp-up time for users. Cassandra is being modernized.
>>>>
>>>> The implementation is flexible, but the interface should empower our
>>>> users to be awesome.
>>>>
>>>> Patrick
>>>>
>>>> 1 -
>>>> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
>>>> <https://urldefense.com/v3/__https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud6ieKGQw$>
>>>> 2 -
>>>> https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
>>>> <https://urldefense.com/v3/__https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ue1o2CO2Q$>
>>>> 3 -
>>>> https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/
>>>> <https://urldefense.com/v3/__https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud3U6Hw5A$>
>>>>
>>>> On Thu,

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Patrick McFadin
As somebody who gave this talk: https://youtu.be/9xf_IXNylhM I love the
evolution of this topic. Excited to see this! ++1 nb

Patrick



On Thu, May 4, 2023 at 11:35 AM C. Scott Andreas 
wrote:

> +1nb.
>
> As someone familiar with this work, it's pretty hard to overstate the
> impact it has on completing Cassandra's HTAP story. Eliminating the
> overhead of bulk reads and writes on production OLTP clusters is
> transformative.
>
> – Scott
>
> On May 4, 2023, at 9:47 AM, Doug Rohrer  wrote:
>
>
> Hello all,
>
> I’d like to put CEP-28 to a vote.
>
> Proposal:
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>
> Jira:
> https://issues.apache.org/jira/browse/CASSANDRA-16222
>
> Draft implementation:
>
> - Apache Cassandra Spark Analytics source code:
> https://github.com/frankgh/cassandra-analytics
> - Changes required for Sidecar:
> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
>
> Discussion:
> https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3
>
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding
> vetoes.
>
>
> Thanks,
>
> Doug Rohrer
>
>
>
>
>


Re: [POLL] Vector type for ML

2023-05-04 Thread Patrick McFadin
I agree with David's reasoning and the use of DENSE (and maybe eventually
SPARSE). This is terminology well established in the data world, and it
would lead to much easier adoption from users. VECTOR is close, but I can
see having to create a lot of content around "How to use it and not get in
trouble." (I have a lot of that content already)

 - We don't have to explain what it is. A lot of prior art out there
already [1][2][3]
 - We're matching an established term with what users would expect. No
surprises.
 - Shorter ramp-up time for users. Cassandra is being modernized.

The implementation is flexible, but the interface should empower our users
to be awesome.

Patrick

1 -
https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
2 -
https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
3 - https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/

On Thu, May 4, 2023 at 10:25 AM David Capwell  wrote:

> My views have changed over time on syntax and I feel type[dimention] may
> not be the best, so it has gone lower in my own personal ranking… this is
> my current preference
>
> 1) DENSE [dimention] | NON NULL [dimention]
> 2) VECTOR
> 3) type[dimention]
>
> My reasoning for this order
>
> * type[dimention] looks like syntax sugar for array, so
> users may assume list/array semantics, but we limit to non-null elements in
> a frozen array
> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct type
> makes more sense… this also leads to a possible future of VECTOR
> which is the non-fixed length version of this type.  What makes VECTOR
> different from list/array?  non-null elements and is frozen.  I don’t feel
> that VECTOR really tells users to expect non-null or frozen semantics, as
> there exists different VECTOR types for those reasons (sparse vs dense)…
> * DENSE may be confusing for people coming from languages where this just
> means “sequential layout”, which is what our frozen array/list already are…
> but since the target user is coming from a ML background, this shouldn’t
> offer much confusion.  DENSE just means FROZEN in Cassandra, with NON NULL
> elements (SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as
> syntax sugar for frozen
>
>
> On May 4, 2023, at 4:13 AM, Brandon Williams  wrote:
>
> 1. VECTOR
> 2. VECTOR FLOAT[n]
> 3. FLOAT[N]   (Non null by default)
>
> Redundant or not, I think having the VECTOR keyword helps signify what
> the app is generally about and helps get buy-in from ML stakeholders.
>
> On Thu, May 4, 2023 at 3:45 AM Benedict  wrote:
>
>
> Hurrah for initial agreement.
>
> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
> think VECTOR should be used to simply imply non-null, as this would be very
> unintuitive. More logical would be NONNULL, if this is the only condition
> being applied. Alternatively for arrays we could default to NONNULL and
> later introduce NULLABLE if we want to permit nulls.
>
> If the word vector is to be used it makes more sense to make it look like
> a list, so VECTOR as here the word VECTOR is clearly not
> redundant.
>
> So, I vote:
>
> 1) (NON NULL) FLOAT[N]
> 2) FLOAT[N]   (Non null by default)
> 3) VECTOR
>
>
>
> On 4 May 2023, at 08:52, Mick Semb Wever  wrote:
>
> 
>
>
> Did we agree on a CQL syntax?
>
> I don’t believe there has been a pool on CQL syntax… my understanding
> reading all the threads is that there are ~4-5 options and non are -1ed, so
> believe we are waiting for majority rule on this?
>
>
>
>
> Re-reading that thread, IIUC the valid choices remaining are…
>
> 1. VECTOR FLOAT[n]
> 2. FLOAT VECTOR[n]
> 3. VECTOR
> 4. VECTOR[n]
> 5. ARRAY
> 6. NON-NULL FROZEN
>
>
> Yes I'm putting my preference (1) first ;) because (banging on) if the
> future of CQL will have FLOAT[n] and FROZEN, where the VECTOR
> keyword is: for general cql users; just meaning "non-null and frozen",
> these gel best together.
>
> Options (5) and (6) are for those that feel we can and should provide this
> type without introducing the vector keyword.
>
>
>
>


Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
\o/

Bring it in team. Group hug.

Now if you'll excuse me, I'm going to go build my preso on how Cassandra is
the only distributed database you can do vector search in an ACID
transaction.

Patrick

On Tue, May 2, 2023 at 3:27 PM Jonathan Ellis  wrote:

> I had a call with David.  We agreed that we want a "vector" data type with
> these properties
>
> - Fixed length
> - No nulls
> - Random access not supported
>
> Where we disagreed was on my proposal to restrict vectors to only numeric
> data.  David's points were that
>
> (1) He has a use case today for a data type with the other vector
> properties,
> (2) It doesn't seem reasonable to create two data types with the same
> properties, one of which is restricted to numerics, and
> (3) The restrictions that I want for numeric vectors make more sense at
> the index and function level, than at the type level.
>
> I'm ready to concede that David has the better case here and move forward
> with a vector implementation without that restriction.
>
> On Tue, May 2, 2023 at 4:03 PM David Capwell  wrote:
>
>>  How about it, David? Did you already make this?
>>
>>
>> I checked out the patch, fixed serialize/deserialize, added the
>> constraints, then added a composeForFloat(ByteBuffer), with this the impact
>> to the POC patch was the following
>>
>> 1) move away from VectorType.instance.serializer().deserialize(bb) to
>> type.composeForFloat(bb), both return float[]
>> 2) change the index validate logic to move away from checking VectorType
>> and instead check for that plus the element type == FloatType.  I didn’t
>> bother to do this as its trivial
>>
>> David. End this argument. SHOW THE CODE!
>>
>>
>> If this argument ends and people are cool with vector supporting abstract
>> type, more than glad to help get this in.
>>
>> On May 2, 2023, at 1:53 PM, Jeremy Hanna 
>> wrote:
>>
>> I'm all for bringing more functionality to the masses sooner, but the
>> original idea has a very very specific use case.  Do we have use cases for
>> a general purpose Vector/Array data structure?  If so, awesome.  I just
>> wondered if generalizing provides value, beyond being straightforward to
>> implement.  I'm just trying to be sensitive to the database code
>> maintenance and driver support for general types versus a single type for a
>> specific, well defined purpose.
>>
>> If it could easily be a plugin, that's great - but the full picture
>> involves drivers that need to support it or you end up getting binary blobs
>> you have to decode client side and then do stuff with.  So ideally if you
>> have a well defined use case that you can build into the database, having
>> it just be part of the database and associated drivers - that makes the
>> experience much much better.
>>
>> I'm not trying to say B couldn't be valuable or that a plugin couldn't be
>> feasible.  I'm just trying to enlarge the picture a bit to see what that
>> means for this use case and for the supporting drivers/clients.
>>
>> On May 2, 2023, at 3:04 PM, Benedict  wrote:
>>
>> But it’s so trivial it was already implemented by David in the span of
>> ten minutes? If anything, we’re slowing progress down by refusing to do the
>> extra types, as we’re busy arguing about it rather than delivering a
>> feature?
>>
>> FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
>> support types beyond float. Not that we should start with float.
>>
>> So, this whole debate is a mess, I think. But hey ho.
>>
>> On 2 May 2023, at 20:57, Patrick McFadin  wrote:
>>
>> 
>> I'll speak up on that one. If you look at my ranked voting, that is where
>> my head is. I get accused of scope creep (a lot) and looking at the initial
>> proposal Jonathan put on the ML it was mostly "Developers are adopting
>> vector search at a furious pace and I think I have a simple way of adding
>> support to keep Cassandra relevant for these use cases" Instead of just
>> focusing on this use case, I feel the arguments have bike shedded into
>> scope creep which means it will take forever to get into the project.
>>
>> My preference is to see one thing validated with an MVP and get it into
>> the hands of developers sooner so we can continue to iterate based on
>> actual usage.
>>
>> It doesn't say your points are wrong or your opinions are broken, I'm
>> voting for what I think will be awesome for users sooner.
>>
>> Patrick
>>
>> On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:
>>
>>>

Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
Yeah, it's a bit of a mess but mailing list yo. People reading this would
have no idea we are friends. ;) (Which we are, for anyone reading this
later!)

I must have missed the point of this already being done. How about it,
David? Did you already make this?

"FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
support types beyond float. Not that we should start with float"
That is not my interpretation and I can definitely see how that may be
frustrating. If B is pretty much done then we are good. My concern, as
noted earlier, is the scope creep component that will delay this happening
for much longer.

David. End this argument. SHOW THE CODE!

Patrick


On Tue, May 2, 2023 at 1:04 PM Benedict  wrote:

> But it’s so trivial it was already implemented by David in the span of ten
> minutes? If anything, we’re slowing progress down by refusing to do the
> extra types, as we’re busy arguing about it rather than delivering a
> feature?
>
> FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
> support types beyond float. Not that we should start with float.
>
> So, this whole debate is a mess, I think. But hey ho.
>
> On 2 May 2023, at 20:57, Patrick McFadin  wrote:
>
> 
> I'll speak up on that one. If you look at my ranked voting, that is where
> my head is. I get accused of scope creep (a lot) and looking at the initial
> proposal Jonathan put on the ML it was mostly "Developers are adopting
> vector search at a furious pace and I think I have a simple way of adding
> support to keep Cassandra relevant for these use cases" Instead of just
> focusing on this use case, I feel the arguments have bike shedded into
> scope creep which means it will take forever to get into the project.
>
> My preference is to see one thing validated with an MVP and get it into
> the hands of developers sooner so we can continue to iterate based on
> actual usage.
>
> It doesn't say your points are wrong or your opinions are broken, I'm
> voting for what I think will be awesome for users sooner.
>
> Patrick
>
> On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:
>
>> Could folk voting against a general purpose type (that could well be
>> called a vector) briefly explain their reasoning?
>>
>> We established in the other thread that it’s technically trivial, meaning
>> folk must think it is strictly superior to only support float rather than
>> eg all numeric types (note: for the type, not the ANN).
>>
>> I am surprised, and the blurbs accompanying votes so far don’t seem to
>> touch on this, mostly just endorsing the idea of a vector.
>>
>>
>> On 2 May 2023, at 20:20, Patrick McFadin  wrote:
>>
>> 
>> A > B > C on both polls.
>>
>> Having talked to several users in the community that are highly excited
>> about this change, this gets to what developers want to do at Cassandra
>> scale: store embeddings and retrieve them.
>>
>> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
>> wrote:
>>
>>> A > B > C
>>>
>>> I don't think that ML is such a niche application that it can't have its
>>> own CQL data type. Also, vectors are mathematical elements that have more
>>> applications that ML.
>>>
>>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>>>
>>>>
>>>>
>>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>>>
>>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>>> machine learning use cases, specifically feature and embedding vectors for
>>>>> training, inference, and vector search?
>>>>>
>>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>>> industry-leading special-purpose vector database ONLY supports that data
>>>>> type.
>>>>>
>>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>>> thread at
>>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>>
>>>>> Please rank the discussed options from most preferred option to least,
>>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > 
>>>>> B
>>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>>
>>>>> (A) I am in favor of adding a vector type fo

Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
I'll speak up on that one. If you look at my ranked voting, that is where
my head is. I get accused of scope creep (a lot) and looking at the initial
proposal Jonathan put on the ML it was mostly "Developers are adopting
vector search at a furious pace and I think I have a simple way of adding
support to keep Cassandra relevant for these use cases" Instead of just
focusing on this use case, I feel the arguments have bike shedded into
scope creep which means it will take forever to get into the project.

My preference is to see one thing validated with an MVP and get it into the
hands of developers sooner so we can continue to iterate based on actual
usage.

It doesn't say your points are wrong or your opinions are broken, I'm
voting for what I think will be awesome for users sooner.

Patrick

On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:

> Could folk voting against a general purpose type (that could well be
> called a vector) briefly explain their reasoning?
>
> We established in the other thread that it’s technically trivial, meaning
> folk must think it is strictly superior to only support float rather than
> eg all numeric types (note: for the type, not the ANN).
>
> I am surprised, and the blurbs accompanying votes so far don’t seem to
> touch on this, mostly just endorsing the idea of a vector.
>
>
> On 2 May 2023, at 20:20, Patrick McFadin  wrote:
>
> 
> A > B > C on both polls.
>
> Having talked to several users in the community that are highly excited
> about this change, this gets to what developers want to do at Cassandra
> scale: store embeddings and retrieve them.
>
> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
> wrote:
>
>> A > B > C
>>
>> I don't think that ML is such a niche application that it can't have its
>> own CQL data type. Also, vectors are mathematical elements that have more
>> applications that ML.
>>
>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>>
>>>
>>>
>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>>
>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>> machine learning use cases, specifically feature and embedding vectors for
>>>> training, inference, and vector search?
>>>>
>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>> industry-leading special-purpose vector database ONLY supports that data
>>>> type.
>>>>
>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>> thread at
>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>
>>>> Please rank the discussed options from most preferred option to least,
>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>
>>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>>> we need to tie it to any particular implementation details.
>>>>
>>>> (B) I am okay with adding a vector type but I believe we must add array
>>>> types that compose with all Cassandra types first, and make vectors a
>>>> special case of arrays-without-null-elements.
>>>>
>>>> (C) I am not in favor of adding a built-in vector type.
>>>>
>>>
>>>
>>>
>>> A  > B > C
>>>
>>> B is stated as "must add array types…".  I think this is a bit loaded.
>>> If B was the (A + the implementation needs to be a non-null frozen float32
>>> array, serialisation forward compatible with other frozen arrays later
>>> implemented) I would put this before (A).  Especially because it's been
>>> shown already this is easy to implement.
>>>
>>>
>>>
>>


Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
A > B > C on both polls.

Having talked to several users in the community that are highly excited
about this change, this gets to what developers want to do at Cassandra
scale: store embeddings and retrieve them.

On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
wrote:

> A > B > C
>
> I don't think that ML is such a niche application that it can't have its
> own CQL data type. Also, vectors are mathematical elements that have more
> applications that ML.
>
> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>
>>
>>
>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>
>>> Should we add a vector type to Cassandra designed to meet the needs of
>>> machine learning use cases, specifically feature and embedding vectors for
>>> training, inference, and vector search?
>>>
>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>> types, with no nulls allowed, and with no need for random access. The ML
>>> industry overwhelmingly uses float32 vectors, to the point that the
>>> industry-leading special-purpose vector database ONLY supports that data
>>> type.
>>>
>>> This poll is to gauge consensus subsequent to the recent discussion
>>> thread at
>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>
>>> Please rank the discussed options from most preferred option to least,
>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
>>> = A (C is my preference, followed by B or A approximately equally.)
>>>
>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>> we need to tie it to any particular implementation details.
>>>
>>> (B) I am okay with adding a vector type but I believe we must add array
>>> types that compose with all Cassandra types first, and make vectors a
>>> special case of arrays-without-null-elements.
>>>
>>> (C) I am not in favor of adding a built-in vector type.
>>>
>>
>>
>>
>> A  > B > C
>>
>> B is stated as "must add array types…".  I think this is a bit loaded.
>> If B was the (A + the implementation needs to be a non-null frozen float32
>> array, serialisation forward compatible with other frozen arrays later
>> implemented) I would put this before (A).  Especially because it's been
>> shown already this is easy to implement.
>>
>>
>>
>


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Patrick McFadin
>
> So is the goal here to provide something specific and idiomatic for the ML
> community or is the goal to make a primitive that's C*-centric that then
> another layer can write to? I personally argue for the former; I don't see
> this specific data type going away any time soon.


+1 on this concept. We could invite an entirely new class of users into
Cassandra by using familiar syntax. I was surprised that DENSE got nuked so
quickly since it is used in the ML world. [1][2][3]

Patrick

1.
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.linalg.DenseVector.html
2. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
3. https://www.pinecone.io/learn/dense-vector-embeddings-nlp/

On Thu, Apr 27, 2023 at 5:49 PM Josh McKenzie  wrote:

> From a machine learning perspective, vectors are a well-known concept that
> are effectively immutable fixed-length n-dimensional values that are then
> later used either as part of a model or in conjunction with a model after
> the fact.
>
> While we could have this be non-frozen and not call it a vector, I'd be
> inclined to still make the argument for a layer of syntactic sugar on top
> that met ML users where they were with concepts they understood rather than
> forcing them through the cognitive lift of figuring out the Cassandra
> specific contortions to replicate something that's ubiquitous in their
> space. We did the same "Cassandra-first" approach with our JSON support and
> that didn't do us any favors in terms of adoption and usage as far as I
> know.
>
> So is the goal here to provide something specific and idiomatic for the ML
> community or is the goal to make a primitive that's C*-centric that then
> another layer can write to? I personally argue for the former; I don't see
> this specific data type going away any time soon.
>
> On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:
>
> but as you point out it has the problem of allowing nulls.
>
>
> If nulls are not allowed for the elements, then either we need  a) a new
> type, or b) add some way to say elements may not be null…. As much as I do
> like b, I am leaning towards new type for this use case.
>
> So, to flesh out the type requirements I have seen so far
>
> 1) represents a fixed size array of element type
> * on write path we will need to validate this
> 2) element may not be null
> * on write path we will need to validate this
> 3) “frozen” (is this really a requirement for the type or is this
> just simpler for the ANN work?  I feel that this shouldn’t be a requirement)
> 4) works for all types (my requirement; original proposal is float only,
> but could logically expand to primitive types)
>
> Anything else?
>
> The key thing about a vector is that unlike lists or tuples you really
> don't care about individual elements, you care about doing vector and
> matrix multiplications with the thing as a unit.
>
>
> That maybe true for this use case, but “should” this be true for the type
> itself?  I feel like no… if a user wants the Nth element of a vector why
> would we block them?  I am not saying the first patch, or even 5.0 adds
> support for index access, I am just trying to push back saying that the
> type should not block this.
>
> (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT
> VECTOR[N].)
>
>
> Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I
> prefer this syntax but that limitation may not be desired for all use
> cases… we could always add LIST and ARRAY later
> to address that case.
>
> In terms of syntax I have seen, here is my ordered preference:
>
> 1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
> 2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this
> semantic…. Could even be NON NULL TYPE[size]
>
> On Apr 27, 2023, at 9:00 AM, Benedict  wrote:
>
>
> That’s a bounded ring buffer, not a fixed length array.
>
> This definitely isn’t a tuple because the types are all the same, which is
> pretty crucial for matrix operations. Matrix libraries generally work on
> arrays of known dimensionality, or sparse representations.
>
> Whether we draw any semantic link between the frozen list and whatever we
> do here, it is fundamentally a frozen list with a restriction on its size.
> What we’re defining here are “statically” sized arrays, whereas a frozen
> list is essentially a dynamically sized array.
>
> I do not think vector is a good name because vector is used in some other
> popular languages to mean a (dynamic) list, which is confusing when we also
> have a list concept.
>
> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link
> with list. Though it is a bit strange that this particular type declaration
> looks so different to other collection types.
>
> On 27 Apr 2023, at 16:48, Jeff Jirsa  wrote:
>
> 
>
>
> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis  wrote:
>
> It's been a while, so I may be missing something, but do we already have
> 

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread Patrick McFadin
I guess this is an excellent example to explore the minima of what
constitutes a CEP. So far, CEPs have been some large changes, so where does
something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think
I did.)

This is a list of everything needed for a CEP:

Status
Scope
Goals
Approach
Timeline
Mailing list / Slack channels
Related JIRA tickets
Motivation
Audience
Proposed Changes
New or Changed Public Interfaces
Compatibility, Deprecation, and Migration Plan
Test Plan
Rejected Alternatives

This is a big enough change to provide information for each element. Going
back to the spirit of why we started CEPs, we wanted to avoid a mega-commit
without some shaping and agreement before code goes into trunk. I don't
have a clear indication of where that line lies. From our own wiki: "It is
highly recommended to pursue a CEP for significant user-facing or changes
that cut across multiple subsystems." That seems to fit here. Part of my
motivation is being clear with potential new contributors by example and
encouraging more awesomeness.

The changes for operators:
 - New drivers
 - New gaurdrails?
 - Indexing == storage requirements

Patrick

On Tue, Apr 25, 2023 at 10:53 PM Mick Semb Wever  wrote:

> I was soo happy when I saw this, I know many users are going to be
> thrilled about it.
>
>
> On Wed, 26 Apr 2023 at 05:15, Patrick McFadin  wrote:
>
>> Not sure if this is what you are saying, Josh, but I believe this needs
>> to be its own CEP. It's a change in CQL syntax and changes how clusters
>> operate. The change needs to be documented and voted on. Jonathan, you know
>> how to find me if you want me to help write it. :)
>>
>
> I'd be fine with just a DISCUSS thread to agree to the CQL change, since
> it: `DENSE FLOAT32` appears to be a minimal,  and the overall patch
> building on SAI. As Henrik mentioned there's other SAI extensions being
> added too without CEPs.  Can you elaborate on how you see this changing how
> the cluster operates?
>
> This will be easier to decide once we have a patch to look at, but that
> depends on a CEP-7 base (e.g. no feature branch exists). If we do want a
> CEP we need to allow a few weeks to get it through, but that can happen in
> parallel and maybe drafting up something now will be valuable anyway for an
> eventual CEP that proposes the more complete features (e.g.
> cosine_similarity(…)).
>
>
>


Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-25 Thread Patrick McFadin
Not sure if this is what you are saying, Josh, but I believe this needs to
be its own CEP. It's a change in CQL syntax and changes how clusters
operate. The change needs to be documented and voted on. Jonathan, you know
how to find me if you want me to help write it. :)

As a side comment to all of this, last ApacheCon in New Orleans,
Jordan West, Alex Petrov, and I were sitting in the hall track and were
having a discussion about just what we have in Cassandra. There is no other
system like Cassandra, and for scale and distributed data, it stands alone.
What do we do with a robust baseline like this?

This! This is a great example!

Patrick


On Tue, Apr 25, 2023 at 5:03 PM Josh McKenzie  wrote:

> To be fair Dinesh kind of primed that:
>
> Do you intend to make this part of CEP-7 or as an incremental update to
> SAI once it is committed?
>
> ;)
>
> I think this body of work more than stands on its own. Great work
> Jonathan, Mike, and Zhao; having native support for more ML-oriented
> workloads in C* would be a big win for a bunch of our users and plays into
> our architectural strengths in a lot of ways too.
>
> On Tue, Apr 25, 2023, at 7:35 PM, Henrik Ingo wrote:
>
> Jonathan what a great proposal/code. An enjoyable read. And at least for
> me educational! (Which is notable, as you're on my turf, I'm a Data Science
> major.)
>
> Sorry for splitting hairs but CEP-7 (as a spec, and wiki page) is approved
> and voted on and I assume there's no proposal to change that. That said,
> work of course continues beyond CEP-7 and this is not the only SAI feature
> that adds on top of the CEP-7 foundation.
>
> I just wanted to clarify so there's no confusion later.
>
> henrik
>
> On Sat, Apr 22, 2023 at 10:41 PM Jonathan Ellis  wrote:
>
> My guess is that I will be able to get this ready to upstream before the
> rest of CEP-7 goes in, so it would make sense to me to roll it into that.
>
> On Fri, Apr 21, 2023 at 5:34 PM Dinesh Joshi  wrote:
>
> Interesting proposal Jonathan. Will grok it over the weekend and play
> around with the branch.
>
> Do you intend to make this part of CEP-7 or as an incremental update to
> SAI once it is committed?
>
> On Apr 21, 2023, at 2:19 PM, Jonathan Ellis  wrote:
>
> Happy Friday, everyone!
>
> Rich text formatting ahead, I've attached a PDF for those who prefer that.
>
>
> I propose adding approximate nearest neighbor (ANN) vector search
> capability to Apache Cassandra via storage-attached indexes (SAI). This is
> a medium-sized effort that will significantly enhance Cassandra’s
> functionality, particularly for AI use cases. This addition will not only
> provide a new and important feature for existing Cassandra users, but also
> attract new users to the community from the AI space, further expanding
> Cassandra’s reach and relevance.
> Introduction
> Vector search is a powerful document search technique that enables
> developers to quickly find relevant content within an extensive collection
> of documents, which is useful as a standalone technique, but it is
> particularly hot now because it significantly enhances the performance of
> LLMs.
>
> Vector search uses ML models to match the semantics of a question rather
> than just the words it contains, avoiding the classic false positives and
> false negatives associated with term-based search.  Alessandro Benedetti
> gives some good examples in his *excellent talk*
> 
> :
> 
>
> 
>
> You can search across any set of vectors, which are just ordered sets of
> numbers.  In the context of natural language queries and document search,
> we are specifically concerned with a type of vector called an *embedding*
> .
>
> An embedding is a high-dimensional vector that captures the underlying
> semantic relationships and contextual information of words or phrases.
> Embeddings are generated by ML models trained for this purpose; OpenAI
> provides an API to do this, but open-source and self-hostable models like
> BERT are also popular. Creating more accurate and smaller embeddings are
> active research areas in ML.
>
> Large language models (LLMs) can be described as a mile wide and an inch
> deep. They are not experts on any narrow domain (although they will
> hallucinate that they are, sometimes convincingly).  You can remedy this by
> giving the LLM additional context for your query, but the context window is
> small (4k tokens for GPT-3.5, up to 32k for GPT-4), so you want to be very
> selective about giving the LLM the most relevant possible information.
>
> Vector search is red-hot now because it allows us to easily answer the
> question “what are the most relevant documents to provide as context” by
> performing a similarity search between the embeddings vector of the query,
> and those of your document universe.  Doing exact search is prohibitively
> expensive, 

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-09 Thread Patrick McFadin
I love the debate that surfaces occasionally, but I have to agree that
KEYSPACE and SCHEMA are doing the job. There is a learning curve with
Cassandra data modeling, and keywords are a minor problem.

Issues that hit every user:
1. Creating the correct primary key
2. Avoiding the urge to index all-the-things(see item 1)
3. Migrating schema because of 1 and 2

4th bonus issue. Grokking consistency level. "EACH_QUORUM sounds perfect
for me."

I was trying to remember when SCHEMA got added to the CQL parser. With a
quick 'git blame' I was taken back to this beast:
https://issues.apache.org/jira/browse/CASSANDRA-14825

One huge area that was never addressed in the Jira: any documentation that
the official CQL parser now supported SCHEMA. So if anything, we should use
this opportunity to update some docs.

Patrick


On Thu, Apr 6, 2023 at 5:28 PM Dinesh Joshi  wrote:

> I’m strongly in favor of leaving terminology as-is.
>
> On Apr 6, 2023, at 7:20 AM, Bowen Song via dev 
> wrote:
>
> 
>
> *> I'm quite happy to leave things as they are if that is the consensus.*
>
> +1 to the above
>
>
> On 06/04/2023 14:54, Mike Adamson wrote:
>
> My apologies. I started this discussion off the back of a usability
> discussion around new user accessibility to Cassandra and the premise that
> there is an initial steep learning curve for new users. Including new users
> who have worked for a long time in the traditional DBMS field.
>
> On the basis of the reason for the discussion,  TABLEGROUP doesn't sit
> well because of user types / functions / indexes etc. which are not
> strictly tables and is also yet another Cassandra only term.
>
> NAMESPACE could work but it's different usage in other systems could be
> just as confusing to new users.
>
> And, I certainly don't think having multiple names for the same thing just
> to satisfy different parties is a good idea at all.
>
> I'm quite happy to leave things as they are if that is the consensus.
>
> On Thu, 6 Apr 2023 at 14:16, Josh McKenzie  wrote:
>
>> KEYSPACE is fine. If we want to introduce a standard nomenclature like
>> DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no
>> benefit.
>>
>> I'm with Benedict in principle, with Aleksey in practice; I think
>> KEYSPACE and SCHEMA are actually fine enough.
>>
>> If and when we get to any kind of multi-tenancy, having a more
>> metaphorical abstraction that users are familiar with like these becomes
>> more valuable; it's pretty clear that things in different keyspaces,
>> different databases, or even different schemas could have different access
>> rules, resourcing, etc from one another.
>>
>> While the off-the-cuff logical TABLEGROUP thing is a *literal* statement
>> about what the thing is, it'd be another unique term to us;  we have enough
>> things in our system where we've charted our own path. My personal .02 is
>> we don't need to go adding more. :)
>>
>> On Thu, Apr 6, 2023, at 8:54 AM, Mick Semb Wever wrote:
>>
>>
>> … but that should be a different discussion about how we evolve config.
>>
>>
>>
>> I disagree. Nomenclature being difficult can benefit from holistic and
>> forward thinking.
>> Sure you can label this off-topic if you like, but I value our discuss
>> threads being collaborative in an open-mode. Sometimes the best idea is on
>> the tail end of a sequence of bad and/or unpopular ideas.
>>
>>
>>
>>
>>
>>
>
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-06 Thread Patrick McFadin
I love that this is finally coming to Cassandra. Absolutely hate that, once
again, we'll be endorsing the use of ALLOW FILTERING. This is an
anti-pattern that keeps getting legitimized.

Hot take: Should we just not do Milestones 1 and 2 and wait for an
index-only Milestone 3?

Patrick

On Thu, Apr 6, 2023 at 10:04 AM David Capwell  wrote:

> Overall I welcome this feature, was trying to use this around 1-2 months
> back and found we didn’t support, so glad to see it coming!
>
> From a testing point of view, I think we would want to have good fuzz
> testing covering complex types (frozen/non-frozen collections, tuples, udt,
> etc.), and reverse ordering; both sections tend to cause the most problem
> for new features (and existing ones)
>
> We also will want a way to disable this feature, and optionally disable at
> different sections (such as m2’s NOT IN for partition keys).
>
> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski 
> wrote:
> >
> > Hi everyone!
> >
> > I created a new CEP for adding NOT support to the query language and
> > want to start discussion around it:
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> >
> > Happy to get your feedback.
> > --
> > Piotr
>
>


Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-06 Thread Patrick McFadin
+1

Thanks to Lorina for getting people excited about it at Cassandra Forward!

On Thu, Apr 6, 2023 at 10:37 AM Mick Semb Wever  wrote:

> +1
>
> On Thu, 6 Apr 2023 at 19:32, Francisco Guerrero 
> wrote:
>
>> +1 (nb)
>>
>> On 2023/04/06 17:30:37 Josh McKenzie wrote:
>> > +1
>> >
>> > On Thu, Apr 6, 2023, at 12:18 PM, Joseph Lynch wrote:
>> > > +1
>> > >
>> > > This proposal looks really exciting!
>> > >
>> > > -Joey
>> > >
>> > > On Wed, Apr 5, 2023 at 2:13 AM Aleksey Yeshchenko 
>> wrote:
>> > > >
>> > > > +1
>> > > >
>> > > > On 4 Apr 2023, at 16:56, Ekaterina Dimitrova 
>> wrote:
>> > > >
>> > > > +1
>> > > >
>> > > > On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer 
>> wrote:
>> > > >>
>> > > >> +1
>> > > >>
>> > > >> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña <
>> adelap...@apache.org> a écrit :
>> > > >>>
>> > > >>> +1
>> > > >>>
>> > > >>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna <
>> jeremy.hanna1...@gmail.com> wrote:
>> > > 
>> > >  +1 nb, will be great to have this in the codebase - it will make
>> nearly every table's compaction work more efficiently.  The only possible
>> exception is tables that are well suited for TWCS.
>> > > 
>> > >  On Apr 4, 2023, at 8:00 AM, Berenguer Blasi <
>> berenguerbl...@gmail.com> wrote:
>> > > 
>> > >  +1
>> > > 
>> > >  On 4/4/23 14:36, J. D. Jordan wrote:
>> > > 
>> > >  +1
>> > > 
>> > >  On Apr 4, 2023, at 7:29 AM, Brandon Williams 
>> wrote:
>> > > 
>> > >  
>> > >  +1
>> > > 
>> > >  On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov 
>> wrote:
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > I would like to put CEP-26 to a vote.
>> > > >
>> > > > Proposal:
>> > > >
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
>> > > >
>> > > > JIRA and draft implementation:
>> > > > https://issues.apache.org/jira/browse/CASSANDRA-18397
>> > > >
>> > > > Up-to-date documentation:
>> > > >
>> https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
>> > > >
>> > > > Discussion:
>> > > >
>> https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b
>> > > >
>> > > > The vote will be open for 72 hours.
>> > > > A vote passes if there are at least three binding +1s and no
>> binding vetoes.
>> > > >
>> > > > Thanks,
>> > > > Branimir
>> > > 
>> > > 
>> > > >
>> > >
>>
>


Re: Google Season of Docs

2023-04-03 Thread Patrick McFadin
It hardly feels like a loss looking at the fantastic projects that were
selected. Thanks for leading this charge Lorina!

Patrick

On Mon, Apr 3, 2023 at 11:39 AM lorinapoland  wrote:

> Sadly, I am informing the community that our grant application to GSoD was
> unsuccessful.
>
> If you would like to see the list of winning projects, check out
> https://developers.google.com/season-of-docs/docs/participants.
>
> Lorina
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>


Re: Welcome our next PMC Chair Josh McKenzie

2023-03-24 Thread Patrick McFadin
Congrats Josh. This is an excellent acknowledgment of your awesome
contributions to the Cassandra projects.

Mick you left some big shoes to fill. Thank you for your service and for
being an endless advocate for the project.

Patrick

On Fri, Mar 24, 2023 at 8:03 AM Paulo Motta 
wrote:

> Thanks Mick and congratulations Josh!! :)
>
> On Thu, Mar 23, 2023 at 5:33 PM Erick Ramirez 
> wrote:
>
>> Thanks Mick for everything you've done and continue to do for the project!
>> Congratulations Josh and thanks for stepping up! The community is in good
>> shape! 
>>
>


Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Patrick McFadin
No more nodetool createepochunsafe! +1

This is going to be another big merge. Just bookmarking the discussions
last week on CEP-15.

On Mon, Feb 6, 2023 at 9:57 AM Jeff Jirsa  wrote:

> +1
>
>
> On Mon, Feb 6, 2023 at 8:16 AM Sam Tunnicliffe  wrote:
>
>> Hi everyone,
>>
>> I would like to start a vote on this CEP.
>>
>> Proposal:
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
>>
>> Discussion:
>> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
>>
>> The vote will be open for 72 hours.
>> A vote passes if there are at least three binding +1s and no binding
>> vetoes.
>>
>> Thanks,
>> Sam
>>
>


Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-05 Thread Patrick McFadin
Thank you everyone for all the well wishes here and in other parts of the
interwebs. It's always a privilege to work with the people in our community.

Patrick

On Fri, Feb 3, 2023 at 11:24 AM C. Scott Andreas 
wrote:

> Congratulations, Patrick!
>
> On Feb 2, 2023, at 9:46 PM, Berenguer Blasi 
> wrote:
>
>
> Welcome!
> On 3/2/23 4:09, Vinay Chella wrote:
>
> Well deserved one, Congratulations, Patrick.
>
> On Fri, Feb 3, 2023 at 4:01 AM Josh McKenzie  wrote:
>
>> Congrats Patrick! Well deserved.
>>
>> On Thu, Feb 2, 2023, at 5:25 PM, Molly Monroy wrote:
>>
>> Congrats, Patrick... much deserved!
>>
>> On Thu, Feb 2, 2023 at 1:59 PM Derek Chen-Becker 
>> wrote:
>>
>> Congrats!
>>
>> On Thu, Feb 2, 2023 at 10:58 AM Benjamin Lerer  wrote:
>>
>> The PMC members are pleased to announce that Patrick McFadin has accepted
>> the invitation to become committer today.
>>
>> Thanks a lot, Patrick, for everything you have done for this project and
>> its community through the years.
>>
>> Congratulations and welcome!
>>
>> The Apache Cassandra PMC members
>>
>>
>>
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>>
>>
>> --
>
>
> Thanks,
> Vinay Chella
>
>


Important news about Cassandra Summit

2023-02-03 Thread Patrick McFadin
*Hello Cassandra Community,We all see what’s happening in tech right now.
Cuts are being made, and budgets are frozen. For Cassandra Summit, this has
translated to low sponsorship and registrations. The program committee has
been discussing options with the Linux Foundation events team, and the
decision was made to move Cassandra Summit to December 12-13. You’ll see
something official from the Linux Foundation soon. This isn’t what anyone
wanted. It’s a challenging time for our community to gather, and that’s
entirely the point of a Cassandra Summit. Hopefully, this provides enough
space to have the Summit we want and need. Between now and December, the
DataStax community team is ramping up a plan B to keep up the project
momentum during this downturn and facilitate community information sharing.
Cassandra 5.0 is coming, and it’s going to be game-changing. No way we are
waiting until December to talk about it! The plan is to have a virtual
event (online) on March 14 and a series of city-specific Cassandra Days in
the coming months. It’s hard for our community to get out, so we’ll come to
you. More information will follow in the next few days. I want to reassure
you this isn’t specific to our community. I’ve been hearing from many that
you were trying anything to get to San Jose in March, but budgets wouldn’t
allow for any non-essential travel. When I started hearing the same thing
from speakers, then sponsors, I knew this was a large-scale problem. We all
know people impacted by layoffs, and I’m sure many are personally affected.
Let’s come together as a community and help each other. If you have open
positions, call them out in this email thread or #cassandra in the ASF
slack.I want to thank the Linux Foundation Events team personally. They are
exceptional professionals and worked quickly to get us back on track. There
was a rush of events trying to postpone to later in the year, but they were
able to get us a new date. They are as protective of conference uptime like
you are about database uptime. More info to follow. ThanksPatrick*


Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Patrick McFadin
API changes are near and dear to my world. The scope of changes could be
minor or major, so I think B is the right way forward.

Not to throw off the momentum, but could this even warrant a separate CEP
in some cases? For example, CEP-15 is a huge change, but the CQL syntax
will continuously evolve with more use. Being judicious in those changes is
good for end users. It's also a good reference to point back to after the
fact.

Patrick

On Thu, Feb 2, 2023 at 6:01 AM Ekaterina Dimitrova 
wrote:

> “ Only that it locks out of the conversation anyone without a Jira login”
> Very valid point I forgot about - since recently people need invitation in
> order to create account…
> Then I would say C until we clarify the scope. Thanks
>
> On Thu, 2 Feb 2023 at 8:54, Benedict  wrote:
>
>> I think lazy consensus is fine for all of these things. If a DISCUSS
>> thread is crickets, or just positive responses, then definitely it can
>> proceed without further ceremony.
>>
>> I think “with heads-up to the mailing list” is very close to B? Only that
>> it locks out of the conversation anyone without a Jira login.
>>
>> On 2 Feb 2023, at 13:46, Ekaterina Dimitrova 
>> wrote:
>>
>> 
>>
>> While I do agree with you, I am thinking that if we include many things
>> that we would expect lazy consensus on I would probably have different
>> preference.
>>
>> I definitely don’t mean to stall this though so in that case:
>> I’d say combination of A+C (jira with heads up on the ML if someone is
>> interested into the jira) and regular log on API changes separate from
>> CHANGES.txt or we can just add labels to entries in CHANGES.txt as some
>> other projects. (I guess this is a detail we can agree on later on, how to
>> implement it, if we decide to move into that direction)
>>
>> On Thu, 2 Feb 2023 at 8:12, Benedict  wrote:
>>
>>> I think it’s fine to separate the systems from the policy? We are
>>> agreeing a policy for systems we want to make guarantees about to our users
>>> (regarding maintenance and compatibility)
>>>
>>> For me, this is (at minimum) CQL and virtual tables. But I don’t think
>>> the policy differs based on the contents of the list, and given how long
>>> this topic stalled for. Given the primary point of contention seems to be
>>> the *policy* and not the list, I think it’s time to express our opinions
>>> numerically so we can move the conversation forwards.
>>>
>>> This isn’t binding, it just reifies the community sentiment.
>>>
>>> On 2 Feb 2023, at 13:02, Ekaterina Dimitrova 
>>> wrote:
>>>
>>> 
>>>
>>> “ So we can close out this discussion, let’s assume we’re only
>>> discussing any interfaces we want to make promises for. We can have a
>>> separate discussion about which those are if there is any disagreement.”
>>> May I suggest we first clear this topic and then move to voting? I would
>>> say I see confusion, not that much of a disagreement. Should we raise a
>>> discussion for every feature flag for example? In another thread virtual
>>> tables were brought in. I saw also other examples where people expressed
>>> uncertainty. I personally feel I’ll be able to take a more informed
>>> decision and vote if I first see this clarified.
>>>
>>> I will be happy to put down a document and bring it for discussion if
>>> people agree with that
>>>
>>>
>>>
>>> On Thu, 2 Feb 2023 at 7:33, Aleksey Yeshchenko 
>>> wrote:
>>>
 Bringing light to new proposed APIs no less important - if not more,
 for reasons already mentioned in this thread. For it’s not easy to change
 them later.

 Voting B.


 On 2 Feb 2023, at 10:15, Andrés de la Peña 
 wrote:

 If it's a breaking change, like removing a method or property, I think
 we would need a DISCUSS API thread prior to making changes. However, if the
 change is an addition, like adding a new yaml property or a JMX method, I
 think JIRA suffices.





Re: [ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-30 Thread Patrick McFadin
This is really game-changing and an important change for the Cassandra
community. I would like to think that creating a governance structure like
this will help get more ecosystem projects under the umbrella of Apache
Cassandra.

Thank you PMC, for spending the time to create this very needed framework.

Patrick

On Mon, Jan 30, 2023 at 11:02 AM Jeff Jirsa  wrote:

> Usually requires an offer to donate from the current owner, an acceptance
> of that offer (PMC vote), and then the work to ensure that contributions
> are acceptable from a legal standpoint (e.g. like the incubator -
> https://incubator.apache.org/guides/transitioning_asf.html - "For
> contributions composed of patches from individual contributors, it is safe
> to import the code once the major contributors (by volume) have completed
> ICLAs or SGAs.").
>
>
>
> On Mon, Jan 30, 2023 at 10:53 AM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> Great news indeed. I am wondering what it would take to include projects
>> everyone is using like medusa, reaper, cassandra-ldap, etc. as a subproject.
>>
>> Thanks,
>> German
>> --
>> *From:* Francisco Guerrero 
>> *Sent:* Friday, January 27, 2023 9:46 AM
>> *To:* dev@cassandra.apache.org 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] Evolving governance in the
>> Cassandra Ecosystem
>>
>> Great news! I'm very happy to see these changes coming soon.
>>
>> Thanks to everyone involved in this work.
>>
>> On 2023/01/26 21:21:01 Josh McKenzie wrote:
>> > The Cassandra PMC is pleased to announce that we're evolving our
>> governance procedures to better foster subprojects under the Cassandra
>> Ecosystem's umbrella. Astute observers among you may have noticed that the
>> Cassandra Sidecar is already a subproject of Apache Cassandra as of CEP-1 (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D95652224=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=xUbCe%2FQGgZq3Ynr42YQucMkOw1IZ67cONiQSnkZI7bk%3D=0)
>> and Cassandra-14395 (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRASC-24=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=RdItVOzwVs865Xd%2Ff8ancwkTDJWKPosHlKgbl1uysMw%3D=0),
>> however up until now we haven't had any structure to accommodate raising
>> committers on specific subprojects or clarity on the addition or governance
>> of future subprojects.
>> >
>> > Further, with the CEP for the driver donation in motion (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY%2Fedit%23heading%3Dh.xhizycgqxoyo=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=pUXo983DEHRBDtjGD%2FHaZnqc1uRwpS7tBkFkNF9Qfns%3D=0),
>> the need for a structured and sustainable way to expand the Cassandra
>> Ecosystem is pressing.
>> >
>> > We'll document these changes in the confluence wiki as well as the
>> sidecar as our first formal subproject after any discussion on this email
>> thread. The new governance process is as follows:
>> > -
>> >
>> > Subproject Governance
>> > 1. The Apache Cassandra PMC is responsible for governing the broad
>> Cassandra Ecosystem.
>> > 2. The PMC will vote on inclusion of new interested subprojects using
>> the existing procedural change vote process documented in the confluence
>> wiki (Super majority voting: 66% of votes must be in favor to pass.
>> Requires 50% participation of roll call).
>> > 3. New committers for these subprojects will be nominated and raised,
>> both at inclusion as a subproject and over time. Nominations can be brought
>> to priv...@cassandra.apache.org. Typically we're looking for a mix of
>> commitment and contribution to the community and project, be it through
>> code, documentation, presentations, or other significant engagement with
>> the project.
>> > 4. While the commit-bit is ecosystem wide, code modification rights and
>> voting rights (technical contribution, binding -1, CEP's) are granted per
>> subproject
>> >  4a. Individuals are trusted to exercise prudence and only commit
>> or claim binding votes on approved subprojects. Repeated violations of this
>> social contract will result in losing committer status.
>> >  4b. Members of the PMC have 

Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-26 Thread Patrick McFadin
Thanks for the positive reception on email and slack.

We are going to have our first gathering next Wednesday at 8AM PT

Link to calendar event:
https://calendar.google.com/calendar/event?action=TEMPLATE=MDVoY3VucnMwaWViaXA1amFmdXAzcnN0dTYga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com



On Tue, Jan 24, 2023 at 3:35 AM Mick Semb Wever  wrote:

> The market...@cassandra.apache.org list is created.
>
> To subscribe send an email to marketing-subscr...@cassandra.apache.org
> from
> the email address you want to subscribe from.
>
> If you are a committer you can alternately use Whimsy:
> https://whimsy.apache.org/committers/subscribe
>
> regards,
> Mick
>
>
> On Fri, 20 Jan 2023 at 00:31, Patrick McFadin  wrote:
>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Hello Cassandra Community!We are at a pivotal moment for the Cassandra
> > community, with the first Cassandra Summit in 7 years coming up on March
> > 13th, and a major release coming later this year with Cassandra 5.0. It
> is
> > important that we come together to set the publicity strategy and
> direction
> > for these important moments, and that we work together to define how
> > Cassandra shows up across the technology industry.To achieve this, we are
> > proposing the formation of a Publicity & Marketing Working Group, and we
> > are requesting your participation.What is the Publicity & Marketing
> Working
> > Group?This is a working group open to community members who have the
> > insight and skills to help define Cassandra’s public narrative and
> > participate in our marketing strategy and execution. The group will meet
> > once a month for an hour to discuss important marketing topics. You can
> > find us on #cassandra-events. We also propose adding a mailing list,
> > marketing@cassandra.a.o, to handle day-to-day marketing needs and async
> > communication. Our publicity and marketing partners from Constantia -
> Molly
> > Monroy  and Melissa Logan  -
> > will work with us to build this working group. What will this group be
> > responsible for?Our initial vision for this group is to accelerate how we
> > do marketing & publicity for Cassandra. We will refine and advance
> > Cassandra’s public perception of the tech industry, to show how Cassandra
> > has grown, innovated, and revitalized itself as a community. We will do
> > this through: - Participating in marketing strategy for major moments (in
> > particular, C* Summit in March and Cassandra 5.0 release later this
> year)-
> > Expanding our local meetup and events presence- Sourcing end-user case
> > studies for marketing and PR collateral- Making sure the Cassandra
> > community shows up at third-party events- Contributing content - from
> blogs
> > to documentation - to ensure we have a robust stream of content for our
> end
> > usersOur first two orders of business will be: 1. Jointly determine
> > operating model and governance, and get input and alignment on the above
> > goals/responsibilities. 2. Discuss marketing for Cassandra Summit,
> > primarily defining the news we will share at the event from the project
> > directly and from our sponsors. This is coming up quickly and we will
> need
> > community assistance to achieve our publicity goals. As this is a
> > community-driven group, please share ideas and feedback on the purpose of
> > this group and what we need to achieve. When is the meeting?We are
> > proposing the meetings take place on the 4th Wednesday of each month. We
> > will alternate times of the day to try to accommodate. We can adjust
> based
> > on member attendance.  - Jan, March, May, July, Sept, Nov.  - 4th Wed of
> > the month,  8a PT- Feb, April, June, August, October, Dec - 4th Wed of
> the
> > month, Wed 4p PTWe will create a centralized document to share and
> document
> > information about the working group, including meeting minutes, monthly
> > tasks, and priorities. Decisions will be discussed and finalized using
> the
> > project mailing list. Patrick*
> >
>


Cassandra Summit update for 2023-01-24

2023-01-24 Thread Patrick McFadin
*Hello Cassandra Community!Quick take: - Register before 1/28 to get
discount pricing.
https://events.linuxfoundation.org/cassandra-summit/register/
 - Use code
CS23DS20 to get 20% off - Make sure and sign up for training the day on
March 12 - Tell everyone you’re going on social media and use
#CassandraSummit in your postsLonger version:If you have been watching
what’s happening with the Cassandra Summit and thinking about going, I’m
here to convince you that now is the time to register. The early
registration discount ends this Saturday, January 28th. *

*It might be helpful to clarify some misconceptions I keep hearing. Every
other Cassandra Summit (except Cassandra Summit Tokyo) has been an event
planned and run by DataStax. To create a more neutral ground that reflects
our community better, Linux Foundation Events has taken on the considerable
task of running Cassandra Summit in 2023. We are very grateful they took a
chance on our community, and we will be better for it. *






*When DataStax ran the event, we could deeply discount tickets because we
treated it as a marketing expense. I’ve been DMed and Slacked quite a few
times for free passes. Since this is a Linux Foundation event,
unfortunately, there are no complimentary passes, as this is a key part of
recouping their costs. You can get a 20% discount by using this code:
CS23DS20Why is this important to mention? Our community needs an
independent Cassandra Summit, and right now, it needs your support in
attending the event. Let’s show the Linux Foundation that Cassandra Summit
is something we value as a community. I know budgets are tight, and it’s
hard to get approval. If you are able, make the case and register today.
Next year when there are thousands of attendees at Cassandra Summit, you
can tell everyone what they missed in 2023. If making the trip isn’t
something you can do, a virtual pass is only $30 with the discount code and
is also a great way to show support. The other important thing you can help
with? Getting out the word about Cassandra Summit. Tell your colleagues and
co-workers that this is a hot tip and you are hooking them up. If you are
going, tell everyone you’ve registered and use the hashtag
#CassandraSummit. Point out sessions you are interested in and share the
love. If you can convince a couple of people to go, you’ve made a
difference. If you need a little more motivation, just look at this
schedule!
https://events.linuxfoundation.org/cassandra-summit/program/schedule/
Thanks,
and I hope to see you there!Patrick*


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-20 Thread Patrick McFadin
I would be happy to be one of the moderators. Not sure if that's singular
or plural. :D Just need to know how to do it.

Patrick

On Fri, Jan 20, 2023 at 1:44 AM Mick Semb Wever  wrote:

> *To achieve this, we are proposing the formation of a Publicity &
>> Marketing Working Group, and we are requesting your participation.*
>>
>
>
> +1 to the proposal and everything you write Patrick!
>
> I've submitted the request for the ML (can take 24 hours). Who would like
> to be a moderator for the list?
>
> Otherwise let's give this a few days for any concerns, questions,
> objections to be raised.
>
>


[DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-19 Thread Patrick McFadin
*Hello Cassandra Community!We are at a pivotal moment for the Cassandra
community, with the first Cassandra Summit in 7 years coming up on March
13th, and a major release coming later this year with Cassandra 5.0. It is
important that we come together to set the publicity strategy and direction
for these important moments, and that we work together to define how
Cassandra shows up across the technology industry.To achieve this, we are
proposing the formation of a Publicity & Marketing Working Group, and we
are requesting your participation.What is the Publicity & Marketing Working
Group?This is a working group open to community members who have the
insight and skills to help define Cassandra’s public narrative and
participate in our marketing strategy and execution. The group will meet
once a month for an hour to discuss important marketing topics. You can
find us on #cassandra-events. We also propose adding a mailing list,
marketing@cassandra.a.o, to handle day-to-day marketing needs and async
communication. Our publicity and marketing partners from Constantia - Molly
Monroy  and Melissa Logan  -
will work with us to build this working group. What will this group be
responsible for?Our initial vision for this group is to accelerate how we
do marketing & publicity for Cassandra. We will refine and advance
Cassandra’s public perception of the tech industry, to show how Cassandra
has grown, innovated, and revitalized itself as a community. We will do
this through: - Participating in marketing strategy for major moments (in
particular, C* Summit in March and Cassandra 5.0 release later this year)-
Expanding our local meetup and events presence- Sourcing end-user case
studies for marketing and PR collateral- Making sure the Cassandra
community shows up at third-party events- Contributing content - from blogs
to documentation - to ensure we have a robust stream of content for our end
usersOur first two orders of business will be: 1. Jointly determine
operating model and governance, and get input and alignment on the above
goals/responsibilities. 2. Discuss marketing for Cassandra Summit,
primarily defining the news we will share at the event from the project
directly and from our sponsors. This is coming up quickly and we will need
community assistance to achieve our publicity goals. As this is a
community-driven group, please share ideas and feedback on the purpose of
this group and what we need to achieve. When is the meeting?We are
proposing the meetings take place on the 4th Wednesday of each month. We
will alternate times of the day to try to accommodate. We can adjust based
on member attendance.  - Jan, March, May, July, Sept, Nov.  - 4th Wed of
the month,  8a PT- Feb, April, June, August, October, Dec - 4th Wed of the
month, Wed 4p PTWe will create a centralized document to share and document
information about the working group, including meeting minutes, monthly
tasks, and priorities. Decisions will be discussed and finalized using the
project mailing list. Patrick*


Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-30 Thread Patrick McFadin
Why are we still debating build tooling? I think you’re wrong, but I’ve
conceded - on the assumption that we can get enough volunteers willing to
adopt responsibility for the new world order.

Not debating. I am just throwing in my support since I have been in the
Camp of Ant.

On Wed, Nov 30, 2022 at 1:29 AM Benedict  wrote:

> Why are we still debating build tooling? I think you’re wrong, but I’ve
> conceded - on the assumption that we can get enough volunteers willing to
> adopt responsibility for the new world order.
>
> I suggest five long term contributors nominate themselves as the build
> file maintainers, and collectively manage a safe and painless migration for
> the rest of us - and agree to maintain and develop the new build file going
> forwards, and support the community as they adopt it.
>
> On the topic of over-exuberant linting I will continue to push back. I
> think linting our brace rules could make sense since they are atypical, but
> more formatting rules than this likely just leads to atrophying style.
> Authorship involves thinking about how to present your code; I don’t want
> to either encourage lazy authorship or prevent experimentation with
> presentation. Both would be bad, and I expect we would struggle to evolve
> our style guide again in future as the language evolves. Our brace rules
> are a good example everyone unilaterally ignored when lambdas arrived, as
> we all recognised they materially harmed the brevity benefits, and we
> eventually codified this.
>
> On migration: auto formatters applied to code that was not written with
> the rules in mind will almost unerringly be made a mess of, so having a
> tool do this is not acceptable IMO.
>
> The idea of checkstyle being the source of truth continues to be untenable
> and anyone still pushing for this should please engage with my earlier
> points on this.
>
>
> On 30 Nov 2022, at 04:06, Patrick McFadin  wrote:
>
> 
> I'm going to +1 what Stefan has said. I've heard on many occasions from
> newcomers to the project that having to use Ant is a deterrent. As a matter
> of fact, a few weeks ago, I spent a Sunday afternoon helping somebody
> trying to build Cassandra and Ant caused a ton of problems. "Ok. ant really
> super clean this time"
>
> Sure it still works for people that have been doing this for years. I
> drive a 20 year old Toyota truck, but I'm reminded by my kids often that
> it's not cool. So in that spirit, I feel my saying we need to keep Ant is
> like saying "You kids get off my lawn!" If it's something that will help
> attract new contributors, I'm all for it.
>
> Patrick
>
> On Fri, Nov 25, 2022 at 2:22 AM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
>> I agree with what you wrote. How I understand it is that migrating to
>> Maven/Gradle makes the project more "attractive" for newcomers. If a
>> project is built on "that old un-cool Ant", it might be a little bit
>> off-putting and questionable if we are "stuck in the past on build systems
>> and not progressing".
>>
>> So in that sense I agree this is more "marketing" rather than
>> technological question but on the other hand, does not Maven/Gradle allow
>> us to modularize the project better? Maybe we would like to modularize but
>> nobody is up to that because build system makes it impossible or at least
>> quite inconvenient to do so. Do you really think there are not any
>> significant benefits to switch even if it "just works" now?
>>
>> 
>> From: Benedict 
>> Sent: Friday, November 25, 2022 11:07
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> There’s always a handful of people asking for it, but notably few if any
>> of the full time contributors doing the majority of the core development of
>> Cassandra. It strikes me as something very appealing to others, but less so
>> to those wanting to get on with development.
>>
>> I never really see a good argument articulated for the migration, besides
>> general hand waving that ant is old, and people like newer build systems.
>> Ant is working fine, so there isn’t a strong technical reason to replace
>> it, and there are good organisational reasons not to.
>>
>> Why do you consider a migration inevitable?
>>
>>
>>
>> > On 25 Nov 2022, at 09:58, M

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-29 Thread Patrick McFadin
I'm going to +1 what Stefan has said. I've heard on many occasions from
newcomers to the project that having to use Ant is a deterrent. As a matter
of fact, a few weeks ago, I spent a Sunday afternoon helping somebody
trying to build Cassandra and Ant caused a ton of problems. "Ok. ant really
super clean this time"

Sure it still works for people that have been doing this for years. I drive
a 20 year old Toyota truck, but I'm reminded by my kids often that it's not
cool. So in that spirit, I feel my saying we need to keep Ant is like
saying "You kids get off my lawn!" If it's something that will help attract
new contributors, I'm all for it.

Patrick

On Fri, Nov 25, 2022 at 2:22 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I agree with what you wrote. How I understand it is that migrating to
> Maven/Gradle makes the project more "attractive" for newcomers. If a
> project is built on "that old un-cool Ant", it might be a little bit
> off-putting and questionable if we are "stuck in the past on build systems
> and not progressing".
>
> So in that sense I agree this is more "marketing" rather than
> technological question but on the other hand, does not Maven/Gradle allow
> us to modularize the project better? Maybe we would like to modularize but
> nobody is up to that because build system makes it impossible or at least
> quite inconvenient to do so. Do you really think there are not any
> significant benefits to switch even if it "just works" now?
>
> 
> From: Benedict 
> Sent: Friday, November 25, 2022 11:07
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> There’s always a handful of people asking for it, but notably few if any
> of the full time contributors doing the majority of the core development of
> Cassandra. It strikes me as something very appealing to others, but less so
> to those wanting to get on with development.
>
> I never really see a good argument articulated for the migration, besides
> general hand waving that ant is old, and people like newer build systems.
> Ant is working fine, so there isn’t a strong technical reason to replace
> it, and there are good organisational reasons not to.
>
> Why do you consider a migration inevitable?
>
>
>
> > On 25 Nov 2022, at 09:58, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >
> > Interesting take on Ant / no-Ant, Benedict. I am very curious how this
> unfolds. My long-term perception is that changing it to something else is
> more or less inevitable but if there is a broader consensus to not do that
>  well.
> >
> > 
> > From: Benedict 
> > Sent: Friday, November 25, 2022 10:52
> > To: dev@cassandra.apache.org
> > Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
> >
> > NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> >
> >
> >
> >
> > I was in a bit of a rush last night. I should say that I’m of course +1
> a general endeavour to clean this up, and to expand our use of linters, and
> I appreciate your volunteering to help out in this way Maxim.
> >
> > However, responding to Stefan, I’m pretty -1 migrating from ant to
> another build system without really good reason. Migration has a real cost
> to productivity for all existing contributors, and the phantom of
> increasing new contributions has never paid off historically. I’m all for
> easing people into participation, but not at penalty to the existing
> contributor base.
> >
> > If the only reason is to make it easier to open in a different IDE, we
> can perhaps have some basic build files outlining code structure for
> importing, that are compatible with our canonical ant build? We could
> perhaps even generate them.
> >
> >
> >> On 25 Nov 2022, at 09:35, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >>
> >> For the record, I was testing that same combo Claude mentioned and it
> did not work out of the box but it is definitely possible to set up
> successfully. I do not remember the details.
> >>
> >> To replay to Maxim, it all seems good to me, roughly, but I humbly
> think it all boils down to Maven/Gradle refactoring and on top of that we
> can do all else.
> >>
> >> For example, there is (1) where the solution, besides fixing the tests,
> is to introduce an Ant task which would check this on build. That being
> said, how is that going to look like when we change Ant for something else?
> That stuff suddenly becomes obsolete.
> >>
> >> This case maybe applies to other problems we want to solve as well. I
> do not want to do something tailored for one build system just to rewrite
> it all or to 

Cassandra Summit CFP update

2022-11-29 Thread Patrick McFadin
*Hi everyone,An update on the current CFP process for Cassandra
Summit. There are currently 23 talk submissions which are far behind what
we need. Two days of tracks mean we need 60 approved talks. Ideally, we
need over 100 submitted to ensure we have a good pool of quality talks. We
already have quite a few vendor pitches that have nothing to do with
Cassandra. Think of it as like CFP
spam. https://events.linuxfoundation.org/cassandra-summit/program/cfp/
<https://events.linuxfoundation.org/cassandra-summit/program/cfp/>The
deadline is December 11th. That is 12 days! If you are assuming that will
get pushed out, don’t. We have a tight schedule before March 13th. Speakers
must be notified of talk acceptance by the beginning of January to book
travel in time. The full schedule will be published by mid-January. That
being said, I have talked to quite a few people that are working on a
submission. Thank you for being willing to create a talk! How can I help
you get it completed? Again, here is my Calendly link if you need to talk
it over:
https://calendly.com/patrick-mcfadin/15-minute-cassandra-summit-cfp-consult
<https://calendly.com/patrick-mcfadin/15-minute-cassandra-summit-cfp-consult>This
is our conference! Let’s make it a festival of the database we love and the
things we build with it. One more thing. We need sponsors! If your employer
can, this is a great opportunity to get your brand out in front of people
building the future. I’ll be back. Go submit a talk. You’ll be happy you
did! Patrick*


Speak at the Cassandra Summit

2022-11-21 Thread Patrick McFadin
Hello Cassandra Community!

Hopefully, you’ve seen the news that we are having a Cassandra Summit on
March 13, 2022. It’s been years since we have done something this big in
the community. We’re all a little out of practice. In an open source
community like ours, one of the most important things we can do is share
information. A summit is where we can concentrate all that sharing in one
place, record what’s said, and have great conversations. We need everyone
to submit a talk proposal by December 11th, 2022. We are accepting *in-person
or remote*! Here’s the link:
https://events.linuxfoundation.org/cassandra-summit/program/cfp

If you are ready and clicked the link, then we are good. Thanks for
reading, and see you in March. For those of you who still need to get
ready, read on.

The barrier to submitting a talk proposal falls into two buckets. 1) *My
employer won’t let me (or pay for travel.)* 2) *I don’t think I have
anything worthy of a talk at Cassandra Summit. *

*For employer issues*

   -

   Sharing information is a valuable contribution to an open source project
   that you use, and it’s paying back to the community in one of the best
   ways.
   -

   Cassandra Summit is a vendor-independent event run by the Linux
   Foundation. This event is not an endorsement of a particular vendor.
   -

   Getting a talk accepted gets you a free ticket. In addition, the amount
   of learning you will get just as an attendee will upgrade whatever you are
   doing with Cassandra.


*Doubt over topics*

   -

   Every use case of Cassandra is unique, and within a 20-30 minute talk,
   sharing two or three key things can go fast.
   -

   Some of the best talks are “Things we did wrong” Talks don’t always have
   to show off new things.
   -

   Expand your vision with some ecosystem! A talk about using product X
   with Cassandra will be very popular.


More general advice on what to think about before filling out the CFP form.
Cassandra Summit is our opportunity to show the world how Cassandra is
moving into the future. Cassandra is the database of choice for cloud
native applications. Some topic areas that will help us all tell that story:


   -

   Developing Applications with Cassandra
   -

   Cloud-native Deployments and Strategies
   -

   Ecosystem Tools that Leverage Cassandra
   -

   What’s Coming for Future Cassandra Versions
   -

   Use Cases and Sharing about Best Practices


And finally, I want to make sure you feel supported for any challenges you
may have. I have set up a unique Calendly link if you want a quick
consultation. You can choose Google Meet or Zoom and find a time that works
for you.

https://calendly.com/patrick-mcfadin/15-minute-cassandra-summit-cfp-consult

You can also find me on the ASF slack or email if you prefer that method.


The deadline is December 11, 2022, at midnight PST. Act now, and don’t be
surprised. Notification of acceptance status will happen in early January,
so this will go quickly!

Patrick


Re: Shall 4.2 become 5.0 ?

2022-09-30 Thread Patrick McFadin
Removing JDK 8 is worth it alone. CEP-21 is going to be the real break
though, and with CEP-15 tied to it, I can't see another path.

Patrick

On Mon, Sep 26, 2022 at 1:46 PM Caleb Rackliffe 
wrote:

> Mick,
>
> Ignore me. I misread your original post.
>
>
>
> On Mon, Sep 26, 2022 at 2:01 PM Ekaterina Dimitrova 
> wrote:
>
>> We agreed long ago to drop the JavaScript UDFs, they were already
>> deprecated in CASSANDRA-17280
>> That was decided around Nashorn and JDK17 and there is ticket
>> CASSANDRA-17281 to cover that effort.
>>
>> On Mon, 26 Sep 2022 at 14:05, Mick Semb Wever  wrote:
>>
>>>
>>> It's obviously still in progress, but CASSANDRA-16052
  may introduce
 some breaking changes to the 2i API.

>>>
>>>
>>> Can you elaborate Caleb? To my understanding, we do not want (and this
>>> thread is not about) permitting breaking changes from one version to the
>>> next. Are there deprecated 2i APIs in 4.x that you will need removed for
>>> 16052 ?
>>>
>>>


Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Patrick McFadin
I'm also working on different use cases and syntax for Accord :D

I'm +1 on this change and leaving the door open for maybe a few more as we
test this out. It needs to be functionally useful for developers in v1, and
I think it's worth the changes to get it right.

One other thing Caleb and I have been discussing is how, when running a
transaction, the statement returns with no message. In CQLSH you have no
idea if anything happened unless you select from the tables and look for
changes. Even something like LWT adds with "applied=true|false"

Patrick

On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
> UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> dev@cassandra.apache.org> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev  [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>

Re: [Discuss] CASSANDRA-17896, Gossip, and foot guns

2022-09-20 Thread Patrick McFadin
IIRC that is something you can already change in JMX? If that's the case, I
say leave that as the barrier to entry into the "parameters of doom."

CEP-21 is the right path forward. It addresses the root cause instead of
creating more ways to fix how you got there. This is the best thing for end
users.

Patrick

On Tue, Sep 20, 2022 at 10:09 AM Josh McKenzie  wrote:

> Ticket for reference:
> https://issues.apache.org/jira/browse/CASSANDRA-17896
>
> Context: "We should expose a system env (-D) param to advanced operators
> to have the ability to specify the replace_addresses_token to be used
> during host replacement in cases where Gossip gets into a bad state."
>
> My question for the dev list: *should* we expose this parameter and
> functionality even if it's heavily documented as being highly unsafe and a
> big foot gun? Clusters can get into states where you effectively can't
> bootstrap a replacement without nuking it and starting over and manually
> intervening / twiddling with peers tables, which this allows us to work
> around a bit more gracefully as operators, but if you do this the wrong way
> it opens up a world of hurt.
>
> Given CEP-21 is on the horizon (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> ) I'm leaning towards closing this out as Won't Fix but leaving the
> branch linked in the event someone runs into this and wants to hotfix it
> into a local build or something; I'm assuming CEP-21 will land before the
> next major which would make this redundant.
>
> What does everyone else think?
>
> ~Josh
>


Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Patrick McFadin
The replies got trashed pretty badly in the responses.
When you say: "Agree it's better to reuse existing syntax than invent new
syntax."

Which syntax are you referring to?

Patrick


On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
wrote:

> Agree it's better to reuse existing syntax than invent new syntax.
>
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say,
> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select
> a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding
> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = * is simply syntactic sugar
> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
> >>> and 5 all at once, consistently alongside each other.
> >>>
> >>> Option 6 is in fact very similar, but is strictly less flexible for the
> >>> user as they have no way to declare multiple scalar variables without
> >>> scoping them inside a tuple.
> >>>
> >>> e.g.
> >>> LET key = pk, value = v FROM table
> >>> IF key > 1 AND value > 1 THEN...
> >>>
> >>> =>
> >>> LET row = SELECT pk AS key, v AS value FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> However, both are expressible in the existing proposal, as if you
> prefer
> >>> this naming scheme you can simply write
> >>>
> >>> LET row = (pk AS key, v AS value) FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> With respect to auto converting single column results to a scalar, we
> do
> >>> need a way for the user to say they care whether the row was null or
> the
> >>> column. I think an implicit conversion here could be surprising.
> However
> >>> we could implement tuple expressions anyway and let the user explicitly
> >>> declare v as a tuple as Caleb has suggested for the existing proposal
> as
> >>> well.
> >>>
> >>> Assigning constants 

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Patrick McFadin
I am +1 on

IS NOT NULL/IS NULL instead of EXISTS/NOT EXISTS

Not requiring (but allowing) SELECT on LET

Patrick

On Mon, Aug 15, 2022 at 11:01 AM Caleb Rackliffe 
wrote:

> Monday Morning Caleb has digested, and here's where I am...
>
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and
> to be honest, this may make some implementation things easier for me (i.e.
> the encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong
> feeling that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of
> the notational convenience of the previous proposal. Again, though, I don't
> have a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of
> supporting a single level of deconstruction to begin with.
>
> Having said all that, on Friday I finished a prototype (based on some of
> Blake's previous work) of the syntax/grammar we've more or less agreed upon
> here, including an implementation of what I described as option #5 above:
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
>
> To look at specific examples, see these tests:
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
>
> There are only two things that aren't yet congruent w/ our discussion
> above, but they should both be trivial to fix:
>
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
>
> If I were to just fix those two items, would we be in agreement on this
> being both the core of the syntax we want and compatible w/ the wish list
> for future items?
>
>
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> 
>> 
>>
>> Verbose version:
>> LET (a) = SELECT val FROM table
>> IF a > 1 THEN...
>>
>> Less verbose version:
>> LET a = SELECT val FROM table
>> IF a.val > 1 THEN...
>>
>>
>>
>> My intention is that these are actually two different ways of expressing
>> the same thing, both supported and neither intended to be more or less
>> verbose than the other. The advantage of permitting both is that you can
>> also write
>>
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>>
>> Alternatively, for non-queries:
>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>> or less verbose:
>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>
>>
>> I personally prefer clarity over any arbitrary verbosity/succinct
>> distinction, but we’re in general “taste” territory here. Since this syntax
>> includes the SELECT on the RHS, it makes sense to only require this for
>> situations where a query is being performed. Though I think if SELECT
>> without a FROM is supported then we will likely end up supporting *all
>> of the above*.
>>
>> Weighing in on the "SELECT without a FROM," I think that is fine and, as
>> Avi stated
>>
>>
>> Yep, definitely fine. Question is just whether we bother to offer it.
>> Also, evidently, whether we support LET *without* a SELECT on the RHS. I
>> am strongly in favour of this, as *requiring* a SELECT even when there’s
>> no table involved is counter-intuitive to me, as LET is now a distinct
>> concept that looks like variable declaration in other languages.
>>
>> Nested:
>> LET (x, y) = SELECT x, y FROM…
>>
>>
>> Deconstruction here refers to the above, i.e. extracting variables x and
>> y from the tuple on the RHS
>>
>> Nesting is just a question of whether we support either nested tuple
>> declarations, or nested deconstruction, which might include any of the
>> following:
>>
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>>
>> IMO, once you start supporting features they need to be sort of
>> intuitively discoverable by users, so that a concept can be used in all
>> places you might expect.
>>
>> But I would be fine with an arbitrary restriction of at most one SELECT
>> on the RHS, or even ONLY a SELECT *or* some other tuple, and at most one
>> level of deconstruction of the RHS.
>>
>>
>&

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Patrick McFadin
ing FROM optional, as it's recognized by other SQL
> dialects.
>
>
> 
> Also since LET is only binding variables, is there any reason we shouldn’t
> support multiple SELECT assignments in a single LET?, e.g.
> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>
>
> What if an inner select returns a tuple? Would y be a tuple?
>
>
> I think this is redundant and atypical enough to not be worth supporting.
> Most people would use separate LETs.
>
>
> 
> Also whether we support tuples in SELECT statements anyway, e.g.
> LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
> IF tuple1.a > 1 AND tuple2.d > 1…
>
>
> Absolutely, this just flows naturally from having tuples. There's no
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".
>
>
>
> 
> and whether we support nested deconstruction, e.g.
> LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
> IF a > 1 AND d > 1…
>
>
> I think this can be safely deferred. Most people would again separate it
> into separate LETs.
>
>
> I'd add (to the specification) that LETs cannot override a previously
> defined variable, just to reduce ambiguity.
>
>
>
>
>
>
>
>
> On 14 Aug 2022, at 13:55, Avi Kivity via dev 
>  wrote:
>
>
> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>
> 
> I’ll do my best to express with my thinking, as well as how I would
> explain the feature to a user.
>
> My mental model for LET statements is that they are simply SELECT
> statements where the columns that are selected become variables accessible
> anywhere in the scope of the transaction. That is to say, you should be
> able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS
> \1\3/g on the columns of a LET statement and produce a valid SELECT
> statement, and vice versa. Both should perform identically.
>
> e.g.
> SELECT pk AS key, v AS value FROM table
>
> =>
> LET key = pk, value = v FROM table
>
>
> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> supports selecting comparisons:
>
>
> $ psql
> psql (14.3)
> Type "help" for help.
>
> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>  ?column? | ?column? | ?column?
> --+--+--
>  f| t|
> (1 row)
>
>
> Using "=" as a syntactic element in LET would make SELECT and LET
> incompatible once comparisons become valid selectors. Unless they become
> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
> comparison).
>
>
> I personally prefer the nested query syntax:
>
>
> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>
>
> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> immediately recognizable by everyone as a query, LET is not.
>
>
>
> Identical form, identical behaviour. Every statement should be directly
> translatable with some simple text manipulation.
>
> We can then make this more powerful for users by simply expanding SELECT
> statements, e.g. by permitting them to declare constants and tuples in the
> column results. In this scheme LET x = * is simply syntactic sugar for LET
> x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at
> once, consistently alongside each other.
>
> Option 6 is in fact very similar, but is strictly less flexible for the
> user as they have no way to declare multiple scalar variables without
> scoping them inside a tuple.
>
> e.g.
> LET key = pk, value = v FROM table
> IF key > 1 AND value > 1 THEN...
>
> =>
> LET row = SELECT pk AS key, v AS value FROM table
> IF row.key > 1 AND row.value > 1 THEN…
>
> However, both are expressible in the existing proposal, as if you prefer
> this naming scheme you can simply write
>
> LET row = (pk AS key, v AS value) FROM table
> IF row.key > 1 AND row.value > 1 THEN…
>
> With respect to auto converting single column results to a scalar, we do
> need a way for the user to say they care whether the row was null or the
> column. I think an implicit conversion here could be surprising. However we
> could implement tuple expressions anyway and let the user explicitly
> declare v as a tuple as Caleb has suggested for the existing proposal as
> well.
>
> Assigning constants or other values not selected from a table would also
> be a little clunky:
>
> LET v1 = someFunc(), v2 = someOtherFunc(?)
> IF v1 > 1 AND v2 > 1 THEN…
>
> =>
> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
> IF row.v1 > 1 AND row.v2 > 1 THEN...
>
> That said, the proposals are *close* to ide

Re: CEP-15 multi key transaction syntax

2022-08-13 Thread Patrick McFadin
I'm really happy to see CEP-15 getting closer to a final implementation.
I'm going to walk through my reasoning for your proposals wrt trying to
explain this to somebody new.

Looking at all the options, the first thing that comes up for me is the
Cassandra project's complicated relationship with NULL.  We have prior art
with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is
used in materialized views similarly to proposals 2,4 and 5.

CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
  AS SELECT [ (column_list) ]
  FROM [keyspace_name.]table_name
  [ WHERE column_name IS NOT NULL
  [ AND column_name IS NOT NULL ... ] ]
  [ AND relation [ AND ... ] ]
  PRIMARY KEY ( column_list )
  [ WITH [ table_properties ]
  [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

 Based on that, I believe 1 and 3 would just confuse users, so -1 on those.

Trying to explain the difference between row and column operations with
LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds more syntax.

#5 is verbose and, in my opinion, easier to reason when writing a query.
Thinking top down, I need to know if these exact rows and/or column values
exist before changing them, so I'll define them first. Then I'll iterate
over the state I created in my actual changes so I know I'm changing
precisely what I want.

#5 could use a bit more to be clearer to somebody who doesn't write CQL
queries daily and wouldn't require memorizing subtle differences. It should
be similar to all the other syntax, so learning a little about CQL will let
you move into more without completely re-learning the new syntax.

So I propose #6)
BEGIN TRANSACTION
  LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
columns
  LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1, row2
  IF row1 IS NULL AND row2.v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

I added the SELECT in the LET just so it's straightforward, you are
reading, and it's just like doing a regular select, but you are assigning
it to a variable.

I removed the confusing 'row1.v' and replaced it with 'row1' I can't see
why you would need the '.v' vs having the complete variable I created in
the statement above.

EOL

Patrick

On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe 
wrote:

> ...and one more option...
>
> 5.) Introduce tuple assignments, removing all ambiguity around row vs.
> column operations.
>
> BEGIN TRANSACTION
>   LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
>   LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1.v, row2.v
>   IF row1 IS NULL AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
>
>
> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe 
> wrote:
>
>> via Benedict, here is a 4th option:
>>
>> 4.) Similar to #2, but don't rely on the key element being NULL.
>>
>> If the read returns no result, x effectively becomes NULL. Otherwise, it
>> remains true/NOT NULL.
>>
>> BEGIN TRANSACTION
>>   LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>   SELECT x, row2_v
>>   IF x IS NULL AND row2_v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> Hello again everyone!
>>>
>>> I've been working on a prototype
>>>  in
>>> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed
>>> on in this thread. One thing that isn't immediately obvious to me is how
>>> the LET syntax handles cases where we want to check for the plain existence
>>> of a row in IF. For example, in this hybrid of the originally proposed
>>> syntax and something more like what we've agreed on (and the RETURNING just
>>> to distinguish between that and SELECT), this could be pretty
>>> straightforward:
>>>
>>> BEGIN TRANSACTION
>>>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>   RETURNING row1.v, row2.v
>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>   END IF
>>> COMMIT TRANSACTION
>>>
>>> The NOT EXISTS operator has row1 to work with. One the other hand, w/
>>> the LET syntax and no naming of reads, it's not clear what the best
>>> solution would be. Here are a few possibilities:
>>>
>>> 1.) Provide a few built-in functions that operate on a whole result row.
>>> If we assume a SQL style IS NULL and IS NOT NULL (see my last post here)
>>> for operations on particular columns, this probably eliminates the need for
>>> EXISTS/NOT EXISTS as well.
>>>
>>> BEGIN TRANSACTION
>>>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>>>   LET row2_v = v 

Re: help for a side project

2022-06-29 Thread Patrick McFadin
Hi Norman,

In short, any time you add a node to a cluster there will be a
redistribution of data and it will be proportional to the total number of
nodes you have in the cluster. VNodes just create smaller chunks and
distribute them around the cluster more. If you have a 3 node cluster with
a RF=1(for simplicity's sake) and add 1 node, every existing node has to
reduce its responsibility from 1/3 of the cluster data to 1/4. The new node
will need to accept 1/4 of the total cluster data as a part of joining.
That's the basics but you can extrapolate from there.

I would be happy to get on zoom and talk it over. Here's my scheduling
link: https://calendly.com/patrick-mcfadin/30min_zoom

Patrick

On Wed, Jun 29, 2022 at 5:13 AM Norman Menfel  wrote:

> Hi all,
>
> apologies for writing to this mailing list but I tried the user mailing
> list, 2 slack channels, reddit and 3 discord channels and got horrible and
> confused answers.
>
> I'm working on a school project trying to reproduce the tokens
> distribution algorithm described in the dynamo db paper. All I want to
> build is a cluster where nodes can join/leave managing vnodes distribution
> just like in Cassandra (I don't care about r/w, replication,)
>
> I believe I understand how everything works without vnodes. But everything
> stops making sense when introducing vnodes. For example, when a new node
> joins a cluster
> new vnodes need to be created. Why adding vnodes does not create a massive
> redistribution of data in the cluster? afterall, adding vnodes means that
> every vnode in the cluster has to "give up" some data to other vnodes in
> order to keep a balanced load across the cluster.
>
> From the documentation it seems like only the portion of the ring
> associated with the node should soffer this redistribution but why does a
> node have a portion of the partition ring associated with it when the
> vnodes stored on the node may be from any portion of the ring?
>
> As you can see, I'm quite confused! I understand that to give me a full
> answer may take you too much time but if you could just point me in the
> right direction, tell me where should I look in the source code, or share
> some links (I've already read anything on the apache
> website/datastax.Ive even read riak documentation trying to find clues)
> that would be amazing!
>
> Thanks a lot for your time and keep up the great work, I love Cassandra!
> Norman
>


Re: The Apache Cassandra(R) Corner Podcast

2022-06-27 Thread Patrick McFadin
Apache Cassandra(R) Corner Podcast doesn't exactly flow. Why not keep the
"Cassandra Corner" ? That falls under the nominative use guidelines for the
Apache Foundation trademarks.

Patrick

On Mon, Jun 27, 2022 at 12:18 PM Nate McCall  wrote:

>
>
>>
>> Are there any objections to this approach?
>>
>>
> Linking to a podcast is just like linking to a presentation or a blog
> entry back in the day. As long as the trademark keeps being respected (good
> effort so far!) I don't see any issue with it. Thanks for taking initiative
> on this, Aaron.
>
> -Nate
>


Re: CEP-15 multi key transaction syntax

2022-06-11 Thread Patrick McFadin
I think the syntax is evolving into something pretty complicated, which may
be warranted but I wanted to take a step back and be a bit more reflective
on what we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL
transactions across different systems. That's my personal bias when I see
the word "database transaction" in this case. When you start a SQL
transaction, you are creating a branch of your data that you can operate
with until you reach your desired state and then merge it back with a
commit. Or if you don't like what you see, use a rollback and act like it
never happened. That was the thinking when I asked about interactive
sessions. If you are using a driver, that all happens in a batch. I realize
that is out of scope here, but that's probably knowledge that is
pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a
philosophical fork in the road which hopefully will generate some comments
in the larger user community.

Path 1)
Mimic what's already been available in the SQL community, using existing
CQL syntax. (SQL Example using JDBC:
https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people
on what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM bened...@apache.org 
wrote:

> This might also permit us to remove one result set (the success/failure
> one) and return instead an exception if the transaction is aborted. This is
> also more consistent with SQL, if memory serves. That might conflict with
> returning the other result sets in the event of abort (though that’s up to
> us ultimately), but it feels like a nicer API for the user – depending on
> how these exceptions are surfaced in client APIs.
>
>
>
> *From: *bened...@apache.org 
> *Date: *Friday, 10 June 2022 at 19:59
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> So, thinking on it myself some more, I think if there’s an option that *
> *doesn’t** require the user to reason about the point at which the read
> happens in order to understand how the condition is applied would probably
> be better.
>
>
>
> What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?
>
>
>
> It’s compatible with more advanced IF functionality later, and probably
> not much trickier to implement?
>
>
>
> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we
> only get one chance to make this API right.
>
>
>
>
>
> *From: *Blake Eggleston 
> *Date: *Friday, 10 June 2022 at 18:56
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Yeah I think that’s intuitive enough. I had been thinking about multiple
> condition branches, but was thinking about something closer to
>
> IF select.column=5
>   UPDATE ... SET ... WHERE key=1;
> ELSE IF select.column=6
>   UPDATE ... SET ... WHERE key=2;
> ELSE
>   UPDATE ... SET ... WHERE key=3;
> ENDIF
> COMMIT TRANSACTION;
>
> Which would make the proposed COMMIT IF we're talking about now a
> shorthand. Of course this would be follow on work.
>
>
>
>
> On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote:
>
>
>
> I imagine that conditions would be evaluated against the state prior to
> the execution of statement against which it is being evaluated, but after
> the prior statements. I *think* that should be OK to reason about.
>
>
>
> i.e. we might have a contrived example like:
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>
>
>
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>
>
>
> I think this is probably adequately intuitive? It is a bit atypical to
> have conditions that wrap the whole transaction though.
>
>
>
> We have another option, of course, which is to offer IF x ROLLBACK
> TRANSACTION, which is closer to SQL, which would translate the above to:
>
>
>
> BEGIN TRANSACTION
>
> SELECT a FROM tbl WHERE k = 1 AS q0
>
> IF q0.a != 0 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> IF q1.a != 1 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION
>
>
>
> This is less succinct, but might be more familiar to users. We could also
> eschew the ability to read from UPDATE statements entirely in this scheme,
> as this would then look very much like SQL.
>
>
>
>
>
> *From: *Blake Eggleston 
> *Date: *Wednesday, 8 June 2022 at 20:59
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> > It affects not just RETURNING but also conditions that are evaluated
> against the row, and if we in future permit using the values from one
> select in a function call / write to another table (which I imagine we
> will).
>
> I hadn’t thought about that... using intermediate or 

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin
Love the Oracle/Postgres RETURNING syntax and just generally +1 to adding
INSERT and DELETE.


And for each DML type...
>  - INSERT ... RETURNING returns inserted data (useful for defaulted or
> autoincrement columns).
>  - UPDATE ... RETURNING returns the modified data.
>  - DELETE ... RETURNING returns the now-deleted data.
>
>


Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin
Oops. I missed this part: "full primary key or a limit of 1"

Still curious what the end-user would see if there is more than one row
returned.

On Sat, Jun 4, 2022 at 5:46 PM Patrick McFadin  wrote:

> I've been waiting for this email! I'll echo what Jeff said about how
> exciting this is for the project.
>
> On the SELECT inside the transaction:
>
> In the first example, I'm making an assumption that you are doing a select
> on a partition key and only expect one result but is any valid CQL SELECT
> allowed here? If 'model' were a non-partition key column name and was
> indexed, then you could potentially have multiple rows returned and that
> isn't an allowed operation. Are only partition key lookups allowed or is
> there some logic looking for only one row?
>
> I'm asking because I can see in reverse time series models where you can
> select the latest temperature
>   SELECT temperature FROM weather_station WHERE id=1234 AND
> DATE='2022-06-04' LIMIT 1;
>
> (also, horrible example. Everyone knows that the return value for a
> Pinto.is_running will always evaluate to FALSE)
>
> On COMMIT TRANSACTION:
>
> So much to unpack here. In the case that the condition is met, is the
> mutation applied at that point, or has it already happened and there is
> something like a rollback segment? What is the case when the condition is
> not met and what is presented to the end-user? More importantly, what
> happens with respect to the A & I in ACID when the transaction is applied?
>
> If UPDATE is used, returning the number of rows changed would be helpful.
>
> Is this something that can be done interactively in cqlsh or does it all
> have to be submitted in one statement block?
>
> I'll stop here for now.
>
> Patrick
>
> On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org 
> wrote:
>
>> > The returned result set is after the updates are applied?
>>
>> Returning the prior values is probably more powerful, as you can perform
>> unconditional updates and respond to the prior state, that you otherwise
>> would not know. It’s also simpler to implement.
>>
>>
>>
>> My inclination is to require that SELECT statements are declared first,
>> so that we leave open the option of (in future) supporting SELECT
>> statements in any place in the transaction, returning the values as of
>> their position in a sequential execution of the statements.
>>
>>
>>
>> > And would you allow a transaction that had > 1 named select and no
>> modification statements, but commit if 1=1 ?
>>
>>
>>
>> My preference is that the IF condition is anyway optional, as it is much
>> more obvious to a user than concocting some always-true condition. But yes,
>> read-only transactions involving multiple tables will definitely be
>> supported.
>>
>>
>>
>>
>>
>> *From: *Jeff Jirsa 
>> *Date: *Saturday, 4 June 2022 at 22:49
>> *To: *dev@cassandra.apache.org 
>> *Subject: *Re: CEP-15 multi key transaction syntax
>>
>>
>> And would you allow a transaction that had > 1 named select and no
>> modification statements, but commit if 1=1 ?
>>
>> > On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
>> >
>> > 
>> >
>> >> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
>> wrote:
>> >>
>> >> Hi dev@,
>> >
>> > First, I’m ridiculously excited to see this.
>> >
>> >>
>> >> I’ve been working on a draft syntax for Accord transactions and wanted
>> to bring what I have to the dev list to solicit feedback and build
>> consensus before moving forward with it. The proposed transaction syntax is
>> intended to be an extended batch syntax. Basically batches with selects,
>> and an optional condition at the end. To facilitate conditions against an
>> arbitrary number of select statements, you can also name the statements,
>> and reference columns in the results. To cut down on the number of
>> operations needed, select values can also be used in updates, including
>> some math operations. Parameterization of literals is supported the same as
>> other statements.
>> >>
>> >> Here's an example selecting a row from 2 tables, and issuing updates
>> for each row if a condition is met:
>> >>
>> >> BEGIN TRANSACTION;
>> >> SELECT * FROM users WHERE name='blake' AS user;
>> >> SELECT * from cars WHERE model='pinto' AS car;
>> >> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
>> name='blake';
>> >> UPDATE cars SET miles_driven =

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin
I've been waiting for this email! I'll echo what Jeff said about how
exciting this is for the project.

On the SELECT inside the transaction:

In the first example, I'm making an assumption that you are doing a select
on a partition key and only expect one result but is any valid CQL SELECT
allowed here? If 'model' were a non-partition key column name and was
indexed, then you could potentially have multiple rows returned and that
isn't an allowed operation. Are only partition key lookups allowed or is
there some logic looking for only one row?

I'm asking because I can see in reverse time series models where you can
select the latest temperature
  SELECT temperature FROM weather_station WHERE id=1234 AND
DATE='2022-06-04' LIMIT 1;

(also, horrible example. Everyone knows that the return value for a
Pinto.is_running will always evaluate to FALSE)

On COMMIT TRANSACTION:

So much to unpack here. In the case that the condition is met, is the
mutation applied at that point, or has it already happened and there is
something like a rollback segment? What is the case when the condition is
not met and what is presented to the end-user? More importantly, what
happens with respect to the A & I in ACID when the transaction is applied?

If UPDATE is used, returning the number of rows changed would be helpful.

Is this something that can be done interactively in cqlsh or does it all
have to be submitted in one statement block?

I'll stop here for now.

Patrick

On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org 
wrote:

> > The returned result set is after the updates are applied?
>
> Returning the prior values is probably more powerful, as you can perform
> unconditional updates and respond to the prior state, that you otherwise
> would not know. It’s also simpler to implement.
>
>
>
> My inclination is to require that SELECT statements are declared first, so
> that we leave open the option of (in future) supporting SELECT statements
> in any place in the transaction, returning the values as of their position
> in a sequential execution of the statements.
>
>
>
> > And would you allow a transaction that had > 1 named select and no
> modification statements, but commit if 1=1 ?
>
>
>
> My preference is that the IF condition is anyway optional, as it is much
> more obvious to a user than concocting some always-true condition. But yes,
> read-only transactions involving multiple tables will definitely be
> supported.
>
>
>
>
>
> *From: *Jeff Jirsa 
> *Date: *Saturday, 4 June 2022 at 22:49
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
>
> And would you allow a transaction that had > 1 named select and no
> modification statements, but commit if 1=1 ?
>
> > On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
> >
> > 
> >
> >> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
> wrote:
> >>
> >> Hi dev@,
> >
> > First, I’m ridiculously excited to see this.
> >
> >>
> >> I’ve been working on a draft syntax for Accord transactions and wanted
> to bring what I have to the dev list to solicit feedback and build
> consensus before moving forward with it. The proposed transaction syntax is
> intended to be an extended batch syntax. Basically batches with selects,
> and an optional condition at the end. To facilitate conditions against an
> arbitrary number of select statements, you can also name the statements,
> and reference columns in the results. To cut down on the number of
> operations needed, select values can also be used in updates, including
> some math operations. Parameterization of literals is supported the same as
> other statements.
> >>
> >> Here's an example selecting a row from 2 tables, and issuing updates
> for each row if a condition is met:
> >>
> >> BEGIN TRANSACTION;
> >> SELECT * FROM users WHERE name='blake' AS user;
> >> SELECT * from cars WHERE model='pinto' AS car;
> >> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
> name='blake';
> >> UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE
> model='pinto';
> >> COMMIT TRANSACTION IF car.is_running;
> >>
> >> This can be simplified by naming the updates with an AS  syntax.
> If updates are named, a corresponding read is generated behind the scenes
> and its values inform the update.
> >>
> >> Here's an example, the query is functionally identical to the previous
> query. In the case of the user update, a read is still performed behind the
> scenes to enable the calculation of miles_driven + 30, but doesn't need to
> be named since it's not referenced anywhere else.
> >>
> >> BEGIN TRANSACTION;
> >> UPDATE users SET miles_driven += 30 WHERE name='blake';
> >> UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
> >> COMMIT TRANSACTION IF car.is_running;
> >>
> >> Here’s another example, performing the canonical bank transfer:
> >>
> >> BEGIN TRANSACTION;
> >> UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
> >> UPDATE accounts SET balance -= 100 WHERE name='benedict' AS 

Re: Adding RSS feed to the Apache Cassandra website.

2022-05-31 Thread Patrick McFadin
+1 on option 2. Anything that eliminates a human step is how it stays up to
date.

On Tue, May 31, 2022 at 5:25 AM Brandon Williams  wrote:

> +1 to Anthony, that seems like the best path to me too.
>
> On Tue, May 31, 2022, 7:15 AM Anthony Grasso 
> wrote:
>
>> This is a good idea!
>>
>> I think option 2 is the best way to go. Currently, there are manual steps
>> involved to publish a post to the blog. I would like to avoid adding more
>> manual work.
>>
>> We could implement option 2 either by:
>>
>>- Bolting on JavaScript for Anotra to use to generate the RSS XML
>>- Adding a Python script that gets called (probably after the HTML is
>>generated) to generate the RSS XML
>>
>> The Python script would be the easiest to implement and maintain.
>>
>> Regards,
>>
>> On Tue, 31 May 2022 at 10:12, Erick Ramirez 
>> wrote:
>>
>>> Thanks for coordinating this. I'm happy to incorporate the manual
>>> process (option 1) in my workflow when reviewing/publishing blog PRs
>>> immediately as a quick solution if our intention is to go with option 2.
>>>
>>> FWIW by "workflow" I mean step 6 of the Pipeline Overview documented in
>>> the wiki here -- https://cwiki.apache.org/confluence/x/-6rkCw. Cheers!
>>>
>>


Apache Cassandra Marketing Meeting - 5/26

2022-05-24 Thread Patrick McFadin
Hi everyone,

It's been almost a month since our last meeting, and a lot is happening.
4.1 is moving along. Cassandra World Party is ready to celebrate this
release, and promotions have started. We had a great turnout at the last
meeting with good participation.

Everyone is invited to participate. The only pre-requisite is a desire to
get the good word out about our favorite database.

Here's our notes doc from the last meeting with an updated agenda:
https://docs.google.com/document/d/11ANJcz1BuXcTPoVZNo4o00SMp6smVzQufvXsrqs0IlE/edit?usp=sharing

When: May 26th, 0900 PST - 1700 GMT
Calendar link:
https://calendar.google.com/event?action=TEMPLATE=NTNsaTZnZ3FoNTZpdjJiY21lbGltamFyc3Ega2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com

Where - direct link to the zoom: https://datastax.zoom.us/j/390839037

See you Thursday,

Patrick


Re: Cassandra World Party in July; CFP open!

2022-05-21 Thread Patrick McFadin
I'm really looking forward to this event. Last year was great and had the
same casual fun as a meetup. With the addition of some in-person meetups
this will be a great time to get out and meet some of your fellow Cassandra
community members.

An important footnote. This event is meant to coincide with the latest
release, which will be 4.1. By July it may be fully released, beta or RC,
doesn't matter. Hopefully we'll have a few lightning talks about some of
the new features. Since these are lightning talks, a perfectly valid title
could be "Why I'm excited about Guardrails" and proceed to talk about the
time someone put a secondary index on every column in a 25 column table.
You might get a few knowing nods from others watching. (and a few laughs)

KubeCon EU was last week and I have heard that there were some
organic Cassandra community gatherings here and there. We have a lot to
share with each other and this is a chance to have a place to do it. Hope
to see you there.

Patrick

On Fri, May 20, 2022 at 3:54 PM Whitney True  wrote:

> Apache Cassandra Community,
>
> We are excited to announce the second annual Apache Cassandra World Party
> will be held *Wednesday, July 20, 2022*. The goal of the event is to
> celebrate the forthcoming release and the Cassandra community.
>
> *Event Details:*
> The Cassandra World Party 2022 will closely mirror the 2021 Cassandra
> World Party (https://cassandra.apache.org/_/blog/World-Party.html) with
> an in-person meet up option in select cities.*
>
> The event will include three (3) 1-hour virtual events that consist of
> moderated lighting talks. In addition, multiple physical meetups would be
> hosted for anyone that would like to attend the virtual event with
> community members who are located in the area or who are interested in
> traveling.
>
> *Community sponsors will host the in-person meet ups. (*If you are
> interested in becoming a sponsor, please email events (at) constantia (dot)
> io)*.
>
> *The CFP is officially open and will close Sunday, June 19 at 11:59pm PT*.
> Please submit talk proposals here:
> https://sessionize.com/apache-cassandra-4-1-world-party/
>
> We encourage and welcome contributors, end users and community members to
> submit fun, fast-paced, 5 minute “lightning” style talks that focus on:
>
>- What you love about Cassandra
>- Why you’re passionate about Cassandra
>- What you’ve contributed to Cassandra - especially 4.1
>- What you’ve learned using Cassandra
>- What you’ve built with Cassandra
>- How you’ve successfully scaled Cassandra
>
> For those who have recently submitted an Apache Cassandra-related talk for
> ApacheCon 2022, please consider using this as an opportunity to present a
> super condensed (and fast!) version of that talk.
>
> *Event Times:*
>
>- July 20, 2022  6 am PT / 1 pm UTC
>- July 20, 2022 2pm PT / 9 pm UTC
>- July 20, 2022 10pm PT / July 21 6 am UTC
>
> *Event Overview:*
>
>- 15 min: opening from moderators / contributors
>- 30 min: 3-5 lightning talks on all things Cassandra
>- 10 min: discussion or interaction
>- 5 min: closing
>
> We are very excited for this year's event and hope you will be, too!
>
> -- The Constantia team
>


Re: Appetite for a 4.1-alpha1 ?

2022-05-18 Thread Patrick McFadin
Nothing promotable like a shiny new release tag!

On Wed, May 18, 2022 at 10:50 AM C. Scott Andreas 
wrote:

> Yep, supportive of anything that has the potential to increase the number
> of users evaluating a pre-release build. Alpha sounds great.
>
> On May 18, 2022, at 9:29 AM, David Capwell  wrote:
>
>
> Works for me
>
> On May 18, 2022, at 7:36 AM, Josh McKenzie  wrote:
>
> +1 from me on the grounds that I expect users to be more inclined to test
> an alpha build of 4.1 rather than finding and pulling down a nightly.
> Expectations of stability differ.
>
> On Wed, May 18, 2022, at 10:18 AM, Stefan Miklosovic wrote:
>
> Hi Mick,
>
> I do not mind having alpha1 out. It will help me with setting up all
> build pipelines for our plugins / tools / libraries as now I can not
> build it as snapshot is not released anywhere nor I can depend on it
> in Maven projects, for example.
>
> So yeah, +1 from me.
>
> Regards
>
> On Wed, 18 May 2022 at 11:40, Mick Semb Wever  wrote:
> >
> > Our release lifecycle docs¹ imply that we can release alphas despite
> > flaky test failures, which means we can cut and vote on a 4.1-alpha1
> > release today. This is also on the presumption that point (2) on our
> > Cassandra CI Process docs² does not apply to pre-beta releases.
> >
> > Is there an appetite for this?
> > Any objections?
> > Any tickets about to land folk want us to wait on?
> >
> > regards,
> > Mick
> >
> >
> > 1)
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
> >
> > 2)
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+CI+Process
>
>
>


Apache Cassandra Marketing Meeting

2022-04-29 Thread Patrick McFadin
Hi everyone,

There are several community members working on various marketing efforts to
get out the good word on Cassandra. I've been asked by a few to help
facilitate a meeting where we can share and coordinate.

I've taken the first step by creating a calendar item with a zoom link.

When: May 5th, 0900 PST - 1700 GMT

The basic agenda for the first meeting I've collected so far:

   - 4.1 release press events
   - Cassandra World Party planning
   - ApacheCon - Getting the word out

Here's a handy(but huge) calendar link:
https://calendar.google.com/event?action=TEMPLATE=NTNsaTZnZ3FoNTZpdjJiY21lbGltamFyc3Ega2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com

Direct link to the zoom: https://datastax.zoom.us/j/390839037

Hope to see you there!

Patrick


Re: [DISCUSS] List Apache Cassandra as a "company" on LinkedIn

2022-03-30 Thread Patrick McFadin
Oh and +1 to the idea of making Apache Cassandra a company on LinkedIn.
Same energy as the Twitter handle. Outgoing updates from the project.

On Wed, Mar 30, 2022 at 2:41 PM Patrick McFadin  wrote:

> I agree that is a problem. In the past, I have tried to make these as
> inclusive as possible by offering multiple time zones, recording every
> meeting, and posting it on YouTube with an email sent to dev@. What we
> can't substitute in a mailing list is the energy that comes from
> brainstorming, which is kind of a feature of people interested in this sort
> of thing for the project. Just like meeting once or twice a year at
> ApacheCon. It's part of an overall package of community vibrancy and none
> of it is exclusive.
>
> One thing that occurred to me is that everything is getting dumped on dev@.
> Is it time for a marketing@ list for Cassandra?
>
> Patrick
>
> On Wed, Mar 30, 2022 at 12:00 PM Eric Evans  wrote:
>
>>
>>
>> On Wed, Mar 30, 2022 at 3:35 AM Benjamin Lerer  wrote:
>>
>>> Thank Erick for raising the discussion.
>>> My apologies for not responding before. The original thread raised
>>> several questions for me and I needed time to think about them.
>>> One question is the Linkedin Company vs Group one. I must admit that it
>>> makes sense but the whole story made me realize my lack of understanding of
>>> how Linkedin works and wanted to explore that more deeply than I did before
>>> creating the group.
>>> Another thing that the thread made me realize is that there are several
>>> people interested in being involved in C* marketing/Public Relations and
>>> that we probably need to do the things in a more mature and open way.
>>> Partick and I would like to organize a contributor meeting focused on
>>> Apache Cassandra marketing to give a chance to everybody to join and
>>> discuss how we could do things better if people are interested.
>>> I feel that it would help us to evolve in this area
>>>
>>
>> Please bear in mind that contributor meetings like these are exclusive by
>> nature; There is no time suitable for every timezone, not everyone has
>> equal connectivity, and those who don't natively speak English might
>> struggle to keep pace in real time (to name just a few reasons).  This is
>> probably why the ASF is so adamant about the use of email.
>>
>> --
>> Eric Evans
>> eev...@apache.org
>>
>


Re: [DISCUSS] List Apache Cassandra as a "company" on LinkedIn

2022-03-30 Thread Patrick McFadin
I agree that is a problem. In the past, I have tried to make these as
inclusive as possible by offering multiple time zones, recording every
meeting, and posting it on YouTube with an email sent to dev@. What we
can't substitute in a mailing list is the energy that comes from
brainstorming, which is kind of a feature of people interested in this sort
of thing for the project. Just like meeting once or twice a year at
ApacheCon. It's part of an overall package of community vibrancy and none
of it is exclusive.

One thing that occurred to me is that everything is getting dumped on dev@.
Is it time for a marketing@ list for Cassandra?

Patrick

On Wed, Mar 30, 2022 at 12:00 PM Eric Evans  wrote:

>
>
> On Wed, Mar 30, 2022 at 3:35 AM Benjamin Lerer  wrote:
>
>> Thank Erick for raising the discussion.
>> My apologies for not responding before. The original thread raised
>> several questions for me and I needed time to think about them.
>> One question is the Linkedin Company vs Group one. I must admit that it
>> makes sense but the whole story made me realize my lack of understanding of
>> how Linkedin works and wanted to explore that more deeply than I did before
>> creating the group.
>> Another thing that the thread made me realize is that there are several
>> people interested in being involved in C* marketing/Public Relations and
>> that we probably need to do the things in a more mature and open way.
>> Partick and I would like to organize a contributor meeting focused on
>> Apache Cassandra marketing to give a chance to everybody to join and
>> discuss how we could do things better if people are interested.
>> I feel that it would help us to evolve in this area
>>
>
> Please bear in mind that contributor meetings like these are exclusive by
> nature; There is no time suitable for every timezone, not everyone has
> equal connectivity, and those who don't natively speak English might
> struggle to keep pace in real time (to name just a few reasons).  This is
> probably why the ASF is so adamant about the use of email.
>
> --
> Eric Evans
> eev...@apache.org
>


Re: New Apache Cassandra Group on LinkedIn

2022-03-14 Thread Patrick McFadin
I think that is a fair perspective based on the history of this project.
I'm not ready to give up on trying to figure it out though. I worry about
the Cassandra project being isolated when it's really not true.

If there is a clear policy on something like a retweet with clear
intentions and within established guidelines, I feel this is possible.

1. A retweet is not an endorsement and is simply for awareness of the
community at large.
2. Back to basics. If the project or product follows ASF trademark then it
is good to go. "powered By Apache Cassandra" etc
3. Content that doesn't adhere to trademark rules is ignored (and possibly
flagged by the PMC for a follow-up)

Incentivize the good behaviors and show a thriving ecosystem.
Disincentivize the bad behaviors and help bring more into compliance.

Patrick



On Mon, Mar 14, 2022 at 10:19 AM Eric Evans  wrote:

>
>
> On Wed, Mar 9, 2022 at 12:39 PM Patrick McFadin 
> wrote:
>
>> I'm not sure if they can merge groups but from what I'm reading that
>> wouldn't work either. What I'm seeing is a desire to not "promote vendors"
>> which I believe is working against the project's self-interest. LinkedIn is
>> the perfect place to do it. The allergic reaction the project has taken for
>> vendors has made our ecosystem look weak when that's not really the case.
>> Temporal, Prometheus, Feast, Orkes (to just name a few) all have Cassandra
>> integrations but you would never know that by looking at any official
>> Cassandra communication because ecosystem == vendor == bad. The result is
>> that Cassandra looks like an island that will never help this project grow.
>>
>
> The problem comes when you try to balance being informative versus
> (creating the impression of )advocacy, while being fair and equitable.
> We've been here, done this, and were never able to walk these lines in a
> way that satisfied a consensus.  It's not worth it IMO.
>
>
>>
>> On Wed, Mar 9, 2022 at 9:20 AM Jeremy Hanna 
>> wrote:
>>
>>> Is it possible to ask someone at linkedin to merge the groups together
>>> so that it's managed by the PMC but with the explicit permission of the
>>> people running the other group?  In the past, I know that Twitter does
>>> things like that in terms of handles and followers.  Is that a desirable
>>> outcome?
>>>
>>> On Mar 9, 2022, at 11:00 AM, Benjamin Lerer  wrote:
>>>
>>> Hi Patrick,
>>> Thanks for reaching out. Effectively the discussion has happened between
>>> the PMC members.
>>> To explain the context, we wanted to have an official group on Linkedin
>>> to publish news about the project as we do through the @cassandra handler
>>> on Twitter. We wanted a group that was vendor independent and focused on
>>> Apache Cassandra and its ecosystem.
>>> To be fully transparent, we had no idea that you were in charge of the
>>> Apache Cassandra Users group as it appears managed by Lynn Bender and
>>> Joanna Kapel. The group also appears to promote different vendors which is
>>> something that we wanted to avoid.
>>> Having to post things under Lynn's name was also an issue for us as we
>>> wished the merits to go to the right persons.
>>>
>>> Now, I am sure that we can work out some solution that will benefit the
>>> community. :-)
>>>
>>> Le mer. 9 mars 2022 à 15:56, Patrick McFadin  a
>>> écrit :
>>>
>>>> I feel like this needs to be a discussion held on the public mailing
>>>> list. I have been running the Apache Cassandra Users group on LinkedIn for
>>>> years after taking it over from Lynn Bender.
>>>> https://www.linkedin.com/groups/3803052/
>>>>
>>>> We have over 7500 members and had its ups and downs but it's been
>>>> pretty consistent as a professional resource on LinkedIn. I'm not sure what
>>>> there is to gain by creating competing groups. If we need more managers in
>>>> the group that's fine but somebody just needed to ask. It's clear that this
>>>> discussion happened somewhere else and this was just an announcement.
>>>>
>>>> Patrick
>>>>
>>>> On Thu, Mar 3, 2022 at 3:41 AM Benjamin Lerer 
>>>> wrote:
>>>>
>>>>> Hi everybody,
>>>>>
>>>>> We just created a new Apache Cassandra group on LinkedIn (
>>>>> https://www.linkedin.com/groups/9159443/).
>>>>>
>>>>> This group will be managed by our community and will respect vendor
>>>>> neutrality.
>>>>> Do not hesitate to join and share your experiences or blog posts with
>>>>> us :-)
>>>>>
>>>>
>>>


Re: New Apache Cassandra Group on LinkedIn

2022-03-09 Thread Patrick McFadin
I'm not sure if they can merge groups but from what I'm reading that
wouldn't work either. What I'm seeing is a desire to not "promote vendors"
which I believe is working against the project's self-interest. LinkedIn is
the perfect place to do it. The allergic reaction the project has taken for
vendors has made our ecosystem look weak when that's not really the case.
Temporal, Prometheus, Feast, Orkes (to just name a few) all have Cassandra
integrations but you would never know that by looking at any official
Cassandra communication because ecosystem == vendor == bad. The result is
that Cassandra looks like an island that will never help this project grow.

On Wed, Mar 9, 2022 at 9:20 AM Jeremy Hanna 
wrote:

> Is it possible to ask someone at linkedin to merge the groups together so
> that it's managed by the PMC but with the explicit permission of the people
> running the other group?  In the past, I know that Twitter does things like
> that in terms of handles and followers.  Is that a desirable outcome?
>
> On Mar 9, 2022, at 11:00 AM, Benjamin Lerer  wrote:
>
> Hi Patrick,
> Thanks for reaching out. Effectively the discussion has happened between
> the PMC members.
> To explain the context, we wanted to have an official group on Linkedin to
> publish news about the project as we do through the @cassandra handler on
> Twitter. We wanted a group that was vendor independent and focused on
> Apache Cassandra and its ecosystem.
> To be fully transparent, we had no idea that you were in charge of the
> Apache Cassandra Users group as it appears managed by Lynn Bender and
> Joanna Kapel. The group also appears to promote different vendors which is
> something that we wanted to avoid.
> Having to post things under Lynn's name was also an issue for us as we
> wished the merits to go to the right persons.
>
> Now, I am sure that we can work out some solution that will benefit the
> community. :-)
>
> Le mer. 9 mars 2022 à 15:56, Patrick McFadin  a
> écrit :
>
>> I feel like this needs to be a discussion held on the public mailing
>> list. I have been running the Apache Cassandra Users group on LinkedIn for
>> years after taking it over from Lynn Bender.
>> https://www.linkedin.com/groups/3803052/
>>
>> We have over 7500 members and had its ups and downs but it's been pretty
>> consistent as a professional resource on LinkedIn. I'm not sure what there
>> is to gain by creating competing groups. If we need more managers in the
>> group that's fine but somebody just needed to ask. It's clear that this
>> discussion happened somewhere else and this was just an announcement.
>>
>> Patrick
>>
>> On Thu, Mar 3, 2022 at 3:41 AM Benjamin Lerer  wrote:
>>
>>> Hi everybody,
>>>
>>> We just created a new Apache Cassandra group on LinkedIn (
>>> https://www.linkedin.com/groups/9159443/).
>>>
>>> This group will be managed by our community and will respect vendor
>>> neutrality.
>>> Do not hesitate to join and share your experiences or blog posts with us
>>> :-)
>>>
>>
>


Re: New Apache Cassandra Group on LinkedIn

2022-03-09 Thread Patrick McFadin
I feel like this needs to be a discussion held on the public mailing list.
I have been running the Apache Cassandra Users group on LinkedIn for years
after taking it over from Lynn Bender.
https://www.linkedin.com/groups/3803052/

We have over 7500 members and had its ups and downs but it's been pretty
consistent as a professional resource on LinkedIn. I'm not sure what there
is to gain by creating competing groups. If we need more managers in the
group that's fine but somebody just needed to ask. It's clear that this
discussion happened somewhere else and this was just an announcement.

Patrick

On Thu, Mar 3, 2022 at 3:41 AM Benjamin Lerer  wrote:

> Hi everybody,
>
> We just created a new Apache Cassandra group on LinkedIn (
> https://www.linkedin.com/groups/9159443/).
>
> This group will be managed by our community and will respect vendor
> neutrality.
> Do not hesitate to join and share your experiences or blog posts with us
> :-)
>


Re: [DISCUSS] CASSANDRA-17292 Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Patrick McFadin
I'm going to put up a red flag of making config file changes of this scale
on a dot release. This should really be a 5.0 consideration.

With that, I would propose a #5. 5.0 nodes will only read the new config
files and reject old config files. If any of you went through the config
file changes from Apache HTTPd 1.3 -> 2.0 you know how much of a lifesaver
that can be for ops. Make it a part of the total upgrade to a new major
version, not a radical change inside of a dot version, and make it a clean
break. No "legacy config" laying around. That's just a recipe for surprises
later if there are new required config values and somebody doesn't even
realize they have some old 4.x yaml files laying around.

Patrick

On Tue, Feb 22, 2022 at 11:51 AM Tibor Répási 
wrote:

> Glad to be agree on #4. That feature could be add anytime.
>
> If a version element is added to the YAML, then it is not necessary to
> change the filename, thus we could end up with #3. The value of the version
> element could default to 1 in the first phase, which does not need any
> change for legacy format configuration. New config format must include
> version: 2. When in some later version the support for legacy configuration
> is removed, the default for the version element could be changed to 2 or
> removed.
>
> On 22. Feb 2022, at 19:30, Caleb Rackliffe 
> wrote:
>
> My initial preference would be something like combining #1 and #4. We
> could add something like a simple "version: <1|2>" element to the YAML that
> would eliminate any possible confusion about back-compat *within* a given
> file.
>
> Thanks for enumerating these!
>
> On Tue, Feb 22, 2022 at 10:42 AM Tibor Répási 
> wrote:
>
>> Hi,
>>
>> I like the idea of having cassandra.yaml better structured, as an
>> operator, my primer concern is the transition. How would we change the
>> config structure from legacy to the new one during a rolling upgrade? My
>> thoughts on this:
>>
>> 1. Legacy and new configuration is stored in different files. Cassandra
>> will read the legacy file on startup if it exists, the new one otherwise.
>> May raise warning on startup when legacy was used.
>>pros:
>> - separate files for separate formats
>> - clean and operator controlled switch to new format
>> - already known procedure, e.g. change from PropertyFileSnitch to
>> GossipingPropertyFileSnitch
>>cons:
>> - name of the config file would change from cassandra.yaml to
>> something else (cassandra_v2.yaml, config.yaml ???)
>> - would need considerable work to get config to the new format
>> - format translation not solved
>>
>> 2. Offline configuration converter tool may be available to convert
>> legacy format to new one. During package upgrade, if a legacy config is
>> found, the upgrade process should convert the config file to the new format.
>>   pros:
>> - seamless upgrade process
>> - tool can be tested properly before
>>   cons:
>> - may interact badly with configuration management tools controlling
>> the contents of cassandra.yaml
>> - poor transparency for operators
>>
>> 3. Cassandra could read both formats, may warn on startup when legacy
>> format found.
>> pros:
>>   - no filename change
>>   - operator controlled switch to new format
>> cons:
>>   - higher complexity at implementation and testing
>>   - format translation not solved
>>
>> 4. An online tool, e.g. nodetool command to export the configuration the
>> Cassandra node is currently running with, with filtering option to suppress
>> default settings.
>> pros:
>>   - such a nodetool command would be useful independently from
>> changing the config format, could be added before and support any format
>>   - the bare information is already available in system_views.settings
>>   - could be combined with #1 or #3 to support the format translation
>> cons: ?
>>
>>
>> My favourite would be #3 + #4, while I would most dislike #2.
>>
>> Tibor
>>
>>
>> On 17. Feb 2022, at 23:13, Caleb Rackliffe 
>> wrote:
>>
>> Hey everyone,
>>
>> There has already been some Slack discussion
>>  around
>> this, but for anyone who doesn't follow that closely, I'd like to lobby
>> more widely for my proposal in CASSANDRA-17292
>>  to eventually
>> move cassandra.yaml toward a more nested structure.
>>
>> The proposal itself is here
>> ,
>> and there has already been some inline discussion, but feel free to drop
>> any feedback there, in the Jira, or here, depending on what you're most
>> comfortable with.
>>
>> Given where we are in the lead-up to 4.1, I have no intention of pushing
>> to adopt any of this for existing config in that release. However, what I
>> think *would* be nice is if we could come to a rough consensus in time
>> to inform work on 

Re: Welcome Anthony Grasso, Erick Ramirez and Lorina Poland as Cassandra committers

2022-02-15 Thread Patrick McFadin
This is a great day for the project. These are three people that have been
contributing continuously to the success of Cassandra users for so many
years I can't even guess. Really makes me happy to see the project mature
into a place where a diversity of contributions are recognized.

Congratulations Lorina, Erick, and Anthony!

Patrick

On Tue, Feb 15, 2022 at 10:30 AM Brandon Williams  wrote:

> Congratulations, well deserved!
>
> On Tue, Feb 15, 2022 at 12:13 PM Benjamin Lerer  wrote:
> >
> > The PMC members are pleased to announce that Anthony Grasso, Erick
> Ramirez and Lorina Poland have accepted the invitation to become committers.
> >
> > Thanks a lot, Anthony, Erick and Lorina for all the work you have done
> on the website and documentation.
> >
> > Congratulations and welcome
> >
> > The Apache Cassandra PMC members
>


Re: [DISCUSS] Non Coding Committers

2022-02-08 Thread Patrick McFadin
Sharan has done a good job shining a spotlight on something that has
created a weird bottleneck in the project. Docs and the Cassandra website
are all in-tree, but it takes somebody who probably isn't even working on
those things to commit any changes. Dinesh nailed it. It's silly. I'm sure
the PMC can come up with a reasonable solution that can be done quickly.
There are a lot of us that love this project that contribute in ways that
don't get compiled into a jar file. This is something that needs to be
solved for the sake of project velocity.

Patrick

On Sun, Feb 6, 2022 at 10:28 PM Mick Semb Wever  wrote:

>
> Thank you Sharan for sharing.
>
>
>> So here in Apache Cassandra I see there is a whole lot of activity
>> happening around the website, marketing, project promotion, blogs, social
>> media - these activities are all contributions to the project. If there are
>> contributions happening in the project that need a committer to action,
>> then it could make sense to consider having committers that are focussed
>> around the 'non coding' parts.
>>
>>
>>
>
> This is so true for us. We are spending a lot of extra time getting these
> non-code contributions across the finish line. The context switching and
> wait time involved in just one more handover, and often across time zones,
> is hurting. And regardless, totally agree we should be formally recognising
> the ongoing work that goes into these non-coding contributions.
>
>


Re: Cassandra project biweekly status update 2022-01-03

2022-01-03 Thread Patrick McFadin
What Ellis said.

On Mon, Jan 3, 2022 at 1:48 PM Jonathan Ellis  wrote:

> +10
>
> Could we post these on the blog as well to reach a wider audience?
>
> On Mon, Jan 3, 2022 at 3:16 PM Mick Semb Wever  wrote:
>
>>
>>
>> /wave Happy 2022 everyone! …
>>> …
>>>
>>> It's been incredibly encouraging to see how active the project has been
>>> in 2021 and I look forward to seeing how things evolve with some of the
>>> upcoming significant CEP's and features this year. Thanks everyone!
>>
>>
>>
>> These updates are really awesome Josh and also a big part of the
>> project's new activity. Thanks for keeping at them!
>>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: Cassandra project biweekly status update 2021-11-08

2021-11-08 Thread Patrick McFadin
Since I have been re-playing Ghost of Tsushima, I felt a Haiku would be
appropriate.

my cluster is failing
jConsole to the rescue
now I am failing





On Mon, Nov 8, 2021 at 12:46 PM Joshua McKenzie 
wrote:

> First off - Congrats again to Sumanth Pasupuleti on becoming a committer on
> the project! Well deserved; looking forward to working with you further.
>
> It looks like ponymail got an upgrade; I didn't even realize that was
> possible at this point. :) So caveat emptor: the links I put in here to
> individual email threads are different than in the past but appear to be
> working.
>
> [New contributors getting started]
> There's been some discussion about whether the #cassandra-dev channel with
> 600 people in it is the best place for new contributors to get involved and
> publicly ask beginner questions or whether we should start a new channel
> with a somewhat more limited scope. Please chime in on that dev mailing
> list thread if you have an opinion:
> https://lists.apache.org/thread/x8fx9b22nfll3gd40w4o971cyznckxrz
>
> As a new contributor we recommend starting in one of two places: Failing
> tests, or starter tickets we label "lhf" (low hanging fruit).
> Query for failing tests:
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496=2252
> Query for unassigned starter tickets:
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2162=2160
>
> We're up from 18 unassigned test failures to 22 in the past couple of
> weeks. David Capwell, Berenguer Blasi, and Ekaterina Dimitrova (and
> others!) have been doing some great work both surfacing failures as well as
> fixing things - thank you!
>
> For unassigned lhf, we're up from 10 to 11 on 4.0.2 (our next minor
> release) and up from 13 to 14 on 4.1.0 (our next major release). Feel free
> to self-select from that list, hit up this email thread or list if you want
> some guidance on where to get involved, ping in the #cassandra-dev slack
> channel on the-asf.slack.com server, or email or message me directly if
> you
> want any help.
>
> [Dev list discussions in the past 14 days]
> https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:
>
> We have an ongoing discussion about what it means to have a releasable
> trunk and what steps, if any, it'd take to get there. Given the scale and
> complexity of this project and its testing infrastructure, I'm curious to
> hear what other experiences people have had with applying select CI and CD
> principles to an ecosystem like this:
> https://lists.apache.org/thread/kyyo5k3my2nx160mfgy0xkwo8xjh2qpv
>
> As mentioned above, there's an ongoing discussion about how to make the
> cassandra dev community more welcoming for newcomers:
> https://lists.apache.org/thread/x8fx9b22nfll3gd40w4o971cyznckxrz
>
> Andres surfaced CEP-3 for guardrails in which we all professed our
> continued love for JMX (especially you Patrick). It'd be great to see more
> operators chime in with their experience running clusters at scale and the
> type of anti-patterns of usage that destabilize clusters since guardrails
> would be a great way to expose protection against frequently occurring
> patterns that scales poorly, among other things (tombstone heavy workloads
> and thousands of tables anyone?)
>
>
> CEP-18: Improving Modularity is going to be deprecated in favor of
> module-specific refactors and optional implementations.
>
> CEP-17: SSTable format API is evolving nicely:
> https://lists.apache.org/thread/boqb5trkq1q38rmb50p4lsw95hyv053m
>
> And these are just the highlights!
>
> [Tickets in the past 14 days]
> On the 4.0.2 front we've closed out 5 tickets compared to 9 in the prior 2
> weeks. Looks like permissions, some timeouts during replica failure,
> website updates, etc.
>
> For 4.1.0 we've closed out 8 issues down from 14. Some stability in schema
> pulls, commit log stability during testing, a slew of test fixes, and a new
> feature to allow denying access to configured partition keys for reads,
> writes, or range reads based on config (CQL or JMX).
>
> [Tickets that need attention]
> Needs Reviewer:
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=CASSANDRA-16547=2259
>
> I've tidied up / created a new quick filter that's tickets that are in
> progress, blocked, or patch available but lacking a reviewer. This is
> slightly opinionated of me in that it implies we should have reviewers for
> things as we work on them rather than once they're further along being
> written; I have a bias towards early inclusion of a 2nd pair of eyes and a
> sounding board. If you see anything on this list that you're qualified to
> review on or know the area of the code-base and have a few cycles, please
> take a look and help out.
>
> Workload wise, 14 tickets on 4.0.2 need reviewers and 34 on 4.1.0 by this
> definition.
>
> I'm going to refrain from linking to stalled tickets (30d inactive) for
> now; the load of that is high (80 on 4.0.2, 422 on 4.1.0) so we probably
> 

Re: Welcome Sumanth Pasupuleti as Apache Cassandra Committer

2021-11-05 Thread Patrick McFadin
Great to see this. Congrats Sumanth!

On Fri, Nov 5, 2021 at 11:34 AM Brandon Williams  wrote:

> Congratulations Sumanth!
>
> On Fri, Nov 5, 2021 at 1:17 PM Oleksandr Petrov
>  wrote:
> >
> > The PMC members are pleased to announce that Sumanth Pasupuleti has
> > recently accepted the invitation to become committer.
> >
> > Sumanth, thank you for all your contributions to the project over the
> years.
> >
> > Congratulations and welcome!
> >
> > The Apache Cassandra PMC members
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


  1   2   >