[DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-19 Thread Patrick McFadin
*Hello Cassandra Community!We are at a pivotal moment for the Cassandra
community, with the first Cassandra Summit in 7 years coming up on March
13th, and a major release coming later this year with Cassandra 5.0. It is
important that we come together to set the publicity strategy and direction
for these important moments, and that we work together to define how
Cassandra shows up across the technology industry.To achieve this, we are
proposing the formation of a Publicity & Marketing Working Group, and we
are requesting your participation.What is the Publicity & Marketing Working
Group?This is a working group open to community members who have the
insight and skills to help define Cassandra’s public narrative and
participate in our marketing strategy and execution. The group will meet
once a month for an hour to discuss important marketing topics. You can
find us on #cassandra-events. We also propose adding a mailing list,
marketing@cassandra.a.o, to handle day-to-day marketing needs and async
communication. Our publicity and marketing partners from Constantia - Molly
Monroy  and Melissa Logan  -
will work with us to build this working group. What will this group be
responsible for?Our initial vision for this group is to accelerate how we
do marketing & publicity for Cassandra. We will refine and advance
Cassandra’s public perception of the tech industry, to show how Cassandra
has grown, innovated, and revitalized itself as a community. We will do
this through: - Participating in marketing strategy for major moments (in
particular, C* Summit in March and Cassandra 5.0 release later this year)-
Expanding our local meetup and events presence- Sourcing end-user case
studies for marketing and PR collateral- Making sure the Cassandra
community shows up at third-party events- Contributing content - from blogs
to documentation - to ensure we have a robust stream of content for our end
usersOur first two orders of business will be: 1. Jointly determine
operating model and governance, and get input and alignment on the above
goals/responsibilities. 2. Discuss marketing for Cassandra Summit,
primarily defining the news we will share at the event from the project
directly and from our sponsors. This is coming up quickly and we will need
community assistance to achieve our publicity goals. As this is a
community-driven group, please share ideas and feedback on the purpose of
this group and what we need to achieve. When is the meeting?We are
proposing the meetings take place on the 4th Wednesday of each month. We
will alternate times of the day to try to accommodate. We can adjust based
on member attendance.  - Jan, March, May, July, Sept, Nov.  - 4th Wed of
the month,  8a PT- Feb, April, June, August, October, Dec - 4th Wed of the
month, Wed 4p PTWe will create a centralized document to share and document
information about the working group, including meeting minutes, monthly
tasks, and priorities. Decisions will be discussed and finalized using the
project mailing list. Patrick*


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-20 Thread Patrick McFadin
I would be happy to be one of the moderators. Not sure if that's singular
or plural. :D Just need to know how to do it.

Patrick

On Fri, Jan 20, 2023 at 1:44 AM Mick Semb Wever  wrote:

> *To achieve this, we are proposing the formation of a Publicity &
>> Marketing Working Group, and we are requesting your participation.*
>>
>
>
> +1 to the proposal and everything you write Patrick!
>
> I've submitted the request for the ML (can take 24 hours). Who would like
> to be a moderator for the list?
>
> Otherwise let's give this a few days for any concerns, questions,
> objections to be raised.
>
>


Cassandra Summit update for 2023-01-24

2023-01-24 Thread Patrick McFadin
*Hello Cassandra Community!Quick take: - Register before 1/28 to get
discount pricing.
https://events.linuxfoundation.org/cassandra-summit/register/
 - Use code
CS23DS20 to get 20% off - Make sure and sign up for training the day on
March 12 - Tell everyone you’re going on social media and use
#CassandraSummit in your postsLonger version:If you have been watching
what’s happening with the Cassandra Summit and thinking about going, I’m
here to convince you that now is the time to register. The early
registration discount ends this Saturday, January 28th. *

*It might be helpful to clarify some misconceptions I keep hearing. Every
other Cassandra Summit (except Cassandra Summit Tokyo) has been an event
planned and run by DataStax. To create a more neutral ground that reflects
our community better, Linux Foundation Events has taken on the considerable
task of running Cassandra Summit in 2023. We are very grateful they took a
chance on our community, and we will be better for it. *






*When DataStax ran the event, we could deeply discount tickets because we
treated it as a marketing expense. I’ve been DMed and Slacked quite a few
times for free passes. Since this is a Linux Foundation event,
unfortunately, there are no complimentary passes, as this is a key part of
recouping their costs. You can get a 20% discount by using this code:
CS23DS20Why is this important to mention? Our community needs an
independent Cassandra Summit, and right now, it needs your support in
attending the event. Let’s show the Linux Foundation that Cassandra Summit
is something we value as a community. I know budgets are tight, and it’s
hard to get approval. If you are able, make the case and register today.
Next year when there are thousands of attendees at Cassandra Summit, you
can tell everyone what they missed in 2023. If making the trip isn’t
something you can do, a virtual pass is only $30 with the discount code and
is also a great way to show support. The other important thing you can help
with? Getting out the word about Cassandra Summit. Tell your colleagues and
co-workers that this is a hot tip and you are hooking them up. If you are
going, tell everyone you’ve registered and use the hashtag
#CassandraSummit. Point out sessions you are interested in and share the
love. If you can convince a couple of people to go, you’ve made a
difference. If you need a little more motivation, just look at this
schedule!
https://events.linuxfoundation.org/cassandra-summit/program/schedule/
Thanks,
and I hope to see you there!Patrick*


Re: [DISCUSS] Formation of Apache Cassandra Publicity & Marketing Group

2023-01-26 Thread Patrick McFadin
Thanks for the positive reception on email and slack.

We are going to have our first gathering next Wednesday at 8AM PT

Link to calendar event:
https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=MDVoY3VucnMwaWViaXA1amFmdXAzcnN0dTYga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw&tmsrc=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com



On Tue, Jan 24, 2023 at 3:35 AM Mick Semb Wever  wrote:

> The market...@cassandra.apache.org list is created.
>
> To subscribe send an email to marketing-subscr...@cassandra.apache.org
> from
> the email address you want to subscribe from.
>
> If you are a committer you can alternately use Whimsy:
> https://whimsy.apache.org/committers/subscribe
>
> regards,
> Mick
>
>
> On Fri, 20 Jan 2023 at 00:31, Patrick McFadin  wrote:
>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Hello Cassandra Community!We are at a pivotal moment for the Cassandra
> > community, with the first Cassandra Summit in 7 years coming up on March
> > 13th, and a major release coming later this year with Cassandra 5.0. It
> is
> > important that we come together to set the publicity strategy and
> direction
> > for these important moments, and that we work together to define how
> > Cassandra shows up across the technology industry.To achieve this, we are
> > proposing the formation of a Publicity & Marketing Working Group, and we
> > are requesting your participation.What is the Publicity & Marketing
> Working
> > Group?This is a working group open to community members who have the
> > insight and skills to help define Cassandra’s public narrative and
> > participate in our marketing strategy and execution. The group will meet
> > once a month for an hour to discuss important marketing topics. You can
> > find us on #cassandra-events. We also propose adding a mailing list,
> > marketing@cassandra.a.o, to handle day-to-day marketing needs and async
> > communication. Our publicity and marketing partners from Constantia -
> Molly
> > Monroy  and Melissa Logan  -
> > will work with us to build this working group. What will this group be
> > responsible for?Our initial vision for this group is to accelerate how we
> > do marketing & publicity for Cassandra. We will refine and advance
> > Cassandra’s public perception of the tech industry, to show how Cassandra
> > has grown, innovated, and revitalized itself as a community. We will do
> > this through: - Participating in marketing strategy for major moments (in
> > particular, C* Summit in March and Cassandra 5.0 release later this
> year)-
> > Expanding our local meetup and events presence- Sourcing end-user case
> > studies for marketing and PR collateral- Making sure the Cassandra
> > community shows up at third-party events- Contributing content - from
> blogs
> > to documentation - to ensure we have a robust stream of content for our
> end
> > usersOur first two orders of business will be: 1. Jointly determine
> > operating model and governance, and get input and alignment on the above
> > goals/responsibilities. 2. Discuss marketing for Cassandra Summit,
> > primarily defining the news we will share at the event from the project
> > directly and from our sponsors. This is coming up quickly and we will
> need
> > community assistance to achieve our publicity goals. As this is a
> > community-driven group, please share ideas and feedback on the purpose of
> > this group and what we need to achieve. When is the meeting?We are
> > proposing the meetings take place on the 4th Wednesday of each month. We
> > will alternate times of the day to try to accommodate. We can adjust
> based
> > on member attendance.  - Jan, March, May, July, Sept, Nov.  - 4th Wed of
> > the month,  8a PT- Feb, April, June, August, October, Dec - 4th Wed of
> the
> > month, Wed 4p PTWe will create a centralized document to share and
> document
> > information about the working group, including meeting minutes, monthly
> > tasks, and priorities. Decisions will be discussed and finalized using
> the
> > project mailing list. Patrick*
> >
>


Re: [ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-30 Thread Patrick McFadin
This is really game-changing and an important change for the Cassandra
community. I would like to think that creating a governance structure like
this will help get more ecosystem projects under the umbrella of Apache
Cassandra.

Thank you PMC, for spending the time to create this very needed framework.

Patrick

On Mon, Jan 30, 2023 at 11:02 AM Jeff Jirsa  wrote:

> Usually requires an offer to donate from the current owner, an acceptance
> of that offer (PMC vote), and then the work to ensure that contributions
> are acceptable from a legal standpoint (e.g. like the incubator -
> https://incubator.apache.org/guides/transitioning_asf.html - "For
> contributions composed of patches from individual contributors, it is safe
> to import the code once the major contributors (by volume) have completed
> ICLAs or SGAs.").
>
>
>
> On Mon, Jan 30, 2023 at 10:53 AM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> Great news indeed. I am wondering what it would take to include projects
>> everyone is using like medusa, reaper, cassandra-ldap, etc. as a subproject.
>>
>> Thanks,
>> German
>> --
>> *From:* Francisco Guerrero 
>> *Sent:* Friday, January 27, 2023 9:46 AM
>> *To:* dev@cassandra.apache.org 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] Evolving governance in the
>> Cassandra Ecosystem
>>
>> Great news! I'm very happy to see these changes coming soon.
>>
>> Thanks to everyone involved in this work.
>>
>> On 2023/01/26 21:21:01 Josh McKenzie wrote:
>> > The Cassandra PMC is pleased to announce that we're evolving our
>> governance procedures to better foster subprojects under the Cassandra
>> Ecosystem's umbrella. Astute observers among you may have noticed that the
>> Cassandra Sidecar is already a subproject of Apache Cassandra as of CEP-1 (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D95652224&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xUbCe%2FQGgZq3Ynr42YQucMkOw1IZ67cONiQSnkZI7bk%3D&reserved=0)
>> and Cassandra-14395 (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRASC-24&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RdItVOzwVs865Xd%2Ff8ancwkTDJWKPosHlKgbl1uysMw%3D&reserved=0),
>> however up until now we haven't had any structure to accommodate raising
>> committers on specific subprojects or clarity on the addition or governance
>> of future subprojects.
>> >
>> > Further, with the CEP for the driver donation in motion (
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY%2Fedit%23heading%3Dh.xhizycgqxoyo&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pUXo983DEHRBDtjGD%2FHaZnqc1uRwpS7tBkFkNF9Qfns%3D&reserved=0),
>> the need for a structured and sustainable way to expand the Cassandra
>> Ecosystem is pressing.
>> >
>> > We'll document these changes in the confluence wiki as well as the
>> sidecar as our first formal subproject after any discussion on this email
>> thread. The new governance process is as follows:
>> > -
>> >
>> > Subproject Governance
>> > 1. The Apache Cassandra PMC is responsible for governing the broad
>> Cassandra Ecosystem.
>> > 2. The PMC will vote on inclusion of new interested subprojects using
>> the existing procedural change vote process documented in the confluence
>> wiki (Super majority voting: 66% of votes must be in favor to pass.
>> Requires 50% participation of roll call).
>> > 3. New committers for these subprojects will be nominated and raised,
>> both at inclusion as a subproject and over time. Nominations can be brought
>> to priv...@cassandra.apache.org. Typically we're looking for a mix of
>> commitment and contribution to the community and project, be it through
>> code, documentation, presentations, or other significant engagement with
>> the project.
>> > 4. While the commit-bit is ecosystem wide, code modification rights and
>> voting rights (technical contribution, binding -1, CEP's) are granted per
>> subproject
>> >  4a. Individuals are trusted to exercise prudence and only commit
>> or claim binding votes on approved subprojects. Repeated violations of this
>> social contract will result in los

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Patrick McFadin
API changes are near and dear to my world. The scope of changes could be
minor or major, so I think B is the right way forward.

Not to throw off the momentum, but could this even warrant a separate CEP
in some cases? For example, CEP-15 is a huge change, but the CQL syntax
will continuously evolve with more use. Being judicious in those changes is
good for end users. It's also a good reference to point back to after the
fact.

Patrick

On Thu, Feb 2, 2023 at 6:01 AM Ekaterina Dimitrova 
wrote:

> “ Only that it locks out of the conversation anyone without a Jira login”
> Very valid point I forgot about - since recently people need invitation in
> order to create account…
> Then I would say C until we clarify the scope. Thanks
>
> On Thu, 2 Feb 2023 at 8:54, Benedict  wrote:
>
>> I think lazy consensus is fine for all of these things. If a DISCUSS
>> thread is crickets, or just positive responses, then definitely it can
>> proceed without further ceremony.
>>
>> I think “with heads-up to the mailing list” is very close to B? Only that
>> it locks out of the conversation anyone without a Jira login.
>>
>> On 2 Feb 2023, at 13:46, Ekaterina Dimitrova 
>> wrote:
>>
>> 
>>
>> While I do agree with you, I am thinking that if we include many things
>> that we would expect lazy consensus on I would probably have different
>> preference.
>>
>> I definitely don’t mean to stall this though so in that case:
>> I’d say combination of A+C (jira with heads up on the ML if someone is
>> interested into the jira) and regular log on API changes separate from
>> CHANGES.txt or we can just add labels to entries in CHANGES.txt as some
>> other projects. (I guess this is a detail we can agree on later on, how to
>> implement it, if we decide to move into that direction)
>>
>> On Thu, 2 Feb 2023 at 8:12, Benedict  wrote:
>>
>>> I think it’s fine to separate the systems from the policy? We are
>>> agreeing a policy for systems we want to make guarantees about to our users
>>> (regarding maintenance and compatibility)
>>>
>>> For me, this is (at minimum) CQL and virtual tables. But I don’t think
>>> the policy differs based on the contents of the list, and given how long
>>> this topic stalled for. Given the primary point of contention seems to be
>>> the *policy* and not the list, I think it’s time to express our opinions
>>> numerically so we can move the conversation forwards.
>>>
>>> This isn’t binding, it just reifies the community sentiment.
>>>
>>> On 2 Feb 2023, at 13:02, Ekaterina Dimitrova 
>>> wrote:
>>>
>>> 
>>>
>>> “ So we can close out this discussion, let’s assume we’re only
>>> discussing any interfaces we want to make promises for. We can have a
>>> separate discussion about which those are if there is any disagreement.”
>>> May I suggest we first clear this topic and then move to voting? I would
>>> say I see confusion, not that much of a disagreement. Should we raise a
>>> discussion for every feature flag for example? In another thread virtual
>>> tables were brought in. I saw also other examples where people expressed
>>> uncertainty. I personally feel I’ll be able to take a more informed
>>> decision and vote if I first see this clarified.
>>>
>>> I will be happy to put down a document and bring it for discussion if
>>> people agree with that
>>>
>>>
>>>
>>> On Thu, 2 Feb 2023 at 7:33, Aleksey Yeshchenko 
>>> wrote:
>>>
 Bringing light to new proposed APIs no less important - if not more,
 for reasons already mentioned in this thread. For it’s not easy to change
 them later.

 Voting B.


 On 2 Feb 2023, at 10:15, Andrés de la Peña 
 wrote:

 If it's a breaking change, like removing a method or property, I think
 we would need a DISCUSS API thread prior to making changes. However, if the
 change is an addition, like adding a new yaml property or a JMX method, I
 think JIRA suffices.





Important news about Cassandra Summit

2023-02-03 Thread Patrick McFadin
*Hello Cassandra Community,We all see what’s happening in tech right now.
Cuts are being made, and budgets are frozen. For Cassandra Summit, this has
translated to low sponsorship and registrations. The program committee has
been discussing options with the Linux Foundation events team, and the
decision was made to move Cassandra Summit to December 12-13. You’ll see
something official from the Linux Foundation soon. This isn’t what anyone
wanted. It’s a challenging time for our community to gather, and that’s
entirely the point of a Cassandra Summit. Hopefully, this provides enough
space to have the Summit we want and need. Between now and December, the
DataStax community team is ramping up a plan B to keep up the project
momentum during this downturn and facilitate community information sharing.
Cassandra 5.0 is coming, and it’s going to be game-changing. No way we are
waiting until December to talk about it! The plan is to have a virtual
event (online) on March 14 and a series of city-specific Cassandra Days in
the coming months. It’s hard for our community to get out, so we’ll come to
you. More information will follow in the next few days. I want to reassure
you this isn’t specific to our community. I’ve been hearing from many that
you were trying anything to get to San Jose in March, but budgets wouldn’t
allow for any non-essential travel. When I started hearing the same thing
from speakers, then sponsors, I knew this was a large-scale problem. We all
know people impacted by layoffs, and I’m sure many are personally affected.
Let’s come together as a community and help each other. If you have open
positions, call them out in this email thread or #cassandra in the ASF
slack.I want to thank the Linux Foundation Events team personally. They are
exceptional professionals and worked quickly to get us back on track. There
was a rush of events trying to postpone to later in the year, but they were
able to get us a new date. They are as protective of conference uptime like
you are about database uptime. More info to follow. ThanksPatrick*


Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-05 Thread Patrick McFadin
Thank you everyone for all the well wishes here and in other parts of the
interwebs. It's always a privilege to work with the people in our community.

Patrick

On Fri, Feb 3, 2023 at 11:24 AM C. Scott Andreas 
wrote:

> Congratulations, Patrick!
>
> On Feb 2, 2023, at 9:46 PM, Berenguer Blasi 
> wrote:
>
>
> Welcome!
> On 3/2/23 4:09, Vinay Chella wrote:
>
> Well deserved one, Congratulations, Patrick.
>
> On Fri, Feb 3, 2023 at 4:01 AM Josh McKenzie  wrote:
>
>> Congrats Patrick! Well deserved.
>>
>> On Thu, Feb 2, 2023, at 5:25 PM, Molly Monroy wrote:
>>
>> Congrats, Patrick... much deserved!
>>
>> On Thu, Feb 2, 2023 at 1:59 PM Derek Chen-Becker 
>> wrote:
>>
>> Congrats!
>>
>> On Thu, Feb 2, 2023 at 10:58 AM Benjamin Lerer  wrote:
>>
>> The PMC members are pleased to announce that Patrick McFadin has accepted
>> the invitation to become committer today.
>>
>> Thanks a lot, Patrick, for everything you have done for this project and
>> its community through the years.
>>
>> Congratulations and welcome!
>>
>> The Apache Cassandra PMC members
>>
>>
>>
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>>
>>
>> --
>
>
> Thanks,
> Vinay Chella
>
>


Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Patrick McFadin
No more nodetool createepochunsafe! +1

This is going to be another big merge. Just bookmarking the discussions
last week on CEP-15.

On Mon, Feb 6, 2023 at 9:57 AM Jeff Jirsa  wrote:

> +1
>
>
> On Mon, Feb 6, 2023 at 8:16 AM Sam Tunnicliffe  wrote:
>
>> Hi everyone,
>>
>> I would like to start a vote on this CEP.
>>
>> Proposal:
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
>>
>> Discussion:
>> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
>>
>> The vote will be open for 72 hours.
>> A vote passes if there are at least three binding +1s and no binding
>> vetoes.
>>
>> Thanks,
>> Sam
>>
>


Re: Welcome our next PMC Chair Josh McKenzie

2023-03-24 Thread Patrick McFadin
Congrats Josh. This is an excellent acknowledgment of your awesome
contributions to the Cassandra projects.

Mick you left some big shoes to fill. Thank you for your service and for
being an endless advocate for the project.

Patrick

On Fri, Mar 24, 2023 at 8:03 AM Paulo Motta 
wrote:

> Thanks Mick and congratulations Josh!! :)
>
> On Thu, Mar 23, 2023 at 5:33 PM Erick Ramirez 
> wrote:
>
>> Thanks Mick for everything you've done and continue to do for the project!
>> Congratulations Josh and thanks for stepping up! The community is in good
>> shape! 🍻
>>
>


Re: Google Season of Docs

2023-04-03 Thread Patrick McFadin
It hardly feels like a loss looking at the fantastic projects that were
selected. Thanks for leading this charge Lorina!

Patrick

On Mon, Apr 3, 2023 at 11:39 AM lorinapoland  wrote:

> Sadly, I am informing the community that our grant application to GSoD was
> unsuccessful.
>
> If you would like to see the list of winning projects, check out
> https://developers.google.com/season-of-docs/docs/participants.
>
> Lorina
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>


Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-06 Thread Patrick McFadin
+1

Thanks to Lorina for getting people excited about it at Cassandra Forward!

On Thu, Apr 6, 2023 at 10:37 AM Mick Semb Wever  wrote:

> +1
>
> On Thu, 6 Apr 2023 at 19:32, Francisco Guerrero 
> wrote:
>
>> +1 (nb)
>>
>> On 2023/04/06 17:30:37 Josh McKenzie wrote:
>> > +1
>> >
>> > On Thu, Apr 6, 2023, at 12:18 PM, Joseph Lynch wrote:
>> > > +1
>> > >
>> > > This proposal looks really exciting!
>> > >
>> > > -Joey
>> > >
>> > > On Wed, Apr 5, 2023 at 2:13 AM Aleksey Yeshchenko 
>> wrote:
>> > > >
>> > > > +1
>> > > >
>> > > > On 4 Apr 2023, at 16:56, Ekaterina Dimitrova 
>> wrote:
>> > > >
>> > > > +1
>> > > >
>> > > > On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer 
>> wrote:
>> > > >>
>> > > >> +1
>> > > >>
>> > > >> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña <
>> adelap...@apache.org> a écrit :
>> > > >>>
>> > > >>> +1
>> > > >>>
>> > > >>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna <
>> jeremy.hanna1...@gmail.com> wrote:
>> > > 
>> > >  +1 nb, will be great to have this in the codebase - it will make
>> nearly every table's compaction work more efficiently.  The only possible
>> exception is tables that are well suited for TWCS.
>> > > 
>> > >  On Apr 4, 2023, at 8:00 AM, Berenguer Blasi <
>> berenguerbl...@gmail.com> wrote:
>> > > 
>> > >  +1
>> > > 
>> > >  On 4/4/23 14:36, J. D. Jordan wrote:
>> > > 
>> > >  +1
>> > > 
>> > >  On Apr 4, 2023, at 7:29 AM, Brandon Williams 
>> wrote:
>> > > 
>> > >  
>> > >  +1
>> > > 
>> > >  On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov 
>> wrote:
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > I would like to put CEP-26 to a vote.
>> > > >
>> > > > Proposal:
>> > > >
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
>> > > >
>> > > > JIRA and draft implementation:
>> > > > https://issues.apache.org/jira/browse/CASSANDRA-18397
>> > > >
>> > > > Up-to-date documentation:
>> > > >
>> https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
>> > > >
>> > > > Discussion:
>> > > >
>> https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b
>> > > >
>> > > > The vote will be open for 72 hours.
>> > > > A vote passes if there are at least three binding +1s and no
>> binding vetoes.
>> > > >
>> > > > Thanks,
>> > > > Branimir
>> > > 
>> > > 
>> > > >
>> > >
>>
>


Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-06 Thread Patrick McFadin
I love that this is finally coming to Cassandra. Absolutely hate that, once
again, we'll be endorsing the use of ALLOW FILTERING. This is an
anti-pattern that keeps getting legitimized.

Hot take: Should we just not do Milestones 1 and 2 and wait for an
index-only Milestone 3?

Patrick

On Thu, Apr 6, 2023 at 10:04 AM David Capwell  wrote:

> Overall I welcome this feature, was trying to use this around 1-2 months
> back and found we didn’t support, so glad to see it coming!
>
> From a testing point of view, I think we would want to have good fuzz
> testing covering complex types (frozen/non-frozen collections, tuples, udt,
> etc.), and reverse ordering; both sections tend to cause the most problem
> for new features (and existing ones)
>
> We also will want a way to disable this feature, and optionally disable at
> different sections (such as m2’s NOT IN for partition keys).
>
> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski 
> wrote:
> >
> > Hi everyone!
> >
> > I created a new CEP for adding NOT support to the query language and
> > want to start discussion around it:
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> >
> > Happy to get your feedback.
> > --
> > Piotr
>
>


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-09 Thread Patrick McFadin
I love the debate that surfaces occasionally, but I have to agree that
KEYSPACE and SCHEMA are doing the job. There is a learning curve with
Cassandra data modeling, and keywords are a minor problem.

Issues that hit every user:
1. Creating the correct primary key
2. Avoiding the urge to index all-the-things(see item 1)
3. Migrating schema because of 1 and 2

4th bonus issue. Grokking consistency level. "EACH_QUORUM sounds perfect
for me."

I was trying to remember when SCHEMA got added to the CQL parser. With a
quick 'git blame' I was taken back to this beast:
https://issues.apache.org/jira/browse/CASSANDRA-14825

One huge area that was never addressed in the Jira: any documentation that
the official CQL parser now supported SCHEMA. So if anything, we should use
this opportunity to update some docs.

Patrick


On Thu, Apr 6, 2023 at 5:28 PM Dinesh Joshi  wrote:

> I’m strongly in favor of leaving terminology as-is.
>
> On Apr 6, 2023, at 7:20 AM, Bowen Song via dev 
> wrote:
>
> 
>
> *> I'm quite happy to leave things as they are if that is the consensus.*
>
> +1 to the above
>
>
> On 06/04/2023 14:54, Mike Adamson wrote:
>
> My apologies. I started this discussion off the back of a usability
> discussion around new user accessibility to Cassandra and the premise that
> there is an initial steep learning curve for new users. Including new users
> who have worked for a long time in the traditional DBMS field.
>
> On the basis of the reason for the discussion,  TABLEGROUP doesn't sit
> well because of user types / functions / indexes etc. which are not
> strictly tables and is also yet another Cassandra only term.
>
> NAMESPACE could work but it's different usage in other systems could be
> just as confusing to new users.
>
> And, I certainly don't think having multiple names for the same thing just
> to satisfy different parties is a good idea at all.
>
> I'm quite happy to leave things as they are if that is the consensus.
>
> On Thu, 6 Apr 2023 at 14:16, Josh McKenzie  wrote:
>
>> KEYSPACE is fine. If we want to introduce a standard nomenclature like
>> DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no
>> benefit.
>>
>> I'm with Benedict in principle, with Aleksey in practice; I think
>> KEYSPACE and SCHEMA are actually fine enough.
>>
>> If and when we get to any kind of multi-tenancy, having a more
>> metaphorical abstraction that users are familiar with like these becomes
>> more valuable; it's pretty clear that things in different keyspaces,
>> different databases, or even different schemas could have different access
>> rules, resourcing, etc from one another.
>>
>> While the off-the-cuff logical TABLEGROUP thing is a *literal* statement
>> about what the thing is, it'd be another unique term to us;  we have enough
>> things in our system where we've charted our own path. My personal .02 is
>> we don't need to go adding more. :)
>>
>> On Thu, Apr 6, 2023, at 8:54 AM, Mick Semb Wever wrote:
>>
>>
>> … but that should be a different discussion about how we evolve config.
>>
>>
>>
>> I disagree. Nomenclature being difficult can benefit from holistic and
>> forward thinking.
>> Sure you can label this off-topic if you like, but I value our discuss
>> threads being collaborative in an open-mode. Sometimes the best idea is on
>> the tail end of a sequence of bad and/or unpopular ideas.
>>
>>
>>
>>
>>
>>
>
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-25 Thread Patrick McFadin
Not sure if this is what you are saying, Josh, but I believe this needs to
be its own CEP. It's a change in CQL syntax and changes how clusters
operate. The change needs to be documented and voted on. Jonathan, you know
how to find me if you want me to help write it. :)

As a side comment to all of this, last ApacheCon in New Orleans,
Jordan West, Alex Petrov, and I were sitting in the hall track and were
having a discussion about just what we have in Cassandra. There is no other
system like Cassandra, and for scale and distributed data, it stands alone.
What do we do with a robust baseline like this?

This! This is a great example!

Patrick


On Tue, Apr 25, 2023 at 5:03 PM Josh McKenzie  wrote:

> To be fair Dinesh kind of primed that:
>
> Do you intend to make this part of CEP-7 or as an incremental update to
> SAI once it is committed?
>
> ;)
>
> I think this body of work more than stands on its own. Great work
> Jonathan, Mike, and Zhao; having native support for more ML-oriented
> workloads in C* would be a big win for a bunch of our users and plays into
> our architectural strengths in a lot of ways too.
>
> On Tue, Apr 25, 2023, at 7:35 PM, Henrik Ingo wrote:
>
> Jonathan what a great proposal/code. An enjoyable read. And at least for
> me educational! (Which is notable, as you're on my turf, I'm a Data Science
> major.)
>
> Sorry for splitting hairs but CEP-7 (as a spec, and wiki page) is approved
> and voted on and I assume there's no proposal to change that. That said,
> work of course continues beyond CEP-7 and this is not the only SAI feature
> that adds on top of the CEP-7 foundation.
>
> I just wanted to clarify so there's no confusion later.
>
> henrik
>
> On Sat, Apr 22, 2023 at 10:41 PM Jonathan Ellis  wrote:
>
> My guess is that I will be able to get this ready to upstream before the
> rest of CEP-7 goes in, so it would make sense to me to roll it into that.
>
> On Fri, Apr 21, 2023 at 5:34 PM Dinesh Joshi  wrote:
>
> Interesting proposal Jonathan. Will grok it over the weekend and play
> around with the branch.
>
> Do you intend to make this part of CEP-7 or as an incremental update to
> SAI once it is committed?
>
> On Apr 21, 2023, at 2:19 PM, Jonathan Ellis  wrote:
>
> Happy Friday, everyone!
>
> Rich text formatting ahead, I've attached a PDF for those who prefer that.
>
>
> I propose adding approximate nearest neighbor (ANN) vector search
> capability to Apache Cassandra via storage-attached indexes (SAI). This is
> a medium-sized effort that will significantly enhance Cassandra’s
> functionality, particularly for AI use cases. This addition will not only
> provide a new and important feature for existing Cassandra users, but also
> attract new users to the community from the AI space, further expanding
> Cassandra’s reach and relevance.
> Introduction
> Vector search is a powerful document search technique that enables
> developers to quickly find relevant content within an extensive collection
> of documents, which is useful as a standalone technique, but it is
> particularly hot now because it significantly enhances the performance of
> LLMs.
>
> Vector search uses ML models to match the semantics of a question rather
> than just the words it contains, avoiding the classic false positives and
> false negatives associated with term-based search.  Alessandro Benedetti
> gives some good examples in his *excellent talk*
> 
> :
> 
>
> 
>
> You can search across any set of vectors, which are just ordered sets of
> numbers.  In the context of natural language queries and document search,
> we are specifically concerned with a type of vector called an *embedding*
> .
>
> An embedding is a high-dimensional vector that captures the underlying
> semantic relationships and contextual information of words or phrases.
> Embeddings are generated by ML models trained for this purpose; OpenAI
> provides an API to do this, but open-source and self-hostable models like
> BERT are also popular. Creating more accurate and smaller embeddings are
> active research areas in ML.
>
> Large language models (LLMs) can be described as a mile wide and an inch
> deep. They are not experts on any narrow domain (although they will
> hallucinate that they are, sometimes convincingly).  You can remedy this by
> giving the LLM additional context for your query, but the context window is
> small (4k tokens for GPT-3.5, up to 32k for GPT-4), so you want to be very
> selective about giving the LLM the most relevant possible information.
>
> Vector search is red-hot now because it allows us to easily answer the
> question “what are the most relevant documents to provide as context” by
> performing a similarity search between the embeddings vector of the query,
> and those of your document universe.  Doing exact search is prohibitively
> expensive, s

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread Patrick McFadin
I guess this is an excellent example to explore the minima of what
constitutes a CEP. So far, CEPs have been some large changes, so where does
something like this fit? (Wait. Did I beat Benedict to a Bike Shed? I think
I did.)

This is a list of everything needed for a CEP:

Status
Scope
Goals
Approach
Timeline
Mailing list / Slack channels
Related JIRA tickets
Motivation
Audience
Proposed Changes
New or Changed Public Interfaces
Compatibility, Deprecation, and Migration Plan
Test Plan
Rejected Alternatives

This is a big enough change to provide information for each element. Going
back to the spirit of why we started CEPs, we wanted to avoid a mega-commit
without some shaping and agreement before code goes into trunk. I don't
have a clear indication of where that line lies. From our own wiki: "It is
highly recommended to pursue a CEP for significant user-facing or changes
that cut across multiple subsystems." That seems to fit here. Part of my
motivation is being clear with potential new contributors by example and
encouraging more awesomeness.

The changes for operators:
 - New drivers
 - New gaurdrails?
 - Indexing == storage requirements

Patrick

On Tue, Apr 25, 2023 at 10:53 PM Mick Semb Wever  wrote:

> I was soo happy when I saw this, I know many users are going to be
> thrilled about it.
>
>
> On Wed, 26 Apr 2023 at 05:15, Patrick McFadin  wrote:
>
>> Not sure if this is what you are saying, Josh, but I believe this needs
>> to be its own CEP. It's a change in CQL syntax and changes how clusters
>> operate. The change needs to be documented and voted on. Jonathan, you know
>> how to find me if you want me to help write it. :)
>>
>
> I'd be fine with just a DISCUSS thread to agree to the CQL change, since
> it: `DENSE FLOAT32` appears to be a minimal,  and the overall patch
> building on SAI. As Henrik mentioned there's other SAI extensions being
> added too without CEPs.  Can you elaborate on how you see this changing how
> the cluster operates?
>
> This will be easier to decide once we have a patch to look at, but that
> depends on a CEP-7 base (e.g. no feature branch exists). If we do want a
> CEP we need to allow a few weeks to get it through, but that can happen in
> parallel and maybe drafting up something now will be valuable anyway for an
> eventual CEP that proposes the more complete features (e.g.
> cosine_similarity(…)).
>
>
>


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Patrick McFadin
>
> So is the goal here to provide something specific and idiomatic for the ML
> community or is the goal to make a primitive that's C*-centric that then
> another layer can write to? I personally argue for the former; I don't see
> this specific data type going away any time soon.


+1 on this concept. We could invite an entirely new class of users into
Cassandra by using familiar syntax. I was surprised that DENSE got nuked so
quickly since it is used in the ML world. [1][2][3]

Patrick

1.
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.linalg.DenseVector.html
2. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
3. https://www.pinecone.io/learn/dense-vector-embeddings-nlp/

On Thu, Apr 27, 2023 at 5:49 PM Josh McKenzie  wrote:

> From a machine learning perspective, vectors are a well-known concept that
> are effectively immutable fixed-length n-dimensional values that are then
> later used either as part of a model or in conjunction with a model after
> the fact.
>
> While we could have this be non-frozen and not call it a vector, I'd be
> inclined to still make the argument for a layer of syntactic sugar on top
> that met ML users where they were with concepts they understood rather than
> forcing them through the cognitive lift of figuring out the Cassandra
> specific contortions to replicate something that's ubiquitous in their
> space. We did the same "Cassandra-first" approach with our JSON support and
> that didn't do us any favors in terms of adoption and usage as far as I
> know.
>
> So is the goal here to provide something specific and idiomatic for the ML
> community or is the goal to make a primitive that's C*-centric that then
> another layer can write to? I personally argue for the former; I don't see
> this specific data type going away any time soon.
>
> On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:
>
> but as you point out it has the problem of allowing nulls.
>
>
> If nulls are not allowed for the elements, then either we need  a) a new
> type, or b) add some way to say elements may not be null…. As much as I do
> like b, I am leaning towards new type for this use case.
>
> So, to flesh out the type requirements I have seen so far
>
> 1) represents a fixed size array of element type
> * on write path we will need to validate this
> 2) element may not be null
> * on write path we will need to validate this
> 3) “frozen” (is this really a requirement for the type or is this
> just simpler for the ANN work?  I feel that this shouldn’t be a requirement)
> 4) works for all types (my requirement; original proposal is float only,
> but could logically expand to primitive types)
>
> Anything else?
>
> The key thing about a vector is that unlike lists or tuples you really
> don't care about individual elements, you care about doing vector and
> matrix multiplications with the thing as a unit.
>
>
> That maybe true for this use case, but “should” this be true for the type
> itself?  I feel like no… if a user wants the Nth element of a vector why
> would we block them?  I am not saying the first patch, or even 5.0 adds
> support for index access, I am just trying to push back saying that the
> type should not block this.
>
> (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT
> VECTOR[N].)
>
>
> Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I
> prefer this syntax but that limitation may not be desired for all use
> cases… we could always add LIST and ARRAY later
> to address that case.
>
> In terms of syntax I have seen, here is my ordered preference:
>
> 1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
> 2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this
> semantic…. Could even be NON NULL TYPE[size]
>
> On Apr 27, 2023, at 9:00 AM, Benedict  wrote:
>
>
> That’s a bounded ring buffer, not a fixed length array.
>
> This definitely isn’t a tuple because the types are all the same, which is
> pretty crucial for matrix operations. Matrix libraries generally work on
> arrays of known dimensionality, or sparse representations.
>
> Whether we draw any semantic link between the frozen list and whatever we
> do here, it is fundamentally a frozen list with a restriction on its size.
> What we’re defining here are “statically” sized arrays, whereas a frozen
> list is essentially a dynamically sized array.
>
> I do not think vector is a good name because vector is used in some other
> popular languages to mean a (dynamic) list, which is confusing when we also
> have a list concept.
>
> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link
> with list. Though it is a bit strange that this particular type declaration
> looks so different to other collection types.
>
> On 27 Apr 2023, at 16:48, Jeff Jirsa  wrote:
>
> 
>
>
> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis  wrote:
>
> It's been a while, so I may be missing something, but do we already have
> 

Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
A > B > C on both polls.

Having talked to several users in the community that are highly excited
about this change, this gets to what developers want to do at Cassandra
scale: store embeddings and retrieve them.

On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
wrote:

> A > B > C
>
> I don't think that ML is such a niche application that it can't have its
> own CQL data type. Also, vectors are mathematical elements that have more
> applications that ML.
>
> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>
>>
>>
>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>
>>> Should we add a vector type to Cassandra designed to meet the needs of
>>> machine learning use cases, specifically feature and embedding vectors for
>>> training, inference, and vector search?
>>>
>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>> types, with no nulls allowed, and with no need for random access. The ML
>>> industry overwhelmingly uses float32 vectors, to the point that the
>>> industry-leading special-purpose vector database ONLY supports that data
>>> type.
>>>
>>> This poll is to gauge consensus subsequent to the recent discussion
>>> thread at
>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>
>>> Please rank the discussed options from most preferred option to least,
>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
>>> = A (C is my preference, followed by B or A approximately equally.)
>>>
>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>> we need to tie it to any particular implementation details.
>>>
>>> (B) I am okay with adding a vector type but I believe we must add array
>>> types that compose with all Cassandra types first, and make vectors a
>>> special case of arrays-without-null-elements.
>>>
>>> (C) I am not in favor of adding a built-in vector type.
>>>
>>
>>
>>
>> A  > B > C
>>
>> B is stated as "must add array types…".  I think this is a bit loaded.
>> If B was the (A + the implementation needs to be a non-null frozen float32
>> array, serialisation forward compatible with other frozen arrays later
>> implemented) I would put this before (A).  Especially because it's been
>> shown already this is easy to implement.
>>
>>
>>
>


Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
I'll speak up on that one. If you look at my ranked voting, that is where
my head is. I get accused of scope creep (a lot) and looking at the initial
proposal Jonathan put on the ML it was mostly "Developers are adopting
vector search at a furious pace and I think I have a simple way of adding
support to keep Cassandra relevant for these use cases" Instead of just
focusing on this use case, I feel the arguments have bike shedded into
scope creep which means it will take forever to get into the project.

My preference is to see one thing validated with an MVP and get it into the
hands of developers sooner so we can continue to iterate based on actual
usage.

It doesn't say your points are wrong or your opinions are broken, I'm
voting for what I think will be awesome for users sooner.

Patrick

On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:

> Could folk voting against a general purpose type (that could well be
> called a vector) briefly explain their reasoning?
>
> We established in the other thread that it’s technically trivial, meaning
> folk must think it is strictly superior to only support float rather than
> eg all numeric types (note: for the type, not the ANN).
>
> I am surprised, and the blurbs accompanying votes so far don’t seem to
> touch on this, mostly just endorsing the idea of a vector.
>
>
> On 2 May 2023, at 20:20, Patrick McFadin  wrote:
>
> 
> A > B > C on both polls.
>
> Having talked to several users in the community that are highly excited
> about this change, this gets to what developers want to do at Cassandra
> scale: store embeddings and retrieve them.
>
> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
> wrote:
>
>> A > B > C
>>
>> I don't think that ML is such a niche application that it can't have its
>> own CQL data type. Also, vectors are mathematical elements that have more
>> applications that ML.
>>
>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>>
>>>
>>>
>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>>
>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>> machine learning use cases, specifically feature and embedding vectors for
>>>> training, inference, and vector search?
>>>>
>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>> industry-leading special-purpose vector database ONLY supports that data
>>>> type.
>>>>
>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>> thread at
>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>
>>>> Please rank the discussed options from most preferred option to least,
>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>
>>>> (A) I am in favor of adding a vector type for floats; I do not believe
>>>> we need to tie it to any particular implementation details.
>>>>
>>>> (B) I am okay with adding a vector type but I believe we must add array
>>>> types that compose with all Cassandra types first, and make vectors a
>>>> special case of arrays-without-null-elements.
>>>>
>>>> (C) I am not in favor of adding a built-in vector type.
>>>>
>>>
>>>
>>>
>>> A  > B > C
>>>
>>> B is stated as "must add array types…".  I think this is a bit loaded.
>>> If B was the (A + the implementation needs to be a non-null frozen float32
>>> array, serialisation forward compatible with other frozen arrays later
>>> implemented) I would put this before (A).  Especially because it's been
>>> shown already this is easy to implement.
>>>
>>>
>>>
>>


Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
Yeah, it's a bit of a mess but mailing list yo. People reading this would
have no idea we are friends. ;) (Which we are, for anyone reading this
later!)

I must have missed the point of this already being done. How about it,
David? Did you already make this?

"FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
support types beyond float. Not that we should start with float"
That is not my interpretation and I can definitely see how that may be
frustrating. If B is pretty much done then we are good. My concern, as
noted earlier, is the scope creep component that will delay this happening
for much longer.

David. End this argument. SHOW THE CODE!

Patrick


On Tue, May 2, 2023 at 1:04 PM Benedict  wrote:

> But it’s so trivial it was already implemented by David in the span of ten
> minutes? If anything, we’re slowing progress down by refusing to do the
> extra types, as we’re busy arguing about it rather than delivering a
> feature?
>
> FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
> support types beyond float. Not that we should start with float.
>
> So, this whole debate is a mess, I think. But hey ho.
>
> On 2 May 2023, at 20:57, Patrick McFadin  wrote:
>
> 
> I'll speak up on that one. If you look at my ranked voting, that is where
> my head is. I get accused of scope creep (a lot) and looking at the initial
> proposal Jonathan put on the ML it was mostly "Developers are adopting
> vector search at a furious pace and I think I have a simple way of adding
> support to keep Cassandra relevant for these use cases" Instead of just
> focusing on this use case, I feel the arguments have bike shedded into
> scope creep which means it will take forever to get into the project.
>
> My preference is to see one thing validated with an MVP and get it into
> the hands of developers sooner so we can continue to iterate based on
> actual usage.
>
> It doesn't say your points are wrong or your opinions are broken, I'm
> voting for what I think will be awesome for users sooner.
>
> Patrick
>
> On Tue, May 2, 2023 at 12:29 PM Benedict  wrote:
>
>> Could folk voting against a general purpose type (that could well be
>> called a vector) briefly explain their reasoning?
>>
>> We established in the other thread that it’s technically trivial, meaning
>> folk must think it is strictly superior to only support float rather than
>> eg all numeric types (note: for the type, not the ANN).
>>
>> I am surprised, and the blurbs accompanying votes so far don’t seem to
>> touch on this, mostly just endorsing the idea of a vector.
>>
>>
>> On 2 May 2023, at 20:20, Patrick McFadin  wrote:
>>
>> 
>> A > B > C on both polls.
>>
>> Having talked to several users in the community that are highly excited
>> about this change, this gets to what developers want to do at Cassandra
>> scale: store embeddings and retrieve them.
>>
>> On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña 
>> wrote:
>>
>>> A > B > C
>>>
>>> I don't think that ML is such a niche application that it can't have its
>>> own CQL data type. Also, vectors are mathematical elements that have more
>>> applications that ML.
>>>
>>> On Tue, 2 May 2023 at 19:15, Mick Semb Wever  wrote:
>>>
>>>>
>>>>
>>>> On Tue, 2 May 2023 at 17:14, Jonathan Ellis  wrote:
>>>>
>>>>> Should we add a vector type to Cassandra designed to meet the needs of
>>>>> machine learning use cases, specifically feature and embedding vectors for
>>>>> training, inference, and vector search?
>>>>>
>>>>> ML vectors are fixed-dimension (fixed-length) sequences of numeric
>>>>> types, with no nulls allowed, and with no need for random access. The ML
>>>>> industry overwhelmingly uses float32 vectors, to the point that the
>>>>> industry-leading special-purpose vector database ONLY supports that data
>>>>> type.
>>>>>
>>>>> This poll is to gauge consensus subsequent to the recent discussion
>>>>> thread at
>>>>> https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>>>>>
>>>>> Please rank the discussed options from most preferred option to least,
>>>>> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > 
>>>>> B
>>>>> = A (C is my preference, followed by B or A approximately equally.)
>>>>>
>>>>> (A) I am in favor

Re: [POLL] Vector type for ML

2023-05-02 Thread Patrick McFadin
\o/

Bring it in team. Group hug.

Now if you'll excuse me, I'm going to go build my preso on how Cassandra is
the only distributed database you can do vector search in an ACID
transaction.

Patrick

On Tue, May 2, 2023 at 3:27 PM Jonathan Ellis  wrote:

> I had a call with David.  We agreed that we want a "vector" data type with
> these properties
>
> - Fixed length
> - No nulls
> - Random access not supported
>
> Where we disagreed was on my proposal to restrict vectors to only numeric
> data.  David's points were that
>
> (1) He has a use case today for a data type with the other vector
> properties,
> (2) It doesn't seem reasonable to create two data types with the same
> properties, one of which is restricted to numerics, and
> (3) The restrictions that I want for numeric vectors make more sense at
> the index and function level, than at the type level.
>
> I'm ready to concede that David has the better case here and move forward
> with a vector implementation without that restriction.
>
> On Tue, May 2, 2023 at 4:03 PM David Capwell  wrote:
>
>>  How about it, David? Did you already make this?
>>
>>
>> I checked out the patch, fixed serialize/deserialize, added the
>> constraints, then added a composeForFloat(ByteBuffer), with this the impact
>> to the POC patch was the following
>>
>> 1) move away from VectorType.instance.serializer().deserialize(bb) to
>> type.composeForFloat(bb), both return float[]
>> 2) change the index validate logic to move away from checking VectorType
>> and instead check for that plus the element type == FloatType.  I didn’t
>> bother to do this as its trivial
>>
>> David. End this argument. SHOW THE CODE!
>>
>>
>> If this argument ends and people are cool with vector supporting abstract
>> type, more than glad to help get this in.
>>
>> On May 2, 2023, at 1:53 PM, Jeremy Hanna 
>> wrote:
>>
>> I'm all for bringing more functionality to the masses sooner, but the
>> original idea has a very very specific use case.  Do we have use cases for
>> a general purpose Vector/Array data structure?  If so, awesome.  I just
>> wondered if generalizing provides value, beyond being straightforward to
>> implement.  I'm just trying to be sensitive to the database code
>> maintenance and driver support for general types versus a single type for a
>> specific, well defined purpose.
>>
>> If it could easily be a plugin, that's great - but the full picture
>> involves drivers that need to support it or you end up getting binary blobs
>> you have to decode client side and then do stuff with.  So ideally if you
>> have a well defined use case that you can build into the database, having
>> it just be part of the database and associated drivers - that makes the
>> experience much much better.
>>
>> I'm not trying to say B couldn't be valuable or that a plugin couldn't be
>> feasible.  I'm just trying to enlarge the picture a bit to see what that
>> means for this use case and for the supporting drivers/clients.
>>
>> On May 2, 2023, at 3:04 PM, Benedict  wrote:
>>
>> But it’s so trivial it was already implemented by David in the span of
>> ten minutes? If anything, we’re slowing progress down by refusing to do the
>> extra types, as we’re busy arguing about it rather than delivering a
>> feature?
>>
>> FWIW, my interpretation of the votes today is that we SHOULD NOT (ever)
>> support types beyond float. Not that we should start with float.
>>
>> So, this whole debate is a mess, I think. But hey ho.
>>
>> On 2 May 2023, at 20:57, Patrick McFadin  wrote:
>>
>> 
>> I'll speak up on that one. If you look at my ranked voting, that is where
>> my head is. I get accused of scope creep (a lot) and looking at the initial
>> proposal Jonathan put on the ML it was mostly "Developers are adopting
>> vector search at a furious pace and I think I have a simple way of adding
>> support to keep Cassandra relevant for these use cases" Instead of just
>> focusing on this use case, I feel the arguments have bike shedded into
>> scope creep which means it will take forever to get into the project.
>>
>> My preference is to see one thing validated with an MVP and get it into
>> the hands of developers sooner so we can continue to iterate based on
>> actual usage.
>>
>> It doesn't say your points are wrong or your opinions are broken, I'm
>> voting for what I think will be awesome for users sooner.
>>
>> Patrick
>>
>

Re: [POLL] Vector type for ML

2023-05-04 Thread Patrick McFadin
I agree with David's reasoning and the use of DENSE (and maybe eventually
SPARSE). This is terminology well established in the data world, and it
would lead to much easier adoption from users. VECTOR is close, but I can
see having to create a lot of content around "How to use it and not get in
trouble." (I have a lot of that content already)

 - We don't have to explain what it is. A lot of prior art out there
already [1][2][3]
 - We're matching an established term with what users would expect. No
surprises.
 - Shorter ramp-up time for users. Cassandra is being modernized.

The implementation is flexible, but the interface should empower our users
to be awesome.

Patrick

1 -
https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
2 -
https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
3 - https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/

On Thu, May 4, 2023 at 10:25 AM David Capwell  wrote:

> My views have changed over time on syntax and I feel type[dimention] may
> not be the best, so it has gone lower in my own personal ranking… this is
> my current preference
>
> 1) DENSE [dimention] | NON NULL [dimention]
> 2) VECTOR
> 3) type[dimention]
>
> My reasoning for this order
>
> * type[dimention] looks like syntax sugar for array, so
> users may assume list/array semantics, but we limit to non-null elements in
> a frozen array
> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct type
> makes more sense… this also leads to a possible future of VECTOR
> which is the non-fixed length version of this type.  What makes VECTOR
> different from list/array?  non-null elements and is frozen.  I don’t feel
> that VECTOR really tells users to expect non-null or frozen semantics, as
> there exists different VECTOR types for those reasons (sparse vs dense)…
> * DENSE may be confusing for people coming from languages where this just
> means “sequential layout”, which is what our frozen array/list already are…
> but since the target user is coming from a ML background, this shouldn’t
> offer much confusion.  DENSE just means FROZEN in Cassandra, with NON NULL
> elements (SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as
> syntax sugar for frozen
>
>
> On May 4, 2023, at 4:13 AM, Brandon Williams  wrote:
>
> 1. VECTOR
> 2. VECTOR FLOAT[n]
> 3. FLOAT[N]   (Non null by default)
>
> Redundant or not, I think having the VECTOR keyword helps signify what
> the app is generally about and helps get buy-in from ML stakeholders.
>
> On Thu, May 4, 2023 at 3:45 AM Benedict  wrote:
>
>
> Hurrah for initial agreement.
>
> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
> think VECTOR should be used to simply imply non-null, as this would be very
> unintuitive. More logical would be NONNULL, if this is the only condition
> being applied. Alternatively for arrays we could default to NONNULL and
> later introduce NULLABLE if we want to permit nulls.
>
> If the word vector is to be used it makes more sense to make it look like
> a list, so VECTOR as here the word VECTOR is clearly not
> redundant.
>
> So, I vote:
>
> 1) (NON NULL) FLOAT[N]
> 2) FLOAT[N]   (Non null by default)
> 3) VECTOR
>
>
>
> On 4 May 2023, at 08:52, Mick Semb Wever  wrote:
>
> 
>
>
> Did we agree on a CQL syntax?
>
> I don’t believe there has been a pool on CQL syntax… my understanding
> reading all the threads is that there are ~4-5 options and non are -1ed, so
> believe we are waiting for majority rule on this?
>
>
>
>
> Re-reading that thread, IIUC the valid choices remaining are…
>
> 1. VECTOR FLOAT[n]
> 2. FLOAT VECTOR[n]
> 3. VECTOR
> 4. VECTOR[n]
> 5. ARRAY
> 6. NON-NULL FROZEN
>
>
> Yes I'm putting my preference (1) first ;) because (banging on) if the
> future of CQL will have FLOAT[n] and FROZEN, where the VECTOR
> keyword is: for general cql users; just meaning "non-null and frozen",
> these gel best together.
>
> Options (5) and (6) are for those that feel we can and should provide this
> type without introducing the vector keyword.
>
>
>
>


Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Patrick McFadin
As somebody who gave this talk: https://youtu.be/9xf_IXNylhM I love the
evolution of this topic. Excited to see this! ++1 nb

Patrick



On Thu, May 4, 2023 at 11:35 AM C. Scott Andreas 
wrote:

> +1nb.
>
> As someone familiar with this work, it's pretty hard to overstate the
> impact it has on completing Cassandra's HTAP story. Eliminating the
> overhead of bulk reads and writes on production OLTP clusters is
> transformative.
>
> – Scott
>
> On May 4, 2023, at 9:47 AM, Doug Rohrer  wrote:
>
>
> Hello all,
>
> I’d like to put CEP-28 to a vote.
>
> Proposal:
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>
> Jira:
> https://issues.apache.org/jira/browse/CASSANDRA-16222
>
> Draft implementation:
>
> - Apache Cassandra Spark Analytics source code:
> https://github.com/frankgh/cassandra-analytics
> - Changes required for Sidecar:
> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apis
>
> Discussion:
> https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3
>
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding
> vetoes.
>
>
> Thanks,
>
> Doug Rohrer
>
>
>
>
>


Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I think we are still discussing implementation here when I'm talking about
developer experience. I want developers to adopt this quickly, easily and
be successful. Vector search is already a thing. People use it every day. A
successful outcome, in my view, is developers picking up this feature
without reading a manual. (Because they don't anyway and get in trouble) I
did some more extensive research about what other DBs are using for syntax.
The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'

Pinecone[1] - dense_vector, sparse_vector
Elastic[2]: dense_vector
Milvus[3]: float_vector, binary_vector
pgvector[4]: vector
Weaviate[5]: Different approach. All typed arrays can be indexed

Based on that I'm advocating a similar syntax:

- DENSE VECTOR
or
- VECTOR

[1] https://docs.pinecone.io/docs/hybrid-search
[2]
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
[3] https://milvus.io/docs/create_collection.md
[4] https://github.com/pgvector/pgvector
[5] https://weaviate.io/developers/weaviate/config-refs/datatypes

On Fri, May 5, 2023 at 6:07 AM Mike Adamson  wrote:

> Then we can have the indexing apparatus only accept *frozen* for
>> the HSNW case.
>>
> I'm inclined to agree with Benedict that the index will need to be
> specifically select by option rather than inferred based on type. As such
> there is no real reason for the *frozen* requirement on the type. The
> hnsw index can be built just as easily from a non-frozen array.
>
> I am in favour of enforcing non-null on the elements of an array by
> default. I would prefer that allowing nulls in the array would be a later
> addition if and when a use case arose for it.
>
> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
> wrote:
>
>> Even in the ML case, sparse can just mean zeros rather than nulls, and
>> they should compress similarly anyway.
>>
>> If we really want null values, I'd rather leave that in collections space.
>>
>> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
>> wrote:
>>
>>> I actually still prefer *type[dimension]*, because I think I
>>> intuitively read this as a primitive (meaning no null elements) array. Then
>>> we can have the indexing apparatus only accept *frozen* for
>>> the HSNW case.
>>>
>>> If that isn't intuitive to anyone else, I don't really have a strong
>>> opinion...but...conflating "frozen" and "dense" seems like a bad idea. One
>>> should indicate single vs. multi-cell, and the other the presence or
>>> absence of nulls/zeros/whatever.
>>>
>>> On Thu, May 4, 2023 at 12:51 PM Patrick McFadin 
>>> wrote:
>>>
>>>> I agree with David's reasoning and the use of DENSE (and maybe
>>>> eventually SPARSE). This is terminology well established in the data world,
>>>> and it would lead to much easier adoption from users. VECTOR is close, but
>>>> I can see having to create a lot of content around "How to use it and not
>>>> get in trouble." (I have a lot of that content already)
>>>>
>>>>  - We don't have to explain what it is. A lot of prior art out there
>>>> already [1][2][3]
>>>>  - We're matching an established term with what users would expect. No
>>>> surprises.
>>>>  - Shorter ramp-up time for users. Cassandra is being modernized.
>>>>
>>>> The implementation is flexible, but the interface should empower our
>>>> users to be awesome.
>>>>
>>>> Patrick
>>>>
>>>> 1 -
>>>> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
>>>> <https://urldefense.com/v3/__https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud6ieKGQw$>
>>>> 2 -
>>>> https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
>>>> <https://urldefense.com/v3/__https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ue1o2CO2Q$>
>>>> 3 -
>>>> https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/
>>>> <https://urldefense.com/v3/__https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVa

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I hope we are willing to consider developers that use our system because if
I had to teach people to use "NON-NULL FROZEN" I'm pretty sure the
response would be:

Did you tell me to go write a distributed map-reduce job in Erlang? I
beleive I did, Bob.

On Fri, May 5, 2023 at 8:05 AM Josh McKenzie  wrote:

> Idiomatically, to my mind, there's a question of "what space are we
> thinking about this datatype in"?
>
> - In the context of mathematics, nullability in a vector would be 0
> - In the context of Cassandra, nullability tends to mean a tombstone (or
> nothing)
> - In the context of programming languages, it's all over the place
>
> Given many models are exploring quantizing to int8 and other data types,
> there's definitely the "support other data types easily in the future"
> piece to me we need to keep in mind.
>
> So with the above and the "meet the user where they are and don't make
> them understand more of Cassandra than absolutely critical to use it", I
> lean:
>
> 1. DENSE_VECTOR
> 2. VECTOR
> 3. type[dimension]
>
> This leaves the path open for us to expand on it in the future with sparse
> support and allows us to introduce some semantics that indicate idioms
> around nullability for the users coming from a different space.
>
> "NON-NULL FROZEN" is strictly correct, however it requires
> understanding idioms of how Cassandra thinks about data (nulls mean
> different things to us, we have differences between frozen and non-frozen
> due to constraints in our storage engine and materialization of data, etc)
> that get in the way of users doing things in the pattern they're familiar
> with without learning more about the DB than they're probably looking to
> learn. Historically this has been a challenge for us in adoption; the
> classic "Why can't I just write and delete and write as much as I want? Why
> are deletes filling up my disk?" problem comes to mind.
>
> I'd also be happy with us supporting:
> * NON-NULL FROZEN
> * DENSE_VECTOR as syntactic sugar for the above
>
> If getting into the "built-in syntactic sugar mapping for communities and
> specific use-cases" is something we're willing to consider.
>
> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>
> I think we are still discussing implementation here when I'm talking about
> developer experience. I want developers to adopt this quickly, easily and
> be successful. Vector search is already a thing. People use it every day. A
> successful outcome, in my view, is developers picking up this feature
> without reading a manual. (Because they don't anyway and get in trouble) I
> did some more extensive research about what other DBs are using for syntax.
> The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
>
> Pinecone[1] - dense_vector, sparse_vector
> Elastic[2]: dense_vector
> Milvus[3]: float_vector, binary_vector
> pgvector[4]: vector
> Weaviate[5]: Different approach. All typed arrays can be indexed
>
> Based on that I'm advocating a similar syntax:
>
> - DENSE VECTOR
> or
> - VECTOR
>
> [1] https://docs.pinecone.io/docs/hybrid-search
> [2]
> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
> [3] https://milvus.io/docs/create_collection.md
> [4] https://github.com/pgvector/pgvector
> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes
>
> On Fri, May 5, 2023 at 6:07 AM Mike Adamson  wrote:
>
> Then we can have the indexing apparatus only accept *frozen* for
> the HSNW case.
>
> I'm inclined to agree with Benedict that the index will need to be
> specifically select by option rather than inferred based on type. As such
> there is no real reason for the *frozen* requirement on the type. The
> hnsw index can be built just as easily from a non-frozen array.
>
> I am in favour of enforcing non-null on the elements of an array by
> default. I would prefer that allowing nulls in the array would be a later
> addition if and when a use case arose for it.
>
> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
> wrote:
>
> Even in the ML case, sparse can just mean zeros rather than nulls, and
> they should compress similarly anyway.
>
> If we really want null values, I'd rather leave that in collections space.
>
> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
> wrote:
>
> I actually still prefer *type[dimension]*, because I think I intuitively
> read this as a primitive (meaning no null elements) array. Then we can have
> the indexing apparatus only accept *frozen* for the HSNW case.
>
> If that isn't intuitiv

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
My vote is:
1. DENSE VECTOR
2. VECTOR
3. ARRAY


On Fri, May 5, 2023 at 9:43 AM David Capwell  wrote:

> Went through and created a spreed sheet of current votes… For Patric and
> Mike, I don’t see a clear vote, so I put a ? where I “think” your
> preference is… for Mick, I only put one vote as the list looked like a
> summary, but you mentioned the first was your preference
>
> *Syntax*
>
> *Jonathan Ellis*
>
> *David Capwell*
>
> *Josh McKenzie*
>
> *Caleb Rackliffe*
>
> *Patrick McFadin*
>
> *Brandon Williams*
>
> *Mike Adamson*
>
> *Benedict*
>
> *Mick Semb Wever*
>
> VECTOR
>
> 1
>
> 2
>
> 2
>
>
>
> 1
>
> ?
>
> 3
>
>
> DENSE VECTOR
>
> 2
>
> 1
>
>
>
> ?
>
>
> ?
>
>
>
> type[dimension]
>
> 3
>
> 3
>
> 3
>
> 1
>
>
> 3
>
>
> 2
>
>
> DENSE_VECTOR
>
>
>
> 1
>
>
>
>
>
>
>
> NON NULL [dimention]
>
>
> 1
>
>
>
>
>
>
> 1
>
>
> VECTOR type[n]
>
>
>
>
>
>
> 2
>
>
>
> 1
>
> ARRAY
>
>
>
>
>
>
>
>
>
>
> NON-NULL FROZEN
>
>
>
>
>
>
>
>
>
>
>
> 1 = top pick
> 2 = second pick
> 3 = third pick
>
> Let me know if I am missing anyone, or if I have bad data
>
> On May 5, 2023, at 9:23 AM, Jonathan Ellis  wrote:
>
> +10 for not inflicting unwieldy keywords on ML users.
>
> Re Josh's summary, mostly agreed, my only objection to adding the DENSE
> keyword is that I don't see a foreseeable future where we also support
> sparse vectors, so it would end up being unnecessary extra verbosity.  So
> my preference would be
>
> 1. VECTOR
> 2. DENSE VECTOR (space instead of underscore, SQL isn't
> afraid of spaces)
> 3. type[dimension]
>
> On Fri, May 5, 2023 at 10:54 AM Patrick McFadin 
> wrote:
>
>> I hope we are willing to consider developers that use our system because
>> if I had to teach people to use "NON-NULL FROZEN" I'm pretty sure
>> the response would be:
>>
>> Did you tell me to go write a distributed map-reduce job in Erlang? I
>> beleive I did, Bob.
>>
>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie 
>> wrote:
>>
>>> Idiomatically, to my mind, there's a question of "what space are we
>>> thinking about this datatype in"?
>>>
>>> - In the context of mathematics, nullability in a vector would be 0
>>> - In the context of Cassandra, nullability tends to mean a tombstone (or
>>> nothing)
>>> - In the context of programming languages, it's all over the place
>>>
>>> Given many models are exploring quantizing to int8 and other data types,
>>> there's definitely the "support other data types easily in the future"
>>> piece to me we need to keep in mind.
>>>
>>> So with the above and the "meet the user where they are and don't make
>>> them understand more of Cassandra than absolutely critical to use it", I
>>> lean:
>>>
>>> 1. DENSE_VECTOR
>>> 2. VECTOR
>>> 3. type[dimension]
>>>
>>> This leaves the path open for us to expand on it in the future with
>>> sparse support and allows us to introduce some semantics that indicate
>>> idioms around nullability for the users coming from a different space.
>>>
>>> "NON-NULL FROZEN" is strictly correct, however it requires
>>> understanding idioms of how Cassandra thinks about data (nulls mean
>>> different things to us, we have differences between frozen and non-frozen
>>> due to constraints in our storage engine and materialization of data, etc)
>>> that get in the way of users doing things in the pattern they're familiar
>>> with without learning more about the DB than they're probably looking to
>>> learn. Historically this has been a challenge for us in adoption; the
>>> classic "Why can't I just write and delete and write as much as I want? Why
>>> are deletes filling up my disk?" problem comes to mind.
>>>
>>> I'd also be happy with us supporting:
>>> * NON-NULL FROZEN
>>> * DENSE_VECTOR as syntactic sugar for the above
>>>
>>> If getting into the "built-in syntactic sugar mapping for communities
>>> and specific use-cases" is something we're willing to consider.
>>>
>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>
>>&g

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
Derek, despite your preference, I would hang out with you at a party.

On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker 
wrote:

> Speaking as someone who likes Erlang, maybe that's why I also like NONNULL
> FROZEN>. It's unambiguous what Cassandra is going to do with that
> type. DENSE VECTOR means I need to go read docs (and then probably
> double-check in the source to be sure) to be sure what exactly is going on.
>
> Cheers,
>
> Derek
>
> On Fri, May 5, 2023 at 9:54 AM Patrick McFadin  wrote:
>
>> I hope we are willing to consider developers that use our system because
>> if I had to teach people to use "NON-NULL FROZEN" I'm pretty sure
>> the response would be:
>>
>> Did you tell me to go write a distributed map-reduce job in Erlang? I
>> beleive I did, Bob.
>>
>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie 
>> wrote:
>>
>>> Idiomatically, to my mind, there's a question of "what space are we
>>> thinking about this datatype in"?
>>>
>>> - In the context of mathematics, nullability in a vector would be 0
>>> - In the context of Cassandra, nullability tends to mean a tombstone (or
>>> nothing)
>>> - In the context of programming languages, it's all over the place
>>>
>>> Given many models are exploring quantizing to int8 and other data types,
>>> there's definitely the "support other data types easily in the future"
>>> piece to me we need to keep in mind.
>>>
>>> So with the above and the "meet the user where they are and don't make
>>> them understand more of Cassandra than absolutely critical to use it", I
>>> lean:
>>>
>>> 1. DENSE_VECTOR
>>> 2. VECTOR
>>> 3. type[dimension]
>>>
>>> This leaves the path open for us to expand on it in the future with
>>> sparse support and allows us to introduce some semantics that indicate
>>> idioms around nullability for the users coming from a different space.
>>>
>>> "NON-NULL FROZEN" is strictly correct, however it requires
>>> understanding idioms of how Cassandra thinks about data (nulls mean
>>> different things to us, we have differences between frozen and non-frozen
>>> due to constraints in our storage engine and materialization of data, etc)
>>> that get in the way of users doing things in the pattern they're familiar
>>> with without learning more about the DB than they're probably looking to
>>> learn. Historically this has been a challenge for us in adoption; the
>>> classic "Why can't I just write and delete and write as much as I want? Why
>>> are deletes filling up my disk?" problem comes to mind.
>>>
>>> I'd also be happy with us supporting:
>>> * NON-NULL FROZEN
>>> * DENSE_VECTOR as syntactic sugar for the above
>>>
>>> If getting into the "built-in syntactic sugar mapping for communities
>>> and specific use-cases" is something we're willing to consider.
>>>
>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>
>>> I think we are still discussing implementation here when I'm talking
>>> about developer experience. I want developers to adopt this quickly, easily
>>> and be successful. Vector search is already a thing. People use it every
>>> day. A successful outcome, in my view, is developers picking up this
>>> feature without reading a manual. (Because they don't anyway and get in
>>> trouble) I did some more extensive research about what other DBs are using
>>> for syntax. The consensus is some variety of 'VECTOR', 'DENSE' and 'SPARSE'
>>>
>>> Pinecone[1] - dense_vector, sparse_vector
>>> Elastic[2]: dense_vector
>>> Milvus[3]: float_vector, binary_vector
>>> pgvector[4]: vector
>>> Weaviate[5]: Different approach. All typed arrays can be indexed
>>>
>>> Based on that I'm advocating a similar syntax:
>>>
>>> - DENSE VECTOR
>>> or
>>> - VECTOR
>>>
>>> [1] https://docs.pinecone.io/docs/hybrid-search
>>> [2]
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
>>> [3] https://milvus.io/docs/create_collection.md
>>> [4] https://github.com/pgvector/pgvector
>>> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes
>>>
>>> On Fri, May 5, 2023 at 6:07 AM Mike Adamson 
>>> wrote:
>>>

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Patrick McFadin
Under the goals section, there is this line:


   1. Scatter/gather across replicas, combining topK from each to get
   global topK.


But what I'm hearing is, exactly how will that happen? Maybe this is an SAI
question too. How is that verified in SAI?

On Tue, May 9, 2023 at 11:07 AM David Capwell  wrote:

> Approach section doesn’t go over how this will handle cross replica
> search, this would be good to flesh out… given results have a real ranking,
> the current 2i logic may yield incorrect results… so would think we need
> num_ranges / rf queries in the best case, with some new capability to sort
> the results?  If my assumption is correct, then how errors are handled
> should also be fleshed out… Example: 1k cluster without vnode and RF=3, so
> 333 queries fanned out to match, then coordinator needs to sort… if 1 of
> the queries fails and can’t fall back to peers… does the query fail (I
> assume so)?
>
> On May 8, 2023, at 7:20 PM, Jonathan Ellis  wrote:
>
> Hi all,
>
> Following the recent discussion threads, I would like to propose CEP-30 to
> add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached
> Indexes (SAI) to Apache Cassandra.
>
> The primary goal of this proposal is to implement ANN vector search
> capabilities, making Cassandra more useful to AI developers and
> organizations managing large datasets that can benefit from fast similarity
> search.
>
> The implementation will leverage Lucene's Hierarchical Navigable Small
> World (HNSW) library and introduce a new CQL data type for vector
> embeddings, a new SAI index for ANN search functionality, and a new CQL
> operator for performing ANN search queries.
>
> We are targeting the 5.0 release for this feature, in conjunction with the
> release of SAI. The proposed changes will maintain compatibility with
> existing Cassandra functionality and compose well with the already-approved
> SAI features.
>
> Please find the full CEP document here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>


Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Patrick McFadin
+1

On Tue, May 9, 2023 at 10:58 AM Caleb Rackliffe 
wrote:

> +1
>
> On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski 
> wrote:
>
>> Let's vote.
>>
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
>>
>> Piotr Kołaczkowski
>> e. pkola...@datastax.com
>> w. www.datastax.com
>>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
Having pulled a lot of developers out of the 2i fire, I would love it if
defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
seems like the right move for most developers that don't read docs and
assume behavior.

As much as I hate that 2i would be the configured default, I get it. New
feature and this is the right thing for users. Would there be any way to
switch 2i to SAI for the same index declaration? That would make for a nice
upgrade for users moving to 5 without having to re-create indexes.

Patrick

On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:

> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
> prefer allowing USING...WITH... for CREATE INDEX
>
>
> I have 0 issues with a new syntax to make this more clear
>
> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
> more or less what my original proposal was above (modulo the configurable
> default).
>
>
> I have 0 issues deprecating and producing a ClientWarning recommending the
> new syntax, but I would be against removing this syntax later on… it should
> be low effort to keep, so breaking a user would not be desirable for me.
>
> change only the fact that CREATE INDEX retains a configurable default
>
>
> This option allows users to control this behavior, and allows us to change
> the default over time.  For 5.0 I am strongly against SAI being the default
> (new features disabled by default), but I wouldn’t have issues in later
> versions changing the default once its been out for awhile.
>
> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
>
> In my mind this is no different from defaulting to BTI in a follow up
> release, but if this concern is that the legacy index leaked details such
> as index tables, so changing the default would have side effects in the
> public domain that users might not expect, then I get it… are there other
> concerns?
>
> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
> wrote:
>
> tl;dr If you take my original proposal and change only the fact that CREATE
> INDEX retains a configurable default, I think we get to the same place?
>
> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>
> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe 
> wrote:
>
>> I see a broad desire here to have a configurable (YAML) default
>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>> the concept of a default index implementation is pretty standard for most
>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>> still need to either revert to CREATE CUSTOM INDEX or add the
>> USING...WITH... extensions to CREATE INDEX to override the default or
>> specify parameters, which will be in play once SAI supports basic text
>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>> that's more or less what my original proposal was above (modulo the
>> configurable default).
>>
>> Thoughts?
>>
>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>> We could introduce new syntax that properly appreciates there’s no
>>> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
>>> these indexes involve a partition key or scatter gather
>>>
>>> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>>>
>>> 
>>> +1 , as we must Improve the image of your own default indexing ability.
>>>
>>> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
>>> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
>>> some version after 5.0?
>>>
>>> for as I know there may be users using this as a plugin-index interface,
>>> like https://github.com/Stratio/cassandra-lucene-index (though these
>>> project may be inactive, But if someone wants to do something similar in
>>> the future, we don't have to stop).
>>>
>>>
>>>
>>> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>>>
 +1 for this, especially in the long term.  CREATE INDEX should do the
 right thing for most people without requiring extra ceremony.

 On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
 jeremiah.jor...@gmail.com> wrote:

> If the consensus is that SAI is the right default index, then we
> should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM
> INDEX.
>
>
> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
> wrote:
>
> Earlier today, Mick started a thread on the future of our index
> creation DDL on Slack:
>
> https://

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
There will be a LOT of content around using SAI in 5.0.

CCing marketing ML

On Wed, May 10, 2023 at 8:38 PM Jeff Jirsa  wrote:

> Changes like this always scare me, but the benefits probably outweigh the
> risks. Probably obviously to whoever implements but please make sure if
> this happens is super visible in both NEWS and simultaneously updates the
> to-string / to-cql representation of the schema in cqlsh / drivers /
> snapshots
>
> On Wed, May 10, 2023 at 8:27 PM Patrick McFadin 
> wrote:
>
>> Having pulled a lot of developers out of the 2i fire, I would love it if
>> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
>> seems like the right move for most developers that don't read docs and
>> assume behavior.
>>
>> As much as I hate that 2i would be the configured default, I get it. New
>> feature and this is the right thing for users. Would there be any way to
>> switch 2i to SAI for the same index declaration? That would make for a nice
>> upgrade for users moving to 5 without having to re-create indexes.
>>
>> Patrick
>>
>> On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:
>>
>>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>>> prefer allowing USING...WITH... for CREATE INDEX
>>>
>>>
>>> I have 0 issues with a new syntax to make this more clear
>>>
>>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>>> more or less what my original proposal was above (modulo the configurable
>>> default).
>>>
>>>
>>> I have 0 issues deprecating and producing a ClientWarning recommending
>>> the new syntax, but I would be against removing this syntax later on… it
>>> should be low effort to keep, so breaking a user would not be desirable for
>>> me.
>>>
>>> change only the fact that CREATE INDEX retains a configurable default
>>>
>>>
>>> This option allows users to control this behavior, and allows us to
>>> change the default over time.  For 5.0 I am strongly against SAI being the
>>> default (new features disabled by default), but I wouldn’t have issues in
>>> later versions changing the default once its been out for awhile.
>>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>>
>>> In my mind this is no different from defaulting to BTI in a follow up
>>> release, but if this concern is that the legacy index leaked details such
>>> as index tables, so changing the default would have side effects in the
>>> public domain that users might not expect, then I get it… are there other
>>> concerns?
>>>
>>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
>>> wrote:
>>>
>>> tl;dr If you take my original proposal and change only the fact that CREATE
>>> INDEX retains a configurable default, I think we get to the same place?
>>>
>>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>>
>>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
>>>> I see a broad desire here to have a configurable (YAML) default
>>>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>>>> the concept of a default index implementation is pretty standard for most
>>>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>>>> still need to either revert to CREATE CUSTOM INDEX or add the
>>>> USING...WITH... extensions to CREATE INDEX to override the default or
>>>> specify parameters, which will be in play once SAI supports basic text
>>>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>>>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>>>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>>>> that's more or less what my original proposal was above (modulo the
>>>> configurable default).
>>>>
>>>> Thoughts?
>>>>
>>>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>>>
>>>>> I’m not convinced by the changing defaults argument here. The
>>>>> characteristics of the two index types are very different, and users with
>>>>> scripts that make indexes today

Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Patrick McFadin
1
Yes
4



On Mon, May 15, 2023 at 3:00 AM Benedict  wrote:

> 3: CREATE  INDEX (Otherwise 2)
> No
> If configurable, should be a distributed configuration. This is very
> different to other local configurations, as the 2i selected has semantic
> implications, not just performance (and the perf implications are also much
> greater)
>
> On 15 May 2023, at 10:45, Mike Adamson  wrote:
>
> 
>
>> [POLL] Centralize existing syntax or create new syntax?
>>
>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>
> [POLL] Should there be a default? (YES/NO)
>>
>
> Yes
>
> [POLL] What do do with the default?
>>
>> 1.) Allow a default, and switch it to SAI (no configurables)
>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>> 3.) YAML config to override default index (legacy 2i remains the default)
>> 4.) YAML config/guardrail to require index type selection (not required
>> by default)
>>
>
> 3.) YAML config to override default index (legacy 2i remains the default)
>
>
>
> On Mon, 15 May 2023 at 08:54, Mick Semb Wever  wrote:
>
>>
>>
>> [POLL] Centralize existing syntax or create new syntax?
>>>
>>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>>
>>
>>
>> (1) CREATE INDEX …
>>
>>
>>
>>> [POLL] Should there be a default? (YES/NO)
>>>
>>
>>
>> Yes (but see below).
>>
>>
>>
>>> [POLL] What do do with the default?
>>>
>>> 1.) Allow a default, and switch it to SAI (no configurables)
>>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>>> 3.) YAML config to override default index (legacy 2i remains the default)
>>> 4.) YAML config/guardrail to require index type selection (not required
>>> by default)
>>>
>>
>>
>> (4) YAML config. Commented out default of 2i.
>>
>> I agree that the default cannot change in 5.0, but our existing default
>> of 2i can be commented out.
>>
>> For the user this gives them the same feedback, and puts the same
>> requirement to edit one line of yaml, as when we disabled MVs and SASI in
>> 4.0
>> No one has complained about either of these, which is a clear signal folk
>> understood how to get their existing DDLs to work from 3.x to 4.x
>>
>
>
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


Re: Vector search demo, and query syntax

2023-05-23 Thread Patrick McFadin
| I first stumbled a bit with "there's no where clause and no filtering
allowed…"
| But I doubt that reaction from any experienced cql user will last more
than a moment.

I was also wondering about that, but this syntax looks good. More
importantly, it will be easy to explain to end users.

Patrick

On Tue, May 23, 2023 at 1:28 PM Jonathan Ellis  wrote:

> Yes, that's totally reasonable syntactically, but I'd prefer not to open
> the can of worms of ordering by some functions but not others (and I
> definitely don't want to try to tackle ordering by all functions).  "You
> can order by expressions involving SAI columns" is a pretty easy rule to
> explain.
>
> On Tue, May 23, 2023 at 12:53 PM David Capwell  wrote:
>
>> I am ok with the syntax, but wondering if a function maybe better than a
>> CQL change?
>>
>> SELECT id, start, end, text
>> FROM {self.keyspace}.{self.table}
>> ORDER BY ANN(embedding, ?)
>> LIMIT ?
>>
>> Not really a common syntax, but could be useful down the line
>>
>> On May 23, 2023, at 12:37 AM, Mick Semb Wever  wrote:
>>
>>
>>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
>>> indexes first and eventually for all SAI indexes.  So this query would
>>> becomeSELECT id, start, end, text FROM
>>> {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*
>>>
>>
>>
>> LGTM.
>>
>> I first stumbled a bit with "there's no where clause and no filtering
>> allowed…"
>>
>> But I doubt that reaction from any experienced cql user will last more
>> than a moment.
>>
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Patrick McFadin
+1
Love the buzz this creating with new users. Thanks for the work on this
Jonathan.

On Thu, May 25, 2023 at 8:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-06-14 Thread Patrick McFadin
Andy,

Good to see you on the ML again! CEP-30 is slated for release with 5.0
later in the year. Until then, you'll need to do a local build or try it
out in a preview in Astra. A few of us have been talking about creating a
preview docker image since there is some interest in having it run in
k8ssandra. In any case, this is very alpha code and should be treated as
such. Reporting errors or unusual results would be greatly appreciated!

Patrick



On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) <
a.e.cob...@dundee.ac.uk> wrote:

> Hi All,
>
>
>
> Great news this has gone through, I wondering if we have a timescale for
> this making it to Beta or release ?  I’m asking because we have a project
> that would benefit from this approach.
>
>
>
> Andy
>
>
>
>
>
> *From: *Jonathan Ellis 
> *Date: *Tuesday, 30 May 2023 at 14:44
> *To: *dev 
> *Subject: *Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
>
> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
>
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
>
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>


Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-14 Thread Patrick McFadin
+1

On Wed, Jun 14, 2023 at 2:39 PM Adam Holmberg 
wrote:

> +1
>
> (long time coming!)
>
> On Wed, Jun 14, 2023 at 3:51 AM Jorge Bay Gondra 
> wrote:
>
>> +1 nb
>>
>> On Wed, Jun 14, 2023 at 9:13 AM Sam Tunnicliffe  wrote:
>>
>>> +1
>>>
>>> On 13 Jun 2023, at 15:14, Jeremy Hanna 
>>> wrote:
>>>
>>> Calling for a vote on CEP-8 [1].
>>>
>>> To clarify the intent, as Benjamin said in the discussion thread [2],
>>> the goal of this vote is simply to ensure that the community is in
>>> favor of the donation. Nothing more.
>>> The plan is to introduce the drivers, one by one. Each driver donation
>>> will need to be accepted first by the PMC members, as it is the case for
>>> any donation. Therefore the PMC should have full control on the pace at
>>> which new drivers are accepted.
>>>
>>> If this vote passes, we can start this process for the Java driver under
>>> the direction of the PMC.
>>>
>>> Jeremy
>>>
>>> 1.
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>>> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>>>
>>>
>>>


Re: [VOTE] CEP 33 - CIDR filtering authorizer

2023-06-28 Thread Patrick McFadin
+1

On Wed, Jun 28, 2023 at 3:42 AM Brandon Williams  wrote:

> +1
>
> Kind Regards,
> Brandon
>
>
> On Tue, Jun 27, 2023 at 12:17 PM Shailaja Koppu  wrote:
> >
> > Hi Team,
> >
> > (Starting a new thread for VOTE instead of reusing the DISCUSS thread,
> to follow usual procedure).
> >
> > Please vote on CEP 33 - CIDR filtering authorizer
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-33%3A+CIDR+filtering+authorizer
> .
> >
> > Thanks,
> > Shailaja
>


Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-10 Thread Patrick McFadin
I would say it helps a lot of people. 45k downloads in just last month:
https://pypistats.org/packages/cqlsh

I feel like a CEP would be in order, along the lines of CEP-8:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

Unless anyone objects, I can help you get the CEP together and we can get a
vote, then a JIRA in place for any changes in trunk.

Patrick

On Mon, Jul 10, 2023 at 4:58 PM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Same - really appreciate those efforts and also welcome the upstreaming
> and release automation...
>
> German
> --
> *From:* Jeff Widman 
> *Sent:* Sunday, July 9, 2023 1:44 PM
> *To:* Max C. 
> *Cc:* dev@cassandra.apache.org ; Brad Schoening
> 
> *Subject:* [EXTERNAL] Re: CASSANDRA-18654 - start publishing CQLSH to
> PyPI as part of the release process
>
> You don't often get email from j...@jeffwidman.com. Learn why this is
> important 
> Thanks Max, always encouraging to hear that the time I spend on open
> source is helping others.
>
> Your use case is very similar to what drove my original desire to get
> involved with the project. Being able to `pip install cqlsh` from a dev
> machine was so much lighter weight than the alternatives.
>
> Anyone else care to weigh in on this?
>
> What are the next steps to move to a decision?
>
> Cheers,
> Jeff
>
> On Sat, Jul 8, 2023, 7:23 PM Max C.  wrote:
>
> As a user, I really appreciate your efforts Jeff & Brad.  I would *love*
> for the C* project to officially support this.
>
> In our environment we have a lot of client machines that all share common
> NFS mounted directories.  It's much easier for us to create a Python
> virtual environment on a file server with the cqlsh PyPI package installed
> than it is to install the Cassandra RPMs on every single machine.  Before I
> discovered your PyPI package, our developers would need to login to  a
> Cassandra node in order to run cqlsh.  The cqlsh PyPI package, however, is
> in our standard "python dev tools" virtual environment -- along with
> Ansible, black, isort and various other Python packages; which means it's
> accessible to everyone, everywhere.
>
> I agree that this should not *replace* packaging cqlsh in the Cassandra
> RPM, so much provide an additional *option* for installing cqlsh without
> the baggage of installing the full Cassandra package.
>
> Thanks again for your work Jeff & Brad.
>
> - Max
> On 7/6/2023 5:55 PM, Jeff Widman wrote:
>
> Myself and Brad Schoening currently maintain
> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
> every Cassandra release.
>
> This way:
>
>- anyone who wants a lightweight client to talk to a remote cassandra
>can simply `pip install cqlsh` without having to download the full
>cassandra source, unzip it, etc.
>- it's very easy for folks to use it as scaffolding in their python
>scripts/tooling since they can simply include it in the list of their
>required dependencies.
>
> We currently handle the packaging by waiting for a release, then manually
> copy/pasting the code out of the cassandra source tree into
> https://github.com/jeffwidman/cqlsh which has some additional
> build/python package configuration files, then using standard
> python tooling to publish to PyPI.
>
> Given that our project is simply a build/packaging project, I wanted to
> start a conversation about upstreaming this into core Cassandra. I realize
> that Cassandra has no interest in maintaining lots of build targets... but
> given that cqlsh is written in Python and publishing to PyPI enables DBA's
> to share more complicated tooling built on top of it this seems like a
> natural fit for core cassandra rather than a standalone project.
>
> Goal:
> When a Cassandra release happens, the build/release process automatically
> publishes cqlsh to https://pypi.org/project/cqlsh/.
>
> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
> was some initial chatter about that in
> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
> lot of complexity, and I'm honestly not sure it's a great idea. Even if
> folks later want to go that route, the first hurdle is publishing to PyPI,
> so for now let's keep the scope of the discussion limited to treating PyPI
> purely as a release target, and not as an ingredient to a release.
>
> From an implementation perspective, this should be very straightforward.
> We don't have any differences from the CQLSH source that's in cassandra,
> instead we point folks to make changes to cqlsh in the Cassandra source. In
> fact we've made multiple contributions back to `cqlsh` ourselves and have
> drastically cleaned up the code:
> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
> So the only real change is adding the package config files and the b

[DISCUSS] Conducting a User Survey

2023-07-10 Thread Patrick McFadin
For quite a few years, I have done Twitter polls to gather helpful
information about how people use Apache Cassandra. Twitter is no longer the
best place to conduct this kind of activity since it has become a ghost
town.

We should ask more comprehensive questions to get the pulse of our user
community. I want to do a simple Google Form survey that we can promote on
every channel for a few weeks. I'll anonymize the results and post them on
cassandra.apache.org.

Here are the proposed questions I have compiled. A pretty basic set of
questions, but it would be fun to know the answer to several of these:
https://docs.google.com/document/d/18627E1UV-BjLyuNFgV0cgPwPmtjUHy7Th9Mk15ll1IA/edit?usp=sharing

Comments are open to all. Please let me know what you think.

Patrick


Apache Cassandra User Survey

2023-07-15 Thread Patrick McFadin
It’s been a long time since I’ve asked the community for feedback in a poll
or otherwise. A lot is changing in the data world, and we have an exciting
Cassandra release coming up with v5!
I would like to ask for five or ten minutes of your time to answer some
questions about how you use Cassandra and how we are doing as a community.
There are only 2 questions required, and the rest are all optional, so
answer whatever you can. It’s all helpful information.

https://forms.gle/KVNd7UmUfcBuoNvF7

The survey will run until July 29, 2023. Once completed, the results will
be anonymized and the results posted on http://cassandra.apache.org

Help spread the word by posting this invitation on social media, slack
channels, or emailing colleagues. The bigger the N, the better the survey!
Here’s a sample to get you started:

I recently took the Apache Cassandra® 2023 survey, and I think you should
too! By sharing your answers, you can help shape the future of the
Cassandra project and contribute to the community. Your opinion matters!
https://forms.gle/KVNd7UmUfcBuoNvF7

Patrick


Who wants a free Cassandra t-shirt?

2023-07-21 Thread Patrick McFadin
We have about another week left on the user survey I posted last week. The
response has been slow, so it's time to get things in gear.

I found a box of Cassandra t-shirts that will make an excellent thank you
for anyone filling out the survey. Once the survey window closes, I'll pick
a random group of emails to receive a shirt. Given the tepid response so
far, your chances are decent to receive a shirt!

5-10 minutes. That's all it takes. Promote to your networks and let's get
some opinions known!

https://forms.gle/KVNd7UmUfcBuoNvF7

Thanks again,

Patrick


Raw results from User Survey

2023-08-01 Thread Patrick McFadin
Thanks to everyone who participated in this survey. We had a significant
enough responses to give this a legitimacy.  220 responses!

I wanted to get the raw results out first so everyone can participate with
the full picture. I'll work on a blog post to post on the Apache web site
after this is done.

Graphs (easy read)
https://docs.google.com/document/d/1Rbg-VP4Xdvgp8EKNczkqfhFYeKwfc_ZmMW0c5Gol9pk/edit?usp=sharing

Anonymized spreadsheet of responses (make your own graphs)
https://docs.google.com/spreadsheets/d/1pjhpjID5sEW4Vcff8tq0Atbcq8Cds18pXorM4CQcStk/edit?usp=sharing

I'll be giving a bit more discussion in the Cassandra marketing meeting
tomorrow if you want to come hear my thoughts.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883297

Now, what surprised you in the results?

Patrick


Re: [Discuss] ​​CEP-35: Add PIP support for CQLSH

2023-08-10 Thread Patrick McFadin
Dinesh raises some good points.

If we do adopt this, there will be non-zero overhead of the release
process. This is fine but we need volunteers to run this process. My
understanding is that they need to be ideally PMC or at least Committers
on the project to go through all the steps to successfully release a new
artifact for our users.

Which was addressed in the proposed changes part of the CEP:

- A document detailing procedures for releasing to PyPI.org. This document
should include details on:

   1. How release to PyPI can be integrated into the build process. Can
   this be done with automation?
   2. How will credentials, permissions and ownership of packages on PyPI
   be managed?

-
My first thought was automation and integration into the build release.

Can you briefly outline the steps that need to be followed for a PyPI
release, Brad?

Patrick


On Wed, Aug 9, 2023 at 2:54 PM Abe Ratnofsky  wrote:

> I think it would be good for the project to have an official PyPI
> distribution, and the signal from users (40K downloads per month) is a
> clear indication that this is useful. Timely releases would help us get
> future improvements to cqlsh out faster, and moving this to an official
> distribution would protect users against any changes in this volunteer
> effort in case something happens in the future.
>
> +1 (nb)
>
> --
> Abe
>
> On Aug 9, 2023, at 1:33 PM, Brad  wrote:
>
> HI Dinesh,
>
> You are correct that the scope of this CEP is practical, narrow and
> limited to having an official distribution of CQLSH on the official Python
> package repository. Cassandra end-users, who use the CQLSH command line,
> would benefit in several direct ways:
>
>- A timely distribution of new CQLSH versions on the official Python
>package repository aligned with Apache Cassandra releases
>- A trusted distribution overseen by Apache Cassandra instead of third
>party maintainers. Today, there is only trust-based faith that the PyPI
>distribution of CQLSH matches the Apache Open Source one.
>- A lightweight distribution of CQLSH clocking in at 110KB vs
>downloading a 50MB tarball.
>
> Perhaps those are modest goals, but I would suggest they are big wins for
> the Cassandra user community. If you haven't tried it yet, please run '*pip
> install cqlsh*' on your desktop and see how nicely it works. Indeed, the
> return-on-investment of effort here should be really high, as the work is
> mostly already done, it's just run from a private repo at
> https://github.com/jeffwidman/cqlsh and has been maintained continually
> since 2013.
>
> Other initiatives such as subdividing the project(s) or re-writing the
> REPL in another language would be out-of-scope. It would be entirely
> appropriate to have a separate discussion on those two topics, if you wish
> to start that discussion.
>
> The process and degree of overhead required to publish to PyPI will
> require some discovery and discussion. Ideally, it would be possible to
> automate it. That is definitely a topic we need further input from the
> engineers involved in the build-release process.
>
> A pre-CEP discussion of this proposal was started by Jeff on the mailing
> list back in early July, see
> https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d.
>
> Regards,
>
> Brad
>
> On Wed, Aug 9, 2023 at 3:31 PM Dinesh Joshi  wrote:
>
>> Brad,
>>
>> Thanks for starting this discussion. My understanding is that we're
>> simply adding pip support for cqlsh and Apache Cassandra project will
>> officially publish a cqlsh pip package. This is a good goal but other
>> than having an official pip package, what is it that we're gaining?
>> Please don't interpret this as push back on your proposal but I am
>> unclear on what we're trying to solve by making this official
>> distribution. There are several distribution channels and it is
>> untenable to officially support all of them.
>>
>> If we do adopt this, there will be non-zero overhead of the release
>> process. This is fine but we need volunteers to run this process. My
>> understanding is that they need to be ideally PMC or at least Committers
>> on the project to go through all the steps to successfully release a new
>> artifact for our users.
>>
>> I would have liked this CEP to go a bit further than just packaging
>> cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
>> doesn't need to live in the cassandra repo. Extracting cqlsh into it's
>> separate repo would allow us to truly decouple cqlsh from the server.
>> This is already true for the most part as we rely on the Python driver
>> which is compatible with several cassandra releases. As it stands today
>> it is not possible for us to update cqlsh without making a Cassandra
>> release.
>>
>> If you truly want to go a bit further, we should consider rewriting
>> cqlsh in Java so we can easily share code from the server. We can then
>> potentially use Java Native Image[1] to produce a truly platform
>> inde

Cassandra Summit Update!

2023-10-15 Thread Patrick McFadin
Hello Cassandra Community,

Below you'll find the updated announcement being posted on the Cassandra
website.

The short version for short attention spans:
 - Cassandra Summit will be co-located with the AI.Dev conference. One
ticket, two conferences.
 - CFP for an AI track for the Cassandra Summit is open for one more week (
https://events.linuxfoundation.org/cassandra-summit/program/cfp/#suggested-topics
)
 - Register now! Earlybird ends in a few weeks. (
https://events.linuxfoundation.org/cassandra-summit/register/)

Cassandra Summit 2023 Gains Ai.dev as Co-located Event; NEW AI + Cassandra
Track

We are excited to announce that the new AI.dev: Open Source GenAI & ML
Summit 2023 
conference will be co-located with Cassandra Summit this year! This means
that Cassandra Summit will welcome an expanded audience that includes
developers who are delving into the realm of open source generative AI and
machine learning.

And with the addition of AI.dev, a NEW AI + Cassandra track

will be featured at the event. The Call for Proposals
 is open
until 9:00 AM PDT on Monday, October 23.

Here’s what you need to know:

WHEN + WHERE IS THIS HAPPENING?: Cassandra Summit + AI.dev will take place
December 12-13, 2023 at the San Jose, California McEnery Convention Center

WHO SHOULD ATTEND?: data practitioners, developers, engineers and
enthusiasts + developers who are interested in open source generative AI
and machine learning.

WHAT ARE THE CFP DETAILS? The CFP for the new AI + Cassandra track is now
open. This track will include lightning talks, conference sessions, panel
sessions and technical workshops that delve into distributed AI using
Cassandra and case studies that cover AI-powered applications using Apache
Cassandra. Submit a talk today!


HOW DO I REGISTER? Cassandra Summit and AI.dev will be running together
simultaneously and attendees will have access to both events with one
single registration. So whether you’ve already registered or are planning
to register, you’ll gain access to both of these events for one price. To
learn more or to register, visit
https://events.linuxfoundation.org/cassandra-summit/register/

Cassandra Summit is where the community can connect to share best practices
and use cases, celebrate makers and users, forge critical relationships,
and learn about advancements in Apache Cassandra. With the addition of
AI.dev, we are excited to expand the community’s flagship event and include
talks that showcase how AI and Cassandra synergize, unlocking new
possibilities and enhancing data-driven solutions.

We hope to see you soon!


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Patrick McFadin
I'm really surprised to see this email. The last I heard everything was on
track for getting into 5.0 and TBH and Accord is what a majority of users
are expecting in 5.0. And how could this be a .1 release?

What is it going to take to get it into 5.0? What is off track and how did
we get here?

On Mon, Oct 23, 2023 at 6:51 AM Sam Tunnicliffe  wrote:

> +1 from me too.
>
> Regarding Benedict's point, backwards incompatibility should be minimal;
> we modified snitch behaviour slightly, so that local snitch config only
> relates to the local node, all peer info is fetched from cluster metadata.
> There is also a minor change to the way failed bootstraps are handled, as
> with TCM they require an explicit cancellation step (running a nodetool
> command).
>
> Whether consensus decrees that this constitutes a major bump or not, I
> think decoupling these major projects from 5.0 is the right move.
>
>
> On 23 Oct 2023, at 12:57, Benedict  wrote:
>
> I’m cool with this.
>
> We may have to think about numbering as I think TCM will break some
> backwards compatibility and we might technically expect the follow-up
> release to be 6.0
>
> Maybe it’s not so bad to have such rapid releases either way.
>
> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>
> 
>
> The TCM work (CEP-21) is in its review stage but being well past our
> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> like to propose the following.
>
> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
> an immediate 5.1-alpha1 release.
>
> I see this as a win-win scenario for us, considering our current
> situation.  (Though it is unfortunate that Accord is included in this
> scenario because we agreed it to be based upon TCM.)
>
> This will mean…
>  - We get to focus on getting 5.0 to beta and GA, which already has a ton
> of features users want.
>  - We get an alpha release with TCM and Accord into users hands quickly
> for broader testing and feedback.
>  - We isolate GA efforts on TCM and Accord – giving oss and downstream
> engineers time and patience reviewing and testing.  TCM will be the biggest
> patch ever to land in C*.
>  - Give users a choice for a more incremental upgrade approach, given just
> how many new features we're putting on them in one year.
>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
> 4.x versions, just as if it had landed in 5.0.
>
>
> The risks/costs this introduces are
>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
> and at some point decide to undo this work, while we can throw away the
> cassandra-5.1 branch we would need to do a bit of work reverting the
> changes in trunk.  This is a _very_ edge case, as confidence levels on the
> design and implementation of both are already tested and high.
>  - We will have to maintain an additional branch.  I propose that we treat
> the 5.1 branch in the same maintenance window as 5.0 (like we have with 3.0
> and 3.11).  This also adds the merge path overhead.
>  - Reviewing of TCM and Accord will continue to happen post-merge.  This
> is not our normal practice, but this work will have already received its
> two +1s from committers, and such ongoing review effort is akin to GA
> stabilisation work on release branches.
>
>
> I see no other ok solution in front of us that gets us at least both the
> 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
> demand to start experimenting with these features, and our Cassandra Summit
> in December.
>
>
> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>
>
>
>


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Patrick McFadin
I’m going to be clearer in my statement.

This has to be in 5.0, even if it’s alpha and ships after December, or this
is going to be disaster that will take us much longer to unravel.

On Mon, Oct 23, 2023 at 7:49 AM Jeremiah Jordan 
wrote:

> +1 from me assuming we have tickets and two committer +1’s on them for
> everything being committed to trunk, and CI is working/passing before it
> merges.  The usual things, but I want to make sure we do not compromise on
> any of them as we try to “move fast” here.
>
> -Jeremiah Jordan
>
> On Oct 23, 2023 at 8:50:46 AM, Sam Tunnicliffe  wrote:
>
>> +1 from me too.
>>
>> Regarding Benedict's point, backwards incompatibility should be minimal;
>> we modified snitch behaviour slightly, so that local snitch config only
>> relates to the local node, all peer info is fetched from cluster metadata.
>> There is also a minor change to the way failed bootstraps are handled, as
>> with TCM they require an explicit cancellation step (running a nodetool
>> command).
>>
>> Whether consensus decrees that this constitutes a major bump or not, I
>> think decoupling these major projects from 5.0 is the right move.
>>
>>
>> On 23 Oct 2023, at 12:57, Benedict  wrote:
>>
>> I’m cool with this.
>>
>> We may have to think about numbering as I think TCM will break some
>> backwards compatibility and we might technically expect the follow-up
>> release to be 6.0
>>
>> Maybe it’s not so bad to have such rapid releases either way.
>>
>> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>>
>> 
>>
>> The TCM work (CEP-21) is in its review stage but being well past our
>> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
>> like to propose the following.
>>
>> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
>> an immediate 5.1-alpha1 release.
>>
>> I see this as a win-win scenario for us, considering our current
>> situation.  (Though it is unfortunate that Accord is included in this
>> scenario because we agreed it to be based upon TCM.)
>>
>> This will mean…
>>  - We get to focus on getting 5.0 to beta and GA, which already has a ton
>> of features users want.
>>  - We get an alpha release with TCM and Accord into users hands quickly
>> for broader testing and feedback.
>>  - We isolate GA efforts on TCM and Accord – giving oss and downstream
>> engineers time and patience reviewing and testing.  TCM will be the biggest
>> patch ever to land in C*.
>>  - Give users a choice for a more incremental upgrade approach, given
>> just how many new features we're putting on them in one year.
>>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
>> 4.x versions, just as if it had landed in 5.0.
>>
>>
>> The risks/costs this introduces are
>>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
>> and at some point decide to undo this work, while we can throw away the
>> cassandra-5.1 branch we would need to do a bit of work reverting the
>> changes in trunk.  This is a _very_ edge case, as confidence levels on the
>> design and implementation of both are already tested and high.
>>  - We will have to maintain an additional branch.  I propose that we
>> treat the 5.1 branch in the same maintenance window as 5.0 (like we have
>> with 3.0 and 3.11).  This also adds the merge path overhead.
>>  - Reviewing of TCM and Accord will continue to happen post-merge.  This
>> is not our normal practice, but this work will have already received its
>> two +1s from committers, and such ongoing review effort is akin to GA
>> stabilisation work on release branches.
>>
>>
>> I see no other ok solution in front of us that gets us at least both the
>> 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
>> demand to start experimenting with these features, and our Cassandra Summit
>> in December.
>>
>>
>> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>>
>>
>>
>>


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
+1 to what you are saying, Josh. Based on the last survey, yes, everyone
was excited about Accord, but SAI and UCS were pretty high on the list.

Benedict and I had a good conversation last night, and now I understand
more essential details for this conversation. TCM is taking far more work
than initially scoped, and Accord depends on a stable TCM. TCM is months
behind and that's a critical fact, and one I personally just learned of. I
thought things were wrapping up this month, and we were in the testing
phase. I get why that's a topic we are dancing around. Nobody wants to say
ship dates are slipping because that's part of our culture. It's
disappointing and, if new information, an unwelcome surprise, but none of
us should be angry or in a blamey mood because I guarantee every one of us
has shipped the code late. My reaction yesterday was based on an incorrect
assumption. Now that I have a better picture, my point of view is changing.

Josh's point about what's best for users is crucial. Users deserve stable
code with a regular cadence of features that make their lives easier. If we
put 5.0 on hold for TCM + Accord, users will get neither for a very long
time. And I mentioned a disaster yesterday. A bigger disaster would be
shipping Accord with a major bug that causes data loss, eroding community
trust. Accord has to be the most bulletproof of all bulletproof features.
The pressure to ship is only going to increase and that's fertile ground
for that sort of bug.

So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
plan mainly because I don't think 5.1 is (or should be) a fast follow.

For the user community, the communication should be straightforward. TCM +
Accord are turning out to be much more complicated than was originally
scoped, and for good reasons. Our first principle is to provide a stable
and reliable system, so as a result, we'll be de-coupling TCM + Accord from
5.0 into a 5.1 branch, which is available in parallel to 5.0 while
additional hardening and testing is done. We can communicate this in a blog
post.,

To make this much more palatable to our use community, if we can get a
build and docker image available ASAP with Accord, it will allow developers
to start playing with the syntax. Up to this point, that hasn't been widely
available unless you compile the code yourself. Developers need to
understand how this will work in an application, and up to this point, the
syntax is text they see in my slides. We need to get some hands-on and that
will get our user community engaged on Accord this calendar year. The
feedback may even uncover some critical changes we'll need to make. Lack of
access to Accord by developers is a critical problem we can fix soon and
there will be plenty of excitement there and start building use cases
before the final code ships.

I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
but maybe one for my birthday?

Patrick



On Tue, Oct 24, 2023 at 7:23 AM Josh McKenzie  wrote:

> Maybe it won't be a glamorous release but shipping
> 5.0 mitigates our worst case scenario.
>
> I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
> memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
> accurate, all combine to make 5.0 a pretty glamorous release IMO
> independent of TCM and Accord. Accord is a true paradigm-shift game-changer
> so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
> resolve one of the biggest pain-points in our system for over a decade, but
> I think 5.0 is a very meaty release in its own right today.
>
> Anyway - I agree with you Brandon re: timelines. If things take longer
> than we'd hope (which, if I think back, they do roughly 100% of the time on
> this project), blocking on these features could both lead to a significant
> delay in 5.0 going out as well as increasing pressure and risk of burnout
> on the folks working on it. While I believe we all need some balanced
> urgency to do our best work, being under the gun for something with a hard
> deadline or having an entire project drag along blocked on you is not where
> I want any of us to be.
>
> Part of why we talked about going to primarily annual calendar-based
> releases was to avoid precisely this situation, where something that
> *feels* right at the cusp of merging leads us to delay a release
> repeatedly. We discussed this a couple times this year:
> 1: https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3,
> where Mick proposed a "soft-freeze" for everything w/out exception and 1st
> week October "hard-freeze", and there was assumed to be lazy consensus
> 2: https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw,
> where we kept along with what we discussed in 1 but added in CEP-30 to be
> waivered in as well.
>
> So. We're at a crossroads here where we need to either follow through with
> what we all agreed to earlier this year, or acknowledge that our best
> intenti

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
I would like to have something for developers to use ASAP to try the Accord
syntax. Very few people have seen it, and I think there's a learning curve
we can start earlier.

It's my understanding that ASF policy is that it needs to be a project
release to create a docker image.

On Tue, Oct 24, 2023 at 11:54 AM Jeremiah Jordan 
wrote:

> If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
> actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
> things while it is in trunk and cut the 5.1 branch when we actually think
> we are near releasing?  I don’t see any reason we can not cut “preview”
> artifacts from trunk?
>
> -Jeremiah
>
> On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
> wrote:
>
>> I guess at the end of the day, shipping a release with a bunch of awesome
>> features is better than holding it back.  If there's 2 big releases in 6
>> months the community isn't any worse off.
>>
>> We either ship something, or nothing, and something is probably better.
>>
>> Jon
>>
>>
>> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>>
>> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>>
>> was excited about Accord, but SAI and UCS were pretty high on the list.
>>
>>
>> Benedict and I had a good conversation last night, and now I understand
>>
>> more essential details for this conversation. TCM is taking far more work
>>
>> than initially scoped, and Accord depends on a stable TCM. TCM is months
>>
>> behind and that's a critical fact, and one I personally just learned of. I
>>
>> thought things were wrapping up this month, and we were in the testing
>>
>> phase. I get why that's a topic we are dancing around. Nobody wants to say
>>
>> ship dates are slipping because that's part of our culture. It's
>>
>> disappointing and, if new information, an unwelcome surprise, but none of
>>
>> us should be angry or in a blamey mood because I guarantee every one of us
>>
>> has shipped the code late. My reaction yesterday was based on an incorrect
>>
>> assumption. Now that I have a better picture, my point of view is
>> changing.
>>
>>
>> Josh's point about what's best for users is crucial. Users deserve stable
>>
>> code with a regular cadence of features that make their lives easier. If
>> we
>>
>> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>>
>> time. And I mentioned a disaster yesterday. A bigger disaster would be
>>
>> shipping Accord with a major bug that causes data loss, eroding community
>>
>> trust. Accord has to be the most bulletproof of all bulletproof features.
>>
>> The pressure to ship is only going to increase and that's fertile ground
>>
>> for that sort of bug.
>>
>>
>> So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
>>
>> plan mainly because I don't think 5.1 is (or should be) a fast follow.
>>
>>
>> For the user community, the communication should be straightforward. TCM +
>>
>> Accord are turning out to be much more complicated than was originally
>>
>> scoped, and for good reasons. Our first principle is to provide a stable
>>
>> and reliable system, so as a result, we'll be de-coupling TCM + Accord
>> from
>>
>> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
>>
>> additional hardening and testing is done. We can communicate this in a
>> blog
>>
>> post.,
>>
>>
>> To make this much more palatable to our use community, if we can get a
>>
>> build and docker image available ASAP with Accord, it will allow
>> developers
>>
>> to start playing with the syntax. Up to this point, that hasn't been
>> widely
>>
>> available unless you compile the code yourself. Developers need to
>>
>> understand how this will work in an application, and up to this point, the
>>
>> syntax is text they see in my slides. We need to get some hands-on and
>> that
>>
>> will get our user community engaged on Accord this calendar year. The
>>
>> feedback may even uncover some critical changes we'll need to make. Lack
>> of
>>
>> access to Accord by developers is a critical problem we can fix soon and
>>
>> there will be plenty of excitement there and start building use cases
>>
>> before the final code ships.
>>
>>
>> I'm bummed but realistic. I

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Patrick McFadin
Let me make that really easy. Hell yes

Not everybody runs CCM, I've tried but I've met resistance.

Compiling your own version usually involves me saying the words "Yes, ant
realclean exists. I'm not trolling you"

docker pull  works on every OS and curates a single node experience.



On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie  wrote:

> In order for the project to advertise the release outside the dev@ list
> it needs to be a formal release.
>
> That's my reading as well:
> https://www.apache.org/legal/release-policy.html#release-definition
>
> I wonder if there'd be value in us having a cronned job that'd do nightly
> docker container builds on trunk + feature branches, archived for N days,
> and we make that generally known to the dev@ list here so folks that want
> to poke at the current state of trunk or other branches could do so with
> very low friction. We'd probably see more engagement on feature branches if
> it was turn-key easy for other C* devs to spin the up and check them out.
>
> For what you're talking about here Patrick (a docker image for folks
> outside the dev@ audience and more user-facing), we'd want to vote on it
> and go through the formal process.
>
> On Tue, Oct 24, 2023, at 3:10 PM, Jeremiah Jordan wrote:
>
> In order for the project to advertise the release outside the dev@ list
> it needs to be a formal release.  That just means that there was a release
> vote and at least 3 PMC members +1’ed it, and there are more +1 than there
> are -1, and we follow all the normal release rules.  The ASF release
> process doesn’t care what branch you cut the artifacts from or what version
> you call it.
>
> So the project can cut artifacts for and release a 5.1-alpha1,
> 5.1-dev-preview1, what ever we want to version this thing, from trunk or
> any other branch name we want.
>
> -Jeremiah
>
> On Oct 24, 2023 at 2:03:41 PM, Patrick McFadin  wrote:
>
> I would like to have something for developers to use ASAP to try the
> Accord syntax. Very few people have seen it, and I think there's a learning
> curve we can start earlier.
>
> It's my understanding that ASF policy is that it needs to be a project
> release to create a docker image.
>
> On Tue, Oct 24, 2023 at 11:54 AM Jeremiah Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
> If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
> actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
> things while it is in trunk and cut the 5.1 branch when we actually think
> we are near releasing?  I don’t see any reason we can not cut “preview”
> artifacts from trunk?
>
> -Jeremiah
>
> On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
> wrote:
>
> I guess at the end of the day, shipping a release with a bunch of awesome
> features is better than holding it back.  If there's 2 big releases in 6
> months the community isn't any worse off.
>
> We either ship something, or nothing, and something is probably better.
>
> Jon
>
>
> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>
> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>
> was excited about Accord, but SAI and UCS were pretty high on the list.
>
>
> Benedict and I had a good conversation last night, and now I understand
>
> more essential details for this conversation. TCM is taking far more work
>
> than initially scoped, and Accord depends on a stable TCM. TCM is months
>
> behind and that's a critical fact, and one I personally just learned of. I
>
> thought things were wrapping up this month, and we were in the testing
>
> phase. I get why that's a topic we are dancing around. Nobody wants to say
>
> ship dates are slipping because that's part of our culture. It's
>
> disappointing and, if new information, an unwelcome surprise, but none of
>
> us should be angry or in a blamey mood because I guarantee every one of us
>
> has shipped the code late. My reaction yesterday was based on an incorrect
>
> assumption. Now that I have a better picture, my point of view is changing.
>
>
> Josh's point about what's best for users is crucial. Users deserve stable
>
> code with a regular cadence of features that make their lives easier. If we
>
> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>
> time. And I mentioned a disaster yesterday. A bigger disaster would be
>
> shipping Accord with a major bug that causes data loss, eroding community
>
> trust. Accord has to be the most bulletproof of all bulletproof features.
>
> The pressure to ship is only going to increase and that's fertile g

Re: Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Patrick McFadin
Sent you an invite Sam. Welcome to the community!

On Fri, Oct 27, 2023 at 10:31 AM Sam  wrote:

> Please can I have an invite to the Slack workspace on this email. I'd like
> to take a look through some of the items for first time contributors :-)
>
> Thanks!
>
> On Fri, 27 Oct 2023 at 18:10, Josh McKenzie  wrote:
>
>> In case you're keeping score on how frequently these are coming out: *please
>> stop*. ;)
>>
>> Silver lining - looks like we have a lot to discuss this round! Last
>> update was late July and we've been churning through the 5.0 freeze and
>> stabilization phase.
>>
>>
>>
>> *[New Contributors Getting Started]*
>> Check out https://the-asf.slack.com, channel #cassandra-dev. Reply
>> directly to me on this email if you need an invite for your account, and
>> reach out to the @cassandra_mentors alias in the channel if you need to get
>> oriented.
>>
>> We have a list of curated "getting started" tickets you can find here,
>> filtered to "ToDo" (i.e. not yet worked):
>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2160&quickFilter=2162&quickFilter=2652
>> .
>>
>> *Helpful links:*
>> - Getting Started with Development on C*:
>> https://cassandra.apache.org/_/development/gettingstarted.html
>> - Building and IDE integration (worktrees are your friend; msg me on
>> slack if you need pointers):
>> https://cassandra.apache.org/_/development/ide.html
>> - Code Style: https://cassandra.apache.org/_/development/code_style.html
>>
>>
>>
>> *[Dev mailing list]*
>>
>> https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27
>> :
>>
>> My last email of shame was 35 threads. Drumroll for this one...
>> 91. *Yeesh*. Let me stick to highlights.
>>
>> Ekaterina pushed through dropping JDK8 support and adding JDK17
>> support... back in July. If you didn't know about it by know, consider
>> yourself doubly notified. :) .
>> https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k I think
>> I can speak on behalf of all of us when I say: *Thank You Ekaterina.*
>>
>> This came up recently on another thread about when to branch 5.1, but we
>> discussed our freeze plans and exception rules for TCM and Accord here:
>> https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick
>> was essentially looking for a similar waiver for Vector search since it was
>> well abstracted, depended on SAI and external libs, and in general
>> shouldn't be too big of a disruption to get into 5.0. General consensus at
>> the time was "sure", and the work has since been completed. But here's the
>> reminder and link for posterity (and in case you missed it).
>>
>> Jaydeep reached out about a potential short-term solution to detecting
>> token-ownership mismatch while we don't yet have TCM; this seems more
>> pressing now as we're looking at a 5.0 without yet having TCM in it. The
>> dev ML thread is here:
>> https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and he
>> created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing
>> with the topic. There's a relatively modest (7 files, just over 300 lines)
>> PR available here: https://github.com/apache/cassandra/pull/2595/files;
>> I haven't looked into it, but it might be worth considering getting this
>> into 5.0 since it looks like we're moving to cutting w/out TCM. Any
>> thoughts?
>>
>> We had a pretty good discussion about automated repair scheduling,
>> discussing whether it should live in the DB proper vs. in the sidecar, pros
>> and cons, pressures, etc. Not sure if things moved beyond that; I know
>> there's at least a few implementations out there that haven't yet made
>> their way back to the ASF project proper. Thread:
>> https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My
>> hope is we can avoid the gridlock we hit for a long time with the sidecar
>> where there are multiple implementations with different tradeoffs and
>> everyone's disincentivized from accepting a solution different from their
>> own in-house one since it'd theoretically require re-tooling. Tough problem
>> with no easy solutions, but would love to see this become a first class
>> citizen in the ecosystem.
>>
>> Paulo brought up a discussion about moving to disk_access_mode =
>> mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we
>> actually changed that in the 5.0 branch? Thread:
>> https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just
>> pulled on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() ==
>> .mmap rather than .mmap_index_only.
>>
>> David Capwell worked on adding some retries to repair messages when
>> they're failing to make the process more robust:
>> https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1.
>> Reception was positive enough that he went so far as to back-port it and
>> also work on some for IR. Looks like he could use a reviewer here:
>> https://issues.apache.org/jira/browse/CASSANDRA-18962 -

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-30 Thread Patrick McFadin
ases that folk can use through the trunk dev cycle.
> >>>
> >>> Personally, with my understanding of timelines in front of us to fully
> review and stabilise tcm, it makes sense to branch it as soon as it's
> merged.  It's easiest to stabilise it on a branch, and there's definitely
> the desire and demand to do so, so it won't be getting forgotten or
> down-prioritised.
> >>>
> >>>
> >>>
> >>> On Wed, 25 Oct 2023 at 18:07, Jeremiah Jordan <
> jeremiah.jor...@gmail.com<mailto:jeremiah.jor...@gmail.com> jeremiah.jor...@gmail.com<mailto:jeremiah.jor...@gmail.com>>> wrote:
> >>>>>
> >>>>> If we do a 5.1 release why not take it as an opportunity to release
> more things. I am not saying that we will. Just that we should let that
> door open.
> >>>>
> >>>>
> >>>> Agreed.  This is the reason I brought up the possibility of not
> branching off 5.1 immediately.
> >>>>
> >>>>
> >>>> On Oct 25, 2023 at 3:17:13 AM, Benjamin Lerer  <mailto:b.le...@gmail.com><mailto:b.le...@gmail.com b.le...@gmail.com>>> wrote:
> >>>>>
> >>>>> The proposal includes 3 things:
> >>>>> 1. Do not include TCM and Accord in 5.0 to avoid delaying 5.0
> >>>>> 2. The next release will be 5.1 and will include only Accord and TCM
> >>>>> 3. Merge TCM and Accord right now in 5.1 (making an initial release)
> >>>>>
> >>>>> I am fine with question 1 and do not have a strong opinion on which
> way to go.
> >>>>> 2. Means that every new feature will have to wait for post 5.1 even
> if it is ready before 5.1 is stabilized and shipped. If we do a 5.1 release
> why not take it as an opportunity to release more things. I am not saying
> that we will. Just that we should let that door open.
> >>>>> 3. There is a need to merge TCM and Accord as maintaining those
> separate branches is costly in terms of time and energy. I fully understand
> that. On the other hand merging TCM and Accord will make the TCM review
> harder and I do believe that this second round of review is valuable as it
> already uncovered a valid issue. Nevertheless, I am fine with merging TCM
> as soon as it passes CI and continuing the review after the merge. If we
> cannot meet at least that quality level (Green CI) we should not merge just
> for creating an 5.1.alpha release for the summit.
> >>>>>
> >>>>> Now, I am totally fine with a preview release without numbering and
> with big warnings that will only serve as a preview for the summit.
> >>>>>
> >>>>> Le mer. 25 oct. 2023 à 06:33, Berenguer Blasi <
> berenguerbl...@gmail.com<mailto:berenguerbl...@gmail.com> berenguerbl...@gmail.com<mailto:berenguerbl...@gmail.com>>> a écrit :
> >>>>>>
> >>>>>> I also think there's many good new features in 5.0 already they'd
> make a
> >>>>>> good release even on their own. My 2 cts.
> >>>>>>
> >>>>>> On 24/10/23 23:20, Brandon Williams wrote:
> >>>>>> > The catch here is that we don't publish docker images currently.
> The
> >>>>>> > C* docker images available are not made by us.
> >>>>>> >
> >>>>>> > Kind Regards,
> >>>>>> > Brandon
> >>>>>> >
> >>>>>> > On Tue, Oct 24, 2023 at 3:31 PM Patrick McFadin <
> pmcfa...@gmail.com<mailto:pmcfa...@gmail.com><mailto:pmcfa...@gmail.com
> <mailto:pmcfa...@gmail.com>>> wrote:
> >>>>>> >> Let me make that really easy. Hell yes
> >>>>>> >>
> >>>>>> >> Not everybody runs CCM, I've tried but I've met resistance.
> >>>>>> >>
> >>>>>> >> Compiling your own version usually involves me saying the words
> "Yes, ant realclean exists. I'm not trolling you"
> >>>>>> >>
> >>>>>> >> docker pull  works on every OS and curates a single node
> experience.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie <
> jmcken...@apache.org<mailto:jmcken...@apache.org> jmcken...@apache.org<mailto:jmcken...@apache.org>&g

Time to register for the Cassandra Summit 2023!

2023-11-09 Thread Patrick McFadin
Hi everyone!

I'm going to keep this short, but it's time to gather the Cassandra
community. December 12-13 in San Jose. Earlybird registration pricing ends
November 21 so don't delay.

Registration page:
https://events.linuxfoundation.org/cassandra-summit/register/
Use my discount code for 20% off: 23CS20

Need some motivation? Check out this schedule:
https://events.linuxfoundation.org/cassandra-summit/program/schedule/

If you are planning on sending a group(Yes!), the Linux Foundation is
offering a group discount. Email me and I can put you in touch with the
right person.

Let's get out and support our community!

Patrick


Re: Time to register for the Cassandra Summit 2023!

2023-11-09 Thread Patrick McFadin
One other important point. In talking to our friends at the Linux
Foundation, they reminded me about scholarships for attending Cassandra
Summit.
If you would like to apply for travel or ticket assistance, follow this
link and apply:
https://events.linuxfoundation.org/cassandra-summit/attend/travel-funding/

I hope many of you will take advantage of this program and join us in San
Jose!

Patrick


On Thu, Nov 9, 2023 at 7:15 AM Patrick McFadin  wrote:

> Hi everyone!
>
> I'm going to keep this short, but it's time to gather the Cassandra
> community. December 12-13 in San Jose. Earlybird registration pricing ends
> November 21 so don't delay.
>
> Registration page:
> https://events.linuxfoundation.org/cassandra-summit/register/
> Use my discount code for 20% off: 23CS20
>
> Need some motivation? Check out this schedule:
> https://events.linuxfoundation.org/cassandra-summit/program/schedule/
>
> If you are planning on sending a group(Yes!), the Linux Foundation is
> offering a group discount. Email me and I can put you in touch with the
> right person.
>
> Let's get out and support our community!
>
> Patrick
>


Cassandra Summit: Early registration discount ends tomorrow

2023-11-20 Thread Patrick McFadin
Hi everyone,

If you've registered for Cassandra Summit, then ignore this email.

If not! Time to get moving. The deadline ends tomorrow.

Link to register:
https://events.linuxfoundation.org/cassandra-summit/register/

Discount code: 23CS20 (Yes you can use it with the early registration price)

If you need motivation, look at this schedule!
https://events.linuxfoundation.org/cassandra-summit/program/schedule/

Let's get everyone gathered! This is our time!

Patrick


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Patrick McFadin
I'm a +1 on a beta now vs maybe later. Beta doesn't imply perfect
especially if there are declared known issues. We need people outside of
this tight group using it and finding issues. I know how this rolls. Very
few people touch a Alpha release. Beta is when the engine starts and we
need to get it started asap. Otherwise we are telling ourselves we have the
perfect testing apparatus and don't need more users testing. I don't think
that is the case.

Scott, Ekaterina, and I are going to be on stage in 2 weeks talking about
Cassandra 5 in the keynotes. In that time, our call to action is going to
be to test the beta.

Patrick

On Tue, Nov 28, 2023 at 9:41 AM Mick Semb Wever  wrote:

> The vote will be open for 72 hours (longer if needed). Everyone who has
>> tested the build is invited to vote. Votes by PMC members are considered
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>
>
> +1
>
> Checked
> - signing correct
> - checksums are correct
> - source artefact builds (JDK 11+17)
> - binary artefact runs (JDK 11+17)
> - debian package runs (JDK 11+17)
> - debian repo runs (JDK 11+17)
> - redhat* package runs (JDK11+17)
> - redhat* repo runs (JDK 11+17)
>
>
> With the disclaimer:  There's a few known bugs in SAI, e.g. 19011, with
> fixes to be available soon in 5.0-beta2.
>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Patrick McFadin
JD, that wasn't my point. It feels like we are treating a beta like an RC,
which it isn't. Ship Beta 1 now and Beta 2 later. We need people looking
today because they will find new bugs and the signal is lost on alpha. It's
too yolo for most people.

On Tue, Nov 28, 2023 at 10:36 AM Benjamin Lerer  wrote:

> -1 based on the problems raised by Caleb.
>
> I would be fine with releasing that version as an alpha as Jeremiah
> proposed.
>
> As of this time, I'm also not aware of a user of the project operating a
>> build from the 5.0 branch at substantial scale to suss out the operational
>> side of what can be expected. If someone is running a build supporting
>> non-perf-test traffic derived from the 5.0 branch and has an experience
>> report to share it would be great to read.
>
>
> Some people at Datastax are working on such testing. It will take a bit of
> time before we get the final results though.
>
> Le mar. 28 nov. 2023 à 19:27, J. D. Jordan  a
> écrit :
>
>> That said. This is clearly better than and with many fixes from the
>> alpha. Would people be more comfortable if this cut was released as another
>> alpha and we do beta1 once the known fixes land?
>>
>> On Nov 28, 2023, at 12:21 PM, J. D. Jordan 
>> wrote:
>>
>> 
>> -0 (NB) on this cut. Given the concerns expressed so far in the thread I
>> would think we should re-cut beta1 at the end of the week.
>>
>> On Nov 28, 2023, at 12:06 PM, Patrick McFadin  wrote:
>>
>> 
>> I'm a +1 on a beta now vs maybe later. Beta doesn't imply perfect
>> especially if there are declared known issues. We need people outside of
>> this tight group using it and finding issues. I know how this rolls. Very
>> few people touch a Alpha release. Beta is when the engine starts and we
>> need to get it started asap. Otherwise we are telling ourselves we have the
>> perfect testing apparatus and don't need more users testing. I don't think
>> that is the case.
>>
>> Scott, Ekaterina, and I are going to be on stage in 2 weeks talking about
>> Cassandra 5 in the keynotes. In that time, our call to action is going to
>> be to test the beta.
>>
>> Patrick
>>
>> On Tue, Nov 28, 2023 at 9:41 AM Mick Semb Wever  wrote:
>>
>>> The vote will be open for 72 hours (longer if needed). Everyone who has
>>>> tested the build is invited to vote. Votes by PMC members are considered
>>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>>>
>>>
>>>
>>> +1
>>>
>>> Checked
>>> - signing correct
>>> - checksums are correct
>>> - source artefact builds (JDK 11+17)
>>> - binary artefact runs (JDK 11+17)
>>> - debian package runs (JDK 11+17)
>>> - debian repo runs (JDK 11+17)
>>> - redhat* package runs (JDK11+17)
>>> - redhat* repo runs (JDK 11+17)
>>>
>>>
>>> With the disclaimer:  There's a few known bugs in SAI, e.g. 19011, with
>>> fixes to be available soon in 5.0-beta2.
>>>
>>>
>>>


Cassandra Summit: Engage those networks!

2023-11-29 Thread Patrick McFadin
Hi everyone,

We are a couple of weeks away from Cassandra Summit. People get busy and
forget to register or miss that there is even a summit happening. Let's
make sure everyone who wants to go gets a chance!

 - If you are going, get on the social media of your choice and let
everyone know you'll be there. Use the hashtag #cassandrasmunnit
 - If you aren't going, you can still remind other folks that it's
happening and the talks you think they should check out.

Either way, here is the basic info to include in your post:

*Schedule:
https://events.linuxfoundation.org/cassandra-summit/program/schedule/
Register:
https://events.linuxfoundation.org/cassandra-summit/register/#register-now
Discount
code: 23CS20*

*One more thing! If you are going and reading this, reply to this email
with a "Going!" or "See you there!" I would love to see who will be there
in two weeks. *


*Patrick*


Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-30 Thread Patrick McFadin
Congratulations and welcome, Francisco!

On Thu, Nov 30, 2023 at 2:45 AM Maxim Muzafarov  wrote:

> My congratulations, Francisco! :-)
>
> On Wed, 29 Nov 2023 at 13:30, Andrés de la Peña 
> wrote:
> >
> > Congrats Francisco!
> >
> > On Wed, 29 Nov 2023 at 11:37, Benjamin Lerer  wrote:
> >>
> >> Congratulations!!! Well deserved!
> >>
> >> Le mer. 29 nov. 2023 à 07:31, Berenguer Blasi 
> a écrit :
> >>>
> >>> Welcome!
> >>>
> >>> On 29/11/23 2:24, guo Maxwell wrote:
> >>>
> >>> Congrats!
> >>>
> >>> Jacek Lewandowski  于2023年11月29日周三
> 06:16写道:
> 
>  Congrats!!!
> 
>  wt., 28 lis 2023, 23:08 użytkownik Abe Ratnofsky 
> napisał:
> >
> > Congrats Francisco!
> >
> > > On Nov 28, 2023, at 1:56 PM, C. Scott Andreas <
> sc...@paradoxica.net> wrote:
> > >
> > > Congratulations, Francisco!
> > >
> > > - Scott
> > >
> > >> On Nov 28, 2023, at 10:53 AM, Dinesh Joshi 
> wrote:
> > >>
> > >> The PMC members are pleased to announce that Francisco Guerrero
> Hernandez has accepted
> > >> the invitation to become committer today.
> > >>
> > >> Congratulations and welcome!
> > >>
> > >> The Apache Cassandra PMC members
> >
>


Re: Introducing the Cassandra Catalyst program!

2023-12-01 Thread Patrick McFadin
So excited for this program! It's been a long time coming but wow, what a
great way to recognize individuals advocating for Cassandra in their own
communities.

Let's get out there and start nominating!

Patrick

On Fri, Dec 1, 2023 at 9:51 AM Melissa Logan  wrote:

> The Cassandra community is excited to introduce the Cassandra Catalyst
> program, a new initiative that aims to recognize individuals who invest in
> the growth of the community by enthusiastically sharing their expertise,
> encouraging participation, and creating a welcoming environment.
>
> This is the first PMC-led community program of its kind within the Apache
> Software Foundation ecosystem and we’re honored to be the pioneer!
>
> What does it mean to be a Cassandra Catalyst?
>
> Catalysts are trustworthy, expert contributors with a passion for
> connecting and empowering others with Cassandra knowledge. The individuals
> must be able to demonstrate strong knowledge of Cassandra such as
> production deployments, educational material, conference talks or other
> ways. In broad terms, Catalyst can participate through Contribution and
> Promotion.
>
> Who can become a Cassandra Catalyst?
>
> Anyone can nominate an individual to become a Catalyst or apply themselves.
> This program applies to existing contributors who have been involved in
> Cassandra for years or those who are newcomers to the community.
>
> The program committee includes PMC members who will be reviewing Catalyst
> applications on a rolling basis. We’ll be recognizing the first group of
> Catalysts on the keynote stage at Cassandra Summit on Dec. 12-13 so apply
> early and be recognized for your contributions!
>
> Learn more and nominate someone/apply:
>
>
> https://cassandra.apache.org/_/blog/Introducing-the-Apache-Cassandra-Catalyst-Program.html
>
>
> If you have questions, feel free to ask here or on the #cassandra-comdev
> channel.
>
> Melissa
>


Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Patrick McFadin
Yay! Congratulations Mike. Well deserved!

On Fri, Dec 8, 2023 at 7:00 AM Andrés de la Peña 
wrote:

> Congrats Mike!
>
> On Fri, 8 Dec 2023 at 14:53, Jeremiah Jordan 
> wrote:
>
>> Congrats Mike!  Thanks for all your work on SAI and Vector index.  Well
>> deserved!
>>
>> On Dec 8, 2023 at 8:52:07 AM, Brandon Williams  wrote:
>>
>>> Congratulations Mike!
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> On Fri, Dec 8, 2023 at 8:41 AM Benjamin Lerer  wrote:
>>>
>>>
>>> The PMC members are pleased to announce that Mike Adamson has accepted
>>>
>>> the invitation to become committer.
>>>
>>>
>>> Thanks a lot, Mike, for everything you have done for the project.
>>>
>>>
>>> Congratulations and welcome
>>>
>>>
>>> The Apache Cassandra PMC members
>>>
>>>


Can't make it to Cassandra Summit but want to see the talks?

2023-12-11 Thread Patrick McFadin
Hi everyone,

The Linux Foundation will be streaming all of the talks from the Cassandra
Summit. Finding the streams is very easy. Go to the conference schedule:

https://events.linuxfoundation.org/cassandra-summit/program/schedule/

Each talk has a YouTube link associated with it. The Keynotes and each room
have their own stream. Find the time and the room, and show up!

If you miss the live stream, the talks will all be available on YouTube
afterward. Join us in the #cassandra-summit channel in the ASF Slack and
start a thread on any talk you have questions. We'll try to get the
speakers to join in.

Patrick


Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2023-12-22 Thread Patrick McFadin
It was great having some more extended discussions about Harry in person
last week. Anything we can do to make it easier for anyone to test
Cassandra thoroughly is an easy +1 from me!

Thanks for all your efforts so far, Alex.

Patrick

On Fri, Dec 22, 2023 at 8:03 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Obviously +1
>
> Thank you Alex
>
> pt., 22 gru 2023, 16:45 użytkownik Sumanth Pasupuleti <
> sumanth.pasupuleti...@gmail.com> napisał:
>
>> +1, thank you for your efforts in bringing Harry in-tree. Anything that
>> improves the testing ecosystem for Cassandra, particularly around complex
>> scenarios / edge cases  goes a long way in improving reliability, and with
>> having a powerful tool like Harry in-tree, it is a lot more accessible to
>> the developers than it has been. Also, thank you for keeping in mind the
>> onboarding experience of developers.
>>
>> - Sumanth
>>
>> On Fri, Dec 22, 2023 at 1:11 AM Alex Petrov  wrote:
>>
>>> Some follow-up tickets to establish the project direction:
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-19229
>>>
>>> Two other things that we will work on in Tree are:
>>> https://issues.apache.org/jira/browse/CASSANDRA-18275 (model and in-JVM
>>> test for partition-restricted 2i queries)
>>> https://issues.apache.org/jira/browse/CASSANDRA-18667 (multi-threaded
>>> SAI read and write fuzz test)
>>>
>>> If you would like to get your recently added feature tested with Harry
>>> model, please let me know!
>>>
>>> On Fri, Dec 22, 2023, at 12:41 AM, Joseph Lynch wrote:
>>>
>>> +1
>>>
>>> Sounds like a great change that will help us unify around a common
>>> testing paradigm, and even pave the path to in-tree load testing plus
>>> integrated correctness checking which would be extremely valuable!
>>>
>>> -Joey
>>>
>>> On Thu, Dec 21, 2023 at 1:35 PM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> Agree w/ all the justifications mentioned above.
>>>
>>> As a reviewer on CASSANDRA-19210
>>> , my goals were
>>> to a.) look at the directory, naming, and package structure of the ported
>>> code, b.) make sure IDE integration was working, and c.) make sure any
>>> modifications to existing code (rather than direct code movements from
>>> cassandra-harry) were straightforward.
>>>
>>> On Thu, Dec 21, 2023 at 3:23 PM Alex Petrov  wrote:
>>>
>>>
>>> Hey folks,
>>>
>>> I am mostly done with a patch that brings Harry in-tree [1]. I will
>>> trigger one more CI run overnight, and my intention was to merge it some
>>> time soon, but I wanted to give a fair warning here, since this is a
>>> relatively large patch.
>>>
>>> Good news for everyone that it:
>>>   a) touches no production code whatsoever. Only test (in-jvm dtest
>>> namely) code that was using Harry already.
>>>   b) the only tests that are changed are ones that used a duplicate
>>> version of placement simulator we had both for testing TCM, and in Harry
>>>   c) in addition, I have converted 3 existing TCM tests to a new API to
>>> have some base for examples/usage.
>>>
>>> Since we were effectively relying on this code for a while now, and the
>>> intention now is to converge to:
>>>   a) fewer different generators, and have a shareable version of
>>> generators for everyone to use accross the base
>>>   b) a testing tool that can be useful for both trivial cases, and
>>> complex scenarios
>>> myself and many other Cassandra contributors have expressed an opinion
>>> that bringing Harry in-tree will be highly benefitial.
>>>
>>> I strongly believe that bringing Harry in-tree will help to lower the
>>> barrier for fuzz test and simplify co-development of Cassandra and Harry.
>>> Previously, it has been rather difficult to debug edge cases because I had
>>> to either re-compile an in-jvm dtest jar and bring it to Harry, or
>>> re-compile a Harry jar and bring it to Cassandra, which is both tedious and
>>> time consuming. Moreover, I believe we have missed at very least one RT
>>> regression [2] because Harry was not in-tree, as its tests would've caught
>>> the issue even with the model that existed.
>>>
>>> For other recently found issues, I think having Harry in-tree would have
>>> substantially lowered a turnaround time, and allowed me to share repros
>>> with developers of corresponding features much quicker.
>>>
>>> I do expect a slight learning curve for Harry, but my intention is to
>>> build a web of simple tests (worked on some of them yesterday after
>>> conversation with David already), which can follow the in-jvm-dtest pattern
>>> of find-similar-test / copy / modify. There's already copious
>>> documentation, so I do not believe not having docs for Harry was ever an
>>> issue, since there have been plenty.
>>>
>>> You all are aware of my dedication to testing and quality of Apache
>>> Cassandra, and I hope you also see the benefits of having a model checker
>>> in-tree.
>>>
>>> Thank you and happy upcoming ho

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Patrick McFadin
Congratulations Maxim! Thank you for all you've done in the project and
everything to come!

Patrick

On Mon, Jan 8, 2024 at 12:32 PM Miklosovic, Stefan via dev <
dev@cassandra.apache.org> wrote:

> Great news! Congratulations.
>
> 
> From: Josh McKenzie 
> Sent: Monday, January 8, 2024 19:19
> To: dev
> Subject: Welcome Maxim Muzafarov as Cassandra Committer
>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has
> accepted
> the invitation to become a committer.
>
> Thanks for all the hard work and collaboration on the project thus far,
> and we're all looking forward to working more with you in the future.
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>
>
>


Re: Welcome Brad Schoening as Cassandra Committer

2024-02-21 Thread Patrick McFadin
Yay! Congrats Brad!

On Wed, Feb 21, 2024 at 2:06 PM Jeremy Hanna 
wrote:

> Congratulations Brad!
>
> On Feb 21, 2024, at 3:59 PM, Leo Toff  wrote:
>
> Congratulations Brad! Thank you for helping me onboard 🙏
>
> On Wed, Feb 21, 2024 at 1:56 PM Jeremiah Jordan 
> wrote:
>
>> Congrats!
>>
>> On Feb 21, 2024 at 2:46:14 PM, Josh McKenzie 
>> wrote:
>>
>>> The Apache Cassandra PMC is pleased to announce that Brad Schoening has
>>> accepted
>>> the invitation to become a committer.
>>>
>>> Your work on the integrated python driver, launch script environment,
>>> and tests
>>> has been a big help to many. Congratulations and welcome!
>>>
>>> The Apache Cassandra PMC members
>>>
>>
>


Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Patrick McFadin
Congratulations, everyone. I am loving this new direction for the project!

On Wed, Apr 17, 2024 at 11:16 AM Yifan Cai  wrote:

> Congrats all
> --
> *From:* Josh McKenzie 
> *Sent:* Wednesday, April 17, 2024 11:05:29 AM
> *To:* dev 
> *Subject:* Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire,
> Olivier Michallat as Cassandra Committers
>
> Congrats everyone and thanks for all the hard work to get things to this
> point!
>
> On Wed, Apr 17, 2024, at 1:18 PM, Ekaterina Dimitrova wrote:
>
> Congrats and thank you for all your work on the drivers!
>
> On Wed, 17 Apr 2024 at 13:17, Francisco Guerrero 
> wrote:
>
> Congratulations everyone!
>
> On 2024/04/17 17:14:34 Abe Ratnofsky wrote:
> > Congrats everyone!
> >
> > > On Apr 17, 2024, at 1:10 PM, Benjamin Lerer  wrote:
> > >
> > > The Apache Cassandra PMC is pleased to announce that Alexandre Dutra,
> Andrew Tolbert, Bret McGuire and Olivier Michallat have accepted the
> invitation to become committers on the java driver sub-project.
> > >
> > > Thanks for your contributions to the Java driver during all those
> years!
> > > Congratulations and welcome!
> > >
> > > The Apache Cassandra PMC members
> >
> >
>
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-23 Thread Patrick McFadin
I finally got a chance to digest this CEP and am happy to see it raised.
This feature has been left to the end user for far too long.

It might get roasted for scope creep, but here goes. Related and something
that I've heard for years is the ability to migrate a single keyspace away
from a set of hardware... online. Similar problem but a lot more
coordination.
 - Create a Keyspace in Cluster B mimicking keyspace in Cluster A
 - Establish replication between keyspaces and sync schema
 - Move data from Cluster A to B
 - Decommission keyspace in Cluster A

In many cases, multiple tenants present cause the cluster to overpressure.
The best solution in that case is to migrate the largest keyspace to a
dedicated cluster.

Live migration but a bit more complicated. No chance of doing this manually
without some serious brain surgery on c* and downtime.

Patrick


On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> Thank you all for the inputs and apologies for the late reply. I see good
> points raised in this discussion. *Please allow me to reply to each point
> individually.*
>
> To start with, let me focus on the point raised by Scott & Jon about file
> content verification at the destination with the source in this reply.
> Agree that just verifying the file name + size is not fool proof. The
> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't compromise on binary level check for
> these two reasons. Let me update the CEP to include binary level
> verification. During implementation, it can probably be made optional so
> that it can be skipped if someone doesn't want it.
>
> Thanks!
> Hari
>
> On Mon, Apr 22, 2024 at 4:40 AM Slater, Ben via dev <
> dev@cassandra.apache.org> wrote:
>
>> We use backup/restore for our implementation of this concept. It has the
>> added benefit that the backup / restore path gets exercised much more
>> regularly than it would in normal operations, finding edge case bugs at a
>> time when you still have other ways of recovering rather than in a full
>> disaster scenario.
>>
>>
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Jordan West 
>> *Date: *Sunday, 21 April 2024 at 05:38
>> *To: *dev@cassandra.apache.org 
>> *Subject: *Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar
>> for Live Migrating Instances
>>
>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>
>>
>>
>> I do really like the framing of replacing a node is restoring a node and
>> then kicking off a replace. That is effectively what we do internally.
>>
>>
>>
>> I also agree we should be able to do data movement well both internal to
>> Cassandra and externally for a variety of reasons.
>>
>>
>>
>> We’ve seen great performance with “ZCS+TLS” even though it’s not full
>> zero copy — nodes that previously took *days* to replace now take a few
>> hours. But we have seen it put pressure on nodes and drive up latencies
>> which is the main reason we still rely on an external data movement system
>> by default — falling back to ZCS+TLS as needed.
>>
>>
>>
>> Jordan
>>
>>
>>
>> On Fri, Apr 19, 2024 at 19:15 Jon Haddad  wrote:
>>
>> Jeff, this is probably the best explanation and justification of the idea
>> that I've heard so far.
>>
>>
>>
>> I like it because
>>
>>
>>
>> 1) we really should have something official for backups
>>
>> 2) backups / object store would be great for analytics
>>
>> 3) it solves a much bigger problem than the single goal of moving
>> instances.
>>
>>
>>
>> I'm a huge +1 in favor of this perspective, with live migration being one
>> use case for backup / restore.
>>
>>
>>
>> Jon
>>
>>
>>
>>
>>
>> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:
>>
>> I think Jordan and German had an interesting insight, or at least their
>> comment made me think about this slightly differently, and I’m going to
>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>
>>
>>
>> The CEP treats this as “move a live instance from one machine to
>> another”. I know why the author wants to do this.
>>
>>
>>
>> If you think of it instead as “change backup/restore mechanism to be able
>> to safely restore from a running instance”, you may end up with a cleaner
>> abstraction that’s easier to think about (and may also be easier to
>> generalize in clouds where you have other tools available ).
>>
>>
>>
>> I’m not familiar enough with the sidecar to know the state of
>> orchestration for backup/restore, but “ensure the original source node
>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
>> “forcibly exclude the original instance from the cluster” are all things
>> the restore code is going to need to

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread Patrick McFadin
This is a great feature addition to CQL! I get asked about it from time to
time but then people figure out a workaround. It will be great to just have
it available.

And right on Simon! I think the only project I had as a high school senior
was figuring out how many parties I could go to and still maintain a
passing grade. Thanks for your work here.

Patrick

On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:

> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
> We plan to add support for it in conditions in a separate patch.
>
> The patch is available.
>
> As a side note, Simon chose to do his highschool senior project
> contributing to Apache Cassandra. This patch is his first contribution for
> his senior project (his second feature contribution to Apache Cassandra).
>
>
>


Re: NGCC 2018?

2018-07-24 Thread Patrick McFadin
Ben,

Lynn Bender had offered a space the day before Distributed Data Summit in
September (http://distributeddatasummit.com/) since we are both platinum
sponsors. I thought he and Nate had talked about that being a good place
for NGCC since many of us will be in town already.

Nate, now that I've spoken for you, you can clarify, :D

Patrick


On Mon, Jul 23, 2018 at 2:25 PM Ben Bromhead  wrote:

> The year has gotten away from us a little bit, but now is as good a time as
> any to put out a general call for interest in an NGCC this year.
>
> Last year Gary and Eric did an awesome job organizing it in San Antonio.
> This year it might be a good idea to do it in another city?
>
> We at Instaclustr are happy to sponsor/organize/run it, but ultimately this
> is a community event and we only want to do it if there is a strong desire
> to attend from the community and it meets the wider needs.
>
> Here are a few thoughts we have had in no particular order:
>
>- I was thinking it might be worth doing it in SF/Bay Area around the
>dates of distributed data day (14th of September) as I know a number of
>folks will be in town for it.
>- Typically NGCC has focused on being a single day, single track
>conference with scheduled sessions and an unconference set of ad-hoc
> talks
>at the end. It may make sense to change this up given the pending freeze
>(maybe make this more like a commit/review fest)? Or keep it in the same
>format but focus on the 4.0 work at hand.
>- Any community members who want to get involved again in the more
>organizational side of it (Gary, Eric)?
>- Any other sponsors (doesn't have to be monetary, can be space,
>resource etc) who want to get involved?
>
> If folks are generally happy with the end approach we'll post details as
> soon as possible (given its July right now)!
>
> Ben
>
>
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>


Re: Apache Cassandra Virtual meetings

2019-08-12 Thread Patrick McFadin
If it works for everyone, DataStax has some resources we could put to this
effort. We do large scale conferences like this all the time and have the
tools to pull it off. It would be a small group of people with full duplex
audio and video with the ability for 100s of people to watch in streaming.
As Scott mentioned, a recording could be made available after the fact for
anyone that wanted to review and historical purposes.

If we go this route, I would suggest using the ASF Cassandra slack for
people posting questions. 1) It gets more people on slack 2) It's more or
less a permanent, searchable record.

Patrick

On Sun, Aug 11, 2019 at 10:18 AM Rahul Xavier Singh <
rahul.xavier.si...@gmail.com> wrote:

> I think these meetings would be great.. if there is a specific structure.
> We use a simple format that could help e.g.
>
> 1. Review long term vision/ roadmap.
> 2. Review next release / features that are in progress.
> 3. Discuss issues in general and make a game plan for the next quarter.
>
> Nothing too complicated, but at least some structure so that we are
> timeboxed on what we are doing.
>
>
>
> On Wed, Aug 7, 2019 at 7:42 AM Joshua McKenzie 
> wrote:
>
> > The one thing we need to keep in mind is the "If it didn't happen on a
> > mailing list, it didn't happen "
> > philosophy of apache projects. Shouldn't constrain us too much as the
> > nuance is:
> >
> > *"Discussions and plan proposals often happen at events, in chats (Slack,
> > IRC, IM, etc.) or other synchronous places. But all final decisions about
> > executing on the plan, checking in the new code, or launching the website
> > must be made by the community asyncrhonously on the mailing list."*
> >
> > So long as we keep that in mind (and maybe push it back to 8am PST since
> > 9am can get pretty ugly for some of the more eastern european / asian
> > countries), makes sense to me.
> >
> > On Tue, Aug 6, 2019 at 6:07 PM Dinesh Joshi  wrote:
> >
> > > Thanks for initiating this conversation Sankalp. On the ASF front, I
> > think
> > > we need to ensure that non-Pacific time participants can also
> participate
> > > in the discussions. So posting the notes and opening up discussions
> after
> > > the meet up to dev@ would be a great way of making sure everyone can
> > > participate and gets visibility. Additionally, we should consider
> > > scheduling this meetup in different timezones as far as logistics allow
> > it.
> > >
> > > Dinesh
> > >
> > > > On Aug 6, 2019, at 2:58 PM, sankalp kohli 
> > > wrote:
> > > >
> > > > Hi All,
> > > > There are projects (like k8s[1]) which do regular meetings
> > using
> > > > video conferencing tools. We want to propose such a meeting for
> Apache
> > > > Cassandra once a quarter. Here are some of the initial details.
> > > >
> > > > 1. A two hour meeting once a quarter starting at 9am Pacific. We can
> > > later
> > > > move this to other times to make it easier for other timezones.
> > > > 2. Agenda of the meeting will be due 2 days prior to the meeting. A
> > > sample
> > > > agenda for next one could cover updates on 4.0 testing, any major
> bugs
> > > > found and/or fixed, next steps for 4.0, etc.
> > > > 3. Each agenda item will have a time duration and list of people to
> > drive
> > > > that item.
> > > > 4. We will have a moderator for each meeting which will rotate around
> > the
> > > > community members.
> > > > 5. We need to figure out which video conferencing tool to use for
> this.
> > > > Suggestions and donation of tools are welcome.
> > > > 6. We will have meeting notes for each item discussed in the meeting.
> > > >
> > > > Motivation for such a meeting
> > > > 1. We currently have Slack, JIRA and emails however an agenda driven
> > > video
> > > > meeting can help facilitate alignment within the community.
> > > > 2. This will give an opportunity to the community to summarize past
> > > > progress and talk about future tasks.
> > > > 3. Agenda notes can serve as newsletters for the community.
> > > >
> > > > Notes:
> > > > 1. Does this violate any Apache rules? I could not find any rules but
> > > > someone can double check
> > > > 2. Are there any other Apache projects which do something similar?
> > > >
> > > > This is a proposal at this time and your feedback is greatly
> > appreciated.
> > > > If anyone thinks this will not help then please provide a reason.
> > > >
> > > > Thanks,
> > > > Sankalp
> > > > [1] https://github.com/kubernetes/community/tree/master/sig-storage
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>


Re: Offering some project management services

2020-01-10 Thread Patrick McFadin
Scott and I had a talk this week and we are starting the contributor
meetings on 1/22 as we talked about at NGCC. (Yeah that was back in
September) Stay tuned for the details and agenda in the project confluence
page.

Patrick

On Fri, Jan 10, 2020 at 3:21 PM Jeff Jirsa  wrote:

> On Fri, Jan 10, 2020 at 3:19 PM Jeff Jirsa  wrote:
>
> >
> >
> > On Fri, Jan 10, 2020 at 2:35 PM Benedict Elliott Smith <
> > bened...@apache.org> wrote:
> >
> >>
> >> Yes, I also miss those fortnightly (or monthly) summaries that Jeff
> >> used to do. They were very useful "glue" in the community. I imagine
> they'd
> >> also make writing the board report easier.
> >>
> >> +1, those were great
> >>
> >>
> >>
> > I'll try to either do more of these, or nudge someone else into doing
> them
> > from time to time.
> >
> >
> (I meant ^ if Josh doesnt volunteer. Would love to have Josh do them if
> he's got time).
>


Apache Cassandra Contributor Meeting

2020-01-13 Thread Patrick McFadin
Hi everyone,

In order to catch up on what's happening here, here's the establishing
thread:
https://lists.apache.org/thread.html/aa54420a43671c00392978f2b0920bc6926ca9ba1e61a486ad39fb21%40%3Cdev.cassandra.apache.org%3E

Key points that Scott Andreas proposed in the initial email was

Motivation for such a meeting
1. We currently have Slack, JIRA and emails however an agenda driven video
meeting can help facilitate alignment within the community.
2. This will give an opportunity to the community to summarize past
progress and talk about future tasks.
3. Agenda notes can serve as newsletters for the community.

To that, I humbly offer my services as a community organizer to help with
the logistics and setup. I'm happy to say this is finally happening and I
apologize this has taken so long. I saw some of the examples mentioned in
the original thread for other open source projects and I "borrowed" heavily
from them.

I created a page in the Cassandra Confluence page to hopefully centralize
both logistics and records of each call. You can fine it here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting

The meetings are on Zoom and set to be wide open. Anyone can join via
computer or phone. I'm using a tier that allows for 100 participants. If we
need more, I can change the type of meeting but it's more of a pain for
logistics. We can try this and see how it goes. Once the meeting starts
I'll hit record, I'll post the video on YouTube and add the link to the
notes. All meeting notes for each agenda items can live in the doc above
and remain as a permanent record. After the meeting, I'll send the notes
link to the dev list as a reminder that it happened to anyone subscribed.

If you have agenda items, please edit the Confluence page and add your name
and what you would like discussed.

My contribution here is as an organizer. Please feel free to email or Slack
if you need anything. Most important, a video meet is an alpha product and
we'll learn a lot from the first time trying. I'll try to keep note of
things to improve in the doc.

See you there,

Patrick


Re: Apache Cassandra Contributor Meeting

2020-01-13 Thread Patrick McFadin
And I sent this without saying when. Let me save you a click on the
confluence link.

January 21, 1PM PST

On Mon, Jan 13, 2020 at 5:28 PM Patrick McFadin  wrote:

> Hi everyone,
>
> In order to catch up on what's happening here, here's the establishing
> thread:
> https://lists.apache.org/thread.html/aa54420a43671c00392978f2b0920bc6926ca9ba1e61a486ad39fb21%40%3Cdev.cassandra.apache.org%3E
>
> Key points that Scott Andreas proposed in the initial email was
>
> Motivation for such a meeting
> 1. We currently have Slack, JIRA and emails however an agenda driven video
> meeting can help facilitate alignment within the community.
> 2. This will give an opportunity to the community to summarize past
> progress and talk about future tasks.
> 3. Agenda notes can serve as newsletters for the community.
>
> To that, I humbly offer my services as a community organizer to help with
> the logistics and setup. I'm happy to say this is finally happening and I
> apologize this has taken so long. I saw some of the examples mentioned in
> the original thread for other open source projects and I "borrowed" heavily
> from them.
>
> I created a page in the Cassandra Confluence page to hopefully centralize
> both logistics and records of each call. You can fine it here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting
>
> The meetings are on Zoom and set to be wide open. Anyone can join via
> computer or phone. I'm using a tier that allows for 100 participants. If we
> need more, I can change the type of meeting but it's more of a pain for
> logistics. We can try this and see how it goes. Once the meeting starts
> I'll hit record, I'll post the video on YouTube and add the link to the
> notes. All meeting notes for each agenda items can live in the doc above
> and remain as a permanent record. After the meeting, I'll send the notes
> link to the dev list as a reminder that it happened to anyone subscribed.
>
> If you have agenda items, please edit the Confluence page and add your
> name and what you would like discussed.
>
> My contribution here is as an organizer. Please feel free to email or
> Slack if you need anything. Most important, a video meet is an alpha
> product and we'll learn a lot from the first time trying. I'll try to keep
> note of things to improve in the doc.
>
> See you there,
>
> Patrick
>


Contributor Meeting summary for 2020-01-21

2020-01-22 Thread Patrick McFadin
Hi everyone,

For a first time, things went amazingly smooth with Zoom and having a good
exchange and participation.

Scott already provided the link but I'll just offer up a quick summary from
the meeting notes and an easy reminder for those that volunteered for
something.

All notes and link to recorded video can be found here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting

*Meeting Notes:*

The community and contributors remain focussed on the 4.0 release. There
was broad consensus that we want to keep working towards the 4.0 QA Test
Plan
<https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans>
and a call for contributors to sign up for major components. We also
touched on a number of other process discussions and agreed to take a few
discussions to the mailing list.

Action Items:

   - *All contributors:* please sign up as either shepherds or contributors
   on the test plan. If we want 4.0 to happen we must get at least minimal
   coverage of these areas.


   - Ahead of next call, review open tickets with “patch available” for
   Alpha and seek reviewers; for unassigned tickets, seek assignees or open
   discussion on scope.
   - Discuss on dev list:
  - Thread regarding when to branch for a post-4.0 release.
  - Cadence of meeting: Consensus on call was “once a month, at an
  earlier time to enable more from European time zones to join - likely
  rotating.”
  - Windows question: potentially a question to the user@ list if there
  are contributors willing to test / contribute patches for Windows.
  - Jeremy Hanna: Discuss changes to defaults prior to 4.0 release.
   - Patrick McFadin: Start discussion on dev list to settle time for next
   meetings and frequency.
   - Anthony raising need for Reaper compatibility testing to other Reaper
   contributors; will note compatibility issues identified (in case they
   warrant a change in C* itself).
   - Josh preparing JIRA epics that capture current headings in the
   Confluence “4.0 open issues” report.

I will start a separate thread about the time and frequency. Any general
feedback/suggestions about the logistics of the meeting can be done in this
thread.

Patrick


Re: Cassandra CI Status

2020-01-27 Thread Patrick McFadin
I would love to get involved promoting those if and when a list produced.
Could this be something as a Cassandra confluence page? I would be happy to
volunteer some time keeping that up-to-date.

Patrick

On Fri, Jan 24, 2020 at 6:49 AM Joshua McKenzie 
wrote:

> >
> > an entry in the progress report?
>
> That'd be slick. I've had some people pinging me on slack asking about the
> easiest way to get involved with the project and ramp up, and I think
> refactoring and cleaning up a dtest or two would be another vector for
> people to get their feet wet. I like it!
>
>
> On Fri, Jan 24, 2020 at 12:38 AM Mick Semb Wever  wrote:
>
> >
> > > >  - parallelise dtests (because 12 hours is wild)
> > >
> > > That's one word for it. :)
> > >
> > >  We used to ad hoc take a crack at sorting the individual test times by
> > > longest and taking top-N and seeing if there was LHF to shave off that.
> > > Being on a flight atm, not having that data handy right now, and that
> not
> > > being in the linked logs from that pipeline run here (awesome work
> btw!),
> > > do we think that might be something worth doing periodically on the
> > project?
> >
> >
> > Yes I think so! Maybe even the longest dtest(s) can be an entry in the
> > progress report? Especially now we can rewrite dtests into either quick
> > "unit" tests using jvm-dtests or event diagnostics.
> >
> > Along we focus on dtests execution time, it would be nice to shore up the
> > flakey unit tests (there's only a handful), so that they are more steps
> in
> > the pipeline that hard fail (and fail-fast), giving faster feedback to
> the
> > contributor/reviewer.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Feedback from the last Apache Cassandra Contributor Meeting

2020-02-03 Thread Patrick McFadin
Hi everyone,

One action item I took from our first contributor meeting was gather
feedback for the next meetings. I've created a short survey if you would
like to offer feedback. I'll let it run for the week and report back on the
results.

https://www.surveymonkey.com/r/C95B7ZP

Thanks,

Patrick


Re: Feedback from the last Apache Cassandra Contributor Meeting

2020-02-10 Thread Patrick McFadin
Just a Monday reminder on the survey link I sent. I got a few responses but
could use a few more to give us some decent N. If you have < 5minutes
today, I would appreciate your feedback. I'll keep it open until tomorrow
and then send results.

Patrick

On Mon, Feb 3, 2020 at 4:21 PM Patrick McFadin  wrote:

> Hi everyone,
>
> One action item I took from our first contributor meeting was gather
> feedback for the next meetings. I've created a short survey if you would
> like to offer feedback. I'll let it run for the week and report back on the
> results.
>
> https://www.surveymonkey.com/r/C95B7ZP
>
> Thanks,
>
> Patrick
>


Re: Ideas for Cassandra 2020 - Remote Meetups / Mastermind

2020-02-10 Thread Patrick McFadin
Rahul,

Don't cut yourself short. I love what you've done with Awesome Cassandra
and organizing meetups. Those are really valuable contributions.

For everyone else on this ML, if you don't know about Awesome Cassandra.
Rahul picked it up and has been keeping it up to date.
https://cassandra.link/awesome/


Patrick

On Mon, Feb 10, 2020 at 8:52 AM Rahul Singh 
wrote:

> Thanks, Michael. I sent that before I read up on the notes.
>
> @nate , @jon , @dinesh I can help with the Documentation.
>
> If you have any specific doc issues you want reviewed, edited, etc,
> please let me know.
>
> If there's a specific JQL on the JIRA board I can start with there.
>
> rahul.xavier.si...@gmail.com
>
> http://cassandra.link
>
>
>
> On Mon, Feb 10, 2020 at 8:36 AM Michael Shuler 
> wrote:
>
> > This is great, thanks, the project appreciates the effort. 2019 is over,
> > don't worry about the past. Moving forward in little or large steps is
> > the goal. :)
> >
> > If you didn't get a chance to attend the first Contributor Meeting,
> > there will be more. Patrick sent out a survey last week for feedback, so
> > I imagine these will continue at a regular interval with continued
> > interest and attendance. Perhaps you could schedule some in-person
> > meetup thing to coincide, as an idea, or just get the word out to others
> > that might be interested in listening in?
> >
> > (Next one has not been scheduled yet, but will show up here on dev@
> list.)
> >
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting
> >
> > Michael
> >
> > On 2/8/20 10:06 PM, Rahul Singh wrote:
> > > Folks, (Initially meant for User , but realized after I wrote it , it’s
> > more sausage making talk which en users probably don’t care about)
> > >
> > > I took on a bunch of work and finally starting to get my head out of
> the
> > sand and realized I failed to deliver on some promises last year I made
> to
> > myself and others to contribute to this community. I wanted to resurface
> a
> > few thoughts on which I would like to contribute.
> > >
> > > We had a conversation on here a while ago to try doing a virtual
> > conference.. which I think is a bit too ambitious. I also spoke to Dinesh
> > last year briefly about doing periodic development meetings which focused
> > on the development planning and execution.
> > >
> > > I’d like to help this project but I don’t know where to start. I tried
> > getting some Jr. members internally at Anant who had time to make fixes
> on
> > content and docs but it didn’t get looked at or reviewed so they lost
> > interest. There’s only so much they would want to do based on my
> requests.
> > The failure to deliver on better documentation organization was mainly
> mine
> > because I didn’t commit enough time into it.
> > >
> > > I don’t think our community does a good enough job communicating the
> > Cassandra value proposition to the enterprise community whether they are
> > developers, architects, or directors. I’ve been meeting with many folks
> > that haven’t touched their clusters since installing 2.1 (because it’s
> > pretty damn good for most people!). When I ask them why, it’s a
> combination
> > of team member churn but also because the knowledge is not as accessible.
> > >
> > > This year as January closes I am recommitting myself to some ideas and
> > would LOVE your feedback. If somethings like this are in progress, I will
> > help.
> > >
> > >
> > > 1. Cassandra Lunch - I’ve been seeing a colleague getting together with
> > his fellow practitioners for a weekly “Sitecore Lunch” and I found it a
> > very easy way to get people talking that normally wouldn’t be interacting
> > with each other in realtime.
> > > 2. Coordinated Remote Meetup - I think this would be way easier to
> > organize and get cross promoted as a quarterly event with the help of
> local
> > organizers. I’m currently organizing DC / Chicago and have been cross
> > promoting virtual talks to both and have gotten a good show with people
> > curious about Cassandra.
> > > 3. Documentation - I know I said I’d help last year. I underestimated
> my
> > free time and over estimated my capacity to focus. That being said , this
> > is one of my passions and I help a lot of orgs get their [blank] together
> > on how to manage their people, process, info and systems and the first
> > thing is always knowledge management. If there’s someone I can shadow and
> > apprentice under to help with Cassandra.Apache.org I really want to help
> > revitalize our site.
> > >
> > >
> > > These may still be overestimating my capacity but I’m willing to fail
> > and try again. :)
> > >
> > >
> > > rahul.xavier.si...@gmail.com
> > >
> > > http://cassandra.link
> > > The Apache Cassandra Knowledge Base.
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.o

Re: Feedback from the last Apache Cassandra Contributor Meeting

2020-02-11 Thread Patrick McFadin
Survey is closed. Thank you everyone that took time to give your feedback.
Here are the results: https://www.surveymonkey.com/results/SM-7YTMZYLT7/

Based on the feedback, these meetings are useful and this seems to be the
ideal format:

1 Hour scheduled
Monthly
Rotate through 10AM-1PM PST to try to cover as many time zones as possible.

Excellent suggestion was to pre-determine a scribe for the meeting notes.
I'll add that to the meeting setup in cwiki.

I'll get the meetings calendared for the next 6 months based on those
criteria and update the cwiki as needed.

Thanks again!

Patrick

On Mon, Feb 10, 2020 at 9:28 AM Patrick McFadin  wrote:

> Just a Monday reminder on the survey link I sent. I got a few responses
> but could use a few more to give us some decent N. If you have < 5minutes
> today, I would appreciate your feedback. I'll keep it open until tomorrow
> and then send results.
>
> Patrick
>
> On Mon, Feb 3, 2020 at 4:21 PM Patrick McFadin  wrote:
>
>> Hi everyone,
>>
>> One action item I took from our first contributor meeting was gather
>> feedback for the next meetings. I've created a short survey if you would
>> like to offer feedback. I'll let it run for the week and report back on the
>> results.
>>
>> https://www.surveymonkey.com/r/C95B7ZP
>>
>> Thanks,
>>
>> Patrick
>>
>


Re: Feedback from the last Apache Cassandra Contributor Meeting

2020-02-12 Thread Patrick McFadin
I’m looking at the calendar and it looks like the next date will be
February 18th which is next Tuesday. We can start our rotation with a 10am
starting time. I’ll get the wiki updated today so agenda items can get
posted.

Patrick

On Tue, Feb 11, 2020 at 9:22 PM Scott Andreas  wrote:

> Thank you, Patrick!
>
> > On Feb 11, 2020, at 3:05 PM, Patrick McFadin  wrote:
> >
> > Survey is closed. Thank you everyone that took time to give your
> feedback.
> > Here are the results: https://www.surveymonkey.com/results/SM-7YTMZYLT7/
> >
> > Based on the feedback, these meetings are useful and this seems to be the
> > ideal format:
> >
> > 1 Hour scheduled
> > Monthly
> > Rotate through 10AM-1PM PST to try to cover as many time zones as
> possible.
> >
> > Excellent suggestion was to pre-determine a scribe for the meeting notes.
> > I'll add that to the meeting setup in cwiki.
> >
> > I'll get the meetings calendared for the next 6 months based on those
> > criteria and update the cwiki as needed.
> >
> > Thanks again!
> >
> > Patrick
> >
> >> On Mon, Feb 10, 2020 at 9:28 AM Patrick McFadin 
> wrote:
> >>
> >> Just a Monday reminder on the survey link I sent. I got a few responses
> >> but could use a few more to give us some decent N. If you have <
> 5minutes
> >> today, I would appreciate your feedback. I'll keep it open until
> tomorrow
> >> and then send results.
> >>
> >> Patrick
> >>
> >>> On Mon, Feb 3, 2020 at 4:21 PM Patrick McFadin 
> wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> One action item I took from our first contributor meeting was gather
> >>> feedback for the next meetings. I've created a short survey if you
> would
> >>> like to offer feedback. I'll let it run for the week and report back
> on the
> >>> results.
> >>>
> >>> https://www.surveymonkey.com/r/C95B7ZP
> >>>
> >>> Thanks,
> >>>
> >>> Patrick
> >>>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Apache Cassandra Contributor Meeting 2020-02-18

2020-02-12 Thread Patrick McFadin
Hi everyone,

A page has been setup for the Apache Cassandra Contributor Meeting on
February 18. The time in the rotation will be 10AM PST

You can add your agenda items here.
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-02-18+Apache+Cassandra+Contributor+Meeting

The Zoom link for the meeting can be found here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting

You may notice I have reorganized the pages in cwiki to accommodate more
meetings in the future. I have blocked out the date and times for the next
6 months for your planning.

Patrick


Sidecar meeting notes from 2020-03-10

2020-03-13 Thread Patrick McFadin
Hi everyone,

This week, a small group of us met on Zoom on how to contribute best to the
sidecar project (CEP-1). DataStax is open sourcing several components, one
of which includes a management sidecar. This was open conversation about
what's being released and how best to participate in CEP-1 in the future.
Thanks to Vinay Chella for organizing such a diverse group!

As a matter of tracking any discussions, notes were added to CEP-1 in the
Apache cwiki.

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1+Online+meeting+2020-03-10

Thanks,

Patrick


Re: Sidecar meeting notes from 2020-03-10

2020-03-13 Thread Patrick McFadin
I think Vinay just pinged everyone working on the project, but to Josh's
point. is this even traffic for the cassandra dev ml? Or does a new one
need to be created?

On Fri, Mar 13, 2020 at 1:29 PM Nate McCall  wrote:

> Where was this announced? I didnt hear anything about it (it's possible I
> missed an email but don't see anything).
>
> On Sat, Mar 14, 2020 at 8:26 AM Patrick McFadin 
> wrote:
>
> > Hi everyone,
> >
> > This week, a small group of us met on Zoom on how to contribute best to
> the
> > sidecar project (CEP-1). DataStax is open sourcing several components,
> one
> > of which includes a management sidecar. This was open conversation about
> > what's being released and how best to participate in CEP-1 in the future.
> > Thanks to Vinay Chella for organizing such a diverse group!
> >
> > As a matter of tracking any discussions, notes were added to CEP-1 in the
> > Apache cwiki.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1+Online+meeting+2020-03-10
> >
> > Thanks,
> >
> > Patrick
> >
>


Re: Sidecar meeting notes from 2020-03-10

2020-03-15 Thread Patrick McFadin
Since there seems to be more energy building on the sidecar project, would
it be helpful to do a monthly Zoom like we do with contributors. Or maybe
just add that in?

On Sun, Mar 15, 2020 at 1:13 AM Mick Semb Wever  wrote:

> > Where was this announced? I didnt hear anything about it (it's possible I
> > missed an email but don't see anything).
> >
>
>
> Hey Patrick, (Jason, Vinay, Dinesh, Joey),
>  can we give any stakeholders or observers to CEPs the chance to know of
> such meetings in advance?
> It doesn't have to be on the dev list, eg could be just an update on the
> CEP-1 page, but some trace of the announcement would be a win.
>
> If the sidecar process is now including input from other tools, then Reaper
> is a stakeholder in that context and no-one there was invited…
> And re-iterating Nate's point, maybe there was an announcement and we just
> missed it?
>
> thanks for writing up the meeting notes, that's fantastic! Exciting to see
> Datastax contributing lots back into the community again.
> cheers.
>


2020-03-24 Apache Cassandra Contributor Meeting

2020-03-23 Thread Patrick McFadin
Hi everyone,

It's easy to loose track of thing given how quickly things have changed in
the past couple of weeks. I missed sending a reminder last week but we have
a contributor meeting tomorrow at 11AM PST.

Please post any agenda items to the cwiki page below. If you don't have
edit rights, just send it to me and I'll post it.

https://cwiki.apache.org/confluence/display/CASSANDRA/2020-03-24+Apache+Cassandra+Contributor+Meeting

Thanks and see you tomorrow!

Patrick


Re: Kubernetes operator unification

2020-03-31 Thread Patrick McFadin
*Thanks for starting this thread Ben! Definitely agree that having a single
project-owned Kubernetes operator for Cassandra is preferred over a
fragmented ecosystem. I'll echo the same sentiment based on conversations
that it appears the community is eager to share experiences and
implementations in this space.Speaking for both myself and some other
contributors I've been working with, we're super excited to collaborate
with the community on this unified/standardized project-based operator. It
seems that nobody is tied to their own code and are open to one solution
that blends the best of all of these operators.Given the current virtual
event way of life we are experiencing, would everyone be ok with me
organizing a Zoom call for anyone who is interested so we can kick this off
properly? If we all engage our social networks, I think this could be a
great way of growing the community and inviting new contributors to the
project. When done, I can post the notes and video in the cwiki. Patrick*

On Tue, Mar 31, 2020 at 8:09 AM Jake Luciani  wrote:

> Hi Ben!
>
> Totally agree.  We should collaborate on a unified operator and I think as
> deployment on k8s becomes more and more prevalent we need to have
> distributed testing in k8s.
>
> To that end we are working on OSS releasing our distributed testing service
> we've developed over the years to make this easier and reproducible. Need a
> few more days before that's ready
> but it may give us a leg up.  I know Alex Petrov has been working a lot on
> the new jvm dtest harness and may have some ideas.
>
> Jake
>
> On Tue, Mar 31, 2020 at 12:11 AM Ben Bromhead  wrote:
>
> > Hi All
> >
> > With the announcement of a C* Sidecar and K8s operator from Datastax
> > (congrats btw), Jake and Stefan discussed moving to a more
> > standardised/unified implementation of an Apache Cassandra operator for
> > Kubernetes. Based on discussions with other folks either using our
> > operator, building/running their own or just getting started, there
> appears
> > to be some broader enthusiasm to a more unified approach outside of just
> > that thread.
> >
> > The current state of play for folks looking to run Apache Cassandra,
> > particularly on Kubernetes, is fairly fragmented. There are multiple
> > projects all doing similar things from large companies operating C* at
> > scale on kubernetes, individual contributors and commercialising
> entities.
> > Each one of these projects also have similar but diverse implementations
> > and capabilities. From an end user perspective, it makes it very hard to
> > figure out what path to take and from someone who supports these end
> users,
> > I'd much rather support one implementation than 3 even if it's not the
> one
> > we wrote :)
> >
> > To that end, I'd like to indicate that we (Instaclustr) are open to
> working
> > towards a project owned standardized K8s operators/sidecar/etc. How that
> > looks and how it gets implemented will surely be the subject of debate,
> > especially amongst those with existing implementations.
> >
> > Before engaging in CEP process (
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201)
> > it might be useful to informally discuss an approach to unifying
> > implementations.
> >
> > To that end I'd like to circulate the following ideas to kick off the
> > discussion of an approach that might be useful:
> >
> > We should look to start off with a new implementation in a separate repo
> > (as the sidecar project has done), leveraging the experience and
> > contributions from existing operator implementations and a framework like
> > the operator-framework, with the initial scope of just supporting our
> > distributed testing in our CI pipeline.
> >
> > Targeting our own distributed test cases (e.g. dtests) brings a number of
> > benefits:
> >
> >- Defines a common environment and goals that minimizes each
> >organisations unique kubernetes challenges.
> >- Follows the spirit of the 4.0 release to be more dba/operator
> aligned,
> >more production ready and easier to get right in a production setting
> > OOB
> >- Our test environment over time will look more and more like how
> users
> >run Cassandra in production. This will be the biggest win IMHO.
> >- The distributed tests will also serve as functional tests for the
> >operator itself.
> >
> > The main drawback I can see with this approach is it will potentially be
> a
> > longer path to getting a useable project based operator out the door. It
> > will also involve a ton of reworking dtests, which for some is going to a
> > hard blocker. From there we can start to expand and support more and more
> > real life use cases. Hopefully this is not a huge leap as our testing
> > should be covering most of those cases!
> >
> > This is largely my personal gut feel on the approach and I'm looking
> > forward to folks other suggestions!
> >
> > Cheers
> >
> > --
> >
> > Ben Bromhead
> >
> > Instaclustr 

Re: Kubernetes operator unification

2020-03-31 Thread Patrick McFadin
Sure. Let me figure out some timing and propose some times.

On Tue, Mar 31, 2020 at 5:15 PM Nate McCall  wrote:

> Given the large portion of work that's been done in EU by the Orange folks
> vs. that of PST and APAC, I think this might be one for which we do two
> versions: PST morning and evening.
>
> On Wed, Apr 1, 2020 at 12:51 PM Patrick McFadin 
> wrote:
>
> > *Thanks for starting this thread Ben! Definitely agree that having a
> single
> > project-owned Kubernetes operator for Cassandra is preferred over a
> > fragmented ecosystem. I'll echo the same sentiment based on conversations
> > that it appears the community is eager to share experiences and
> > implementations in this space.Speaking for both myself and some other
> > contributors I've been working with, we're super excited to collaborate
> > with the community on this unified/standardized project-based operator.
> It
> > seems that nobody is tied to their own code and are open to one solution
> > that blends the best of all of these operators.Given the current virtual
> > event way of life we are experiencing, would everyone be ok with me
> > organizing a Zoom call for anyone who is interested so we can kick this
> off
> > properly? If we all engage our social networks, I think this could be a
> > great way of growing the community and inviting new contributors to the
> > project. When done, I can post the notes and video in the cwiki. Patrick*
> >
> > On Tue, Mar 31, 2020 at 8:09 AM Jake Luciani  wrote:
> >
> > > Hi Ben!
> > >
> > > Totally agree.  We should collaborate on a unified operator and I think
> > as
> > > deployment on k8s becomes more and more prevalent we need to have
> > > distributed testing in k8s.
> > >
> > > To that end we are working on OSS releasing our distributed testing
> > service
> > > we've developed over the years to make this easier and reproducible.
> > Need a
> > > few more days before that's ready
> > > but it may give us a leg up.  I know Alex Petrov has been working a lot
> > on
> > > the new jvm dtest harness and may have some ideas.
> > >
> > > Jake
> > >
> > > On Tue, Mar 31, 2020 at 12:11 AM Ben Bromhead 
> > wrote:
> > >
> > > > Hi All
> > > >
> > > > With the announcement of a C* Sidecar and K8s operator from Datastax
> > > > (congrats btw), Jake and Stefan discussed moving to a more
> > > > standardised/unified implementation of an Apache Cassandra operator
> for
> > > > Kubernetes. Based on discussions with other folks either using our
> > > > operator, building/running their own or just getting started, there
> > > appears
> > > > to be some broader enthusiasm to a more unified approach outside of
> > just
> > > > that thread.
> > > >
> > > > The current state of play for folks looking to run Apache Cassandra,
> > > > particularly on Kubernetes, is fairly fragmented. There are multiple
> > > > projects all doing similar things from large companies operating C*
> at
> > > > scale on kubernetes, individual contributors and commercialising
> > > entities.
> > > > Each one of these projects also have similar but diverse
> > implementations
> > > > and capabilities. From an end user perspective, it makes it very hard
> > to
> > > > figure out what path to take and from someone who supports these end
> > > users,
> > > > I'd much rather support one implementation than 3 even if it's not
> the
> > > one
> > > > we wrote :)
> > > >
> > > > To that end, I'd like to indicate that we (Instaclustr) are open to
> > > working
> > > > towards a project owned standardized K8s operators/sidecar/etc. How
> > that
> > > > looks and how it gets implemented will surely be the subject of
> debate,
> > > > especially amongst those with existing implementations.
> > > >
> > > > Before engaging in CEP process (
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201)
> > > > it might be useful to informally discuss an approach to unifying
> > > > implementations.
> > > >
> > > > To that end I'd like to circulate the following ideas to kick off the
> > > > discussion of an approach that might be useful:
> > > >
> > > > We should l

Kubernetes Operator SIG Zoom

2020-04-06 Thread Patrick McFadin
Hi,

I have sorted out the time zones and got the initial Kubernetes Operator
zoom call on the calendar. All of it is documented here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Kubernetes+Operator+SIG+Meeting

Meeting 1(APAC/Western US Friendly)

San Francisco: April 8, 5:00PM

Singapore: April 9, 8:00AM

Tokyo: April 9, 9:00AM

Sydney: April 9, 10:00AM

New Zealand: April 9, 12:00PM

iCAL link

Meeting 2(CEST/Eastern US Friendly)

San Francisco: April 9, 7:00 AM

New York: April 9, 10:00AM

London: April 9, 3:00PM

Paris: April 9, 4:00PM

Berlin: April 9, 4:00PM

iCal Link

I will also be sharing this link on social media to help get the word out
to the larger community. I feel like we have a great opportunity to invite
a lot of new people.

Any discussion topics can be sent directly to me, I'll gather and document
before each meeting.

Thanks!

Patrick


Re: Kubernetes Operator SIG Zoom

2020-04-14 Thread Patrick McFadin
Hi everyone,

A little late, but wanted to update post meeting. Here are the notes and
videos from our meetings last week:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148646938

The first action item from this meeting is creating the CEP to gather
feedback. Ben Bromhead and myself will start that process which will live
in the Cassandra cwiki for everyone to contribute. There was also consensus
on having bi-weekly meetings at the same times which I will schedule.

Thanks to everyone who participated!

Patrick

On Mon, Apr 6, 2020 at 2:25 PM Patrick McFadin  wrote:

> Hi,
>
> I have sorted out the time zones and got the initial Kubernetes Operator
> zoom call on the calendar. All of it is documented here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Kubernetes+Operator+SIG+Meeting
>
> Meeting 1(APAC/Western US Friendly)
>
> San Francisco: April 8, 5:00PM
>
> Singapore: April 9, 8:00AM
>
> Tokyo: April 9, 9:00AM
>
> Sydney: April 9, 10:00AM
>
> New Zealand: April 9, 12:00PM
>
> iCAL link
> <https://calendar.google.com/event?action=TEMPLATE&tmeid=N2M2Z2xnZ3Nha3ZyMHIwbWxrazRqcG1jcXYga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw&tmsrc=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com>
> Meeting 2(CEST/Eastern US Friendly)
>
> San Francisco: April 9, 7:00 AM
>
> New York: April 9, 10:00AM
>
> London: April 9, 3:00PM
>
> Paris: April 9, 4:00PM
>
> Berlin: April 9, 4:00PM
>
> iCal Link
> <https://calendar.google.com/event?action=TEMPLATE&tmeid=NjdhampnbGc4ZHFocnRtZ2diaGEyanZhNGsga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw&tmsrc=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com>
> I will also be sharing this link on social media to help get the word out
> to the larger community. I feel like we have a great opportunity to invite
> a lot of new people.
>
> Any discussion topics can be sent directly to me, I'll gather and document
> before each meeting.
>
> Thanks!
>
> Patrick
>


Meeting notes and recording from today's meeting

2020-04-21 Thread Patrick McFadin
Hi everyone,

Here are the notes and video recording from today's meeting:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-04-21+Apache+Cassandra+Contributor+Meeting

I updated the meeting details with an ICS to import into your own calendar
to make it easy to schedule.
https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting

Thanks!

Patrick


Re: DataStax Driver Donation to Apache Cassandra Project

2020-04-22 Thread Patrick McFadin
It would probably be a good idea to get some outside guidance on what other
projects have seen because like what Nate said, this isn't the first time.

https://felix.apache.org/documentation/subprojects.html
https://cocoon.apache.org/subprojects/
Commons has components: http://commons.apache.org/components.html
Hadoop, as mentioned, has modules.

Patrick

On Wed, Apr 22, 2020 at 1:25 PM Nate McCall  wrote:

> On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> bened...@apache.org>
> wrote:
>
> > I welcome the donation, and hope we are able to accept all of the
> > drivers.  This is really great news IMO.
> >
> >  I do however wonder if the project may be accumulating too many
> > sub-projects?  I wonder if it's time to think about splitting, and
> perhaps
> > incubating a project for the drivers?
> >
>
> This is a legit concern and good question, but I think this is more a
> natural evolution of growing a project. There is precedent for this in
> Spark, Beam, Hadoop and others who have a number of different repositories
> under the general project umbrella.
>
> What I would like to avoid is a situation like with Apache Curator and
> Apache Zookeeper. The former being a zookeeper client donation from Netflix
> that came in as a top level project. From the peanut gallery, it seems like
> that has been less than ideal a couple of times in the past coordinating
> releases, trademarks and such with separate project management.
>


Re: Kubernetes Operator SIG Zoom

2020-04-22 Thread Patrick McFadin
Just a reminder for everyone. We have two Kubernetes meeting over the next
12 hours. Same format as before.

Zoom link and calendar invites with times can be found here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG

Franck has brought up an excellent point about combining meetings. That
will be a discussion item for these meetings and if we can combine them and
still be inclusive for many timezones.

Ben and I will discuss the CEP draft we have been working on and let
everyone jump in and start working this out together.

Patrick

On Mon, Apr 20, 2020 at 2:48 AM  wrote:

> Hi everyone,
>
> Thanks Patrick for a difficult job at animating these meetings.
> I have watched the first meeting and I now am quite sure we should have
> only one meeting instead of 2. I was pleased to hear things about CassKop
> and we could have replied on the spot.
>
> Personally I had no trouble watching good old apple keynotes when they
> were help at 10 am SF time, may be we could have our C* operator “keynote"
> at the same time or close?
>
> There didn’t seem to be that many people from Asia in the first call but I
> may be mistaken
>
> Just my 2 cents :)
>
> Franck
>
> PS: I have not found the CEP where I would like to express our position.
> Spoiler: casskop code is available for anyone who wants it (support
> included) and it has nothing Orange specific :)
>
>
> franck.de...@orange.com
> Casskop Product Owner
> https://github.com/Orange-OpenSource/casskop
>
>
>
> > On 15 Apr 2020, at 02:17, Patrick McFadin  wrote:
> >
> > Hi everyone,
> >
> > A little late, but wanted to update post meeting. Here are the notes and
> > videos from our meetings last week:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148646938
> >
> > The first action item from this meeting is creating the CEP to gather
> > feedback. Ben Bromhead and myself will start that process which will live
> > in the Cassandra cwiki for everyone to contribute. There was also
> consensus
> > on having bi-weekly meetings at the same times which I will schedule.
> >
> > Thanks to everyone who participated!
> >
> > Patrick
> >
> > On Mon, Apr 6, 2020 at 2:25 PM Patrick McFadin 
> wrote:
> >
> >> Hi,
> >>
> >> I have sorted out the time zones and got the initial Kubernetes Operator
> >> zoom call on the calendar. All of it is documented here:
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Kubernetes+Operator+SIG+Meeting
> >>
> >> Meeting 1(APAC/Western US Friendly)
> >>
> >> San Francisco: April 8, 5:00PM
> >>
> >> Singapore: April 9, 8:00AM
> >>
> >> Tokyo: April 9, 9:00AM
> >>
> >> Sydney: April 9, 10:00AM
> >>
> >> New Zealand: April 9, 12:00PM
> >>
> >> iCAL link
> >> <
> https://calendar.google.com/event?action=TEMPLATE&tmeid=N2M2Z2xnZ3Nha3ZyMHIwbWxrazRqcG1jcXYga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw&tmsrc=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com
> >
> >> Meeting 2(CEST/Eastern US Friendly)
> >>
> >> San Francisco: April 9, 7:00 AM
> >>
> >> New York: April 9, 10:00AM
> >>
> >> London: April 9, 3:00PM
> >>
> >> Paris: April 9, 4:00PM
> >>
> >> Berlin: April 9, 4:00PM
> >>
> >> iCal Link
> >> <
> https://calendar.google.com/event?action=TEMPLATE&tmeid=NjdhampnbGc4ZHFocnRtZ2diaGEyanZhNGsga2w5cHVoZ2s3cXRkdXFhdHRlOHRmZDVtcHNAZw&tmsrc=kl9puhgk7qtduqatte8tfd5mps%40group.calendar.google.com
> >
> >> I will also be sharing this link on social media to help get the word
> out
> >> to the larger community. I feel like we have a great opportunity to
> invite
> >> a lot of new people.
> >>
> >> Any discussion topics can be sent directly to me, I'll gather and
> document
> >> before each meeting.
> >>
> >> Thanks!
> >>
> >> Patrick
> >>
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
>


4-26-2020 update on Kubernetes Operator

2020-04-26 Thread Patrick McFadin
*Hi everyone,Over the past two weeks, we have had 4 public meetings with a
lot of great discussions. You can find the recordings and notes here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG
There
were some important next steps after this week. First is some housekeeping.
Having two meetings allowed for better time zone spread, but the
discussions were disconnected and tended to be somewhat redundant. It was
suggested to move to a single meeting that can span the most timezones. I
took that feedback and have rebuilt the SIG meeting schedules in the same
type of rotation being used for the Contributor Meetings. We’ll see how
that goes for everyone effected. I’ve also switched away from Zoom to Jitsi
(jitsi.org ). Switching to a more open video conferencing
software seemed like a natural move and the feature list is comparable to
Zoom.All the meeting details and schedule are posted here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG
This
includes a calendar file and shared calendar link. Next important thing is
the beginning of the CEP for the Kubernetes Operator. Ben Bromhead and I
took a first pass at a skeleton for CEP-2

with all the basics. At this point, we need everyone participating in the
project to take some time and help build out some of the critical details.
Because everyone loves Confluence so much, I have created a Google doc we
can use as a working area before moving over to a more formal Cassandra
Wiki.
https://docs.google.com/document/d/18Ow4R3tB9GIvdcFO7WmUvjb0a-sT6h0zSCEnfHsPz58/edit?usp=sharing
Everyone
has edit rights. Please use the comment functionality if you have questions
about a particular section.The main portion that really needs the most
thoughtful work is Operator Capability Level
.
What does each level mean in Cassandra terms. There was already some good
debate about configuration and common tasks like repair. Let’s get that
captured in the doc if we can. If you are one of the groups that already
have an operator, your experience here is invaluable. Please take some time
of you can. Thanks and looking forward to the collaboration. Patrick*


2020-05-07 Cassandra Kubernetes Operator SIG reminder

2020-05-07 Thread Patrick McFadin
Hi everyone,

Cassandra Kubernetes Operator SIG today at 10AM PST. Just a reminder, I
switched the conference link to Jitsi from Zoom. Link in the wiki:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG

Today we will be discussing CEP-2 so bring your opinions.
https://docs.google.com/document/d/18Ow4R3tB9GIvdcFO7WmUvjb0a-sT6h0zSCEnfHsPz58/edit#heading=h.haeraryxhhvn

Specifically nailing down Level 1, 2 and 3

See you then

Patrick


2020-05-07 Cassandra Kubernetes SIG meeting follow-up

2020-05-11 Thread Patrick McFadin
Hi everyone,

A little late getting the page and recording up. You can find the recording
here:

https://cwiki.apache.org/confluence/display/CASSANDRA/2020-05-07+Cassandra+Kubernetes+Operator+SIG

The notes for this week's meeting we're taken int he working CEP gdoc
found here:
https://docs.google.com/document/d/18Ow4R3tB9GIvdcFO7WmUvjb0a-sT6h0zSCEnfHsPz58/edit?usp=sharing

Some outstanding questions, but will address those in a [discussion] thread

Patrick


[discussion]Completing CEP-2

2020-05-11 Thread Patrick McFadin
Hi everyone,

Last week in our Cassandra Kubernetes SIG it was clear that we are coming
up on the completion of the specifications for CEP-2. The path we on look
something like this:

 - Agree to the overall specifications for a Cassandra Kubernetes Operator
with as much details as possible on the required features.
 - One or more groups donating code as an initial commit.
 - Jira and commit activity on common Cassandra Kubernetes operator.

The two main questions that have come up as a result:

1. What constitutes a completed CEP? Is this something the PMC votes on or
how does this get approved as a part of the project
2. What are the procedures for code donation at this scale? It's likely
that more than one group will be participating with a large amount of code.

Any help or opinions on those two questions would be great.

Patrick


Reminder - 2020-05-19 Apache Cassandra Contributor Meeting

2020-05-18 Thread Patrick McFadin
Hi everyone,

Reminder that tomorrow at 1PM PST we'll be having a contributor meeting. I
gave Jitsi a try for the Kubernetes SIG but ran into a lot of trouble with
browser compatibility and recording. I'll just stick with using Zoom to
keep it working consistently.

https://datastax.zoom.us/j/390839037

https://cwiki.apache.org/confluence/display/CASSANDRA/2020-05-19+Apache+Cassandra+Contributor+Meeting

Add any agenda items here or email me direct and I can put them in.

Thanks,

Patrick


Re: Reminder - 2020-05-19 Apache Cassandra Contributor Meeting

2020-05-19 Thread Patrick McFadin
Thanks Mick!

On Mon, May 18, 2020 at 11:51 PM Mick Semb Wever  wrote:

> I'll be there and have added "Cassandra CI Run-through, next steps,
> help needed, and Q&A" to the agenda.
> If you have questions on CI turn up and ask them.
>
> Mick
>
>
> On Tue, 19 May 2020 at 02:53, Patrick McFadin  wrote:
> >
> > Hi everyone,
> >
> > Reminder that tomorrow at 1PM PST we'll be having a contributor meeting.
> I
> > gave Jitsi a try for the Kubernetes SIG but ran into a lot of trouble
> with
> > browser compatibility and recording. I'll just stick with using Zoom to
> > keep it working consistently.
> >
> > https://datastax.zoom.us/j/390839037
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-05-19+Apache+Cassandra+Contributor+Meeting
> >
> > Add any agenda items here or email me direct and I can put them in.
> >
> > Thanks,
> >
> > Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


2020-05-19 Contributor Meeting notes and recording

2020-05-20 Thread Patrick McFadin
Hi everyone,

Meeting notes and recording up here:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-05-19+Apache+Cassandra+Contributor+Meeting

Patrick


Cassandra Kubernetes SIG today

2020-05-21 Thread Patrick McFadin
Hi everyone,

Quick reminder, Cassandra Kubernetes Operator SIG at the top of the hour.

I've switched back to using Zoom to avoid the issues we had with Jitsi. The
link to the meeting is here:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG

Calendar object was also updated if you have subscribed.

Patrick


Cassandra Kubernetes SIG 2020-05-21 note and recording

2020-05-24 Thread Patrick McFadin
Hi everyone,

Last weeks meeting is posted here:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-05-19+Cassandra+Kubernetes+Operator+SIG

Highlights:

This is a big task trying to get multiple operators in the wild to one
closer to the project. The CEP is at a place where we need to get further
into previous art that may be donated. In our meeting, it was clear that
since this is code that runs in Kubernetes, we need to approach things from
that ecosystem. The Custom Resource Definition is a way to extend the
Kubernetes API and operators use these custom APIs to control the resource
such as a Cassandra node. Getting to a common place with what a CRD
required for Cassandra is the next step. Here is an example of a CRD:
https://github.com/datastax/cass-operator/blob/master/operator/deploy/crds/cassandra.datastax.com_cassandradatacenters_crd.yaml

Yes, that is a lot of YAML to cut through but John Sanda and I will try to
organize a way for all the participating groups to contribute and converge
on a final version.

Patrick


  1   2   3   >