Re: Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Patrick McFadin
Sent you an invite Sam. Welcome to the community!

On Fri, Oct 27, 2023 at 10:31 AM Sam  wrote:

> Please can I have an invite to the Slack workspace on this email. I'd like
> to take a look through some of the items for first time contributors :-)
>
> Thanks!
>
> On Fri, 27 Oct 2023 at 18:10, Josh McKenzie  wrote:
>
>> In case you're keeping score on how frequently these are coming out: *please
>> stop*. ;)
>>
>> Silver lining - looks like we have a lot to discuss this round! Last
>> update was late July and we've been churning through the 5.0 freeze and
>> stabilization phase.
>>
>>
>>
>> *[New Contributors Getting Started]*
>> Check out https://the-asf.slack.com, channel #cassandra-dev. Reply
>> directly to me on this email if you need an invite for your account, and
>> reach out to the @cassandra_mentors alias in the channel if you need to get
>> oriented.
>>
>> We have a list of curated "getting started" tickets you can find here,
>> filtered to "ToDo" (i.e. not yet worked):
>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162=2652
>> .
>>
>> *Helpful links:*
>> - Getting Started with Development on C*:
>> https://cassandra.apache.org/_/development/gettingstarted.html
>> - Building and IDE integration (worktrees are your friend; msg me on
>> slack if you need pointers):
>> https://cassandra.apache.org/_/development/ide.html
>> - Code Style: https://cassandra.apache.org/_/development/code_style.html
>>
>>
>>
>> *[Dev mailing list]*
>>
>> https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27
>> :
>>
>> My last email of shame was 35 threads. Drumroll for this one...
>> 91. *Yeesh*. Let me stick to highlights.
>>
>> Ekaterina pushed through dropping JDK8 support and adding JDK17
>> support... back in July. If you didn't know about it by know, consider
>> yourself doubly notified. :) .
>> https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k I think
>> I can speak on behalf of all of us when I say: *Thank You Ekaterina.*
>>
>> This came up recently on another thread about when to branch 5.1, but we
>> discussed our freeze plans and exception rules for TCM and Accord here:
>> https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick
>> was essentially looking for a similar waiver for Vector search since it was
>> well abstracted, depended on SAI and external libs, and in general
>> shouldn't be too big of a disruption to get into 5.0. General consensus at
>> the time was "sure", and the work has since been completed. But here's the
>> reminder and link for posterity (and in case you missed it).
>>
>> Jaydeep reached out about a potential short-term solution to detecting
>> token-ownership mismatch while we don't yet have TCM; this seems more
>> pressing now as we're looking at a 5.0 without yet having TCM in it. The
>> dev ML thread is here:
>> https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and he
>> created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing
>> with the topic. There's a relatively modest (7 files, just over 300 lines)
>> PR available here: https://github.com/apache/cassandra/pull/2595/files;
>> I haven't looked into it, but it might be worth considering getting this
>> into 5.0 since it looks like we're moving to cutting w/out TCM. Any
>> thoughts?
>>
>> We had a pretty good discussion about automated repair scheduling,
>> discussing whether it should live in the DB proper vs. in the sidecar, pros
>> and cons, pressures, etc. Not sure if things moved beyond that; I know
>> there's at least a few implementations out there that haven't yet made
>> their way back to the ASF project proper. Thread:
>> https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My
>> hope is we can avoid the gridlock we hit for a long time with the sidecar
>> where there are multiple implementations with different tradeoffs and
>> everyone's disincentivized from accepting a solution different from their
>> own in-house one since it'd theoretically require re-tooling. Tough problem
>> with no easy solutions, but would love to see this become a first class
>> citizen in the ecosystem.
>>
>> Paulo brought up a discussion about moving to disk_access_mode =
>> mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we
>> actually changed that in the 5.0 branch? Thread:
>> https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just
>> pulled on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() ==
>> .mmap rather than .mmap_index_only.
>>
>> David Capwell worked on adding some retries to repair messages when
>> they're failing to make the process more robust:
>> https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1.
>> Reception was positive enough that he went so far as to back-port it and
>> also work on some for IR. Looks like he could use a reviewer here:
>> https://issues.apache.org/jira/browse/CASSANDRA-18962 - and this is
>> patch available.
>>

Re: Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Sam
Please can I have an invite to the Slack workspace on this email. I'd like
to take a look through some of the items for first time contributors :-)

Thanks!

On Fri, 27 Oct 2023 at 18:10, Josh McKenzie  wrote:

> In case you're keeping score on how frequently these are coming out: *please
> stop*. ;)
>
> Silver lining - looks like we have a lot to discuss this round! Last
> update was late July and we've been churning through the 5.0 freeze and
> stabilization phase.
>
>
>
> *[New Contributors Getting Started]*
> Check out https://the-asf.slack.com, channel #cassandra-dev. Reply
> directly to me on this email if you need an invite for your account, and
> reach out to the @cassandra_mentors alias in the channel if you need to get
> oriented.
>
> We have a list of curated "getting started" tickets you can find here,
> filtered to "ToDo" (i.e. not yet worked):
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162=2652
> .
>
> *Helpful links:*
> - Getting Started with Development on C*:
> https://cassandra.apache.org/_/development/gettingstarted.html
> - Building and IDE integration (worktrees are your friend; msg me on slack
> if you need pointers): https://cassandra.apache.org/_/development/ide.html
> - Code Style: https://cassandra.apache.org/_/development/code_style.html
>
>
>
> *[Dev mailing list]*
>
> https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27
> :
>
> My last email of shame was 35 threads. Drumroll for this one...
> 91. *Yeesh*. Let me stick to highlights.
>
> Ekaterina pushed through dropping JDK8 support and adding JDK17 support...
> back in July. If you didn't know about it by know, consider yourself doubly
> notified. :) .
> https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k I think
> I can speak on behalf of all of us when I say: *Thank You Ekaterina.*
>
> This came up recently on another thread about when to branch 5.1, but we
> discussed our freeze plans and exception rules for TCM and Accord here:
> https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick
> was essentially looking for a similar waiver for Vector search since it was
> well abstracted, depended on SAI and external libs, and in general
> shouldn't be too big of a disruption to get into 5.0. General consensus at
> the time was "sure", and the work has since been completed. But here's the
> reminder and link for posterity (and in case you missed it).
>
> Jaydeep reached out about a potential short-term solution to detecting
> token-ownership mismatch while we don't yet have TCM; this seems more
> pressing now as we're looking at a 5.0 without yet having TCM in it. The
> dev ML thread is here:
> https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and he
> created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing
> with the topic. There's a relatively modest (7 files, just over 300 lines)
> PR available here: https://github.com/apache/cassandra/pull/2595/files; I
> haven't looked into it, but it might be worth considering getting this into
> 5.0 since it looks like we're moving to cutting w/out TCM. Any thoughts?
>
> We had a pretty good discussion about automated repair scheduling,
> discussing whether it should live in the DB proper vs. in the sidecar, pros
> and cons, pressures, etc. Not sure if things moved beyond that; I know
> there's at least a few implementations out there that haven't yet made
> their way back to the ASF project proper. Thread:
> https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My hope
> is we can avoid the gridlock we hit for a long time with the sidecar where
> there are multiple implementations with different tradeoffs and everyone's
> disincentivized from accepting a solution different from their own in-house
> one since it'd theoretically require re-tooling. Tough problem with no easy
> solutions, but would love to see this become a first class citizen in the
> ecosystem.
>
> Paulo brought up a discussion about moving to disk_access_mode =
> mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we
> actually changed that in the 5.0 branch? Thread:
> https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just
> pulled on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() ==
> .mmap rather than .mmap_index_only.
>
> David Capwell worked on adding some retries to repair messages when
> they're failing to make the process more robust:
> https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1.
> Reception was positive enough that he went so far as to back-port it and
> also work on some for IR. Looks like he could use a reviewer here:
> https://issues.apache.org/jira/browse/CASSANDRA-18962 - and this is patch
> available.
>
> Mike Adamson reached out about adding / taking a dependency on jvector:
> https://lists.apache.org/thread/zkqg7mk9hp35zn0cf1tvywc2m3l63jrn. The
> general gist of it was "looks good, written by 

Project Status Update: 90-day catch-up edition [2023-10-27]

2023-10-27 Thread Josh McKenzie
In case you're keeping score on how frequently these are coming out: *please 
stop*. ;)

Silver lining - looks like we have a lot to discuss this round! Last update was 
late July and we've been churning through the 5.0 freeze and stabilization 
phase.


*[New Contributors Getting Started]
*
Check out https://the-asf.slack.com, channel #cassandra-dev. Reply directly to 
me on this email if you need an invite for your account, and reach out to the 
@cassandra_mentors alias in the channel if you need to get oriented.

We have a list of curated "getting started" tickets you can find here, filtered 
to "ToDo" (i.e. not yet worked): 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484=2160=2162=2652.

*Helpful links:**
*
- Getting Started with Development on C*: 
https://cassandra.apache.org/_/development/gettingstarted.html
- Building and IDE integration (worktrees are your friend; msg me on slack if 
you need pointers): https://cassandra.apache.org/_/development/ide.html
- Code Style: https://cassandra.apache.org/_/development/code_style.html


*[Dev mailing list]
*
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-7-20%7Cdto=2023-10-27:

My last email of shame was 35 threads. Drumroll for this one...
91. *Yeesh*. Let me stick to highlights.

Ekaterina pushed through dropping JDK8 support and adding JDK17 support... back 
in July. If you didn't know about it by know, consider yourself doubly 
notified. :) . https://lists.apache.org/thread/9pwz3vtpf88fly27psc7yxvcv0lwbz8k 
I think I can speak on behalf of all of us when I say: **Thank You Ekaterina.**

This came up recently on another thread about when to branch 5.1, but we 
discussed our freeze plans and exception rules for TCM and Accord here: 
https://lists.apache.org/thread/mzj3dq8b7mzf60k6mkby88b9n9ywmsgw. Mick was 
essentially looking for a similar waiver for Vector search since it was well 
abstracted, depended on SAI and external libs, and in general shouldn't be too 
big of a disruption to get into 5.0. General consensus at the time was "sure", 
and the work has since been completed. But here's the reminder and link for 
posterity (and in case you missed it).

Jaydeep reached out about a potential short-term solution to detecting 
token-ownership mismatch while we don't yet have TCM; this seems more pressing 
now as we're looking at a 5.0 without yet having TCM in it. The dev ML thread 
is here: https://lists.apache.org/thread/4p0orhom42g36osnknqj3fqmqhvqml1g, and 
he created https://issues.apache.org/jira/browse/CASSANDRA-18758 dealing with 
the topic. There's a relatively modest (7 files, just over 300 lines) PR 
available here: https://github.com/apache/cassandra/pull/2595/files; I haven't 
looked into it, but it might be worth considering getting this into 5.0 since 
it looks like we're moving to cutting w/out TCM. Any thoughts?

We had a pretty good discussion about automated repair scheduling, discussing 
whether it should live in the DB proper vs. in the sidecar, pros and cons, 
pressures, etc. Not sure if things moved beyond that; I know there's at least a 
few implementations out there that haven't yet made their way back to the ASF 
project proper. Thread: 
https://lists.apache.org/thread/glvmkwknf91rxc5l6w4d4m1kcvlr6mrv. My hope is we 
can avoid the gridlock we hit for a long time with the sidecar where there are 
multiple implementations with different tradeoffs and everyone's 
disincentivized from accepting a solution different from their own in-house one 
since it'd theoretically require re-tooling. Tough problem with no easy 
solutions, but would love to see this become a first class citizen in the 
ecosystem.

Paulo brought up a discussion about moving to disk_access_mode = 
mmap_index_only on 5.0. Seemed to be a consensus there but I'm not sure we 
actually changed that in the 5.0 branch? Thread: 
https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw. Just pulled 
on cassandra-5.0 and it looks like auto + hasLargeAddressSpace() == .mmap 
rather than .mmap_index_only.

David Capwell worked on adding some retries to repair messages when they're 
failing to make the process more robust: 
https://lists.apache.org/thread/wxv6k6slljqcw73xcmpxj4kn5lz95jd1. Reception was 
positive enough that he went so far as to back-port it and also work on some 
for IR. Looks like he could use a reviewer here: 
https://issues.apache.org/jira/browse/CASSANDRA-18962 - and this is patch 
available.

Mike Adamson reached out about adding / taking a dependency on jvector: 
https://lists.apache.org/thread/zkqg7mk9hp35zn0cf1tvywc2m3l63jrn. The general 
gist of it was "looks good, written by committer(s) / pmc members, permissvely 
licensed. Go for it". Some discussion about copyright holders and whether that 
matters from an ASF perspective, and we've further had some good discussion 
about the application of generative AI tooling to not just code contributed to 
the ASF, but also in dependencies we bring into the