Re: Tokenization and SAI query syntax

2023-08-07 Thread Benedict
Yep, this sounds like the potentially least bad approach for now. Sorry Caleb, I jumped in without properly reading the thread and assumed we were proposing changes to CQL.If it’s clear we’re dropping into a sub-language and providing a sub-query to it that’s SAI-specific, that gives us pretty broad leeway IMO.On 7 Aug 2023, at 22:27, Josh McKenzie  wrote:Been chatting a bit w/Caleb about this offline and poking around to better educate myself.using functions (ignoring the implementation complexity) at least removes ambiguity. This, plus using functions lets us kick the can down the road a bit in terms of landing on an integrated grammar we agree on. It seems to me there's a tension between:"SQL-like" (i.e. postgres-like)"Indexing and Search domain-specific-like" (i.e. lucene syntax which, as Benedict points out, doesn't really jell w/what we have in CQL at this point), and??? Some other YOLO CQL / C* specific thing where we go our own roadI don't think we're really going to know what our feature-set in terms of indexing is going to look like or the shape it's going to take for awhile, so backing ourselves into any of the 3 corners above right now feels very premature to me.So I'm coming around to the expr / method call approach to preserve that flexibility. It's maximally explicit and preserves optionality at the expense of being clunky. For now.On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote:> I do not think we should start using lucene syntax for it, it will make people think they can do everything else lucene allows.I'm sure we won't be supporting everything Lucene allows, but this is going to evolve. Right off the bat, if you introduce support for tokenization and filtering, someone is, for example, going to ask for phrase queries. ("John Smith landed in Virginia" is tokenized, but someone wants to match exactly on "John Smith".) The whole point of the Vector project is to do relevance, right? Are we going to do term boosting? Do we need queries like "field: quick brown +fox -news" where fox must be present, news cannot be present, and quick and brown increase relevance?SASI uses "=" and "LIKE" in a way that assumes the user understands the tokenization scheme in use on the target field. I understand that's a bit ambiguous.If we object to allowing expr embedding of a subset of the Lucene syntax, I can't imagine we're okay w/ then jamming a subset of that syntax into the main CQL grammar.If we want to do this in non-expr CQL space, I think using functions (ignoring the implementation complexity) at least removes ambiguity. "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be pretty clear, although there may be other problems. For instance, what happens when I try to use "token_match" on an indexed field whose analyzer does not tokenize? We obviously can't use the index, so we'd be reduced to requiring a filtering query, but maybe that's fine. My point is that, if we're going to make write and read analyzers symmetrical, there's really no way to make the semantics of our queries totally independent of analysis. (ex. "field : foo bar" behaves differently w/ read tokenization than it does without. It could even be an OR or AND query w/ tokenization, depending on our defaults.)On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma  wrote:Why not start with SQLish operators supported by many databases (LIKE and CONTAINS)?On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan  wrote:I am also -1 on directly exposing lucene like syntax here. Besides being ugly, SAI is not lucene, I do not think we should start using lucene syntax for it, it will make people think they can do everything else lucene allows.On Aug 7, 2023, at 5:13 AM, Benedict  wrote:I’m strongly opposed to : It is very dissimilar to our current operators. CQL is already not the prettiest language, but let’s not make it a total mish mash.On 7 Aug 2023, at 10:59, Mike Adamson  wrote:I am also in agreement with 'column : token' in that 'I don't hate it' but I'd like to offer an alternative to this in 'column HAS token'. HAS is currently not a keyword that we use so wouldn't cause any brain conflicts.While I don't hate ':' I have a particular dislike of the lucene search syntax because of its terseness and lack of easy readability. Saying that, I'm happy to do with ':' if that is the decision. On Fri, 4 Aug 2023 at 00:23, Jon Haddad  wrote:Assuming SAI is a superset of SASI, and we were to set up something so that SASI indexes auto convert to SAI, this gives even more weight to my point regarding how differing behavior for the same syntax can lead to issues.  Imo the best case scenario results in the user not even noticing their indexes have changed.An (maybe better?) alternative is to add a flag to the index configuration for "compatibility mod", which might address the concerns around using an equality operator when it 

Re: Tokenization and SAI query syntax

2023-08-07 Thread Josh McKenzie
Been chatting a bit w/Caleb about this offline and poking around to better 
educate myself.

> using functions (ignoring the implementation complexity) at least removes 
> ambiguity. 
This, plus using functions lets us kick the can down the road a bit in terms of 
landing on an integrated grammar we agree on. It seems to me there's a tension 
between:
 1. "SQL-like" (i.e. postgres-like)
 2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, as 
Benedict points out, doesn't really jell w/what we have in CQL at this point), 
and
 3. ??? Some other YOLO CQL / C* specific thing where we go our own road
I don't think we're really going to know what our feature-set in terms of 
indexing is going to look like or the shape it's going to take for awhile, so 
backing ourselves into any of the 3 corners above right now feels very 
premature to me.

So I'm coming around to the expr / method call approach to preserve that 
flexibility. It's maximally explicit and preserves optionality at the expense 
of being clunky. For now.

On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote:
> > I do not think we should start using lucene syntax for it, it will make 
> > people think they can do everything else lucene allows.
> 
> I'm sure we won't be supporting everything Lucene allows, but this is going 
> to evolve. Right off the bat, if you introduce support for tokenization and 
> filtering, someone is, for example, going to ask for phrase queries. ("John 
> Smith landed in Virginia" is tokenized, but someone wants to match exactly on 
> "John Smith".) The whole point of the Vector project is to do relevance, 
> right? Are we going to do term boosting? Do we need queries like "field: 
> quick brown +fox -news" where fox must be present, news cannot be present, 
> and quick and brown increase relevance?
> 
> SASI uses "=" and "LIKE" in a way that assumes the user understands the 
> tokenization scheme in use on the target field. I understand that's a bit 
> ambiguous.
> 
> If we object to allowing expr embedding of a subset of the Lucene syntax, I 
> can't imagine we're okay w/ then jamming a subset of that syntax into the 
> main CQL grammar.
> 
> If we want to do this in non-expr CQL space, I think using functions 
> (ignoring the implementation complexity) at least removes ambiguity. 
> "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be 
> pretty clear, although there may be other problems. For instance, what 
> happens when I try to use "token_match" on an indexed field whose analyzer 
> does not tokenize? We obviously can't use the index, so we'd be reduced to 
> requiring a filtering query, but maybe that's fine. My point is that, if 
> we're going to make write and read analyzers symmetrical, there's really no 
> way to make the semantics of our queries totally independent of analysis. 
> (ex. "field : foo bar" behaves differently w/ read tokenization than it does 
> without. It could even be an OR or AND query w/ tokenization, depending on 
> our defaults.)
> 
> On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma  wrote:
>> Why not start with SQLish operators supported by many databases (LIKE and 
>> CONTAINS)?
>> 
>> On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan  
>> wrote:
>>> 
>>> I am also -1 on directly exposing lucene like syntax here. Besides being 
>>> ugly, SAI is not lucene, I do not think we should start using lucene syntax 
>>> for it, it will make people think they can do everything else lucene allows.
>>> 
 On Aug 7, 2023, at 5:13 AM, Benedict  wrote:
 
 
 I’m strongly opposed to : 
 
 It is very dissimilar to our current operators. CQL is already not the 
 prettiest language, but let’s not make it a total mish mash.
 
 
 
 
> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
> 
> I am also in agreement with 'column : token' in that 'I don't hate it' 
> but I'd like to offer an alternative to this in 'column HAS token'. HAS 
> is currently not a keyword that we use so wouldn't cause any brain 
> conflicts.
> 
> While I don't hate ':' I have a particular dislike of the lucene search 
> syntax because of its terseness and lack of easy readability. 
> 
> Saying that, I'm happy to do with ':' if that is the decision. 
> 
> On Fri, 4 Aug 2023 at 00:23, Jon Haddad  
> wrote:
>> Assuming SAI is a superset of SASI, and we were to set up something so 
>> that SASI indexes auto convert to SAI, this gives even more weight to my 
>> point regarding how differing behavior for the same syntax can lead to 
>> issues.  Imo the best case scenario results in the user not even 
>> noticing their indexes have changed.
>> 
>> An (maybe better?) alternative is to add a flag to the index 
>> configuration for "compatibility mod", which might address the concerns 
>> around using an equality operator when it actually is a partial match.
>> 
>> For what 

Re: Tokenization and SAI query syntax

2023-08-07 Thread Caleb Rackliffe
> I do not think we should start using lucene syntax for it, it will make
people think they can do everything else lucene allows.

I'm sure we won't be supporting everything Lucene allows, but this is going
to evolve. Right off the bat, if you introduce support for tokenization and
filtering, someone is, for example, going to ask for phrase queries. ("John
Smith landed in Virginia" is tokenized, but someone wants to match exactly
on "John Smith".) The whole point of the Vector project is to do relevance,
right? Are we going to do term boosting? Do we need queries like "field:
quick brown +fox -news" where fox must be present, news cannot be present,
and quick and brown increase relevance?

SASI uses "=" and "LIKE" in a way that assumes the user understands the
tokenization scheme in use on the target field. I understand that's a bit
ambiguous.

If we object to allowing expr embedding of a subset of the Lucene syntax, I
can't imagine we're okay w/ then jamming a subset of that syntax into the
main CQL grammar.

If we want to do this in non-expr CQL space, I think using functions
(ignoring the implementation complexity) at least removes ambiguity.
"token_match", "phrase_match", "token_like", "=", and "LIKE" would all be
pretty clear, although there may be other problems. For instance, what
happens when I try to use "token_match" on an indexed field whose analyzer
does not tokenize? We obviously can't use the index, so we'd be reduced to
requiring a filtering query, but maybe that's fine. My point is that, if
we're going to make write and read analyzers symmetrical, there's really no
way to make the semantics of our queries totally independent of analysis.
(ex. "field : foo bar" behaves differently w/ read tokenization than it
does without. It could even be an OR or AND query w/ tokenization,
depending on our defaults.)

On Mon, Aug 7, 2023 at 12:55 PM Atri Sharma  wrote:

> Why not start with SQLish operators supported by many databases (LIKE and
> CONTAINS)?
>
> On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan 
> wrote:
>
>> I am also -1 on directly exposing lucene like syntax here. Besides being
>> ugly, SAI is not lucene, I do not think we should start using lucene syntax
>> for it, it will make people think they can do everything else lucene allows.
>>
>> On Aug 7, 2023, at 5:13 AM, Benedict  wrote:
>>
>> 
>> I’m strongly opposed to :
>>
>> It is very dissimilar to our current operators. CQL is already not the
>> prettiest language, but let’s not make it a total mish mash.
>>
>>
>>
>> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
>>
>> 
>> I am also in agreement with 'column : token' in that 'I don't hate it'
>> but I'd like to offer an alternative to this in 'column HAS token'. HAS is
>> currently not a keyword that we use so wouldn't cause any brain conflicts.
>>
>> While I don't hate ':' I have a particular dislike of the lucene search
>> syntax because of its terseness and lack of easy readability.
>>
>> Saying that, I'm happy to do with ':' if that is the decision.
>>
>> On Fri, 4 Aug 2023 at 00:23, Jon Haddad 
>> wrote:
>>
>>> Assuming SAI is a superset of SASI, and we were to set up something so
>>> that SASI indexes auto convert to SAI, this gives even more weight to my
>>> point regarding how differing behavior for the same syntax can lead to
>>> issues.  Imo the best case scenario results in the user not even noticing
>>> their indexes have changed.
>>>
>>> An (maybe better?) alternative is to add a flag to the index
>>> configuration for "compatibility mod", which might address the concerns
>>> around using an equality operator when it actually is a partial match.
>>>
>>> For what it's worth, I'm in agreement that = should mean full equality
>>> and not token match.
>>>
>>> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
>>> > For what it's worth, I'd very much like to completely remove SASI from
>>> the
>>> > codebase for 6.0. The only remaining functionality gaps at the moment
>>> are
>>> > LIKE (prefix/suffix) queries and its limited tokenization
>>> > capabilities, both of which already have SAI Phase 2 Jiras.
>>> >
>>> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
>>> > wrote:
>>> >
>>> > > SASI just uses “=“ for the tokenized equality matching, which is the
>>> exact
>>> > > thing this discussion is about changing/not liking.
>>> > >
>>> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan <
>>> jeremiah.jor...@gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > I do not think LIKE actually applies here. LIKE is used for
>>> prefix,
>>> > > contains, or suffix searches in SASI depending on the index type.
>>> > > >
>>> > > > This is about exact matching of tokens.
>>> > > >
>>> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad <
>>> rustyrazorbl...@apache.org>
>>> > > wrote:
>>> > > >>
>>> > > >> Certain bits of functionality also already exist on the SASI
>>> side of
>>> > > things, but I'm not sure how much overlap there is.  Currently,
>>> there's a
>>> > > LIKE keyword that handles token 

Re: Tokenization and SAI query syntax

2023-08-07 Thread Atri Sharma
Why not start with SQLish operators supported by many databases (LIKE and
CONTAINS)?

On Mon, Aug 7, 2023 at 10:01 PM J. D. Jordan 
wrote:

> I am also -1 on directly exposing lucene like syntax here. Besides being
> ugly, SAI is not lucene, I do not think we should start using lucene syntax
> for it, it will make people think they can do everything else lucene allows.
>
> On Aug 7, 2023, at 5:13 AM, Benedict  wrote:
>
> 
> I’m strongly opposed to :
>
> It is very dissimilar to our current operators. CQL is already not the
> prettiest language, but let’s not make it a total mish mash.
>
>
>
> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
>
> 
> I am also in agreement with 'column : token' in that 'I don't hate it' but
> I'd like to offer an alternative to this in 'column HAS token'. HAS is
> currently not a keyword that we use so wouldn't cause any brain conflicts.
>
> While I don't hate ':' I have a particular dislike of the lucene search
> syntax because of its terseness and lack of easy readability.
>
> Saying that, I'm happy to do with ':' if that is the decision.
>
> On Fri, 4 Aug 2023 at 00:23, Jon Haddad 
> wrote:
>
>> Assuming SAI is a superset of SASI, and we were to set up something so
>> that SASI indexes auto convert to SAI, this gives even more weight to my
>> point regarding how differing behavior for the same syntax can lead to
>> issues.  Imo the best case scenario results in the user not even noticing
>> their indexes have changed.
>>
>> An (maybe better?) alternative is to add a flag to the index
>> configuration for "compatibility mod", which might address the concerns
>> around using an equality operator when it actually is a partial match.
>>
>> For what it's worth, I'm in agreement that = should mean full equality
>> and not token match.
>>
>> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
>> > For what it's worth, I'd very much like to completely remove SASI from
>> the
>> > codebase for 6.0. The only remaining functionality gaps at the moment
>> are
>> > LIKE (prefix/suffix) queries and its limited tokenization
>> > capabilities, both of which already have SAI Phase 2 Jiras.
>> >
>> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
>> > wrote:
>> >
>> > > SASI just uses “=“ for the tokenized equality matching, which is the
>> exact
>> > > thing this discussion is about changing/not liking.
>> > >
>> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan > >
>> > > wrote:
>> > > >
>> > > > I do not think LIKE actually applies here. LIKE is used for prefix,
>> > > contains, or suffix searches in SASI depending on the index type.
>> > > >
>> > > > This is about exact matching of tokens.
>> > > >
>> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad > >
>> > > wrote:
>> > > >>
>> > > >> Certain bits of functionality also already exist on the SASI side
>> of
>> > > things, but I'm not sure how much overlap there is.  Currently,
>> there's a
>> > > LIKE keyword that handles token matching, although it seems to have
>> some
>> > > differences from the feature set in SAI.
>> > > >>
>> > > >> That said, there seems to be enough of an overlap that it would
>> make
>> > > sense to consider using LIKE in the same manner, doesn't it?  I think
>> it
>> > > would be a little odd if we have different syntax for different
>> indexes.
>> > > >>
>> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
>> > > >>
>> > > >> I think one complication here is that there seems to be a desire,
>> that
>> > > I very much agree with, to expose as much of the underlying
>> flexibility of
>> > > Lucene as much as possible.  If it means we use Caleb's suggestion,
>> I'd ask
>> > > that the queries that SASI and SAI both support use the same syntax,
>> even
>> > > if it means there's two ways of writing the same query.  To use
>> Caleb's
>> > > example, this would mean supporting both LIKE and the `expr` column.
>> > > >>
>> > > >> Jon
>> > > >>
>> > >  On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
>> > > >>> Here are some additional bits of prior art, if anyone finds them
>> > > useful:
>> > > >>>
>> > > >>>
>> > > >>> The Stratio Lucene Index -
>> > > >>> https://github.com/Stratio/cassandra-lucene-index#examples
>> > > >>>
>> > > >>> Stratio was the reason C* added the "expr" functionality. They
>> embedded
>> > > >>> something similar to ElasticSearch JSON, which probably isn't my
>> > > favorite
>> > > >>> choice, but it's there.
>> > > >>>
>> > > >>>
>> > > >>> The ElasticSearch match query syntax -
>> > > >>>
>> > >
>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
>> > > >>>
>> > > >>> Again, not my favorite. It's verbose, and probably too powerful
>> for us.
>> > > >>>
>> > > >>>
>> > > >>> ElasticSearch's documentation for the basic Lucene query syntax -
>> > > >>>
>> > >
>> 

Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-08-07 Thread Josh McKenzie
Merge path for bugs on 3.0 is pretty brutal at this point. Good thing 2 will 
drop off when we GA 5.0.

Updated wiki w/new branches plus some examples: link 


On Mon, Aug 7, 2023, at 11:18 AM, Mick Semb Wever wrote:
> 
> Forward merging cassandra-4.1 … cassandra-5.0 … trunk is now required ! 
> 
> trunk is still got 5.0 in the build.xml, but that's only temporary until 
> 18705 lands, and of no harm i believe… (i could easily be wrong, but not 
> AFAIK)
> 
> 
> On Mon, 7 Aug 2023 at 13:38, Brandon Williams  wrote:
>> Is this intended to be used now and change the merge order?  I ask
>> because 18705 mentions bumping build.xml and CHANGES.txt amongst
>> others that haven't been done which is leading to confusion.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Sat, Aug 5, 2023 at 4:46 PM Mick Semb Wever  wrote:
>> >
>> >
>> > With no objections, and everything folk mentioned above in, the 
>> > cassandra-5.0 branch is cut.
>> >
>> > Next steps are bumping trunk to 5.1 and then cutting a 5.0-alpha1
>> >
>> > The bumping to 5.1 has a few steps involved in it, but the initial in-tree 
>> > PRs are ready for review, with CI being run, see CASSANDRA-18705
>> >
>> >
>> >
>> > On Sat, 29 Jul 2023 at 00:00, Brandon Williams  wrote:
>> >>
>> >> +1 to everything stated here.
>> >>
>> >> Kind Regards,
>> >> Brandon
>> >>
>> >> On Wed, Jul 26, 2023 at 5:28 PM Mick Semb Wever  wrote:
>> >> >
>> >> >
>> >> > The previous thread¹ on when to freeze 5.0 landed on freezing the first 
>> >> > week of August, with a waiver in place for TCM and Accord to land later 
>> >> > (but before October).
>> >> >
>> >> > With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 
>> >> > work that hasn't landed is Vector search (CEP-30).
>> >> >
>> >> > Are there any objections to a waiver on Vector search?  All the 
>> >> > groundwork: SAI and the vector type; has been merged, with all 
>> >> > remaining work expected to land in August.
>> >> >
>> >> > I'm keen to freeze and see us shift gears – there's already SO MUCH in 
>> >> > 5.0 and a long list of flakies.  It takes time and patience to triage 
>> >> > and identify the bugs that hit us before GA.  The freeze is about being 
>> >> > "mostly feature complete",  so we have room for things before our first 
>> >> > beta (precedence is to ask).   If we hope for a GA by December, account 
>> >> > for the 6 weeks turnaround time for cutting and voting on one alpha, 
>> >> > one beta, and one rc release, and the quiet period that August is, we 
>> >> > really only have September and October left.
>> >> >
>> >> > I already feel this is asking a bit of a miracle from us given how 4.1 
>> >> > went (and I'm hoping I will be proven wrong).
>> >> >
>> >> > In addition, are there any objections to cutting an 5.0-alpha1 release 
>> >> > as soon as we freeze?
>> >> >
>> >> > This is on the understanding vector, tcm and accord will become 
>> >> > available in later alphas.  Originally the discussion¹ was waiting for 
>> >> > Accord for alpha1, but a number of folk off-list have requested earlier 
>> >> > alphas to help with testing.
>> >> >
>> >> >
>> >> > ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: Tokenization and SAI query syntax

2023-08-07 Thread J. D. Jordan
I am also -1 on directly exposing lucene like syntax here. Besides being ugly, SAI is not lucene, I do not think we should start using lucene syntax for it, it will make people think they can do everything else lucene allows.On Aug 7, 2023, at 5:13 AM, Benedict  wrote:I’m strongly opposed to : It is very dissimilar to our current operators. CQL is already not the prettiest language, but let’s not make it a total mish mash.On 7 Aug 2023, at 10:59, Mike Adamson  wrote:I am also in agreement with 'column : token' in that 'I don't hate it' but I'd like to offer an alternative to this in 'column HAS token'. HAS is currently not a keyword that we use so wouldn't cause any brain conflicts.While I don't hate ':' I have a particular dislike of the lucene search syntax because of its terseness and lack of easy readability. Saying that, I'm happy to do with ':' if that is the decision. On Fri, 4 Aug 2023 at 00:23, Jon Haddad  wrote:Assuming SAI is a superset of SASI, and we were to set up something so that SASI indexes auto convert to SAI, this gives even more weight to my point regarding how differing behavior for the same syntax can lead to issues.  Imo the best case scenario results in the user not even noticing their indexes have changed.

An (maybe better?) alternative is to add a flag to the index configuration for "compatibility mod", which might address the concerns around using an equality operator when it actually is a partial match.

For what it's worth, I'm in agreement that = should mean full equality and not token match.

On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
> For what it's worth, I'd very much like to completely remove SASI from the
> codebase for 6.0. The only remaining functionality gaps at the moment are
> LIKE (prefix/suffix) queries and its limited tokenization
> capabilities, both of which already have SAI Phase 2 Jiras.
> 
> On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
> wrote:
> 
> > SASI just uses “=“ for the tokenized equality matching, which is the exact
> > thing this discussion is about changing/not liking.
> >
> > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
> > wrote:
> > >
> > > I do not think LIKE actually applies here. LIKE is used for prefix,
> > contains, or suffix searches in SASI depending on the index type.
> > >
> > > This is about exact matching of tokens.
> > >
> > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
> > wrote:
> > >>
> > >> Certain bits of functionality also already exist on the SASI side of
> > things, but I'm not sure how much overlap there is.  Currently, there's a
> > LIKE keyword that handles token matching, although it seems to have some
> > differences from the feature set in SAI.
> > >>
> > >> That said, there seems to be enough of an overlap that it would make
> > sense to consider using LIKE in the same manner, doesn't it?  I think it
> > would be a little odd if we have different syntax for different indexes.
> > >>
> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > >>
> > >> I think one complication here is that there seems to be a desire, that
> > I very much agree with, to expose as much of the underlying flexibility of
> > Lucene as much as possible.  If it means we use Caleb's suggestion, I'd ask
> > that the queries that SASI and SAI both support use the same syntax, even
> > if it means there's two ways of writing the same query.  To use Caleb's
> > example, this would mean supporting both LIKE and the `expr` column.
> > >>
> > >> Jon
> > >>
> >  On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> > >>> Here are some additional bits of prior art, if anyone finds them
> > useful:
> > >>>
> > >>>
> > >>> The Stratio Lucene Index -
> > >>> https://github.com/Stratio/cassandra-lucene-index#examples
> > >>>
> > >>> Stratio was the reason C* added the "expr" functionality. They embedded
> > >>> something similar to ElasticSearch JSON, which probably isn't my
> > favorite
> > >>> choice, but it's there.
> > >>>
> > >>>
> > >>> The ElasticSearch match query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
> > >>>
> > >>> Again, not my favorite. It's verbose, and probably too powerful for us.
> > >>>
> > >>>
> > >>> ElasticSearch's documentation for the basic Lucene query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
> > >>>
> > >>> One idea is to take the basic Lucene index, which it seems we already
> > have
> > >>> some support for, and feed it to "expr". This is nice for two reasons:
> > >>>
> > 

Re: Tokenization and SAI query syntax

2023-08-07 Thread Caleb Rackliffe
@Benedict I'm not particularly keen to try to graft the Lucene syntax into
CQL itself, to be clear. What I'm proposing is more along the lines of
allowing that syntax via "expr" and leaving that Lucene systems would call
"filters" in predicates currently expressible by CQL.

On Mon, Aug 7, 2023 at 5:12 AM Benedict  wrote:

> I’m strongly opposed to :
>
> It is very dissimilar to our current operators. CQL is already not the
> prettiest language, but let’s not make it a total mish mash.
>
>
>
> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
>
> 
> I am also in agreement with 'column : token' in that 'I don't hate it' but
> I'd like to offer an alternative to this in 'column HAS token'. HAS is
> currently not a keyword that we use so wouldn't cause any brain conflicts.
>
> While I don't hate ':' I have a particular dislike of the lucene search
> syntax because of its terseness and lack of easy readability.
>
> Saying that, I'm happy to do with ':' if that is the decision.
>
> On Fri, 4 Aug 2023 at 00:23, Jon Haddad 
> wrote:
>
>> Assuming SAI is a superset of SASI, and we were to set up something so
>> that SASI indexes auto convert to SAI, this gives even more weight to my
>> point regarding how differing behavior for the same syntax can lead to
>> issues.  Imo the best case scenario results in the user not even noticing
>> their indexes have changed.
>>
>> An (maybe better?) alternative is to add a flag to the index
>> configuration for "compatibility mod", which might address the concerns
>> around using an equality operator when it actually is a partial match.
>>
>> For what it's worth, I'm in agreement that = should mean full equality
>> and not token match.
>>
>> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
>> > For what it's worth, I'd very much like to completely remove SASI from
>> the
>> > codebase for 6.0. The only remaining functionality gaps at the moment
>> are
>> > LIKE (prefix/suffix) queries and its limited tokenization
>> > capabilities, both of which already have SAI Phase 2 Jiras.
>> >
>> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
>> > wrote:
>> >
>> > > SASI just uses “=“ for the tokenized equality matching, which is the
>> exact
>> > > thing this discussion is about changing/not liking.
>> > >
>> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan > >
>> > > wrote:
>> > > >
>> > > > I do not think LIKE actually applies here. LIKE is used for prefix,
>> > > contains, or suffix searches in SASI depending on the index type.
>> > > >
>> > > > This is about exact matching of tokens.
>> > > >
>> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad > >
>> > > wrote:
>> > > >>
>> > > >> Certain bits of functionality also already exist on the SASI side
>> of
>> > > things, but I'm not sure how much overlap there is.  Currently,
>> there's a
>> > > LIKE keyword that handles token matching, although it seems to have
>> some
>> > > differences from the feature set in SAI.
>> > > >>
>> > > >> That said, there seems to be enough of an overlap that it would
>> make
>> > > sense to consider using LIKE in the same manner, doesn't it?  I think
>> it
>> > > would be a little odd if we have different syntax for different
>> indexes.
>> > > >>
>> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
>> > > >>
>> > > >> I think one complication here is that there seems to be a desire,
>> that
>> > > I very much agree with, to expose as much of the underlying
>> flexibility of
>> > > Lucene as much as possible.  If it means we use Caleb's suggestion,
>> I'd ask
>> > > that the queries that SASI and SAI both support use the same syntax,
>> even
>> > > if it means there's two ways of writing the same query.  To use
>> Caleb's
>> > > example, this would mean supporting both LIKE and the `expr` column.
>> > > >>
>> > > >> Jon
>> > > >>
>> > >  On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
>> > > >>> Here are some additional bits of prior art, if anyone finds them
>> > > useful:
>> > > >>>
>> > > >>>
>> > > >>> The Stratio Lucene Index -
>> > > >>> https://github.com/Stratio/cassandra-lucene-index#examples
>> > > >>>
>> > > >>> Stratio was the reason C* added the "expr" functionality. They
>> embedded
>> > > >>> something similar to ElasticSearch JSON, which probably isn't my
>> > > favorite
>> > > >>> choice, but it's there.
>> > > >>>
>> > > >>>
>> > > >>> The ElasticSearch match query syntax -
>> > > >>>
>> > >
>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
>> > > >>>
>> > > >>> Again, not my favorite. It's verbose, and probably too powerful
>> for us.
>> > > >>>
>> > > >>>
>> > > >>> ElasticSearch's documentation for the basic Lucene query syntax -
>> > > >>>
>> > >
>> 

Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-08-07 Thread Mick Semb Wever
Forward merging cassandra-4.1 … cassandra-5.0 … trunk is now required !

trunk is still got 5.0 in the build.xml, but that's only temporary until
18705 lands, and of no harm i believe… (i could easily be wrong, but not
AFAIK)


On Mon, 7 Aug 2023 at 13:38, Brandon Williams  wrote:

> Is this intended to be used now and change the merge order?  I ask
> because 18705 mentions bumping build.xml and CHANGES.txt amongst
> others that haven't been done which is leading to confusion.
>
> Kind Regards,
> Brandon
>
> On Sat, Aug 5, 2023 at 4:46 PM Mick Semb Wever  wrote:
> >
> >
> > With no objections, and everything folk mentioned above in, the
> cassandra-5.0 branch is cut.
> >
> > Next steps are bumping trunk to 5.1 and then cutting a 5.0-alpha1
> >
> > The bumping to 5.1 has a few steps involved in it, but the initial
> in-tree PRs are ready for review, with CI being run, see CASSANDRA-18705
> >
> >
> >
> > On Sat, 29 Jul 2023 at 00:00, Brandon Williams  wrote:
> >>
> >> +1 to everything stated here.
> >>
> >> Kind Regards,
> >> Brandon
> >>
> >> On Wed, Jul 26, 2023 at 5:28 PM Mick Semb Wever  wrote:
> >> >
> >> >
> >> > The previous thread¹ on when to freeze 5.0 landed on freezing the
> first week of August, with a waiver in place for TCM and Accord to land
> later (but before October).
> >> >
> >> > With JDK8 now dropped and SAI and UCS merged, the only expected 5.0
> work that hasn't landed is Vector search (CEP-30).
> >> >
> >> > Are there any objections to a waiver on Vector search?  All the
> groundwork: SAI and the vector type; has been merged, with all remaining
> work expected to land in August.
> >> >
> >> > I'm keen to freeze and see us shift gears – there's already SO MUCH
> in 5.0 and a long list of flakies.  It takes time and patience to triage
> and identify the bugs that hit us before GA.  The freeze is about being
> "mostly feature complete",  so we have room for things before our first
> beta (precedence is to ask).   If we hope for a GA by December, account for
> the 6 weeks turnaround time for cutting and voting on one alpha, one beta,
> and one rc release, and the quiet period that August is, we really only
> have September and October left.
> >> >
> >> > I already feel this is asking a bit of a miracle from us given how
> 4.1 went (and I'm hoping I will be proven wrong).
> >> >
> >> > In addition, are there any objections to cutting an 5.0-alpha1
> release as soon as we freeze?
> >> >
> >> > This is on the understanding vector, tcm and accord will become
> available in later alphas.  Originally the discussion¹ was waiting for
> Accord for alpha1, but a number of folk off-list have requested earlier
> alphas to help with testing.
> >> >
> >> >
> >> > ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-08-07 Thread Brandon Williams
Is this intended to be used now and change the merge order?  I ask
because 18705 mentions bumping build.xml and CHANGES.txt amongst
others that haven't been done which is leading to confusion.

Kind Regards,
Brandon

On Sat, Aug 5, 2023 at 4:46 PM Mick Semb Wever  wrote:
>
>
> With no objections, and everything folk mentioned above in, the cassandra-5.0 
> branch is cut.
>
> Next steps are bumping trunk to 5.1 and then cutting a 5.0-alpha1
>
> The bumping to 5.1 has a few steps involved in it, but the initial in-tree 
> PRs are ready for review, with CI being run, see CASSANDRA-18705
>
>
>
> On Sat, 29 Jul 2023 at 00:00, Brandon Williams  wrote:
>>
>> +1 to everything stated here.
>>
>> Kind Regards,
>> Brandon
>>
>> On Wed, Jul 26, 2023 at 5:28 PM Mick Semb Wever  wrote:
>> >
>> >
>> > The previous thread¹ on when to freeze 5.0 landed on freezing the first 
>> > week of August, with a waiver in place for TCM and Accord to land later 
>> > (but before October).
>> >
>> > With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work 
>> > that hasn't landed is Vector search (CEP-30).
>> >
>> > Are there any objections to a waiver on Vector search?  All the 
>> > groundwork: SAI and the vector type; has been merged, with all remaining 
>> > work expected to land in August.
>> >
>> > I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0 
>> > and a long list of flakies.  It takes time and patience to triage and 
>> > identify the bugs that hit us before GA.  The freeze is about being 
>> > "mostly feature complete",  so we have room for things before our first 
>> > beta (precedence is to ask).   If we hope for a GA by December, account 
>> > for the 6 weeks turnaround time for cutting and voting on one alpha, one 
>> > beta, and one rc release, and the quiet period that August is, we really 
>> > only have September and October left.
>> >
>> > I already feel this is asking a bit of a miracle from us given how 4.1 
>> > went (and I'm hoping I will be proven wrong).
>> >
>> > In addition, are there any objections to cutting an 5.0-alpha1 release as 
>> > soon as we freeze?
>> >
>> > This is on the understanding vector, tcm and accord will become available 
>> > in later alphas.  Originally the discussion¹ was waiting for Accord for 
>> > alpha1, but a number of folk off-list have requested earlier alphas to 
>> > help with testing.
>> >
>> >
>> > ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3


Re: Tokenization and SAI query syntax

2023-08-07 Thread Benedict
I’m strongly opposed to : It is very dissimilar to our current operators. CQL is already not the prettiest language, but let’s not make it a total mish mash.On 7 Aug 2023, at 10:59, Mike Adamson  wrote:I am also in agreement with 'column : token' in that 'I don't hate it' but I'd like to offer an alternative to this in 'column HAS token'. HAS is currently not a keyword that we use so wouldn't cause any brain conflicts.While I don't hate ':' I have a particular dislike of the lucene search syntax because of its terseness and lack of easy readability. Saying that, I'm happy to do with ':' if that is the decision. On Fri, 4 Aug 2023 at 00:23, Jon Haddad  wrote:Assuming SAI is a superset of SASI, and we were to set up something so that SASI indexes auto convert to SAI, this gives even more weight to my point regarding how differing behavior for the same syntax can lead to issues.  Imo the best case scenario results in the user not even noticing their indexes have changed.

An (maybe better?) alternative is to add a flag to the index configuration for "compatibility mod", which might address the concerns around using an equality operator when it actually is a partial match.

For what it's worth, I'm in agreement that = should mean full equality and not token match.

On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
> For what it's worth, I'd very much like to completely remove SASI from the
> codebase for 6.0. The only remaining functionality gaps at the moment are
> LIKE (prefix/suffix) queries and its limited tokenization
> capabilities, both of which already have SAI Phase 2 Jiras.
> 
> On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
> wrote:
> 
> > SASI just uses “=“ for the tokenized equality matching, which is the exact
> > thing this discussion is about changing/not liking.
> >
> > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
> > wrote:
> > >
> > > I do not think LIKE actually applies here. LIKE is used for prefix,
> > contains, or suffix searches in SASI depending on the index type.
> > >
> > > This is about exact matching of tokens.
> > >
> > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
> > wrote:
> > >>
> > >> Certain bits of functionality also already exist on the SASI side of
> > things, but I'm not sure how much overlap there is.  Currently, there's a
> > LIKE keyword that handles token matching, although it seems to have some
> > differences from the feature set in SAI.
> > >>
> > >> That said, there seems to be enough of an overlap that it would make
> > sense to consider using LIKE in the same manner, doesn't it?  I think it
> > would be a little odd if we have different syntax for different indexes.
> > >>
> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > >>
> > >> I think one complication here is that there seems to be a desire, that
> > I very much agree with, to expose as much of the underlying flexibility of
> > Lucene as much as possible.  If it means we use Caleb's suggestion, I'd ask
> > that the queries that SASI and SAI both support use the same syntax, even
> > if it means there's two ways of writing the same query.  To use Caleb's
> > example, this would mean supporting both LIKE and the `expr` column.
> > >>
> > >> Jon
> > >>
> >  On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> > >>> Here are some additional bits of prior art, if anyone finds them
> > useful:
> > >>>
> > >>>
> > >>> The Stratio Lucene Index -
> > >>> https://github.com/Stratio/cassandra-lucene-index#examples
> > >>>
> > >>> Stratio was the reason C* added the "expr" functionality. They embedded
> > >>> something similar to ElasticSearch JSON, which probably isn't my
> > favorite
> > >>> choice, but it's there.
> > >>>
> > >>>
> > >>> The ElasticSearch match query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
> > >>>
> > >>> Again, not my favorite. It's verbose, and probably too powerful for us.
> > >>>
> > >>>
> > >>> ElasticSearch's documentation for the basic Lucene query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
> > >>>
> > >>> One idea is to take the basic Lucene index, which it seems we already
> > have
> > >>> some support for, and feed it to "expr". This is nice for two reasons:
> > >>>
> > >>> 1.) People can just write Lucene queries if they already know how.
> > >>> 2.) No changes to the grammar.
> > >>>
> > >>> Lucene has distinct concepts of filtering and querying, and this is
> > kind of
> > >>> the latter. I'm not sure how, for example, we would want 

Re: Tokenization and SAI query syntax

2023-08-07 Thread Mike Adamson
I am also in agreement with 'column : token' in that 'I don't hate it' but
I'd like to offer an alternative to this in 'column HAS token'. HAS is
currently not a keyword that we use so wouldn't cause any brain conflicts.

While I don't hate ':' I have a particular dislike of the lucene search
syntax because of its terseness and lack of easy readability.

Saying that, I'm happy to do with ':' if that is the decision.

On Fri, 4 Aug 2023 at 00:23, Jon Haddad  wrote:

> Assuming SAI is a superset of SASI, and we were to set up something so
> that SASI indexes auto convert to SAI, this gives even more weight to my
> point regarding how differing behavior for the same syntax can lead to
> issues.  Imo the best case scenario results in the user not even noticing
> their indexes have changed.
>
> An (maybe better?) alternative is to add a flag to the index configuration
> for "compatibility mod", which might address the concerns around using an
> equality operator when it actually is a partial match.
>
> For what it's worth, I'm in agreement that = should mean full equality and
> not token match.
>
> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
> > For what it's worth, I'd very much like to completely remove SASI from
> the
> > codebase for 6.0. The only remaining functionality gaps at the moment are
> > LIKE (prefix/suffix) queries and its limited tokenization
> > capabilities, both of which already have SAI Phase 2 Jiras.
> >
> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
> > wrote:
> >
> > > SASI just uses “=“ for the tokenized equality matching, which is the
> exact
> > > thing this discussion is about changing/not liking.
> > >
> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
> > > wrote:
> > > >
> > > > I do not think LIKE actually applies here. LIKE is used for prefix,
> > > contains, or suffix searches in SASI depending on the index type.
> > > >
> > > > This is about exact matching of tokens.
> > > >
> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
> > > wrote:
> > > >>
> > > >> Certain bits of functionality also already exist on the SASI side
> of
> > > things, but I'm not sure how much overlap there is.  Currently,
> there's a
> > > LIKE keyword that handles token matching, although it seems to have
> some
> > > differences from the feature set in SAI.
> > > >>
> > > >> That said, there seems to be enough of an overlap that it would make
> > > sense to consider using LIKE in the same manner, doesn't it?  I think
> it
> > > would be a little odd if we have different syntax for different
> indexes.
> > > >>
> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > > >>
> > > >> I think one complication here is that there seems to be a desire,
> that
> > > I very much agree with, to expose as much of the underlying
> flexibility of
> > > Lucene as much as possible.  If it means we use Caleb's suggestion,
> I'd ask
> > > that the queries that SASI and SAI both support use the same syntax,
> even
> > > if it means there's two ways of writing the same query.  To use Caleb's
> > > example, this would mean supporting both LIKE and the `expr` column.
> > > >>
> > > >> Jon
> > > >>
> > >  On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> > > >>> Here are some additional bits of prior art, if anyone finds them
> > > useful:
> > > >>>
> > > >>>
> > > >>> The Stratio Lucene Index -
> > > >>> https://github.com/Stratio/cassandra-lucene-index#examples
> > > >>>
> > > >>> Stratio was the reason C* added the "expr" functionality. They
> embedded
> > > >>> something similar to ElasticSearch JSON, which probably isn't my
> > > favorite
> > > >>> choice, but it's there.
> > > >>>
> > > >>>
> > > >>> The ElasticSearch match query syntax -
> > > >>>
> > >
> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
> > > >>>
> > > >>> Again, not my favorite. It's verbose, and probably too powerful
> for us.
> > > >>>
> > > >>>
> > > >>> ElasticSearch's documentation for the basic Lucene query syntax -
> > > >>>
> > >
> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
> > > >>>
> > > >>> One idea is to take the basic Lucene index, which it seems we
> already
> > > have
> > > >>> some support for, and feed it to "expr". This is nice for two
> reasons:
> > > >>>
> > > >>> 1.) People can just write Lucene queries if they already know how.
> > > >>> 2.) No changes to the grammar.
> > > >>>
> > > >>> Lucene has distinct concepts of filtering and querying, and this is
> > > kind of
> > > >>> the latter. I'm not sure how, for example, we would want "expr" to
> > > interact
> > > >>> w/ filters on other column indexes in vanilla CQL space...

Re: Removal of commitlog_sync_batch_window_in_ms in 5.0

2023-08-07 Thread Miklosovic, Stefan
Since there is no response / nobody seems to see this as an issue, I am going 
to remove it (will be removed in 5.0).


From: Miklosovic, Stefan
Sent: Wednesday, August 2, 2023 21:57
To: dev@cassandra.apache.org
Subject: Removal of commitlog_sync_batch_window_in_ms in 5.0

Hello list,

I want to double check this one (1) on ML.

It is relatively an innocent low-hanger however the caveat is that it might 
potentially break the upgrade to 5.0. The deprecation happened in (2) (in 4.0).

I think it is just eligible for deletion now. This property was commented out 
and it is effectively not used. There is even the comment about this (3).

Other option is to leave it deprecated. While this might work, I think this is 
quite a precedence, isn't it? Are there any other configuration parameters we 
will live with for ever even they are not used? It seems strange to me that we 
would just keep this one deprecated for good. Do we apply this rule to all 
other properties from now on then? I am afraid the config would be bloated a 
little bit after some time ... I think that waiting one major and removing it 
is a good compromise.

(1) https://issues.apache.org/jira/browse/CASSANDRA-17161
(2) https://issues.apache.org/jira/browse/CASSANDRA-13530
(3) https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L545-L546