Re: [DISCUSS] The future of CREATE INDEX

2023-06-20 Thread Caleb Rackliffe
For everyone previously following this, just created
https://issues.apache.org/jira/browse/CASSANDRA-18615 :)

On Fri, May 19, 2023 at 1:34 PM Caleb Rackliffe 
wrote:

> Posted on ASF Slack to see if we can get more responses, but so far the
> leaders seem to be...
>
> [POLL] Centralize existing syntax or create new syntax?
>
> 1.) CREATE INDEX ... USING ... WITH OPTIONS...
>
> (i.e. centralize)
>
> [POLL] Should there be a default? (YES/NO)
>
> Yes
>
> [POLL] What do do with the default?
>
> 3.) and 4.) i.e. YAML options to control default and requirement to
> specify a default
>
> (i.e. w/o changing default in 5.0)
>
> On Thu, May 18, 2023 at 3:33 AM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
>> I don't want to hijack this thread, I just want to say that the point 4)
>> seems to be recurring. I second Caleb in saying that transactional metadata
>> would probably fix this. Because of the problem of not being sure that all
>> config is same, cluster-wide, I basically dropped the effort on CEP-24
>> because different local configurations might compromise the security.
>>
>> ____
>> From: Henrik Ingo 
>> Sent: Wednesday, May 17, 2023 22:32
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] The future of CREATE INDEX
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>> I have read the thread but chose to reply to the top message...
>>
>> I'm coming to this with the background of having worked with MySQL, where
>> both the storage engine and index implementation had many options, and
>> often of course some index types were only available in some engines.
>>
>> I would humbly suggest:
>>
>> 1. What's up with naming anything "legacy". Calling the current index
>> type "2i" seems perfectly fine with me. From what I've heard it can work
>> great for many users?
>>
>> 2. It should be possible to always specify the index type explicitly. In
>> other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
>> (if it isn't already)
>>
>> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
>> Java path.
>>
>> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
>> special... The simplest change IMO is to just make the CUSTOM work optional.
>>
>> 4. Benedict's point that a YAML option is per node is a good one... For
>> example, you wouldn't want some nodes to create a 2i index and other nodes
>> a SAI index for the same index That said, how many other YAML options
>> can you think of that would create total chaos if different nodes actually
>> had different values for them? For example what if a guardrail allowed some
>> action on some nodes but not others?  Maybe what we need is a jira ticket
>> to enforce that certain sections of the config must not differ?
>>
>> 5. That said, the default index type could also be a property of the
>> keyspace
>>
>> 6. MySQL allows the DBA to determine the default engine. This seems to
>> work well. If the user doesn't care, they don't care, if they do, they use
>> the explicit syntax.
>>
>> henrik
>>
>>
>> On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe <
>> calebrackli...@gmail.com<mailto:calebrackli...@gmail.com>> wrote:
>> Earlier today, Mick started a thread on the future of our index creation
>> DDL on Slack:
>>
>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019<
>> https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!YuQzuQkxC0gmD9ofXEGoaEmVMwPwZ_ab8-B_PCfRfNsQtKIZDLOIuw38jnV1Vt8TqHXn-818hL-CoLbVJXBTCWgSxoE$
>> >
>>
>> At the moment, there are two ways to create a secondary index.
>>
>> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>>
>> This creates an optionally named legacy 2i on the provided table and
>> column.
>>
>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>
>> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
>> USING  [WITH OPTIONS = ]
>>
>> This creates a secondary index on the provided table and column using the
>> specified 2i implementation class and (optional) parameters.
>>
>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>> 'StorageAttachedIndex'
>>
>> (Note

Re: [DISCUSS] The future of CREATE INDEX

2023-05-19 Thread Caleb Rackliffe
Posted on ASF Slack to see if we can get more responses, but so far the
leaders seem to be...

[POLL] Centralize existing syntax or create new syntax?

1.) CREATE INDEX ... USING ... WITH OPTIONS...

(i.e. centralize)

[POLL] Should there be a default? (YES/NO)

Yes

[POLL] What do do with the default?

3.) and 4.) i.e. YAML options to control default and requirement to specify
a default

(i.e. w/o changing default in 5.0)

On Thu, May 18, 2023 at 3:33 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I don't want to hijack this thread, I just want to say that the point 4)
> seems to be recurring. I second Caleb in saying that transactional metadata
> would probably fix this. Because of the problem of not being sure that all
> config is same, cluster-wide, I basically dropped the effort on CEP-24
> because different local configurations might compromise the security.
>
> 
> From: Henrik Ingo 
> Sent: Wednesday, May 17, 2023 22:32
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] The future of CREATE INDEX
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> I have read the thread but chose to reply to the top message...
>
> I'm coming to this with the background of having worked with MySQL, where
> both the storage engine and index implementation had many options, and
> often of course some index types were only available in some engines.
>
> I would humbly suggest:
>
> 1. What's up with naming anything "legacy". Calling the current index type
> "2i" seems perfectly fine with me. From what I've heard it can work great
> for many users?
>
> 2. It should be possible to always specify the index type explicitly. In
> other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
> (if it isn't already)
>
> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
> Java path.
>
> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
> special... The simplest change IMO is to just make the CUSTOM work optional.
>
> 4. Benedict's point that a YAML option is per node is a good one... For
> example, you wouldn't want some nodes to create a 2i index and other nodes
> a SAI index for the same index That said, how many other YAML options
> can you think of that would create total chaos if different nodes actually
> had different values for them? For example what if a guardrail allowed some
> action on some nodes but not others?  Maybe what we need is a jira ticket
> to enforce that certain sections of the config must not differ?
>
> 5. That said, the default index type could also be a property of the
> keyspace
>
> 6. MySQL allows the DBA to determine the default engine. This seems to
> work well. If the user doesn't care, they don't care, if they do, they use
> the explicit syntax.
>
> henrik
>
>
> On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe  <mailto:calebrackli...@gmail.com>> wrote:
> Earlier today, Mick started a thread on the future of our index creation
> DDL on Slack:
>
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019<
> https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!YuQzuQkxC0gmD9ofXEGoaEmVMwPwZ_ab8-B_PCfRfNsQtKIZDLOIuw38jnV1Vt8TqHXn-818hL-CoLbVJXBTCWgSxoE$
> >
>
> At the moment, there are two ways to create a secondary index.
>
> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>
> This creates an optionally named legacy 2i on the provided table and
> column.
>
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>
> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING
>  [WITH OPTIONS = ]
>
> This creates a secondary index on the provided table and column using the
> specified 2i implementation class and (optional) parameters.
>
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
> 'StorageAttachedIndex'
>
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
> shorthand for the fully-qualified class name, which is also valid.)
>
> So what is there to discuss?
>
> The concern Mick raised is...
>
> "...just folk continuing to use CREATE INDEX  because they think CREATE
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
> doing 2i (when they think they are, and/or we definitely want them to be,
> using SAI)"
>
> To paraphrase, we want people to use SAI once it's available where
> possible, and the default behavior of CREATE INDEX could be at odds w/ that.
>
&g

Re: [DISCUSS] The future of CREATE INDEX

2023-05-18 Thread Miklosovic, Stefan
I don't want to hijack this thread, I just want to say that the point 4) seems 
to be recurring. I second Caleb in saying that transactional metadata would 
probably fix this. Because of the problem of not being sure that all config is 
same, cluster-wide, I basically dropped the effort on CEP-24 because different 
local configurations might compromise the security.


From: Henrik Ingo 
Sent: Wednesday, May 17, 2023 22:32
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] The future of CREATE INDEX

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I have read the thread but chose to reply to the top message...

I'm coming to this with the background of having worked with MySQL, where both 
the storage engine and index implementation had many options, and often of 
course some index types were only available in some engines.

I would humbly suggest:

1. What's up with naming anything "legacy". Calling the current index type "2i" 
seems perfectly fine with me. From what I've heard it can work great for many 
users?

2. It should be possible to always specify the index type explicitly. In other 
words, it should be possible to CREATE CUSTOM INDEX ... USING "2i" (if it isn't 
already)

2b) It should be possible to just say "SAI" or "SASIIndex", not the full Java 
path.

3. It's a fair point that the "CUSTOM" word may make this sound a bit too 
special... The simplest change IMO is to just make the CUSTOM work optional.

4. Benedict's point that a YAML option is per node is a good one... For 
example, you wouldn't want some nodes to create a 2i index and other nodes a 
SAI index for the same index That said, how many other YAML options can you 
think of that would create total chaos if different nodes actually had 
different values for them? For example what if a guardrail allowed some action 
on some nodes but not others?  Maybe what we need is a jira ticket to enforce 
that certain sections of the config must not differ?

5. That said, the default index type could also be a property of the keyspace

6. MySQL allows the DBA to determine the default engine. This seems to work 
well. If the user doesn't care, they don't care, if they do, they use the 
explicit syntax.

henrik


On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe 
mailto:calebrackli...@gmail.com>> wrote:
Earlier today, Mick started a thread on the future of our index creation DDL on 
Slack:

https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019<https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!YuQzuQkxC0gmD9ofXEGoaEmVMwPwZ_ab8-B_PCfRfNsQtKIZDLOIuw38jnV1Vt8TqHXn-818hL-CoLbVJXBTCWgSxoE$>

At the moment, there are two ways to create a secondary index.

1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()

This creates an optionally named legacy 2i on the provided table and column.

ex. CREATE INDEX my_index ON kd.tbl(my_text_col)

2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
 [WITH OPTIONS = ]

This creates a secondary index on the provided table and column using the 
specified 2i implementation class and (optional) parameters.

ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
'StorageAttachedIndex'

(Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
shorthand for the fully-qualified class name, which is also valid.)

So what is there to discuss?

The concern Mick raised is...

"...just folk continuing to use CREATE INDEX  because they think CREATE CUSTOM 
INDEX is advanced (or just don't know of it), and we leave users doing 2i (when 
they think they are, and/or we definitely want them to be, using SAI)"

To paraphrase, we want people to use SAI once it's available where possible, 
and the default behavior of CREATE INDEX could be at odds w/ that.

The proposal we seem to have landed on is something like the following:

For 5.0:

1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
2.) Leave CREATE CUSTOM INDEX...USING... available by default.

(Note: How this would interact w/ the existing secondary_indexes_enabled YAML 
options isn't clear yet.)

Post-5.0:

1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ 
it.
2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a hybrid 
between the two. For example, CREATE INDEX...USING...WITH. This would both be 
flexible enough to accommodate index implementation selection and prescriptive 
enough to force the user to make a decision (and wouldn't change the legacy 
behavior of the existing CREATE INDEX). In this world, creating a legacy 2i 
might look something like CREATE INDEX...USING `legacy`.
3.) Eventually deprecate CREATE CUSTOM INDEX...USING.

Eventually we would have a single enab

Re: [DISCUSS] The future of CREATE INDEX

2023-05-17 Thread Caleb Rackliffe
> 1. What's up with naming anything "legacy". Calling the current index
type "2i" seems perfectly fine with me. From what I've heard it can work
great for many users?

We can give the existing default secondary index any public-facing name we
like, but "2i" is too broad. It just stands for "secondary index", which is
obviously broad enough to cover anything. The use of "legacy" is
conversational, and it reflects the assertion that SAI should, when at
feature parity, be superior to the existing default 2i implementation for
any workload w/ partition-restricted queries. It will surely be possible to
construct a scenario where SAI's SSTable-attached design, combined with
global scatter/gather queries and a huge number of local/per-node SSTables,
causes it to perform worse than the existing default 2i, which is just an
inverted index implemented as a hidden table w/ search terms as partition
keys.

> 2. It should be possible to always specify the index type explicitly. In
other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
(if it isn't already)

Yes. It should be possible to specify the type no matter what syntax we
use. However, if we started this project from scratch, I don't think we
would build CREATE CUSTOM INDEX in the first place.

> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
Java path.
> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
special... The simplest change IMO is to just make the CUSTOM work optional.

Agreed on both, and 2b (aliasing) is already supported for CREATE CUSTOM
INDEX. (It may be that we should move toward something like a
ServiceLoader-enabled set of named 2i's.)

> 4. Benedict's point that a YAML option is per node is a good one... For
example, you wouldn't want some nodes to create a 2i index and other nodes
a SAI index for the same index That said, how many other YAML options
can you think of that would create total chaos if different nodes actually
had different values for them? For example what if a guardrail allowed some
action on some nodes but not others?  Maybe what we need is a jira ticket
to enforce that certain sections of the config must not differ?

At some point, my guess is that TCM will give us the ability to have
consistent, cluster-wide metadata/configuration. Right now, we have quite a
few YAML options that control cluster-wide behavior including our
prohibition on creating experimental SASI indexes and our option to disable
2i creation. None of the options we've discussed should make it possible
for a single secondary index on a column of a table to have differing local
implementations.

> 6. MySQL allows the DBA to determine the default engine. This seems to
work well. If the user doesn't care, they don't care, if they do, they use
the explicit syntax.

Sounds like option #3 on the 3rd POLL.

On Wed, May 17, 2023 at 3:33 PM Henrik Ingo 
wrote:

> I have read the thread but chose to reply to the top message...
>
> I'm coming to this with the background of having worked with MySQL, where
> both the storage engine and index implementation had many options, and
> often of course some index types were only available in some engines.
>
> I would humbly suggest:
>
> 1. What's up with naming anything "legacy". Calling the current index type
> "2i" seems perfectly fine with me. From what I've heard it can work great
> for many users?
>
> 2. It should be possible to always specify the index type explicitly. In
> other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
> (if it isn't already)
>
> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
> Java path.
>
> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
> special... The simplest change IMO is to just make the CUSTOM work optional.
>
> 4. Benedict's point that a YAML option is per node is a good one... For
> example, you wouldn't want some nodes to create a 2i index and other nodes
> a SAI index for the same index That said, how many other YAML options
> can you think of that would create total chaos if different nodes actually
> had different values for them? For example what if a guardrail allowed some
> action on some nodes but not others?  Maybe what we need is a jira ticket
> to enforce that certain sections of the config must not differ?
>
> 5. That said, the default index type could also be a property of the
> keyspace
>
> 6. MySQL allows the DBA to determine the default engine. This seems to
> work well. If the user doesn't care, they don't care, if they do, they use
> the explicit syntax.
>
> henrik
>
>
> On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe 
> wrote:
>
>> Earlier today, Mick started a thread on the future of our index creation
>> DDL on Slack:
>>
>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>> 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-17 Thread Henrik Ingo
I have read the thread but chose to reply to the top message...

I'm coming to this with the background of having worked with MySQL, where
both the storage engine and index implementation had many options, and
often of course some index types were only available in some engines.

I would humbly suggest:

1. What's up with naming anything "legacy". Calling the current index type
"2i" seems perfectly fine with me. From what I've heard it can work great
for many users?

2. It should be possible to always specify the index type explicitly. In
other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
(if it isn't already)

2b) It should be possible to just say "SAI" or "SASIIndex", not the full
Java path.

3. It's a fair point that the "CUSTOM" word may make this sound a bit too
special... The simplest change IMO is to just make the CUSTOM work optional.

4. Benedict's point that a YAML option is per node is a good one... For
example, you wouldn't want some nodes to create a 2i index and other nodes
a SAI index for the same index That said, how many other YAML options
can you think of that would create total chaos if different nodes actually
had different values for them? For example what if a guardrail allowed some
action on some nodes but not others?  Maybe what we need is a jira ticket
to enforce that certain sections of the config must not differ?

5. That said, the default index type could also be a property of the
keyspace

6. MySQL allows the DBA to determine the default engine. This seems to work
well. If the user doesn't care, they don't care, if they do, they use the
explicit syntax.

henrik


On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe 
wrote:

> Earlier today, Mick started a thread on the future of our index creation
> DDL on Slack:
>
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
> 
>
> At the moment, there are two ways to create a secondary index.
>
> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>
> This creates an optionally named legacy 2i on the provided table and
> column.
>
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>
> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
> USING  [WITH OPTIONS = ]*
>
> This creates a secondary index on the provided table and column using the
> specified 2i implementation class and (optional) parameters.
>
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
> 'StorageAttachedIndex'
>
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
> shorthand for the fully-qualified class name, which is also valid.)
>
> So what is there to discuss?
>
> The concern Mick raised is...
>
> "...just folk continuing to use CREATE INDEX  because they think CREATE
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
> doing 2i (when they think they are, and/or we definitely want them to be,
> using SAI)"
>
> To paraphrase, we want people to use SAI once it's available where
> possible, and the default behavior of CREATE INDEX could be at odds w/
> that.
>
> The proposal we seem to have landed on is something like the following:
>
> For 5.0:
>
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>
> (Note: How this would interact w/ the existing secondary_indexes_enabled
> YAML options isn't clear yet.)
>
> Post-5.0:
>
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity
> w/ it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
> would both be flexible enough to accommodate index implementation selection
> and prescriptive enough to force the user to make a decision (and wouldn't
> change the legacy behavior of the existing CREATE INDEX). In this world,
> creating a legacy 2i might look something like CREATE INDEX...USING
> `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>
> Eventually we would have a single enabled DDL statement for index creation
> that would be minimal but also explicit/able to handle some evolution.
>
> What does everyone think?
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

  
  


Re: [DISCUSS] The future of CREATE INDEX

2023-05-16 Thread Caleb Rackliffe
I might as well weigh in...

[POLL] Centralize existing syntax or create new syntax?

1.) CREATE INDEX ... USING ... WITH OPTIONS...

(I think the more important protection for users WRT local indexes should
come in the form of a guardrail prohibiting scatter/gather queries against
them.)

[POLL] Should there be a default? (YES/NO)

Yes

[POLL] What do do with the default?

3.) and 4.) Allow both configuring a default and disabling the default
concept entirely. (Both of these are creation-time items, just like the
current guardrails we have around 2i creation and SASI disabling.)

(No defaults should change, but administrators can lock this down/change
the default index without much effort.)

On Mon, May 15, 2023 at 10:52 PM guo Maxwell  wrote:

> [POLL] Centralize existing syntax or create new syntax?
>
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>
> and I  think we should keep CREATE CUSTOM INDEX
>
> [POLL] Should there be a default? (YES/NO)
>
>
> of course  YES
>
> [POLL] What do do with the default?
>
>
> 4.) YAML config/guardrail to require index type selection (not required by
> default)
>
>
>
> Jonathan Ellis  于2023年5月16日周二 07:18写道:
>
>> On Fri, May 12, 2023 at 1:39 PM Caleb Rackliffe 
>> wrote:
>>
>>> [POLL] Centralize existing syntax or create new syntax?
>>>
>>
>> 1 (Existing)
>>
>> [POLL] Should there be a default? (YES/NO)
>>>
>>
>> YES
>>
>>
>>> [POLL] What do do with the default?
>>>
>>
>> 1 (Default SAI)
>>
>>
>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread guo Maxwell
>
> [POLL] Centralize existing syntax or create new syntax?


1.) CREATE INDEX ... USING  WITH OPTIONS...

and I  think we should keep CREATE CUSTOM INDEX

[POLL] Should there be a default? (YES/NO)


of course  YES

[POLL] What do do with the default?


4.) YAML config/guardrail to require index type selection (not required by
default)



Jonathan Ellis  于2023年5月16日周二 07:18写道:

> On Fri, May 12, 2023 at 1:39 PM Caleb Rackliffe 
> wrote:
>
>> [POLL] Centralize existing syntax or create new syntax?
>>
>
> 1 (Existing)
>
> [POLL] Should there be a default? (YES/NO)
>>
>
> YES
>
>
>> [POLL] What do do with the default?
>>
>
> 1 (Default SAI)
>
>


-- 
you are the apple of my eye !


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Jonathan Ellis
On Fri, May 12, 2023 at 1:39 PM Caleb Rackliffe 
wrote:

> [POLL] Centralize existing syntax or create new syntax?
>

1 (Existing)

[POLL] Should there be a default? (YES/NO)
>

YES


> [POLL] What do do with the default?
>

1 (Default SAI)


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Dinesh Joshi
> On May 12, 2023, at 11:36 AM, Caleb Rackliffe  
> wrote:
> 
> [POLL] Centralize existing syntax or create new syntax?
> 
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds 
> LOCAL keyword for clarity and separation from future GLOBAL indexes)
> 
> (In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)

2.

> 
> 
> [POLL] Should there be a default? (YES/NO)

Yes.


> [POLL] What do do with the default?
> 
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by 
> default)

1 or 2.

3 and 4 are bad options IMHO.

As a user I expect defaults to remain consistent across installations with the 
same major version. Allowing configurable defaults will change CQL behavior 
based on Cassandra's configuration. This makes things very unpredictable and at 
that point it is better to force the user to explicitly select their index 
implementation.

Imagine a user's surprise where they run the same DDL script to setup a schema 
on two clusters and they end up with a _different_ index because the clusters 
had different defaults. This is not the user experience we should be aiming for.

> 
> On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  > wrote:
>>> 
>>> Given it seems most DBs have a default index (see Postgres, etc.), I tend 
>>> to lean toward having one, but that's me...
>> 
>>  
>> I'm for it too.  Would be nice to enforce the setting is globally uniform to 
>> avoid the per-node problem. Or add a keyspace option. 
>> 
>> For users replaying <5 DDLs this would just require they set the default 
>> index to 2i.
>> This is not a headache, it's a one-off action that can be clearly expressed 
>> in NEWS.
>> It acts as a deprecation warning too.
>> This prevents new uneducated users from creating the unintended index, it 
>> supports existing users, and it does not present SAI as the battle-tested 
>> default.
>> 
>> Agree with the poll, there's a number of different PoVs here already.  I'm 
>> not fond of the LOCAL addition,  I appreciate what it informs, but it's just 
>> not important enough IMHO (folk should be reading up on the index type).



Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread David Capwell
> [POLL] Centralize existing syntax or create new syntax?


1.) CREATE INDEX ... USING  WITH OPTIONS...

> [POLL] Should there be a default? (YES/NO)

Yes

> [POLL] What do do with the default?

3.) YAML config to override default index (legacy 2i remains the default)
4.) YAML config/guardrail to require index type selection (not required by 
default)

For me 3 AND 4.  When no type is given allow a config for the default, and add 
a guardrail to limit what index types are allowed.. if I misunderstood 4, I 
still prefer my option that we should have a allow list of types an operator is 
willing to support

> On May 15, 2023, at 7:39 AM, Patrick McFadin  wrote:
> 
> 1
> Yes
> 4
> 
> 
> 
> On Mon, May 15, 2023 at 3:00 AM Benedict  > wrote:
>> 3: CREATE  INDEX (Otherwise 2)
>> No
>> If configurable, should be a distributed configuration. This is very 
>> different to other local configurations, as the 2i selected has semantic 
>> implications, not just performance (and the perf implications are also much 
>> greater)
>> 
>>> On 15 May 2023, at 10:45, Mike Adamson >> > wrote:
>>> 
>>> 
 [POLL] Centralize existing syntax or create new syntax?
 
 1.) CREATE INDEX ... USING  WITH OPTIONS...
 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds 
 LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>>  
>>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>>> 
 [POLL] Should there be a default? (YES/NO)
>>> 
>>> Yes
>>> 
 [POLL] What do do with the default?
 
 1.) Allow a default, and switch it to SAI (no configurables)
 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
 3.) YAML config to override default index (legacy 2i remains the default)
 4.) YAML config/guardrail to require index type selection (not required by 
 default)
>>> 
>>> 3.) YAML config to override default index (legacy 2i remains the default)
>>> 
>>> 
>>> 
>>> On Mon, 15 May 2023 at 08:54, Mick Semb Wever >> > wrote:
 
 
> [POLL] Centralize existing syntax or create new syntax?
> 
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but 
> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
 
 
 (1) CREATE INDEX …
 
  
> [POLL] Should there be a default? (YES/NO)
 
 
 Yes (but see below).
 
  
> [POLL] What do do with the default?
> 
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required 
> by default)
 
 
 (4) YAML config. Commented out default of 2i.
 
 I agree that the default cannot change in 5.0, but our existing default of 
 2i can be commented out.
 
 For the user this gives them the same feedback, and puts the same 
 requirement to edit one line of yaml, as when we disabled MVs and SASI in 
 4.0
 No one has complained about either of these, which is a clear signal folk 
 understood how to get their existing DDLs to work from 3.x to 4.x
>>> 
>>> 
>>> -- 
>>>  Mike Adamson
>>> Engineering
>>> 
>>> +1 650 389 6000  | datastax.com 
>>> Find DataStax Online:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 



Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Patrick McFadin
1
Yes
4



On Mon, May 15, 2023 at 3:00 AM Benedict  wrote:

> 3: CREATE  INDEX (Otherwise 2)
> No
> If configurable, should be a distributed configuration. This is very
> different to other local configurations, as the 2i selected has semantic
> implications, not just performance (and the perf implications are also much
> greater)
>
> On 15 May 2023, at 10:45, Mike Adamson  wrote:
>
> 
>
>> [POLL] Centralize existing syntax or create new syntax?
>>
>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>
> [POLL] Should there be a default? (YES/NO)
>>
>
> Yes
>
> [POLL] What do do with the default?
>>
>> 1.) Allow a default, and switch it to SAI (no configurables)
>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>> 3.) YAML config to override default index (legacy 2i remains the default)
>> 4.) YAML config/guardrail to require index type selection (not required
>> by default)
>>
>
> 3.) YAML config to override default index (legacy 2i remains the default)
>
>
>
> On Mon, 15 May 2023 at 08:54, Mick Semb Wever  wrote:
>
>>
>>
>> [POLL] Centralize existing syntax or create new syntax?
>>>
>>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>>
>>
>>
>> (1) CREATE INDEX …
>>
>>
>>
>>> [POLL] Should there be a default? (YES/NO)
>>>
>>
>>
>> Yes (but see below).
>>
>>
>>
>>> [POLL] What do do with the default?
>>>
>>> 1.) Allow a default, and switch it to SAI (no configurables)
>>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>>> 3.) YAML config to override default index (legacy 2i remains the default)
>>> 4.) YAML config/guardrail to require index type selection (not required
>>> by default)
>>>
>>
>>
>> (4) YAML config. Commented out default of 2i.
>>
>> I agree that the default cannot change in 5.0, but our existing default
>> of 2i can be commented out.
>>
>> For the user this gives them the same feedback, and puts the same
>> requirement to edit one line of yaml, as when we disabled MVs and SASI in
>> 4.0
>> No one has complained about either of these, which is a clear signal folk
>> understood how to get their existing DDLs to work from 3.x to 4.x
>>
>
>
> --
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
>
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Benedict
3: CREATE  INDEX (Otherwise 2)NoIf configurable, should be a distributed configuration. This is very different to other local configurations, as the 2i selected has semantic implications, not just performance (and the perf implications are also much greater)On 15 May 2023, at 10:45, Mike Adamson  wrote:[POLL] Centralize existing syntax or create new syntax?1.) CREATE INDEX ... USING  WITH OPTIONS...2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds LOCAL keyword for clarity and separation from future GLOBAL indexes) 1.) CREATE INDEX ... USING  WITH OPTIONS...[POLL] Should there be a default? (YES/NO)Yes[POLL] What do do with the default?1.) Allow a default, and switch it to SAI (no configurables)2.) Allow a default, and stay w/ the legacy 2i (no configurables)3.) YAML config to override default index (legacy 2i remains the default)4.) YAML config/guardrail to require index type selection (not required by default)3.) YAML config to override default index (legacy 2i remains the default)On Mon, 15 May 2023 at 08:54, Mick Semb Wever  wrote:[POLL] Centralize existing syntax or create new syntax?1.) CREATE INDEX ... USING  WITH OPTIONS...2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds LOCAL keyword for clarity and separation from future GLOBAL indexes)(1) CREATE INDEX … [POLL] Should there be a default? (YES/NO)Yes (but see below). [POLL] What do do with the default?1.) Allow a default, and switch it to SAI (no configurables)2.) Allow a default, and stay w/ the legacy 2i (no configurables)3.) YAML config to override default index (legacy 2i remains the default)4.) YAML config/guardrail to require index type selection (not required by default)(4) YAML config. Commented out default of 2i.I agree that the default cannot change in 5.0, but our existing default of 2i can be commented out.For the user this gives them the same feedback, and puts the same requirement to edit one line of yaml, as when we disabled MVs and SASI in 4.0No one has complained about either of these, which is a clear signal folk understood how to get their existing DDLs to work from 3.x to 4.x
-- Mike AdamsonEngineering+1 650 389 6000 | datastax.comFind DataStax Online:        


Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Mike Adamson
>
> [POLL] Centralize existing syntax or create new syntax?
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>

1.) CREATE INDEX ... USING  WITH OPTIONS...

[POLL] Should there be a default? (YES/NO)
>

Yes

[POLL] What do do with the default?
>
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by
> default)
>

3.) YAML config to override default index (legacy 2i remains the default)



On Mon, 15 May 2023 at 08:54, Mick Semb Wever  wrote:

>
>
> [POLL] Centralize existing syntax or create new syntax?
>>
>> 1.) CREATE INDEX ... USING  WITH OPTIONS...
>> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
>> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>>
>
>
> (1) CREATE INDEX …
>
>
>
>> [POLL] Should there be a default? (YES/NO)
>>
>
>
> Yes (but see below).
>
>
>
>> [POLL] What do do with the default?
>>
>> 1.) Allow a default, and switch it to SAI (no configurables)
>> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
>> 3.) YAML config to override default index (legacy 2i remains the default)
>> 4.) YAML config/guardrail to require index type selection (not required
>> by default)
>>
>
>
> (4) YAML config. Commented out default of 2i.
>
> I agree that the default cannot change in 5.0, but our existing default of
> 2i can be commented out.
>
> For the user this gives them the same feedback, and puts the same
> requirement to edit one line of yaml, as when we disabled MVs and SASI in
> 4.0
> No one has complained about either of these, which is a clear signal folk
> understood how to get their existing DDLs to work from 3.x to 4.x
>


-- 
[image: DataStax Logo Square]  *Mike Adamson*
Engineering

+1 650 389 6000 <16503896000> | datastax.com 
Find DataStax Online: [image: LinkedIn Logo]

   [image: Facebook Logo]

   [image: Twitter Logo]    [image: RSS Feed]
   [image: Github Logo]



Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Mick Semb Wever
[POLL] Centralize existing syntax or create new syntax?
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>


(1) CREATE INDEX …



> [POLL] Should there be a default? (YES/NO)
>


Yes (but see below).



> [POLL] What do do with the default?
>
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by
> default)
>


(4) YAML config. Commented out default of 2i.

I agree that the default cannot change in 5.0, but our existing default of
2i can be commented out.

For the user this gives them the same feedback, and puts the same
requirement to edit one line of yaml, as when we disabled MVs and SASI in
4.0
No one has complained about either of these, which is a clear signal folk
understood how to get their existing DDLs to work from 3.x to 4.x


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
I don’t think there’s going to be any real support for doing it in 5.0 anyway at this point.On May 12, 2023, at 1:48 PM, Benedict  wrote:Given we have no data in front of us to make a decision regarding switching defaults, I don’t think it is suitable to include that option in this poll. In fact, until we have sufficient data to discuss that I’m going to put a hard veto on that on technical grounds.On 12 May 2023, at 19:41, Caleb Rackliffe  wrote:...and to clarify, answers should be what you'd like to see for 5.0 specificallyOn Fri, May 12, 2023 at 1:36 PM Caleb Rackliffe  wrote:[POLL] Centralize existing syntax or create new syntax?1.) CREATE INDEX ... USING  WITH OPTIONS...2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds LOCAL keyword for clarity and separation from future GLOBAL indexes)(In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)[POLL] Should there be a default? (YES/NO)[POLL] What do do with the default?1.) Allow a default, and switch it to SAI (no configurables)2.) Allow a default, and stay w/ the legacy 2i (no configurables)3.) YAML config to override default index (legacy 2i remains the default)4.) YAML config/guardrail to require index type selection (not required by default)On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  wrote:Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me... I'm for it too.  Would be nice to enforce the setting is globally uniform to avoid the per-node problem. Or add a keyspace option. For users replaying <5 DDLs this would just require they set the default index to 2i.This is not a headache, it's a one-off action that can be clearly expressed in NEWS.It acts as a deprecation warning too.This prevents new uneducated users from creating the unintended index, it supports existing users, and it does not present SAI as the battle-tested default.Agree with the poll, there's a number of different PoVs here already.  I'm not fond of the LOCAL addition,  I appreciate what it informs, but it's just not important enough IMHO (folk should be reading up on the index type).




Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
Given we have no data in front of us to make a decision regarding switching defaults, I don’t think it is suitable to include that option in this poll. In fact, until we have sufficient data to discuss that I’m going to put a hard veto on that on technical grounds.On 12 May 2023, at 19:41, Caleb Rackliffe  wrote:...and to clarify, answers should be what you'd like to see for 5.0 specificallyOn Fri, May 12, 2023 at 1:36 PM Caleb Rackliffe  wrote:[POLL] Centralize existing syntax or create new syntax?1.) CREATE INDEX ... USING  WITH OPTIONS...2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds LOCAL keyword for clarity and separation from future GLOBAL indexes)(In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)[POLL] Should there be a default? (YES/NO)[POLL] What do do with the default?1.) Allow a default, and switch it to SAI (no configurables)2.) Allow a default, and stay w/ the legacy 2i (no configurables)3.) YAML config to override default index (legacy 2i remains the default)4.) YAML config/guardrail to require index type selection (not required by default)On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  wrote:Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me... I'm for it too.  Would be nice to enforce the setting is globally uniform to avoid the per-node problem. Or add a keyspace option. For users replaying <5 DDLs this would just require they set the default index to 2i.This is not a headache, it's a one-off action that can be clearly expressed in NEWS.It acts as a deprecation warning too.This prevents new uneducated users from creating the unintended index, it supports existing users, and it does not present SAI as the battle-tested default.Agree with the poll, there's a number of different PoVs here already.  I'm not fond of the LOCAL addition,  I appreciate what it informs, but it's just not important enough IMHO (folk should be reading up on the index type).




Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Jeremiah D Jordan
> [POLL] Centralize existing syntax or create new syntax?

1.) CREATE INDEX ... USING  WITH OPTIONS...


> [POLL] Should there be a default? (YES/NO)

YES


> [POLL] What do do with the default?

3.) YAML config to override default index (legacy 2i remains the default)

DESCRIBE should always show the full CREATE INDEX … statement with the index 
specified, such that replaying the output of DESCRIBE will not depend on the 
default settings.  This is what we do right now for CREATE TABLE OPTIONS.  
Things you don’t specify get a default, that default may change between 
releases, DESCRIBE shows the full CREATE TABLE with all OPTIONS listed so 
replaying DESCRIBE does not get any defaults.

I don’t agree with the sentiment that a yaml option overriding CQL is bad.  We 
have tons of local node yaml options that change how a given CQL query can act. 
 All of the guardrails, all of the auth settings, tons of other things that 
should truly be in global configuration, but since we don’t have global 
configuration are in the C* yaml file.  “Make sure you set these options the 
same on every node” is the only thing we have right now.  We shouldn’t be 
limiting what we want to allow configuration of because we don’t have global 
config yet.

-Jeremiah

> On May 12, 2023, at 1:36 PM, Caleb Rackliffe  wrote:
> 
> [POLL] Centralize existing syntax or create new syntax?
> 
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds 
> LOCAL keyword for clarity and separation from future GLOBAL indexes)
> 
> (In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)
> 
> 
> [POLL] Should there be a default? (YES/NO)
> 
> 
> [POLL] What do do with the default?
> 
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by 
> default)
> 
> On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  > wrote:
>>> 
>>> Given it seems most DBs have a default index (see Postgres, etc.), I tend 
>>> to lean toward having one, but that's me...
>> 
>>  
>> I'm for it too.  Would be nice to enforce the setting is globally uniform to 
>> avoid the per-node problem. Or add a keyspace option. 
>> 
>> For users replaying <5 DDLs this would just require they set the default 
>> index to 2i.
>> This is not a headache, it's a one-off action that can be clearly expressed 
>> in NEWS.
>> It acts as a deprecation warning too.
>> This prevents new uneducated users from creating the unintended index, it 
>> supports existing users, and it does not present SAI as the battle-tested 
>> default.
>> 
>> Agree with the poll, there's a number of different PoVs here already.  I'm 
>> not fond of the LOCAL addition,  I appreciate what it informs, but it's just 
>> not important enough IMHO (folk should be reading up on the index type).



Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
...and to clarify, answers should be what you'd like to see for 5.0
specifically

On Fri, May 12, 2023 at 1:36 PM Caleb Rackliffe 
wrote:

> [POLL] Centralize existing syntax or create new syntax?
>
> 1.) CREATE INDEX ... USING  WITH OPTIONS...
> 2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but
> adds LOCAL keyword for clarity and separation from future GLOBAL indexes)
>
> (In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)
>
>
> [POLL] Should there be a default? (YES/NO)
>
>
> [POLL] What do do with the default?
>
> 1.) Allow a default, and switch it to SAI (no configurables)
> 2.) Allow a default, and stay w/ the legacy 2i (no configurables)
> 3.) YAML config to override default index (legacy 2i remains the default)
> 4.) YAML config/guardrail to require index type selection (not required by
> default)
>
> On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  wrote:
>
>>
>>> Given it seems most DBs have a default index (see Postgres, etc.), I
>>> tend to lean toward having one, but that's me...
>>>
>>
>>
>> I'm for it too.  Would be nice to enforce the setting is
>> globally uniform to avoid the per-node problem. Or add a keyspace option.
>>
>> For users replaying <5 DDLs this would just require they set the default
>> index to 2i.
>> This is not a headache, it's a one-off action that can be clearly
>> expressed in NEWS.
>> It acts as a deprecation warning too.
>> This prevents new uneducated users from creating the unintended index,
>> it supports existing users, and it does not present SAI as the
>> battle-tested default.
>>
>> Agree with the poll, there's a number of different PoVs here already.
>> I'm not fond of the LOCAL addition,  I appreciate what it informs, but it's
>> just not important enough IMHO (folk should be reading up on the index
>> type).
>>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
[POLL] Centralize existing syntax or create new syntax?

1.) CREATE INDEX ... USING  WITH OPTIONS...
2.) CREATE LOCAL INDEX ... USING ... WITH OPTIONS...  (same as 1, but adds
LOCAL keyword for clarity and separation from future GLOBAL indexes)

(In both cases, we deprecate w/ client warnings CREATE CUSTOM INDEX)


[POLL] Should there be a default? (YES/NO)


[POLL] What do do with the default?

1.) Allow a default, and switch it to SAI (no configurables)
2.) Allow a default, and stay w/ the legacy 2i (no configurables)
3.) YAML config to override default index (legacy 2i remains the default)
4.) YAML config/guardrail to require index type selection (not required by
default)

On Fri, May 12, 2023 at 12:39 PM Mick Semb Wever  wrote:

>
>> Given it seems most DBs have a default index (see Postgres, etc.), I tend
>> to lean toward having one, but that's me...
>>
>
>
> I'm for it too.  Would be nice to enforce the setting is globally uniform
> to avoid the per-node problem. Or add a keyspace option.
>
> For users replaying <5 DDLs this would just require they set the default
> index to 2i.
> This is not a headache, it's a one-off action that can be clearly
> expressed in NEWS.
> It acts as a deprecation warning too.
> This prevents new uneducated users from creating the unintended index, it
> supports existing users, and it does not present SAI as the battle-tested
>  default.
>
> Agree with the poll, there's a number of different PoVs here already.  I'm
> not fond of the LOCAL addition,  I appreciate what it informs, but it's
> just not important enough IMHO (folk should be reading up on the index
> type).
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
But then we have to reconsider the existing syntax, or do we want LOCAL to be the default?We should be planning our language evolution along with our feature evolution.On 12 May 2023, at 19:28, Caleb Rackliffe  wrote:If at some point in the glorious future we have global indexes, I'm sure we can add GLOBAL to the syntax...sry, working on an ugly poll...On Fri, May 12, 2023 at 1:24 PM Benedict  wrote:If folk should be reading up on the index type, doesn’t that conflict with your support of a default?Should there be different global and local defaults, once we have global indexes, or should we always default to a local index? Or a global one?On 12 May 2023, at 18:39, Mick Semb Wever  wrote:Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me... I'm for it too.  Would be nice to enforce the setting is globally uniform to avoid the per-node problem. Or add a keyspace option. For users replaying <5 DDLs this would just require they set the default index to 2i.This is not a headache, it's a one-off action that can be clearly expressed in NEWS.It acts as a deprecation warning too.This prevents new uneducated users from creating the unintended index, it supports existing users, and it does not present SAI as the battle-tested default.Agree with the poll, there's a number of different PoVs here already.  I'm not fond of the LOCAL addition,  I appreciate what it informs, but it's just not important enough IMHO (folk should be reading up on the index type).



Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
If at some point in the glorious future we have global indexes, I'm sure we
can add GLOBAL to the syntax...sry, working on an ugly poll...

On Fri, May 12, 2023 at 1:24 PM Benedict  wrote:

> If folk should be reading up on the index type, doesn’t that conflict with
> your support of a default?
>
> Should there be different global and local defaults, once we have global
> indexes, or should we always default to a local index? Or a global one?
>
> On 12 May 2023, at 18:39, Mick Semb Wever  wrote:
>
> 
>
>>
>> Given it seems most DBs have a default index (see Postgres, etc.), I tend
>> to lean toward having one, but that's me...
>>
>
>
> I'm for it too.  Would be nice to enforce the setting is globally uniform
> to avoid the per-node problem. Or add a keyspace option.
>
> For users replaying <5 DDLs this would just require they set the default
> index to 2i.
> This is not a headache, it's a one-off action that can be clearly
> expressed in NEWS.
> It acts as a deprecation warning too.
> This prevents new uneducated users from creating the unintended index, it
> supports existing users, and it does not present SAI as the battle-tested
>  default.
>
> Agree with the poll, there's a number of different PoVs here already.  I'm
> not fond of the LOCAL addition,  I appreciate what it informs, but it's
> just not important enough IMHO (folk should be reading up on the index
> type).
>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
If folk should be reading up on the index type, doesn’t that conflict with your 
support of a default?

Should there be different global and local defaults, once we have global 
indexes, or should we always default to a local index? Or a global one?

> On 12 May 2023, at 18:39, Mick Semb Wever  wrote:
> 
> 
>> 
>> Given it seems most DBs have a default index (see Postgres, etc.), I tend to 
>> lean toward having one, but that's me...
> 
>  
> I'm for it too.  Would be nice to enforce the setting is globally uniform to 
> avoid the per-node problem. Or add a keyspace option. 
> 
> For users replaying <5 DDLs this would just require they set the default 
> index to 2i.
> This is not a headache, it's a one-off action that can be clearly expressed 
> in NEWS.
> It acts as a deprecation warning too.
> This prevents new uneducated users from creating the unintended index, it 
> supports existing users, and it does not present SAI as the battle-tested 
> default.
> 
> Agree with the poll, there's a number of different PoVs here already.  I'm 
> not fond of the LOCAL addition,  I appreciate what it informs, but it's just 
> not important enough IMHO (folk should be reading up on the index type).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Mick Semb Wever
>
>
> Given it seems most DBs have a default index (see Postgres, etc.), I tend
> to lean toward having one, but that's me...
>


I'm for it too.  Would be nice to enforce the setting is globally uniform
to avoid the per-node problem. Or add a keyspace option.

For users replaying <5 DDLs this would just require they set the default
index to 2i.
This is not a headache, it's a one-off action that can be clearly expressed
in NEWS.
It acts as a deprecation warning too.
This prevents new uneducated users from creating the unintended index, it
supports existing users, and it does not present SAI as the battle-tested
 default.

Agree with the poll, there's a number of different PoVs here already.  I'm
not fond of the LOCAL addition,  I appreciate what it informs, but it's
just not important enough IMHO (folk should be reading up on the index
type).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
There remains the question of what the new syntax is - whether it augments CREATE INDEX to replace CREATE CUSTOM INDEX or if we introduce new syntax because we think it’s clearer.I can accept settling for modifying CREATE INDEX … USING, but I maintain that CREATE LOCAL  INDEX is betterOn 12 May 2023, at 18:31, Caleb Rackliffe  wrote:Even if we don't want to allow a default, we can keep the same CREATE INDEX syntax in place, and have a guardrail forcing (or not) the selection of an implementation, right? This would be no worse than the YAML option we already have for enabling 2i creation as a whole.On Fri, May 12, 2023 at 12:28 PM Benedict  wrote:I’m not convinced a default index makes any sense, no. The trade-offs in a distributed setting are much more pronounced.Indexes in a local-only RDBMS are much simpler affairs; the trade offs are much more subtle than here. On 12 May 2023, at 18:24, Caleb Rackliffe  wrote:> Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?I think the essential question is whether we want the concept of a default index. If we do, we need to figure that out now. If we don't then a new syntax that forces it becomes interesting.Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me...On Fri, May 12, 2023 at 12:20 PM David Capwell  wrote:I really dislike the idea of the same CQL doing different things based upon a per-node configuration.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. I am cool adding such a config, and also cool keeping CREATE INDEX disabled by default…. But would like to point out that we have many configs that impact CQL and they are almost always local configs…Is CREATE INDEX even allowed?  This is a per node config. Right now you can block globally, enable on a single instance, create the index for your users, then revert the config change on the instance…. All guardrails that define what we can do are per node configs…Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?if we decide before the 5.0 release that we have enough information to change the default (#1), we can change it in a matter of minutes.I am strongly against this… SAI is new for 5.0 so should be disabled by default; else we disrespect the idea that new features are disabled by default.  I am cool with our docs recommending if we do find its better in most cases, but we should not change the default in the same reason it lands in.On May 12, 2023, at 10:10 AM, Caleb Rackliffe  wrote:I don't want to cut over for 5.0 either way. I was more contrasting a configurable cutover in 5.0 vs. a hard cutover later.On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so the answer is still pretty simple? On 12 May 2023, at 18:04, Caleb Rackliffe  wrote:I don't particularly like the YAML solution either, but absent that, we're back to fighting about whether we introduce entirely new syntax or hard cut over to SAI at some point.We already have per-node configuration in the YAML that determines whether or not we can create a 2i at all, right?What if we just do #2 and #3 and punt on everything else?On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally. In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax.On 12 May 2023, at 17:29, Caleb Rackliffe  wrote:We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.Okay, so the proposal for 5.0...1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
Even if we don't want to allow a default, we can keep the same CREATE INDEX
syntax in place, and have a guardrail forcing (or not) the selection of an
implementation, right? This would be no worse than the YAML option we
already have for enabling 2i creation as a whole.

On Fri, May 12, 2023 at 12:28 PM Benedict  wrote:

> I’m not convinced a default index makes any sense, no. The trade-offs in a
> distributed setting are much more pronounced.
>
> Indexes in a local-only RDBMS are much simpler affairs; the trade offs are
> much more subtle than here.
>
> On 12 May 2023, at 18:24, Caleb Rackliffe 
> wrote:
>
> 
> > Now, giving this thread, there is pushback for a config to allow
> default impl to change… but there is 0 pushback for new syntax to make this
> explicit…. So maybe we should [POLL] for what syntax people want?
>
> I think the essential question is whether we want the concept of a default
> index. If we do, we need to figure that out now. If we don't then a new
> syntax that forces it becomes interesting.
>
> Given it seems most DBs have a default index (see Postgres, etc.), I tend
> to lean toward having one, but that's me...
>
> On Fri, May 12, 2023 at 12:20 PM David Capwell  wrote:
>
>> I really dislike the idea of the same CQL doing different things based upon
>> a per-node configuration.
>>
>>
>> I agree with Brandon that changing CQL behaviour like this based on node
>> config is really not ideal.
>>
>>
>> I am cool adding such a config, and also cool keeping CREATE INDEX
>> disabled by default…. But would like to point out that we have many configs
>> that impact CQL and they are almost always local configs…
>>
>> Is CREATE INDEX even allowed?  This is a per node config. Right now you
>> can block globally, enable on a single instance, create the index for your
>> users, then revert the config change on the instance….
>>
>> All guardrails that define what we can do are per node configs…
>>
>> Now, giving this thread, there is pushback for a config to allow default
>> impl to change… but there is 0 pushback for new syntax to make this
>> explicit…. So maybe we should [POLL] for what syntax people want?
>>
>> if we decide before the 5.0 release that we have enough information to
>> change the default (#1), we can change it in a matter of minutes.
>>
>>
>> I am strongly against this… SAI is new for 5.0 so should be disabled by
>> default; else we disrespect the idea that new features are disabled by
>> default.  I am cool with our docs recommending if we do find its better in
>> most cases, but we should not change the default in the same reason it
>> lands in.
>>
>> On May 12, 2023, at 10:10 AM, Caleb Rackliffe 
>> wrote:
>>
>> I don't want to cut over for 5.0 either way. I was more contrasting a
>> configurable cutover in 5.0 vs. a hard cutover later.
>>
>> On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:
>>
>>> If the performance characteristics are as clear cut as you think, then
>>> maybe it will be an easy decision once the evidence is available for
>>> everyone to consider?
>>>
>>> If not, then we probably can’t do the hard cutover and so the answer is
>>> still pretty simple?
>>>
>>> On 12 May 2023, at 18:04, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> I don't particularly like the YAML solution either, but absent that,
>>> we're back to fighting about whether we introduce entirely new syntax or
>>> hard cut over to SAI at some point.
>>>
>>> We already have per-node configuration in the YAML that determines
>>> whether or not we can create a 2i at all, right?
>>>
>>> What if we just do #2 and #3 and punt on everything else?
>>>
>>> On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:
>>>
 A table is not a local concept at all, it has a global primary index -
 that’s the core idea of Cassandra.

 I agree with Brandon that changing CQL behaviour like this based on
 node config is really not ideal. New syntax is by far the simplest and
 safest solution to this IMO. It doesn’t have to use the word LOCAL, but I
 think that’s anyway an improvement, personally.

 In future we will hopefully offer GLOBAL indexes, and IMO it is better
 to reify the distinction in the syntax.

 On 12 May 2023, at 17:29, Caleb Rackliffe 
 wrote:

 
 We don't need to know everything about SAI's performance profile to
 plan and execute some small, reasonable things now for 5.0. I'm going to
 try to summarize the least controversial package of ideas from the
 discussion above. I've left out creating any new syntax. For example, I
 think CREATE LOCAL INDEX, while explicit, is just not necessary. We
 don't use CREATE LOCAL TABLE, although it has the same locality as our
 indexes.

 Okay, so the proposal for 5.0...

 1.) Add a YAML option that specifies a default implementation for CREATE
 INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
 don't have to commit to the absolute superiority 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
I’m not convinced a default index makes any sense, no. The trade-offs in a distributed setting are much more pronounced.Indexes in a local-only RDBMS are much simpler affairs; the trade offs are much more subtle than here. On 12 May 2023, at 18:24, Caleb Rackliffe  wrote:> Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?I think the essential question is whether we want the concept of a default index. If we do, we need to figure that out now. If we don't then a new syntax that forces it becomes interesting.Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me...On Fri, May 12, 2023 at 12:20 PM David Capwell  wrote:I really dislike the idea of the same CQL doing different things based upon a per-node configuration.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. I am cool adding such a config, and also cool keeping CREATE INDEX disabled by default…. But would like to point out that we have many configs that impact CQL and they are almost always local configs…Is CREATE INDEX even allowed?  This is a per node config. Right now you can block globally, enable on a single instance, create the index for your users, then revert the config change on the instance…. All guardrails that define what we can do are per node configs…Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?if we decide before the 5.0 release that we have enough information to change the default (#1), we can change it in a matter of minutes.I am strongly against this… SAI is new for 5.0 so should be disabled by default; else we disrespect the idea that new features are disabled by default.  I am cool with our docs recommending if we do find its better in most cases, but we should not change the default in the same reason it lands in.On May 12, 2023, at 10:10 AM, Caleb Rackliffe  wrote:I don't want to cut over for 5.0 either way. I was more contrasting a configurable cutover in 5.0 vs. a hard cutover later.On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so the answer is still pretty simple? On 12 May 2023, at 18:04, Caleb Rackliffe  wrote:I don't particularly like the YAML solution either, but absent that, we're back to fighting about whether we introduce entirely new syntax or hard cut over to SAI at some point.We already have per-node configuration in the YAML that determines whether or not we can create a 2i at all, right?What if we just do #2 and #3 and punt on everything else?On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally. In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax.On 12 May 2023, at 17:29, Caleb Rackliffe  wrote:We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.Okay, so the proposal for 5.0...1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't have to commit to the absolute superiority of SAI.2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.)3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.Can we live w/ this?I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:if we didn't have copious amounts of (not all 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
> Now, giving this thread, there is pushback for a config to allow default
impl to change… but there is 0 pushback for new syntax to make this
explicit…. So maybe we should [POLL] for what syntax people want?

I think the essential question is whether we want the concept of a default
index. If we do, we need to figure that out now. If we don't then a new
syntax that forces it becomes interesting.

Given it seems most DBs have a default index (see Postgres, etc.), I tend
to lean toward having one, but that's me...

On Fri, May 12, 2023 at 12:20 PM David Capwell  wrote:

> I really dislike the idea of the same CQL doing different things based upon
> a per-node configuration.
>
>
> I agree with Brandon that changing CQL behaviour like this based on node
> config is really not ideal.
>
>
> I am cool adding such a config, and also cool keeping CREATE INDEX
> disabled by default…. But would like to point out that we have many configs
> that impact CQL and they are almost always local configs…
>
> Is CREATE INDEX even allowed?  This is a per node config. Right now you
> can block globally, enable on a single instance, create the index for your
> users, then revert the config change on the instance….
>
> All guardrails that define what we can do are per node configs…
>
> Now, giving this thread, there is pushback for a config to allow default
> impl to change… but there is 0 pushback for new syntax to make this
> explicit…. So maybe we should [POLL] for what syntax people want?
>
> if we decide before the 5.0 release that we have enough information to
> change the default (#1), we can change it in a matter of minutes.
>
>
> I am strongly against this… SAI is new for 5.0 so should be disabled by
> default; else we disrespect the idea that new features are disabled by
> default.  I am cool with our docs recommending if we do find its better in
> most cases, but we should not change the default in the same reason it
> lands in.
>
> On May 12, 2023, at 10:10 AM, Caleb Rackliffe 
> wrote:
>
> I don't want to cut over for 5.0 either way. I was more contrasting a
> configurable cutover in 5.0 vs. a hard cutover later.
>
> On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:
>
>> If the performance characteristics are as clear cut as you think, then
>> maybe it will be an easy decision once the evidence is available for
>> everyone to consider?
>>
>> If not, then we probably can’t do the hard cutover and so the answer is
>> still pretty simple?
>>
>> On 12 May 2023, at 18:04, Caleb Rackliffe 
>> wrote:
>>
>> 
>> I don't particularly like the YAML solution either, but absent that,
>> we're back to fighting about whether we introduce entirely new syntax or
>> hard cut over to SAI at some point.
>>
>> We already have per-node configuration in the YAML that determines
>> whether or not we can create a 2i at all, right?
>>
>> What if we just do #2 and #3 and punt on everything else?
>>
>> On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:
>>
>>> A table is not a local concept at all, it has a global primary index -
>>> that’s the core idea of Cassandra.
>>>
>>> I agree with Brandon that changing CQL behaviour like this based on node
>>> config is really not ideal. New syntax is by far the simplest and safest
>>> solution to this IMO. It doesn’t have to use the word LOCAL, but I think
>>> that’s anyway an improvement, personally.
>>>
>>> In future we will hopefully offer GLOBAL indexes, and IMO it is better
>>> to reify the distinction in the syntax.
>>>
>>> On 12 May 2023, at 17:29, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> We don't need to know everything about SAI's performance profile to plan
>>> and execute some small, reasonable things now for 5.0. I'm going to try to
>>> summarize the least controversial package of ideas from the discussion
>>> above. I've left out creating any new syntax. For example, I think CREATE
>>> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
>>> LOCAL TABLE, although it has the same locality as our indexes.
>>>
>>> Okay, so the proposal for 5.0...
>>>
>>> 1.) Add a YAML option that specifies a default implementation for CREATE
>>> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
>>> don't have to commit to the absolute superiority of SAI.
>>> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go
>>> to market using CREATE CUSTOM INDEX, which feels...not so polished.
>>> (The backend for this already exists w/ CREATE CUSTOM INDEX.)
>>> 3.) Leave in place but deprecate (client warnings could work?) CREATE
>>> CUSTOM INDEX. Support the syntax for the foreseeable future.
>>>
>>> Can we live w/ this?
>>>
>>> I don't think any information about SAI we could possibly acquire before
>>> a 5.0 release would affect the reasonableness of this much.
>>>
>>>
>>> On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:
>>>
 if we didn't have copious amounts of (not all public, I know, working
 on it) evidence


 If that’s the 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
I still prefer introducing CREATE LOCAL INDEX, to help users understand the semantics of the index they’re creating.I think it will in future potentially be quite confusing to be able to create global and local indexes using the same DDL statement.But, depending on appetite, that could plausibly be done in future instead.(I don’t endorse the assumption of a future switch of default)On 12 May 2023, at 18:18, Caleb Rackliffe  wrote:So the weakest version of the plan that actually accomplishes something useful for 5.0:1.) Just leave the CREATE INDEX default alone for now. Hard switch the default after 5.0.2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.)3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.Any objections to that?On Fri, May 12, 2023 at 12:10 PM Caleb Rackliffe  wrote:I don't want to cut over for 5.0 either way. I was more contrasting a configurable cutover in 5.0 vs. a hard cutover later.On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so the answer is still pretty simple? On 12 May 2023, at 18:04, Caleb Rackliffe  wrote:I don't particularly like the YAML solution either, but absent that, we're back to fighting about whether we introduce entirely new syntax or hard cut over to SAI at some point.We already have per-node configuration in the YAML that determines whether or not we can create a 2i at all, right?What if we just do #2 and #3 and punt on everything else?On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally. In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax.On 12 May 2023, at 17:29, Caleb Rackliffe  wrote:We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.Okay, so the proposal for 5.0...1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't have to commit to the absolute superiority of SAI.2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.)3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.Can we live w/ this?I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:if we didn't have copious amounts of (not all public, I know, working on it) evidenceIf that’s the assumption on which this proposal is based, let’s discuss the evidence base first, as given the fundamentally different way they work (almost diametrically opposite), I would want to see a very high quality of evidence to support the claim.I don’t think we can resolve this conversation effectively until this question is settled.On 12 May 2023, at 16:19, Caleb Rackliffe  wrote:> This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.We wouldn't have even advanced it to this point if we didn't have copious amounts of (not all public, I know, working on it) evidence it did for the vast majority of workloads. Having said that, I don't strongly agree that we should make it the default in 5.0, because performance isn't the only concern. (correctness, DDL back-compat, which we've sort of touched w/ the YAML default option, etc.)This conversation is now going in like 3 different directions, or at least 3 different "packages" of ideas, so there isn't even a single thing to vote on. Let me 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread David Capwell
> I really dislike the idea of the same CQL doing different things based upon a 
> per-node configuration.

> I agree with Brandon that changing CQL behaviour like this based on node 
> config is really not ideal. 

I am cool adding such a config, and also cool keeping CREATE INDEX disabled by 
default…. But would like to point out that we have many configs that impact CQL 
and they are almost always local configs…

Is CREATE INDEX even allowed?  This is a per node config. Right now you can 
block globally, enable on a single instance, create the index for your users, 
then revert the config change on the instance…. 

All guardrails that define what we can do are per node configs…

Now, giving this thread, there is pushback for a config to allow default impl 
to change… but there is 0 pushback for new syntax to make this explicit…. So 
maybe we should [POLL] for what syntax people want?

> if we decide before the 5.0 release that we have enough information to change 
> the default (#1), we can change it in a matter of minutes.

I am strongly against this… SAI is new for 5.0 so should be disabled by 
default; else we disrespect the idea that new features are disabled by default. 
 I am cool with our docs recommending if we do find its better in most cases, 
but we should not change the default in the same reason it lands in.

> On May 12, 2023, at 10:10 AM, Caleb Rackliffe  
> wrote:
> 
> I don't want to cut over for 5.0 either way. I was more contrasting a 
> configurable cutover in 5.0 vs. a hard cutover later.
> 
> On Fri, May 12, 2023 at 12:09 PM Benedict  > wrote:
>> If the performance characteristics are as clear cut as you think, then maybe 
>> it will be an easy decision once the evidence is available for everyone to 
>> consider?
>> 
>> If not, then we probably can’t do the hard cutover and so the answer is 
>> still pretty simple? 
>> 
>>> On 12 May 2023, at 18:04, Caleb Rackliffe >> > wrote:
>>> 
>>> 
>>> I don't particularly like the YAML solution either, but absent that, we're 
>>> back to fighting about whether we introduce entirely new syntax or hard cut 
>>> over to SAI at some point.
>>> 
>>> We already have per-node configuration in the YAML that determines whether 
>>> or not we can create a 2i at all, right?
>>> 
>>> What if we just do #2 and #3 and punt on everything else?
>>> 
>>> On Fri, May 12, 2023 at 11:56 AM Benedict >> > wrote:
 A table is not a local concept at all, it has a global primary index - 
 that’s the core idea of Cassandra.
 
 I agree with Brandon that changing CQL behaviour like this based on node 
 config is really not ideal. New syntax is by far the simplest and safest 
 solution to this IMO. It doesn’t have to use the word LOCAL, but I think 
 that’s anyway an improvement, personally. 
 
 In future we will hopefully offer GLOBAL indexes, and IMO it is better to 
 reify the distinction in the syntax.
 
> On 12 May 2023, at 17:29, Caleb Rackliffe  > wrote:
> 
> 
> We don't need to know everything about SAI's performance profile to plan 
> and execute some small, reasonable things now for 5.0. I'm going to try 
> to summarize the least controversial package of ideas from the discussion 
> above. I've left out creating any new syntax. For example, I think CREATE 
> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE 
> LOCAL TABLE, although it has the same locality as our indexes.
> 
> Okay, so the proposal for 5.0...
> 
> 1.) Add a YAML option that specifies a default implementation for CREATE 
> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We 
> don't have to commit to the absolute superiority of SAI.
> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go 
> to market using CREATE CUSTOM INDEX, which feels...not so polished. (The 
> backend for this already exists w/ CREATE CUSTOM INDEX.)
> 3.) Leave in place but deprecate (client warnings could work?) CREATE 
> CUSTOM INDEX. Support the syntax for the foreseeable future.
> 
> Can we live w/ this?
> 
> I don't think any information about SAI we could possibly acquire before 
> a 5.0 release would affect the reasonableness of this much.
> 
> 
> On Fri, May 12, 2023 at 10:54 AM Benedict  > wrote:
>>> if we didn't have copious amounts of (not all public, I know, working 
>>> on it) evidence
>> 
>> If that’s the assumption on which this proposal is based, let’s discuss 
>> the evidence base first, as given the fundamentally different way they 
>> work (almost diametrically opposite), I would want to see a very high 
>> quality of evidence to support the claim.
>> 
>> I don’t think we can resolve this conversation 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
So the weakest version of the plan that actually accomplishes something
useful for 5.0:

1.) Just leave the CREATE INDEX default alone for now. Hard switch the
default after 5.0.
2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to
market using CREATE CUSTOM INDEX, which feels...not so polished. (The
backend for this already exists w/ CREATE CUSTOM INDEX.)
3.) Leave in place but deprecate (client warnings could work?) CREATE
CUSTOM INDEX. Support the syntax for the foreseeable future.

Any objections to that?

On Fri, May 12, 2023 at 12:10 PM Caleb Rackliffe 
wrote:

> I don't want to cut over for 5.0 either way. I was more contrasting a
> configurable cutover in 5.0 vs. a hard cutover later.
>
> On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:
>
>> If the performance characteristics are as clear cut as you think, then
>> maybe it will be an easy decision once the evidence is available for
>> everyone to consider?
>>
>> If not, then we probably can’t do the hard cutover and so the answer is
>> still pretty simple?
>>
>> On 12 May 2023, at 18:04, Caleb Rackliffe 
>> wrote:
>>
>> 
>> I don't particularly like the YAML solution either, but absent that,
>> we're back to fighting about whether we introduce entirely new syntax or
>> hard cut over to SAI at some point.
>>
>> We already have per-node configuration in the YAML that determines
>> whether or not we can create a 2i at all, right?
>>
>> What if we just do #2 and #3 and punt on everything else?
>>
>> On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:
>>
>>> A table is not a local concept at all, it has a global primary index -
>>> that’s the core idea of Cassandra.
>>>
>>> I agree with Brandon that changing CQL behaviour like this based on node
>>> config is really not ideal. New syntax is by far the simplest and safest
>>> solution to this IMO. It doesn’t have to use the word LOCAL, but I think
>>> that’s anyway an improvement, personally.
>>>
>>> In future we will hopefully offer GLOBAL indexes, and IMO it is better
>>> to reify the distinction in the syntax.
>>>
>>> On 12 May 2023, at 17:29, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> We don't need to know everything about SAI's performance profile to plan
>>> and execute some small, reasonable things now for 5.0. I'm going to try to
>>> summarize the least controversial package of ideas from the discussion
>>> above. I've left out creating any new syntax. For example, I think CREATE
>>> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
>>> LOCAL TABLE, although it has the same locality as our indexes.
>>>
>>> Okay, so the proposal for 5.0...
>>>
>>> 1.) Add a YAML option that specifies a default implementation for CREATE
>>> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
>>> don't have to commit to the absolute superiority of SAI.
>>> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go
>>> to market using CREATE CUSTOM INDEX, which feels...not so polished.
>>> (The backend for this already exists w/ CREATE CUSTOM INDEX.)
>>> 3.) Leave in place but deprecate (client warnings could work?) CREATE
>>> CUSTOM INDEX. Support the syntax for the foreseeable future.
>>>
>>> Can we live w/ this?
>>>
>>> I don't think any information about SAI we could possibly acquire before
>>> a 5.0 release would affect the reasonableness of this much.
>>>
>>>
>>> On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:
>>>
 if we didn't have copious amounts of (not all public, I know, working
 on it) evidence


 If that’s the assumption on which this proposal is based, let’s discuss
 the evidence base first, as given the fundamentally different way they work
 (almost diametrically opposite), I would want to see a very high quality of
 evidence to support the claim.

 I don’t think we can resolve this conversation effectively until this
 question is settled.

 On 12 May 2023, at 16:19, Caleb Rackliffe 
 wrote:

 
 > This creates huge headaches for everyone successfully using 2i today
 though, and SAI *is not* guaranteed to perform as well or better - it has a
 very different performance profile.

 We wouldn't have even advanced it to this point if we didn't have
 copious amounts of (not all public, I know, working on it) evidence it did
 for the vast majority of workloads. Having said that, I don't strongly
 agree that we should make it the default in 5.0, because performance isn't
 the only concern. (correctness, DDL back-compat, which we've sort of
 touched w/ the YAML default option, etc.)

 This conversation is now going in like 3 different directions, or at
 least 3 different "packages" of ideas, so there isn't even a single thing
 to vote on. Let me read through again and try to distill into something
 that we might be able to do so with...

 On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
 wrote:

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
I don't want to cut over for 5.0 either way. I was more contrasting a
configurable cutover in 5.0 vs. a hard cutover later.

On Fri, May 12, 2023 at 12:09 PM Benedict  wrote:

> If the performance characteristics are as clear cut as you think, then
> maybe it will be an easy decision once the evidence is available for
> everyone to consider?
>
> If not, then we probably can’t do the hard cutover and so the answer is
> still pretty simple?
>
> On 12 May 2023, at 18:04, Caleb Rackliffe 
> wrote:
>
> 
> I don't particularly like the YAML solution either, but absent that, we're
> back to fighting about whether we introduce entirely new syntax or hard cut
> over to SAI at some point.
>
> We already have per-node configuration in the YAML that determines whether
> or not we can create a 2i at all, right?
>
> What if we just do #2 and #3 and punt on everything else?
>
> On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:
>
>> A table is not a local concept at all, it has a global primary index -
>> that’s the core idea of Cassandra.
>>
>> I agree with Brandon that changing CQL behaviour like this based on node
>> config is really not ideal. New syntax is by far the simplest and safest
>> solution to this IMO. It doesn’t have to use the word LOCAL, but I think
>> that’s anyway an improvement, personally.
>>
>> In future we will hopefully offer GLOBAL indexes, and IMO it is better to
>> reify the distinction in the syntax.
>>
>> On 12 May 2023, at 17:29, Caleb Rackliffe 
>> wrote:
>>
>> 
>> We don't need to know everything about SAI's performance profile to plan
>> and execute some small, reasonable things now for 5.0. I'm going to try to
>> summarize the least controversial package of ideas from the discussion
>> above. I've left out creating any new syntax. For example, I think CREATE
>> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
>> LOCAL TABLE, although it has the same locality as our indexes.
>>
>> Okay, so the proposal for 5.0...
>>
>> 1.) Add a YAML option that specifies a default implementation for CREATE
>> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
>> don't have to commit to the absolute superiority of SAI.
>> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go
>> to market using CREATE CUSTOM INDEX, which feels...not so polished. (The
>> backend for this already exists w/ CREATE CUSTOM INDEX.)
>> 3.) Leave in place but deprecate (client warnings could work?) CREATE
>> CUSTOM INDEX. Support the syntax for the foreseeable future.
>>
>> Can we live w/ this?
>>
>> I don't think any information about SAI we could possibly acquire before
>> a 5.0 release would affect the reasonableness of this much.
>>
>>
>> On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:
>>
>>> if we didn't have copious amounts of (not all public, I know, working on
>>> it) evidence
>>>
>>>
>>> If that’s the assumption on which this proposal is based, let’s discuss
>>> the evidence base first, as given the fundamentally different way they work
>>> (almost diametrically opposite), I would want to see a very high quality of
>>> evidence to support the claim.
>>>
>>> I don’t think we can resolve this conversation effectively until this
>>> question is settled.
>>>
>>> On 12 May 2023, at 16:19, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> > This creates huge headaches for everyone successfully using 2i today
>>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>>> very different performance profile.
>>>
>>> We wouldn't have even advanced it to this point if we didn't have
>>> copious amounts of (not all public, I know, working on it) evidence it did
>>> for the vast majority of workloads. Having said that, I don't strongly
>>> agree that we should make it the default in 5.0, because performance isn't
>>> the only concern. (correctness, DDL back-compat, which we've sort of
>>> touched w/ the YAML default option, etc.)
>>>
>>> This conversation is now going in like 3 different directions, or at
>>> least 3 different "packages" of ideas, so there isn't even a single thing
>>> to vote on. Let me read through again and try to distill into something
>>> that we might be able to do so with...
>>>
>>> On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
>>> wrote:
>>>
 This.

 I would also consider adding CREATE LEGACY INDEX syntax as an alias for
 today’s CREATE INDEX, the latter to be deprecated and (in very distant
 future) removed.

 On 12 May 2023, at 13:14, Benedict  wrote:

 This creates huge headaches for everyone successfully using 2i today
 though, and SAI *is not* guaranteed to perform as well or better - it has a
 very different performance profile.

 I think we should deprecate CREATE INDEX, and introduce new syntax
 CREATE LOCAL INDEX to make clear that this is not a global index, and that
 this should require the USING syntax to avoid this problem in future.

 We should 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so the answer is still pretty simple? On 12 May 2023, at 18:04, Caleb Rackliffe  wrote:I don't particularly like the YAML solution either, but absent that, we're back to fighting about whether we introduce entirely new syntax or hard cut over to SAI at some point.We already have per-node configuration in the YAML that determines whether or not we can create a 2i at all, right?What if we just do #2 and #3 and punt on everything else?On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally. In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax.On 12 May 2023, at 17:29, Caleb Rackliffe  wrote:We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.Okay, so the proposal for 5.0...1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't have to commit to the absolute superiority of SAI.2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.)3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.Can we live w/ this?I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:if we didn't have copious amounts of (not all public, I know, working on it) evidenceIf that’s the assumption on which this proposal is based, let’s discuss the evidence base first, as given the fundamentally different way they work (almost diametrically opposite), I would want to see a very high quality of evidence to support the claim.I don’t think we can resolve this conversation effectively until this question is settled.On 12 May 2023, at 16:19, Caleb Rackliffe  wrote:> This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.We wouldn't have even advanced it to this point if we didn't have copious amounts of (not all public, I know, working on it) evidence it did for the vast majority of workloads. Having said that, I don't strongly agree that we should make it the default in 5.0, because performance isn't the only concern. (correctness, DDL back-compat, which we've sort of touched w/ the YAML default option, etc.)This conversation is now going in like 3 different directions, or at least 3 different "packages" of ideas, so there isn't even a single thing to vote on. Let me read through again and try to distill into something that we might be able to do so with...On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko  wrote:This.I would also consider adding CREATE LEGACY INDEX syntax as an alias for today’s CREATE INDEX, the latter to be deprecated and (in very distant future) removed.On 12 May 2023, at 13:14, Benedict  wrote:This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a global index, and that this should require the USING syntax to avoid this problem in future. We should report warnings to the client when CREATE INDEX is used, indicating it is deprecated.




Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
I don't particularly like the YAML solution either, but absent that, we're
back to fighting about whether we introduce entirely new syntax or hard cut
over to SAI at some point.

We already have per-node configuration in the YAML that determines whether
or not we can create a 2i at all, right?

What if we just do #2 and #3 and punt on everything else?

On Fri, May 12, 2023 at 11:56 AM Benedict  wrote:

> A table is not a local concept at all, it has a global primary index -
> that’s the core idea of Cassandra.
>
> I agree with Brandon that changing CQL behaviour like this based on node
> config is really not ideal. New syntax is by far the simplest and safest
> solution to this IMO. It doesn’t have to use the word LOCAL, but I think
> that’s anyway an improvement, personally.
>
> In future we will hopefully offer GLOBAL indexes, and IMO it is better to
> reify the distinction in the syntax.
>
> On 12 May 2023, at 17:29, Caleb Rackliffe 
> wrote:
>
> 
> We don't need to know everything about SAI's performance profile to plan
> and execute some small, reasonable things now for 5.0. I'm going to try to
> summarize the least controversial package of ideas from the discussion
> above. I've left out creating any new syntax. For example, I think CREATE
> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
> LOCAL TABLE, although it has the same locality as our indexes.
>
> Okay, so the proposal for 5.0...
>
> 1.) Add a YAML option that specifies a default implementation for CREATE
> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
> don't have to commit to the absolute superiority of SAI.
> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go
> to market using CREATE CUSTOM INDEX, which feels...not so polished. (The
> backend for this already exists w/ CREATE CUSTOM INDEX.)
> 3.) Leave in place but deprecate (client warnings could work?) CREATE
> CUSTOM INDEX. Support the syntax for the foreseeable future.
>
> Can we live w/ this?
>
> I don't think any information about SAI we could possibly acquire before a
> 5.0 release would affect the reasonableness of this much.
>
>
> On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:
>
>> if we didn't have copious amounts of (not all public, I know, working on
>> it) evidence
>>
>>
>> If that’s the assumption on which this proposal is based, let’s discuss
>> the evidence base first, as given the fundamentally different way they work
>> (almost diametrically opposite), I would want to see a very high quality of
>> evidence to support the claim.
>>
>> I don’t think we can resolve this conversation effectively until this
>> question is settled.
>>
>> On 12 May 2023, at 16:19, Caleb Rackliffe 
>> wrote:
>>
>> 
>> > This creates huge headaches for everyone successfully using 2i today
>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>> very different performance profile.
>>
>> We wouldn't have even advanced it to this point if we didn't have copious
>> amounts of (not all public, I know, working on it) evidence it did for the
>> vast majority of workloads. Having said that, I don't strongly agree that
>> we should make it the default in 5.0, because performance isn't the only
>> concern. (correctness, DDL back-compat, which we've sort of touched w/ the
>> YAML default option, etc.)
>>
>> This conversation is now going in like 3 different directions, or at
>> least 3 different "packages" of ideas, so there isn't even a single thing
>> to vote on. Let me read through again and try to distill into something
>> that we might be able to do so with...
>>
>> On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
>> wrote:
>>
>>> This.
>>>
>>> I would also consider adding CREATE LEGACY INDEX syntax as an alias for
>>> today’s CREATE INDEX, the latter to be deprecated and (in very distant
>>> future) removed.
>>>
>>> On 12 May 2023, at 13:14, Benedict  wrote:
>>>
>>> This creates huge headaches for everyone successfully using 2i today
>>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>>> very different performance profile.
>>>
>>> I think we should deprecate CREATE INDEX, and introduce new syntax
>>> CREATE LOCAL INDEX to make clear that this is not a global index, and that
>>> this should require the USING syntax to avoid this problem in future.
>>>
>>> We should report warnings to the client when CREATE INDEX is used,
>>> indicating it is deprecated.
>>>
>>>
>>>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally. In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax.On 12 May 2023, at 17:29, Caleb Rackliffe  wrote:We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.Okay, so the proposal for 5.0...1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't have to commit to the absolute superiority of SAI.2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.)3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.Can we live w/ this?I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:if we didn't have copious amounts of (not all public, I know, working on it) evidenceIf that’s the assumption on which this proposal is based, let’s discuss the evidence base first, as given the fundamentally different way they work (almost diametrically opposite), I would want to see a very high quality of evidence to support the claim.I don’t think we can resolve this conversation effectively until this question is settled.On 12 May 2023, at 16:19, Caleb Rackliffe  wrote:> This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.We wouldn't have even advanced it to this point if we didn't have copious amounts of (not all public, I know, working on it) evidence it did for the vast majority of workloads. Having said that, I don't strongly agree that we should make it the default in 5.0, because performance isn't the only concern. (correctness, DDL back-compat, which we've sort of touched w/ the YAML default option, etc.)This conversation is now going in like 3 different directions, or at least 3 different "packages" of ideas, so there isn't even a single thing to vote on. Let me read through again and try to distill into something that we might be able to do so with...On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko  wrote:This.I would also consider adding CREATE LEGACY INDEX syntax as an alias for today’s CREATE INDEX, the latter to be deprecated and (in very distant future) removed.On 12 May 2023, at 13:14, Benedict  wrote:This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a global index, and that this should require the USING syntax to avoid this problem in future. We should report warnings to the client when CREATE INDEX is used, indicating it is deprecated.



Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Brandon Williams
On Fri, May 12, 2023 at 11:29 AM Caleb Rackliffe
 wrote:
>
> Okay, so the proposal for 5.0...
>
> 1.) Add a YAML option that specifies a default implementation for CREATE 
> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't 
> have to commit to the absolute superiority of SAI.

I really dislike the idea of the same CQL doing different things based
upon a per-node configuration.


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
...and if we decide before the 5.0 release that we have enough information
to change the default (#1), we can change it in a matter of minutes.

On Fri, May 12, 2023 at 11:28 AM Caleb Rackliffe 
wrote:

> We don't need to know everything about SAI's performance profile to plan
> and execute some small, reasonable things now for 5.0. I'm going to try to
> summarize the least controversial package of ideas from the discussion
> above. I've left out creating any new syntax. For example, I think CREATE
> LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
> LOCAL TABLE, although it has the same locality as our indexes.
>
> Okay, so the proposal for 5.0...
>
> 1.) Add a YAML option that specifies a default implementation for CREATE
> INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
> don't have to commit to the absolute superiority of SAI.
> 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go
> to market using CREATE CUSTOM INDEX, which feels...not so polished. (The
> backend for this already exists w/ CREATE CUSTOM INDEX.)
> 3.) Leave in place but deprecate (client warnings could work?) CREATE
> CUSTOM INDEX. Support the syntax for the foreseeable future.
>
> Can we live w/ this?
>
> I don't think any information about SAI we could possibly acquire before a
> 5.0 release would affect the reasonableness of this much.
>
>
> On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:
>
>> if we didn't have copious amounts of (not all public, I know, working on
>> it) evidence
>>
>>
>> If that’s the assumption on which this proposal is based, let’s discuss
>> the evidence base first, as given the fundamentally different way they work
>> (almost diametrically opposite), I would want to see a very high quality of
>> evidence to support the claim.
>>
>> I don’t think we can resolve this conversation effectively until this
>> question is settled.
>>
>> On 12 May 2023, at 16:19, Caleb Rackliffe 
>> wrote:
>>
>> 
>> > This creates huge headaches for everyone successfully using 2i today
>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>> very different performance profile.
>>
>> We wouldn't have even advanced it to this point if we didn't have copious
>> amounts of (not all public, I know, working on it) evidence it did for the
>> vast majority of workloads. Having said that, I don't strongly agree that
>> we should make it the default in 5.0, because performance isn't the only
>> concern. (correctness, DDL back-compat, which we've sort of touched w/ the
>> YAML default option, etc.)
>>
>> This conversation is now going in like 3 different directions, or at
>> least 3 different "packages" of ideas, so there isn't even a single thing
>> to vote on. Let me read through again and try to distill into something
>> that we might be able to do so with...
>>
>> On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
>> wrote:
>>
>>> This.
>>>
>>> I would also consider adding CREATE LEGACY INDEX syntax as an alias for
>>> today’s CREATE INDEX, the latter to be deprecated and (in very distant
>>> future) removed.
>>>
>>> On 12 May 2023, at 13:14, Benedict  wrote:
>>>
>>> This creates huge headaches for everyone successfully using 2i today
>>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>>> very different performance profile.
>>>
>>> I think we should deprecate CREATE INDEX, and introduce new syntax
>>> CREATE LOCAL INDEX to make clear that this is not a global index, and that
>>> this should require the USING syntax to avoid this problem in future.
>>>
>>> We should report warnings to the client when CREATE INDEX is used,
>>> indicating it is deprecated.
>>>
>>>
>>>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
We don't need to know everything about SAI's performance profile to plan
and execute some small, reasonable things now for 5.0. I'm going to try to
summarize the least controversial package of ideas from the discussion
above. I've left out creating any new syntax. For example, I think CREATE
LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE
LOCAL TABLE, although it has the same locality as our indexes.

Okay, so the proposal for 5.0...

1.) Add a YAML option that specifies a default implementation for CREATE
INDEX, and make this the legacy 2i for now. No existing DDL breaks. We
don't have to commit to the absolute superiority of SAI.
2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to
market using CREATE CUSTOM INDEX, which feels...not so polished. (The
backend for this already exists w/ CREATE CUSTOM INDEX.)
3.) Leave in place but deprecate (client warnings could work?) CREATE
CUSTOM INDEX. Support the syntax for the foreseeable future.

Can we live w/ this?

I don't think any information about SAI we could possibly acquire before a
5.0 release would affect the reasonableness of this much.


On Fri, May 12, 2023 at 10:54 AM Benedict  wrote:

> if we didn't have copious amounts of (not all public, I know, working on
> it) evidence
>
>
> If that’s the assumption on which this proposal is based, let’s discuss
> the evidence base first, as given the fundamentally different way they work
> (almost diametrically opposite), I would want to see a very high quality of
> evidence to support the claim.
>
> I don’t think we can resolve this conversation effectively until this
> question is settled.
>
> On 12 May 2023, at 16:19, Caleb Rackliffe 
> wrote:
>
> 
> > This creates huge headaches for everyone successfully using 2i today
> though, and SAI *is not* guaranteed to perform as well or better - it has a
> very different performance profile.
>
> We wouldn't have even advanced it to this point if we didn't have copious
> amounts of (not all public, I know, working on it) evidence it did for the
> vast majority of workloads. Having said that, I don't strongly agree that
> we should make it the default in 5.0, because performance isn't the only
> concern. (correctness, DDL back-compat, which we've sort of touched w/ the
> YAML default option, etc.)
>
> This conversation is now going in like 3 different directions, or at least
> 3 different "packages" of ideas, so there isn't even a single thing to vote
> on. Let me read through again and try to distill into something that we
> might be able to do so with...
>
> On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
> wrote:
>
>> This.
>>
>> I would also consider adding CREATE LEGACY INDEX syntax as an alias for
>> today’s CREATE INDEX, the latter to be deprecated and (in very distant
>> future) removed.
>>
>> On 12 May 2023, at 13:14, Benedict  wrote:
>>
>> This creates huge headaches for everyone successfully using 2i today
>> though, and SAI *is not* guaranteed to perform as well or better - it has a
>> very different performance profile.
>>
>> I think we should deprecate CREATE INDEX, and introduce new syntax CREATE
>> LOCAL INDEX to make clear that this is not a global index, and that this
>> should require the USING syntax to avoid this problem in future.
>>
>> We should report warnings to the client when CREATE INDEX is used,
>> indicating it is deprecated.
>>
>>
>>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
if we didn't have copious amounts of (not all public, I know, working on it) evidenceIf that’s the assumption on which this proposal is based, let’s discuss the evidence base first, as given the fundamentally different way they work (almost diametrically opposite), I would want to see a very high quality of evidence to support the claim.I don’t think we can resolve this conversation effectively until this question is settled.On 12 May 2023, at 16:19, Caleb Rackliffe  wrote:> This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.We wouldn't have even advanced it to this point if we didn't have copious amounts of (not all public, I know, working on it) evidence it did for the vast majority of workloads. Having said that, I don't strongly agree that we should make it the default in 5.0, because performance isn't the only concern. (correctness, DDL back-compat, which we've sort of touched w/ the YAML default option, etc.)This conversation is now going in like 3 different directions, or at least 3 different "packages" of ideas, so there isn't even a single thing to vote on. Let me read through again and try to distill into something that we might be able to do so with...On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko  wrote:This.I would also consider adding CREATE LEGACY INDEX syntax as an alias for today’s CREATE INDEX, the latter to be deprecated and (in very distant future) removed.On 12 May 2023, at 13:14, Benedict  wrote:This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a global index, and that this should require the USING syntax to avoid this problem in future. We should report warnings to the client when CREATE INDEX is used, indicating it is deprecated.


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Caleb Rackliffe
> This creates huge headaches for everyone successfully using 2i today
though, and SAI *is not* guaranteed to perform as well or better - it has a
very different performance profile.

We wouldn't have even advanced it to this point if we didn't have copious
amounts of (not all public, I know, working on it) evidence it did for the
vast majority of workloads. Having said that, I don't strongly agree that
we should make it the default in 5.0, because performance isn't the only
concern. (correctness, DDL back-compat, which we've sort of touched w/ the
YAML default option, etc.)

This conversation is now going in like 3 different directions, or at least
3 different "packages" of ideas, so there isn't even a single thing to vote
on. Let me read through again and try to distill into something that we
might be able to do so with...

On Fri, May 12, 2023 at 7:56 AM Aleksey Yeshchenko 
wrote:

> This.
>
> I would also consider adding CREATE LEGACY INDEX syntax as an alias for
> today’s CREATE INDEX, the latter to be deprecated and (in very distant
> future) removed.
>
> On 12 May 2023, at 13:14, Benedict  wrote:
>
> This creates huge headaches for everyone successfully using 2i today
> though, and SAI *is not* guaranteed to perform as well or better - it has a
> very different performance profile.
>
> I think we should deprecate CREATE INDEX, and introduce new syntax CREATE
> LOCAL INDEX to make clear that this is not a global index, and that this
> should require the USING syntax to avoid this problem in future.
>
> We should report warnings to the client when CREATE INDEX is used,
> indicating it is deprecated.
>
>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Aleksey Yeshchenko
This.

I would also consider adding CREATE LEGACY INDEX syntax as an alias for today’s 
CREATE INDEX, the latter to be deprecated and (in very distant future) removed.

> On 12 May 2023, at 13:14, Benedict  wrote:
> 
> This creates huge headaches for everyone successfully using 2i today though, 
> and SAI *is not* guaranteed to perform as well or better - it has a very 
> different performance profile.
> 
> I think we should deprecate CREATE INDEX, and introduce new syntax CREATE 
> LOCAL INDEX to make clear that this is not a global index, and that this 
> should require the USING syntax to avoid this problem in future. 
> 
> We should report warnings to the client when CREATE INDEX is used, indicating 
> it is deprecated.



Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a global index, and that this should require the USING syntax to avoid this problem in future. We should report warnings to the client when CREATE INDEX is used, indicating it is deprecated.On 12 May 2023, at 13:10, Mick Semb Wever  wrote:On Thu, 11 May 2023 at 05:27, Patrick McFadin  wrote:Having pulled a lot of developers out of the 2i fire,Yes.  I'm keen not to leave 2i as the default once SAI lands. Otherwise I agree with the deprecated first principle, but 2i is just too problematic. Just having no default in 5.0, forcing the user to evaluate which index to use would be an improvement.For example, if the default index in cassandra.yaml option exists but is commented out, that would prevent `CREATE INDEX` from working without specifying a `USING`. Then the yaml documentation would be clear about choices.  I'd be ok with that for 5.0, and then make sai the default in the following release.Note, having the option in cassandra.yaml is problematic, as this is not a per-node setting (AFAIK).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Mick Semb Wever
On Thu, 11 May 2023 at 05:27, Patrick McFadin  wrote:

> Having pulled a lot of developers out of the 2i fire,
>


Yes.  I'm keen not to leave 2i as the default once SAI lands. Otherwise I
agree with the deprecated first principle, but 2i is just too problematic.
Just having no default in 5.0, forcing the user to evaluate which index to
use would be an improvement.

For example, if the default index in cassandra.yaml option exists but is
commented out, that would prevent `CREATE INDEX` from working without
specifying a `USING`. Then the yaml documentation would be clear about
choices.  I'd be ok with that for 5.0, and then make sai the default in the
following release.

Note, having the option in cassandra.yaml is problematic, as this is not a
per-node setting (AFAIK).


Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
There will be a LOT of content around using SAI in 5.0.

CCing marketing ML

On Wed, May 10, 2023 at 8:38 PM Jeff Jirsa  wrote:

> Changes like this always scare me, but the benefits probably outweigh the
> risks. Probably obviously to whoever implements but please make sure if
> this happens is super visible in both NEWS and simultaneously updates the
> to-string / to-cql representation of the schema in cqlsh / drivers /
> snapshots
>
> On Wed, May 10, 2023 at 8:27 PM Patrick McFadin 
> wrote:
>
>> Having pulled a lot of developers out of the 2i fire, I would love it if
>> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
>> seems like the right move for most developers that don't read docs and
>> assume behavior.
>>
>> As much as I hate that 2i would be the configured default, I get it. New
>> feature and this is the right thing for users. Would there be any way to
>> switch 2i to SAI for the same index declaration? That would make for a nice
>> upgrade for users moving to 5 without having to re-create indexes.
>>
>> Patrick
>>
>> On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:
>>
>>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>>> prefer allowing USING...WITH... for CREATE INDEX
>>>
>>>
>>> I have 0 issues with a new syntax to make this more clear
>>>
>>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>>> more or less what my original proposal was above (modulo the configurable
>>> default).
>>>
>>>
>>> I have 0 issues deprecating and producing a ClientWarning recommending
>>> the new syntax, but I would be against removing this syntax later on… it
>>> should be low effort to keep, so breaking a user would not be desirable for
>>> me.
>>>
>>> change only the fact that CREATE INDEX retains a configurable default
>>>
>>>
>>> This option allows users to control this behavior, and allows us to
>>> change the default over time.  For 5.0 I am strongly against SAI being the
>>> default (new features disabled by default), but I wouldn’t have issues in
>>> later versions changing the default once its been out for awhile.
>>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>>
>>> In my mind this is no different from defaulting to BTI in a follow up
>>> release, but if this concern is that the legacy index leaked details such
>>> as index tables, so changing the default would have side effects in the
>>> public domain that users might not expect, then I get it… are there other
>>> concerns?
>>>
>>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
>>> wrote:
>>>
>>> tl;dr If you take my original proposal and change only the fact that CREATE
>>> INDEX retains a configurable default, I think we get to the same place?
>>>
>>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>>
>>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
 I see a broad desire here to have a configurable (YAML) default
 implementation for CREATE INDEX. I'm not strongly opposed to that, as
 the concept of a default index implementation is pretty standard for most
 DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
 still need to either revert to CREATE CUSTOM INDEX or add the
 USING...WITH... extensions to CREATE INDEX to override the default or
 specify parameters, which will be in play once SAI supports basic text
 tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
 pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
 and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
 that's more or less what my original proposal was above (modulo the
 configurable default).

 Thoughts?

 On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:

> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
> We could introduce new syntax that properly appreciates there’s no
> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
> these indexes involve a partition key or scatter gather
>
> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>
> 
> +1 , as we must Improve the image of your own default indexing ability.
>
> and As for *CREATE CUSTOM INDEX *, should we just left as it is and
> we can disable the ability for create SAI through  *CREATE CUSTOM
> INDEX*  in some version after 5.0?
>
> for as I know there may be users using this as a plugin-index
> interface, like https://github.com/Stratio/cassandra-lucene-index
> (though these project may be inactive, But if someone wants 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Jeff Jirsa
Changes like this always scare me, but the benefits probably outweigh the
risks. Probably obviously to whoever implements but please make sure if
this happens is super visible in both NEWS and simultaneously updates the
to-string / to-cql representation of the schema in cqlsh / drivers /
snapshots

On Wed, May 10, 2023 at 8:27 PM Patrick McFadin  wrote:

> Having pulled a lot of developers out of the 2i fire, I would love it if
> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
> seems like the right move for most developers that don't read docs and
> assume behavior.
>
> As much as I hate that 2i would be the configured default, I get it. New
> feature and this is the right thing for users. Would there be any way to
> switch 2i to SAI for the same index declaration? That would make for a nice
> upgrade for users moving to 5 without having to re-create indexes.
>
> Patrick
>
> On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:
>
>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>> prefer allowing USING...WITH... for CREATE INDEX
>>
>>
>> I have 0 issues with a new syntax to make this more clear
>>
>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>> more or less what my original proposal was above (modulo the configurable
>> default).
>>
>>
>> I have 0 issues deprecating and producing a ClientWarning recommending
>> the new syntax, but I would be against removing this syntax later on… it
>> should be low effort to keep, so breaking a user would not be desirable for
>> me.
>>
>> change only the fact that CREATE INDEX retains a configurable default
>>
>>
>> This option allows users to control this behavior, and allows us to
>> change the default over time.  For 5.0 I am strongly against SAI being the
>> default (new features disabled by default), but I wouldn’t have issues in
>> later versions changing the default once its been out for awhile.
>>
>> I’m not convinced by the changing defaults argument here. The
>> characteristics of the two index types are very different, and users with
>> scripts that make indexes today shouldn’t have their behaviour change.
>>
>>
>> In my mind this is no different from defaulting to BTI in a follow up
>> release, but if this concern is that the legacy index leaked details such
>> as index tables, so changing the default would have side effects in the
>> public domain that users might not expect, then I get it… are there other
>> concerns?
>>
>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
>> wrote:
>>
>> tl;dr If you take my original proposal and change only the fact that CREATE
>> INDEX retains a configurable default, I think we get to the same place?
>>
>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>
>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> I see a broad desire here to have a configurable (YAML) default
>>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>>> the concept of a default index implementation is pretty standard for most
>>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>>> still need to either revert to CREATE CUSTOM INDEX or add the
>>> USING...WITH... extensions to CREATE INDEX to override the default or
>>> specify parameters, which will be in play once SAI supports basic text
>>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>>> that's more or less what my original proposal was above (modulo the
>>> configurable default).
>>>
>>> Thoughts?
>>>
>>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>>
 I’m not convinced by the changing defaults argument here. The
 characteristics of the two index types are very different, and users with
 scripts that make indexes today shouldn’t have their behaviour change.

 We could introduce new syntax that properly appreciates there’s no
 default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
 these indexes involve a partition key or scatter gather

 On 10 May 2023, at 06:26, guo Maxwell  wrote:

 
 +1 , as we must Improve the image of your own default indexing ability.

 and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
 can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
 some version after 5.0?

 for as I know there may be users using this as a plugin-index
 interface, like https://github.com/Stratio/cassandra-lucene-index
 (though these project may be inactive, But if someone wants to do something
 similar in the future, we don't have to stop).



 Jonathan Ellis  于2023年5月10日周三 10:01写道:

> +1 for this, especially in the long term.  CREATE INDEX should do the
> right thing for most people 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Patrick McFadin
Having pulled a lot of developers out of the 2i fire, I would love it if
defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
seems like the right move for most developers that don't read docs and
assume behavior.

As much as I hate that 2i would be the configured default, I get it. New
feature and this is the right thing for users. Would there be any way to
switch 2i to SAI for the same index declaration? That would make for a nice
upgrade for users moving to 5 without having to re-create indexes.

Patrick

On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:

> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
> prefer allowing USING...WITH... for CREATE INDEX
>
>
> I have 0 issues with a new syntax to make this more clear
>
> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
> more or less what my original proposal was above (modulo the configurable
> default).
>
>
> I have 0 issues deprecating and producing a ClientWarning recommending the
> new syntax, but I would be against removing this syntax later on… it should
> be low effort to keep, so breaking a user would not be desirable for me.
>
> change only the fact that CREATE INDEX retains a configurable default
>
>
> This option allows users to control this behavior, and allows us to change
> the default over time.  For 5.0 I am strongly against SAI being the default
> (new features disabled by default), but I wouldn’t have issues in later
> versions changing the default once its been out for awhile.
>
> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
>
> In my mind this is no different from defaulting to BTI in a follow up
> release, but if this concern is that the legacy index leaked details such
> as index tables, so changing the default would have side effects in the
> public domain that users might not expect, then I get it… are there other
> concerns?
>
> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
> wrote:
>
> tl;dr If you take my original proposal and change only the fact that CREATE
> INDEX retains a configurable default, I think we get to the same place?
>
> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>
> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe 
> wrote:
>
>> I see a broad desire here to have a configurable (YAML) default
>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>> the concept of a default index implementation is pretty standard for most
>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>> still need to either revert to CREATE CUSTOM INDEX or add the
>> USING...WITH... extensions to CREATE INDEX to override the default or
>> specify parameters, which will be in play once SAI supports basic text
>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>> that's more or less what my original proposal was above (modulo the
>> configurable default).
>>
>> Thoughts?
>>
>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>> We could introduce new syntax that properly appreciates there’s no
>>> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
>>> these indexes involve a partition key or scatter gather
>>>
>>> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>>>
>>> 
>>> +1 , as we must Improve the image of your own default indexing ability.
>>>
>>> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
>>> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
>>> some version after 5.0?
>>>
>>> for as I know there may be users using this as a plugin-index interface,
>>> like https://github.com/Stratio/cassandra-lucene-index (though these
>>> project may be inactive, But if someone wants to do something similar in
>>> the future, we don't have to stop).
>>>
>>>
>>>
>>> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>>>
 +1 for this, especially in the long term.  CREATE INDEX should do the
 right thing for most people without requiring extra ceremony.

 On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
 jeremiah.jor...@gmail.com> wrote:

> If the consensus is that SAI is the right default index, then we
> should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM
> INDEX.
>
>
> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
> wrote:
>
> Earlier today, Mick started a thread on the future of our index
> creation DDL on Slack:
>
> 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread David Capwell
> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd prefer 
> allowing USING...WITH... for CREATE INDEX

I have 0 issues with a new syntax to make this more clear

> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's more or 
> less what my original proposal was above (modulo the configurable default).

I have 0 issues deprecating and producing a ClientWarning recommending the new 
syntax, but I would be against removing this syntax later on… it should be low 
effort to keep, so breaking a user would not be desirable for me.

> change only the fact that CREATE INDEX retains a configurable default


This option allows users to control this behavior, and allows us to change the 
default over time.  For 5.0 I am strongly against SAI being the default (new 
features disabled by default), but I wouldn’t have issues in later versions 
changing the default once its been out for awhile.

> I’m not convinced by the changing defaults argument here. The characteristics 
> of the two index types are very different, and users with scripts that make 
> indexes today shouldn’t have their behaviour change.

In my mind this is no different from defaulting to BTI in a follow up release, 
but if this concern is that the legacy index leaked details such as index 
tables, so changing the default would have side effects in the public domain 
that users might not expect, then I get it… are there other concerns?

> On May 10, 2023, at 9:03 AM, Caleb Rackliffe  wrote:
> 
> tl;dr If you take my original proposal and change only the fact that CREATE 
> INDEX retains a configurable default, I think we get to the same place?
> 
> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
> 
> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe  > wrote:
>> I see a broad desire here to have a configurable (YAML) default 
>> implementation for CREATE INDEX. I'm not strongly opposed to that, as the 
>> concept of a default index implementation is pretty standard for most DBMS 
>> (see Postgres, etc.). However, keep in mind that if we do that, we still 
>> need to either revert to CREATE CUSTOM INDEX or add the USING...WITH... 
>> extensions to CREATE INDEX to override the default or specify parameters, 
>> which will be in play once SAI supports basic text tokenization/filtering. 
>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd prefer 
>> allowing USING...WITH... for CREATE INDEX and just deprecating CREATE CUSTOM 
>> INDEX (at least after 5.0), but that's more or less what my original 
>> proposal was above (modulo the configurable default).
>> 
>> Thoughts?
>> 
>> On Wed, May 10, 2023 at 2:59 AM Benedict > > wrote:
>>> I’m not convinced by the changing defaults argument here. The 
>>> characteristics of the two index types are very different, and users with 
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>> 
>>> We could introduce new syntax that properly appreciates there’s no default 
>>> index, perhaps CREATE LOCAL [type] INDEX? To also make clear that these 
>>> indexes involve a partition key or scatter gather
>>> 
 On 10 May 2023, at 06:26, guo Maxwell >>> > wrote:
 
 
 +1 , as we must Improve the image of your own default indexing ability.
 
 and As for CREATE CUSTOM INDEX , should we just left as it is and we can 
 disable the ability for create SAI through  CREATE CUSTOM INDEX  in some 
 version after 5.0? 
 
 for as I know there may be users using this as a plugin-index interface, 
 like https://github.com/Stratio/cassandra-lucene-index (though these 
 project may be inactive, But if someone wants to do something similar in 
 the future, we don't have to stop).
 
 
 
 Jonathan Ellis mailto:jbel...@gmail.com>> 
 于2023年5月10日周三 10:01写道:
> +1 for this, especially in the long term.  CREATE INDEX should do the 
> right thing for most people without requiring extra ceremony.
> 
> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan 
> mailto:jeremiah.jor...@gmail.com>> wrote:
>> If the consensus is that SAI is the right default index, then we should 
>> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>> 
>> 
>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe >> > wrote:
>>> 
>>> Earlier today, Mick started a thread on the future of our index 
>>> creation DDL on Slack:
>>> 
>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>> 
>>> At the moment, there are two ways to create a secondary index.
>>> 
>>> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>>> 
>>> This creates an optionally named legacy 2i on the provided table and 
>>> column.
>>> 
>>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>> 
>>> 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Caleb Rackliffe
tl;dr If you take my original proposal and change only the fact that CREATE
INDEX retains a configurable default, I think we get to the same place?

(Then it's just a matter of what we do in 5.0 vs. after 5.0...)

On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe 
wrote:

> I see a broad desire here to have a configurable (YAML) default
> implementation for CREATE INDEX. I'm not strongly opposed to that, as the
> concept of a default index implementation is pretty standard for most DBMS
> (see Postgres, etc.). However, keep in mind that if we do that, we still
> need to either revert to CREATE CUSTOM INDEX or add the USING...WITH...
> extensions to CREATE INDEX to override the default or specify parameters,
> which will be in play once SAI supports basic text tokenization/filtering.
> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
> prefer allowing USING...WITH... for CREATE INDEX and just deprecating CREATE
> CUSTOM INDEX (at least after 5.0), but that's more or less what my
> original proposal was above (modulo the configurable default).
>
> Thoughts?
>
> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>
>> I’m not convinced by the changing defaults argument here. The
>> characteristics of the two index types are very different, and users with
>> scripts that make indexes today shouldn’t have their behaviour change.
>>
>> We could introduce new syntax that properly appreciates there’s no
>> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
>> these indexes involve a partition key or scatter gather
>>
>> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>>
>> 
>> +1 , as we must Improve the image of your own default indexing ability.
>>
>> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
>> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
>> some version after 5.0?
>>
>> for as I know there may be users using this as a plugin-index interface,
>> like https://github.com/Stratio/cassandra-lucene-index (though these
>> project may be inactive, But if someone wants to do something similar in
>> the future, we don't have to stop).
>>
>>
>>
>> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>>
>>> +1 for this, especially in the long term.  CREATE INDEX should do the
>>> right thing for most people without requiring extra ceremony.
>>>
>>> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
>>> jeremiah.jor...@gmail.com> wrote:
>>>
 If the consensus is that SAI is the right default index, then we should
 just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.


 On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
 wrote:

 Earlier today, Mick started a thread on the future of our index
 creation DDL on Slack:

 https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019

 At the moment, there are two ways to create a secondary index.

 *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*

 This creates an optionally named legacy 2i on the provided table and
 column.

 ex. CREATE INDEX my_index ON kd.tbl(my_text_col)

 *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
 USING  [WITH OPTIONS = ]*

 This creates a secondary index on the provided table and column using
 the specified 2i implementation class and (optional) parameters.

 ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
 'StorageAttachedIndex'

 (Note that the work on SAI added aliasing, so `StorageAttachedIndex`
 is shorthand for the fully-qualified class name, which is also valid.)

 So what is there to discuss?

 The concern Mick raised is...

 "...just folk continuing to use CREATE INDEX  because they think CREATE
 CUSTOM INDEX is advanced (or just don't know of it), and we leave
 users doing 2i (when they think they are, and/or we definitely want them to
 be, using SAI)"

 To paraphrase, we want people to use SAI once it's available where
 possible, and the default behavior of CREATE INDEX could be at odds w/
 that.

 The proposal we seem to have landed on is something like the following:

 For 5.0:

 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
 2.) Leave CREATE CUSTOM INDEX...USING... available by default.

 (Note: How this would interact w/ the existing
 secondary_indexes_enabled YAML options isn't clear yet.)

 Post-5.0:

 1.) Deprecate and eventually remove SASI when SAI hits full feature
 parity w/ it.
 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of
 a hybrid between the two. For example, CREATE INDEX...USING...WITH.
 This would both be flexible enough to accommodate index implementation
 selection and prescriptive enough to force the user to make a decision (and
 wouldn't change the legacy behavior of the existing CREATE 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Caleb Rackliffe
> We could introduce new syntax that properly appreciates there’s no
default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
these indexes involve a partition key or scatter gather

I think this is something we should handle in guardrails space on the query
side for all indexes. Specifically, we should have the ability to diallow
scatter/gather queries against indexes (and *all* indexes are local rn).
Mentioning this at the DDL level probably isn't necessary.

On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:

> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
> We could introduce new syntax that properly appreciates there’s no default
> index, perhaps CREATE LOCAL [type] INDEX? To also make clear that these
> indexes involve a partition key or scatter gather
>
> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>
> 
> +1 , as we must Improve the image of your own default indexing ability.
>
> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
> some version after 5.0?
>
> for as I know there may be users using this as a plugin-index interface,
> like https://github.com/Stratio/cassandra-lucene-index (though these
> project may be inactive, But if someone wants to do something similar in
> the future, we don't have to stop).
>
>
>
> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>
>> +1 for this, especially in the long term.  CREATE INDEX should do the
>> right thing for most people without requiring extra ceremony.
>>
>> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>
>>> If the consensus is that SAI is the right default index, then we should
>>> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>>>
>>>
>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
>>> wrote:
>>>
>>> Earlier today, Mick started a thread on the future of our index creation
>>> DDL on Slack:
>>>
>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>>
>>> At the moment, there are two ways to create a secondary index.
>>>
>>> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>>>
>>> This creates an optionally named legacy 2i on the provided table and
>>> column.
>>>
>>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>>
>>> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
>>> USING  [WITH OPTIONS = ]*
>>>
>>> This creates a secondary index on the provided table and column using
>>> the specified 2i implementation class and (optional) parameters.
>>>
>>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>>> 'StorageAttachedIndex'
>>>
>>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
>>> shorthand for the fully-qualified class name, which is also valid.)
>>>
>>> So what is there to discuss?
>>>
>>> The concern Mick raised is...
>>>
>>> "...just folk continuing to use CREATE INDEX  because they think CREATE
>>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
>>> doing 2i (when they think they are, and/or we definitely want them to be,
>>> using SAI)"
>>>
>>> To paraphrase, we want people to use SAI once it's available where
>>> possible, and the default behavior of CREATE INDEX could be at odds w/
>>> that.
>>>
>>> The proposal we seem to have landed on is something like the following:
>>>
>>> For 5.0:
>>>
>>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>>
>>> (Note: How this would interact w/ the existing secondary_indexes_enabled
>>> YAML options isn't clear yet.)
>>>
>>> Post-5.0:
>>>
>>> 1.) Deprecate and eventually remove SASI when SAI hits full feature
>>> parity w/ it.
>>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
>>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
>>> would both be flexible enough to accommodate index implementation selection
>>> and prescriptive enough to force the user to make a decision (and wouldn't
>>> change the legacy behavior of the existing CREATE INDEX). In this
>>> world, creating a legacy 2i might look something like CREATE
>>> INDEX...USING `legacy`.
>>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>>
>>> Eventually we would have a single enabled DDL statement for index
>>> creation that would be minimal but also explicit/able to handle some
>>> evolution.
>>>
>>> What does everyone think?
>>>
>>>
>>>
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>>
>
>
> --
> you are the apple of my eye !
>
>


Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Caleb Rackliffe
I see a broad desire here to have a configurable (YAML) default
implementation for CREATE INDEX. I'm not strongly opposed to that, as the
concept of a default index implementation is pretty standard for most DBMS
(see Postgres, etc.). However, keep in mind that if we do that, we still
need to either revert to CREATE CUSTOM INDEX or add the USING...WITH...
extensions to CREATE INDEX to override the default or specify parameters,
which will be in play once SAI supports basic text tokenization/filtering.
Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd prefer
allowing USING...WITH... for CREATE INDEX and just deprecating CREATE
CUSTOM INDEX (at least after 5.0), but that's more or less what my original
proposal was above (modulo the configurable default).

Thoughts?

On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:

> I’m not convinced by the changing defaults argument here. The
> characteristics of the two index types are very different, and users with
> scripts that make indexes today shouldn’t have their behaviour change.
>
> We could introduce new syntax that properly appreciates there’s no default
> index, perhaps CREATE LOCAL [type] INDEX? To also make clear that these
> indexes involve a partition key or scatter gather
>
> On 10 May 2023, at 06:26, guo Maxwell  wrote:
>
> 
> +1 , as we must Improve the image of your own default indexing ability.
>
> and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
> can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
> some version after 5.0?
>
> for as I know there may be users using this as a plugin-index interface,
> like https://github.com/Stratio/cassandra-lucene-index (though these
> project may be inactive, But if someone wants to do something similar in
> the future, we don't have to stop).
>
>
>
> Jonathan Ellis  于2023年5月10日周三 10:01写道:
>
>> +1 for this, especially in the long term.  CREATE INDEX should do the
>> right thing for most people without requiring extra ceremony.
>>
>> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>
>>> If the consensus is that SAI is the right default index, then we should
>>> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>>>
>>>
>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
>>> wrote:
>>>
>>> Earlier today, Mick started a thread on the future of our index creation
>>> DDL on Slack:
>>>
>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>>
>>> At the moment, there are two ways to create a secondary index.
>>>
>>> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>>>
>>> This creates an optionally named legacy 2i on the provided table and
>>> column.
>>>
>>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>>
>>> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
>>> USING  [WITH OPTIONS = ]*
>>>
>>> This creates a secondary index on the provided table and column using
>>> the specified 2i implementation class and (optional) parameters.
>>>
>>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>>> 'StorageAttachedIndex'
>>>
>>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
>>> shorthand for the fully-qualified class name, which is also valid.)
>>>
>>> So what is there to discuss?
>>>
>>> The concern Mick raised is...
>>>
>>> "...just folk continuing to use CREATE INDEX  because they think CREATE
>>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
>>> doing 2i (when they think they are, and/or we definitely want them to be,
>>> using SAI)"
>>>
>>> To paraphrase, we want people to use SAI once it's available where
>>> possible, and the default behavior of CREATE INDEX could be at odds w/
>>> that.
>>>
>>> The proposal we seem to have landed on is something like the following:
>>>
>>> For 5.0:
>>>
>>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>>
>>> (Note: How this would interact w/ the existing secondary_indexes_enabled
>>> YAML options isn't clear yet.)
>>>
>>> Post-5.0:
>>>
>>> 1.) Deprecate and eventually remove SASI when SAI hits full feature
>>> parity w/ it.
>>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
>>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
>>> would both be flexible enough to accommodate index implementation selection
>>> and prescriptive enough to force the user to make a decision (and wouldn't
>>> change the legacy behavior of the existing CREATE INDEX). In this
>>> world, creating a legacy 2i might look something like CREATE
>>> INDEX...USING `legacy`.
>>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>>
>>> Eventually we would have a single enabled DDL statement for index
>>> creation that would be minimal but also explicit/able to handle some
>>> evolution.
>>>
>>> What does everyone think?
>>>
>>>
>>>
>>
>> --
>> Jonathan Ellis

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Benedict
I’m not convinced by the changing defaults argument here. The characteristics of the two index types are very different, and users with scripts that make indexes today shouldn’t have their behaviour change.We could introduce new syntax that properly appreciates there’s no default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that these indexes involve a partition key or scatter gatherOn 10 May 2023, at 06:26, guo Maxwell  wrote:+1 , as we must Improve the image of your own default indexing ability.and As for CREATE CUSTOM INDEX , should we just left as it is and we can disable the ability for create SAI through  CREATE CUSTOM INDEX  in some version after 5.0? for as I know there may be users using this as a plugin-index interface, like https://github.com/Stratio/cassandra-lucene-index (though these project may be inactive, But if someone wants to do something similar in the future, we don't have to stop).Jonathan Ellis  于2023年5月10日周三 10:01写道:+1 for this, especially in the long term.  CREATE INDEX should do the right thing for most people without requiring extra ceremony.On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan  wrote:If the consensus is that SAI is the right default index, then we should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.On May 9, 2023, at 4:44 PM, Caleb Rackliffe  wrote:Earlier today, Mick started a thread on the future of our index creation DDL on Slack:https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019At the moment, there are two ways to create a secondary index.1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()This creates an optionally named legacy 2i on the provided table and column.    ex. CREATE INDEX my_index ON kd.tbl(my_text_col)2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING  [WITH OPTIONS = ]This creates a secondary index on the provided table and column using the specified 2i implementation class and (optional) parameters.    ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 'StorageAttachedIndex'(Note that the work on SAI added aliasing, so `StorageAttachedIndex` is shorthand for the fully-qualified class name, which is also valid.)So what is there to discuss?The concern Mick raised is..."...just folk continuing to use CREATE INDEX  because they think CREATE CUSTOM INDEX is advanced (or just don't know of it), and we leave users doing 2i (when they think they are, and/or we definitely want them to be, using SAI)"To paraphrase, we want people to use SAI once it's available where possible, and the default behavior of CREATE INDEX could be at odds w/ that.The proposal we seem to have landed on is something like the following:For 5.0:1.) Disable by default the creation of new legacy 2i via CREATE INDEX.2.) Leave CREATE CUSTOM INDEX...USING... available by default.(Note: How this would interact w/ the existing secondary_indexes_enabled YAML options isn't clear yet.)Post-5.0:1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ it.2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a hybrid between the two. For example, CREATE INDEX...USING...WITH. This would both be flexible enough to accommodate index implementation selection and prescriptive enough to force the user to make a decision (and wouldn't change the legacy behavior of the existing CREATE INDEX). In this world, creating a legacy 2i might look something like CREATE INDEX...USING `legacy`.3.) Eventually deprecate CREATE CUSTOM INDEX...USING.Eventually we would have a single enabled DDL statement for index creation that would be minimal but also explicit/able to handle some evolution.What does everyone think?
-- Jonathan Ellisco-founder, http://www.datastax.com@spyced
-- you are the apple of my eye !


Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread guo Maxwell
+1 , as we must Improve the image of your own default indexing ability.

and As for *CREATE CUSTOM INDEX *, should we just left as it is and we can
disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in some
version after 5.0?

for as I know there may be users using this as a plugin-index interface,
like https://github.com/Stratio/cassandra-lucene-index (though these
project may be inactive, But if someone wants to do something similar in
the future, we don't have to stop).



Jonathan Ellis  于2023年5月10日周三 10:01写道:

> +1 for this, especially in the long term.  CREATE INDEX should do the
> right thing for most people without requiring extra ceremony.
>
> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
>> If the consensus is that SAI is the right default index, then we should
>> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>>
>>
>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
>> wrote:
>>
>> Earlier today, Mick started a thread on the future of our index creation
>> DDL on Slack:
>>
>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>
>> At the moment, there are two ways to create a secondary index.
>>
>> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>>
>> This creates an optionally named legacy 2i on the provided table and
>> column.
>>
>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>
>> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
>> USING  [WITH OPTIONS = ]*
>>
>> This creates a secondary index on the provided table and column using the
>> specified 2i implementation class and (optional) parameters.
>>
>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>> 'StorageAttachedIndex'
>>
>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
>> shorthand for the fully-qualified class name, which is also valid.)
>>
>> So what is there to discuss?
>>
>> The concern Mick raised is...
>>
>> "...just folk continuing to use CREATE INDEX  because they think CREATE
>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
>> doing 2i (when they think they are, and/or we definitely want them to be,
>> using SAI)"
>>
>> To paraphrase, we want people to use SAI once it's available where
>> possible, and the default behavior of CREATE INDEX could be at odds w/
>> that.
>>
>> The proposal we seem to have landed on is something like the following:
>>
>> For 5.0:
>>
>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>
>> (Note: How this would interact w/ the existing secondary_indexes_enabled
>> YAML options isn't clear yet.)
>>
>> Post-5.0:
>>
>> 1.) Deprecate and eventually remove SASI when SAI hits full feature
>> parity w/ it.
>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
>> would both be flexible enough to accommodate index implementation selection
>> and prescriptive enough to force the user to make a decision (and wouldn't
>> change the legacy behavior of the existing CREATE INDEX). In this world,
>> creating a legacy 2i might look something like CREATE INDEX...USING
>> `legacy`.
>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>
>> Eventually we would have a single enabled DDL statement for index
>> creation that would be minimal but also explicit/able to handle some
>> evolution.
>>
>> What does everyone think?
>>
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 
you are the apple of my eye !


Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Dinesh Joshi
I agree. 5.0 is a major release and provides an opportunity to switch defaults.

> On May 9, 2023, at 7:00 PM, Jonathan Ellis  wrote:
> 
> +1 for this, especially in the long term.  CREATE INDEX should do the right 
> thing for most people without requiring extra ceremony.
> 
> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan  > wrote:
>> If the consensus is that SAI is the right default index, then we should just 
>> change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>> 
>> 
>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe >> > wrote:
>>> 
>>> Earlier today, Mick started a thread on the future of our index creation 
>>> DDL on Slack:
>>> 
>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>> 
>>> At the moment, there are two ways to create a secondary index.
>>> 
>>> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>>> 
>>> This creates an optionally named legacy 2i on the provided table and column.
>>> 
>>> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>> 
>>> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
>>>  [WITH OPTIONS = ]
>>> 
>>> This creates a secondary index on the provided table and column using the 
>>> specified 2i implementation class and (optional) parameters.
>>> 
>>> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
>>> 'StorageAttachedIndex'
>>> 
>>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
>>> shorthand for the fully-qualified class name, which is also valid.)
>>> 
>>> So what is there to discuss?
>>> 
>>> The concern Mick raised is...
>>> 
>>> "...just folk continuing to use CREATE INDEX  because they think CREATE 
>>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users 
>>> doing 2i (when they think they are, and/or we definitely want them to be, 
>>> using SAI)"
>>> 
>>> To paraphrase, we want people to use SAI once it's available where 
>>> possible, and the default behavior of CREATE INDEX could be at odds w/ that.
>>> 
>>> The proposal we seem to have landed on is something like the following:
>>> 
>>> For 5.0:
>>> 
>>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>> 
>>> (Note: How this would interact w/ the existing secondary_indexes_enabled 
>>> YAML options isn't clear yet.)
>>> 
>>> Post-5.0:
>>> 
>>> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity 
>>> w/ it.
>>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
>>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This 
>>> would both be flexible enough to accommodate index implementation selection 
>>> and prescriptive enough to force the user to make a decision (and wouldn't 
>>> change the legacy behavior of the existing CREATE INDEX). In this world, 
>>> creating a legacy 2i might look something like CREATE INDEX...USING 
>>> `legacy`.
>>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>> 
>>> Eventually we would have a single enabled DDL statement for index creation 
>>> that would be minimal but also explicit/able to handle some evolution.
>>> 
>>> What does everyone think?
>> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com 
> @spyced



Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jonathan Ellis
+1 for this, especially in the long term.  CREATE INDEX should do the right
thing for most people without requiring extra ceremony.

On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan 
wrote:

> If the consensus is that SAI is the right default index, then we should
> just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.
>
>
> On May 9, 2023, at 4:44 PM, Caleb Rackliffe 
> wrote:
>
> Earlier today, Mick started a thread on the future of our index creation
> DDL on Slack:
>
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>
> At the moment, there are two ways to create a secondary index.
>
> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>
> This creates an optionally named legacy 2i on the provided table and
> column.
>
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>
> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
> USING  [WITH OPTIONS = ]*
>
> This creates a secondary index on the provided table and column using the
> specified 2i implementation class and (optional) parameters.
>
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
> 'StorageAttachedIndex'
>
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
> shorthand for the fully-qualified class name, which is also valid.)
>
> So what is there to discuss?
>
> The concern Mick raised is...
>
> "...just folk continuing to use CREATE INDEX  because they think CREATE
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
> doing 2i (when they think they are, and/or we definitely want them to be,
> using SAI)"
>
> To paraphrase, we want people to use SAI once it's available where
> possible, and the default behavior of CREATE INDEX could be at odds w/
> that.
>
> The proposal we seem to have landed on is something like the following:
>
> For 5.0:
>
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>
> (Note: How this would interact w/ the existing secondary_indexes_enabled
> YAML options isn't clear yet.)
>
> Post-5.0:
>
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity
> w/ it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
> would both be flexible enough to accommodate index implementation selection
> and prescriptive enough to force the user to make a decision (and wouldn't
> change the legacy behavior of the existing CREATE INDEX). In this world,
> creating a legacy 2i might look something like CREATE INDEX...USING
> `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>
> Eventually we would have a single enabled DDL statement for index creation
> that would be minimal but also explicit/able to handle some evolution.
>
> What does everyone think?
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jeremiah D Jordan
> If we assume SAI is what we should use by default for the cluster, would it 
> make sense to allow
> 
> CREATE INDEX [IF NOT EXISTS] [name] ON  ()
> 
> But use a new yaml config that switches from legacy to SAI?
> 
> default_2i_impl: sai
> 
> For 5.0 we can default to “legacy” (new features disabled by default), but 
> allow operators to change this to SAI if they desire?

We have server side DESCRIBE now, so if we have DESCRIBE always show every 
index as a CUSTOM INDEX (or some new syntax that specifies your index type) 
then we could definitely go with this “pick the default in the yaml”.  DESCRIBE 
would always be explicit about which index was in use for which column, so 
backup/restore would work no matter what the default was.

I like this idea.

> On May 9, 2023, at 5:11 PM, David Capwell  wrote:
> 
> If we assume SAI is what we should use by default for the cluster, would it 
> make sense to allow
> 
> CREATE INDEX [IF NOT EXISTS] [name] ON  ()
> 
> But use a new yaml config that switches from legacy to SAI?
> 
> default_2i_impl: sai
> 
> For 5.0 we can default to “legacy” (new features disabled by default), but 
> allow operators to change this to SAI if they desire?
> 
>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
> 
> For 5.0, I would argue all indexes should be disabled by default and require 
> operators to allow… I am totally cool with a new allow list to allow some 
> impl..
> 
> secondary_indexes_enabled: false
> secondary_indexes_impl_allowed: [] # default, but could allow users to do 
> [’sai’] if they wish to allow sai… this does have weird semantics as it 
> causes _enabled to be ignored… this could also replace _enabled, but what is 
> allowed in the true case isn’t 100% clear?  Maybe you need _enabled=true and 
> this allow list limits what is actually allowed (prob is way more clear)?
> 
> 
>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
>> both be flexible enough to accommodate index implementation selection and 
>> prescriptive enough to force the user to make a decision (and wouldn't 
>> change the legacy behavior of the existing CREATE INDEX). In this world, 
>> creating a legacy 2i might look something like CREATE INDEX...USING `legacy`.
> 
> I do not mind a new syntax that tries to be more clear, but the “replace” is 
> what I would push back against… we should keep the 2 existing syntax and not 
> force users to migrate… we can logically merge the 3 syntaxes, but we should 
> not remove the 2 others.
> 
> CREATE INDEX - gets rewritten to CREATE INDEX… USING config.default_2i_imp
> CREATE CUSTOM INDEX` - gets rewritten to new using syntax
> 
>> 3.) Eventually deprecate CREATE CUSTOM INDEX…USING.
> 
> I don’t mind producing a warning telling users its best to use the new 
> syntax, but if its low effort for us to maintain, we should… and since this 
> can be rewritten to the new format in the parser, this should be low effort 
> to support, so we should?
> 
>> On May 9, 2023, at 2:44 PM, Caleb Rackliffe  wrote:
>> 
>> Earlier today, Mick started a thread on the future of our index creation DDL 
>> on Slack:
>> 
>> > href="https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!ZNlxRbG0J87XAz6DKq01BGUb2RlhMUAk936ZcYTetiOnZOwwaTeW0KlVxpgB9d8hfqFP7npFTWzb5NjCQA$;>https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>> 
>> At the moment, there are two ways to create a secondary index.
>> 
>> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
>> 
>> This creates an optionally named legacy 2i on the provided table and column.
>> 
>>ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>> 
>> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
>>  [WITH OPTIONS = ]
>> 
>> This creates a secondary index on the provided table and column using the 
>> specified 2i implementation class and (optional) parameters.
>> 
>>ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
>> 'StorageAttachedIndex'
>> 
>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
>> shorthand for the fully-qualified class name, which is also valid.)
>> 
>> So what is there to discuss?
>> 
>> The concern Mick raised is...
>> 
>> "...just folk continuing to use CREATE INDEX  because they think CREATE 
>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users 
>> doing 2i (when they think they are, and/or we definitely want them to be, 
>> using SAI)"
>> 
>> To paraphrase, we want people to use SAI once it's available where possible, 
>> and the default behavior of CREATE INDEX could be at odds w/ that.
>> 
>> The proposal we seem to have landed on is something like the following:
>> 
>> For 5.0:
>> 
>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>> 
>> (Note: How this would 

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jeremiah D Jordan
If the consensus is that SAI is the right default index, then we should just 
change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX.


> On May 9, 2023, at 4:44 PM, Caleb Rackliffe  wrote:
> 
> Earlier today, Mick started a thread on the future of our index creation DDL 
> on Slack:
> 
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
> 
> At the moment, there are two ways to create a secondary index.
> 
> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
> 
> This creates an optionally named legacy 2i on the provided table and column.
> 
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
> 
> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
>  [WITH OPTIONS = ]
> 
> This creates a secondary index on the provided table and column using the 
> specified 2i implementation class and (optional) parameters.
> 
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
> 'StorageAttachedIndex'
> 
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
> shorthand for the fully-qualified class name, which is also valid.)
> 
> So what is there to discuss?
> 
> The concern Mick raised is...
> 
> "...just folk continuing to use CREATE INDEX  because they think CREATE 
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users doing 
> 2i (when they think they are, and/or we definitely want them to be, using 
> SAI)"
> 
> To paraphrase, we want people to use SAI once it's available where possible, 
> and the default behavior of CREATE INDEX could be at odds w/ that.
> 
> The proposal we seem to have landed on is something like the following:
> 
> For 5.0:
> 
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
> 
> (Note: How this would interact w/ the existing secondary_indexes_enabled YAML 
> options isn't clear yet.)
> 
> Post-5.0:
> 
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ 
> it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
> both be flexible enough to accommodate index implementation selection and 
> prescriptive enough to force the user to make a decision (and wouldn't change 
> the legacy behavior of the existing CREATE INDEX). In this world, creating a 
> legacy 2i might look something like CREATE INDEX...USING `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
> 
> Eventually we would have a single enabled DDL statement for index creation 
> that would be minimal but also explicit/able to handle some evolution.
> 
> What does everyone think?



Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread David Capwell
If we assume SAI is what we should use by default for the cluster, would it 
make sense to allow

CREATE INDEX [IF NOT EXISTS] [name] ON  ()

But use a new yaml config that switches from legacy to SAI?

default_2i_impl: sai

For 5.0 we can default to “legacy” (new features disabled by default), but 
allow operators to change this to SAI if they desire?

> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.

For 5.0, I would argue all indexes should be disabled by default and require 
operators to allow… I am totally cool with a new allow list to allow some impl..

secondary_indexes_enabled: false
secondary_indexes_impl_allowed: [] # default, but could allow users to do 
[’sai’] if they wish to allow sai… this does have weird semantics as it causes 
_enabled to be ignored… this could also replace _enabled, but what is allowed 
in the true case isn’t 100% clear?  Maybe you need _enabled=true and this allow 
list limits what is actually allowed (prob is way more clear)?


> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
> both be flexible enough to accommodate index implementation selection and 
> prescriptive enough to force the user to make a decision (and wouldn't change 
> the legacy behavior of the existing CREATE INDEX). In this world, creating a 
> legacy 2i might look something like CREATE INDEX...USING `legacy`.

I do not mind a new syntax that tries to be more clear, but the “replace” is 
what I would push back against… we should keep the 2 existing syntax and not 
force users to migrate… we can logically merge the 3 syntaxes, but we should 
not remove the 2 others.

CREATE INDEX - gets rewritten to CREATE INDEX… USING config.default_2i_imp
CREATE CUSTOM INDEX` - gets rewritten to new using syntax

> 3.) Eventually deprecate CREATE CUSTOM INDEX…USING.

I don’t mind producing a warning telling users its best to use the new syntax, 
but if its low effort for us to maintain, we should… and since this can be 
rewritten to the new format in the parser, this should be low effort to 
support, so we should?

> On May 9, 2023, at 2:44 PM, Caleb Rackliffe  wrote:
> 
> Earlier today, Mick started a thread on the future of our index creation DDL 
> on Slack:
> 
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
> 
> At the moment, there are two ways to create a secondary index.
> 
> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()
> 
> This creates an optionally named legacy 2i on the provided table and column.
> 
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
> 
> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING 
>  [WITH OPTIONS = ]
> 
> This creates a secondary index on the provided table and column using the 
> specified 2i implementation class and (optional) parameters.
> 
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
> 'StorageAttachedIndex'
> 
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
> shorthand for the fully-qualified class name, which is also valid.)
> 
> So what is there to discuss?
> 
> The concern Mick raised is...
> 
> "...just folk continuing to use CREATE INDEX  because they think CREATE 
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users doing 
> 2i (when they think they are, and/or we definitely want them to be, using 
> SAI)"
> 
> To paraphrase, we want people to use SAI once it's available where possible, 
> and the default behavior of CREATE INDEX could be at odds w/ that.
> 
> The proposal we seem to have landed on is something like the following:
> 
> For 5.0:
> 
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
> 
> (Note: How this would interact w/ the existing secondary_indexes_enabled YAML 
> options isn't clear yet.)
> 
> Post-5.0:
> 
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ 
> it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
> both be flexible enough to accommodate index implementation selection and 
> prescriptive enough to force the user to make a decision (and wouldn't change 
> the legacy behavior of the existing CREATE INDEX). In this world, creating a 
> legacy 2i might look something like CREATE INDEX...USING `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
> 
> Eventually we would have a single enabled DDL statement for index creation 
> that would be minimal but also explicit/able to handle some evolution.
> 
> What does everyone think?



[DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Caleb Rackliffe
Earlier today, Mick started a thread on the future of our index creation
DDL on Slack:

https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019

At the moment, there are two ways to create a secondary index.

*1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*

This creates an optionally named legacy 2i on the provided table and column.

ex. CREATE INDEX my_index ON kd.tbl(my_text_col)

*2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  () USING
 [WITH OPTIONS = ]*

This creates a secondary index on the provided table and column using the
specified 2i implementation class and (optional) parameters.

ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
'StorageAttachedIndex'

(Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
shorthand for the fully-qualified class name, which is also valid.)

So what is there to discuss?

The concern Mick raised is...

"...just folk continuing to use CREATE INDEX  because they think CREATE
CUSTOM INDEX is advanced (or just don't know of it), and we leave users
doing 2i (when they think they are, and/or we definitely want them to be,
using SAI)"

To paraphrase, we want people to use SAI once it's available where
possible, and the default behavior of CREATE INDEX could be at odds w/ that.

The proposal we seem to have landed on is something like the following:

For 5.0:

1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
2.) Leave CREATE CUSTOM INDEX...USING... available by default.

(Note: How this would interact w/ the existing secondary_indexes_enabled
YAML options isn't clear yet.)

Post-5.0:

1.) Deprecate and eventually remove SASI when SAI hits full feature parity
w/ it.
2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
hybrid between the two. For example, CREATE INDEX...USING...WITH. This
would both be flexible enough to accommodate index implementation selection
and prescriptive enough to force the user to make a decision (and wouldn't
change the legacy behavior of the existing CREATE INDEX). In this world,
creating a legacy 2i might look something like CREATE INDEX...USING `legacy`
.
3.) Eventually deprecate CREATE CUSTOM INDEX...USING.

Eventually we would have a single enabled DDL statement for index creation
that would be minimal but also explicit/able to handle some evolution.

What does everyone think?