Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-19 Thread Štefan Miklošovič
Hi Jyothsna,

I find the current state of the CEP in accordance to our discussion albeit
it does not seem to include everything.

There is missing:

1) the fact that you want to back this by TCM
2) that there will be a virtual table a user can see the comments in
3) mention about USE KEYSPACE and autocompletion
4) part about "WITH COMMENTS" for CREATE TABLE LIKE. (however I see it is
mentioned among "goals" but I would like to be more explicit about that
further down the document with the examples).

While the CEP does not need to be overly specific about the implementation
details as they can also vary a bit as the implementation progresses and
one just can not know 100% how it will look upon delivery, I think it is
still possible to specify the most important parts of that. I consider 1)
and 2) quite important aspects of that which shapes the whole solution and
it should be mentioned in CEP explicitly.

After we craft that CEP like this, it will be formally voted on, and not
everybody who will be voting on it is necessarily following this discussion
where these (quite important) technical details were decided. As they are
going to read the CEP as a whole upon casting their vote, I think it is
pretty reasonable to inform them about the intricacies of the overall
solution without having to go through this thread themselves.

Do you think it makes sense if you populate CEP with all this information?

There is also the section "Dropping and renaming annotations" which I think
is not relevant anymore / is redundant and can be removed?

There are also remains of "annotations" in the non-goals section. These can
go away too? I think that we started to prefer COMMENT ON instead of @PII
(that is what I understand "annotation" is in this context).

Regards and thank you

On Mon, Aug 18, 2025 at 9:34 PM Jyothsna Konisa 
wrote:

> Hi Everyone,
>
> Thank you for all the great feedback! I’ve updated the proposed grammar in
> the CEP to align with our discussion and adopt PostgreSQL-style CQL
> statements. Below are a few clarifications on specific points:
>
> *Providers*
> For SECURITY LABEL, we will accept CQL statements such as:
>
> SECURITY LABEL [FOR ] ON  IS '';
> However, FOR  will not be implemented in this CEP, as the scope
> here is limited to enriching schema elements. When a provider is supplied,
> value will be ignored. Server will log a warning message regarding the
> ignored field. Support for providers can be addressed in a future CEP,
> which would further strengthen Cassandra's security posture.
>
> *USE KEYSPACE and Autocomplete*
> We will support omitting the keyspace when USE KEYSPACE is active,
> enabling the expected autocomplete behavior.
>
> *CREATE TABLE LIKE*
> I agree with the suggestion to add a WITH COMMENTS option. By default,
> comments will not be copied. For example:
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS AND SECURITY
> LABEL;
> Thanks again for the thoughtful discussion and valuable input!
>
> Best,
> Jyothsna
>
> On Wed, Aug 13, 2025 at 4:03 AM Štefan Miklošovič 
> wrote:
>
>> Thank you very much, Jyothsna, for being so receptive to community
>> suggestions. Really appreciate it.
>>
>> Regarding to your last example of comment creation, as you put that
>>
>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>>
>> having Cassandra which has also the concept of keyspaces, when I compare
>> it with PG which has this
>>
>> COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
>>
>>  and we would have this
>>
>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>>
>> The construct of "ks.tb.val" is rather unusual but I think we could
>> definitely live with it.
>>
>> One more caveat to all these examples is that if we have
>>
>> USE KEYSPACE ks;
>>
>> then this should "autocomplete" ks:
>>
>> COMMENT ON COLUMN tb.val IS 'credit card number'
>>
>> Similarly, it would be nice if it was done like that of all other
>> elements which logically reside in a keyspace.
>>
>> There is also "CREATE TABLE LIKE" introduced recently (1, 2, 3) and if
>> there is a table we go to copy like that to another one, it is questionable
>> if we should automatically create all comments with it. We could follow how
>> it is done for indexes:
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES;
>>
>> so here it would be
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH COMMENTS;
>>
>> and in case of both specified:
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS;
>>
>> and by default comments would _not_ be copied over.
>>
>> Regards and thank you
>>
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-7662
>> (2) https://issues.apache.org/jira/browse/CASSANDRA-19964
>> (3) https://issues.apache.org/jira/browse/CASSANDRA-19965
>>
>> On Tue, Aug 12, 2025 at 9:53 PM Jyothsna Konisa 
>> wrote:
>>
>>>
>>> Hi Stefan, Patrick, and everyone,
>>>
>>> Thank you all for your valuable feedback and suggestions. I've
>>> conso

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-18 Thread Patrick McFadin
Great update. +1 from me!

Patrick

On Mon, Aug 18, 2025 at 12:33 PM Jyothsna Konisa 
wrote:

> Hi Everyone,
>
> Thank you for all the great feedback! I’ve updated the proposed grammar in
> the CEP to align with our discussion and adopt PostgreSQL-style CQL
> statements. Below are a few clarifications on specific points:
>
> *Providers*
> For SECURITY LABEL, we will accept CQL statements such as:
>
> SECURITY LABEL [FOR ] ON  IS '';
> However, FOR  will not be implemented in this CEP, as the scope
> here is limited to enriching schema elements. When a provider is supplied,
> value will be ignored. Server will log a warning message regarding the
> ignored field. Support for providers can be addressed in a future CEP,
> which would further strengthen Cassandra's security posture.
>
> *USE KEYSPACE and Autocomplete*
> We will support omitting the keyspace when USE KEYSPACE is active,
> enabling the expected autocomplete behavior.
>
> *CREATE TABLE LIKE*
> I agree with the suggestion to add a WITH COMMENTS option. By default,
> comments will not be copied. For example:
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS AND SECURITY
> LABEL;
> Thanks again for the thoughtful discussion and valuable input!
>
> Best,
> Jyothsna
>
> On Wed, Aug 13, 2025 at 4:03 AM Štefan Miklošovič 
> wrote:
>
>> Thank you very much, Jyothsna, for being so receptive to community
>> suggestions. Really appreciate it.
>>
>> Regarding to your last example of comment creation, as you put that
>>
>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>>
>> having Cassandra which has also the concept of keyspaces, when I compare
>> it with PG which has this
>>
>> COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
>>
>>  and we would have this
>>
>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>>
>> The construct of "ks.tb.val" is rather unusual but I think we could
>> definitely live with it.
>>
>> One more caveat to all these examples is that if we have
>>
>> USE KEYSPACE ks;
>>
>> then this should "autocomplete" ks:
>>
>> COMMENT ON COLUMN tb.val IS 'credit card number'
>>
>> Similarly, it would be nice if it was done like that of all other
>> elements which logically reside in a keyspace.
>>
>> There is also "CREATE TABLE LIKE" introduced recently (1, 2, 3) and if
>> there is a table we go to copy like that to another one, it is questionable
>> if we should automatically create all comments with it. We could follow how
>> it is done for indexes:
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES;
>>
>> so here it would be
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH COMMENTS;
>>
>> and in case of both specified:
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS;
>>
>> and by default comments would _not_ be copied over.
>>
>> Regards and thank you
>>
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-7662
>> (2) https://issues.apache.org/jira/browse/CASSANDRA-19964
>> (3) https://issues.apache.org/jira/browse/CASSANDRA-19965
>>
>> On Tue, Aug 12, 2025 at 9:53 PM Jyothsna Konisa 
>> wrote:
>>
>>>
>>> Hi Stefan, Patrick, and everyone,
>>>
>>> Thank you all for your valuable feedback and suggestions. I've
>>> consolidated the key points and wanted to share our thinking on a path
>>> forward.
>>>
>>>
>>> *Regarding the PostgreSQL-style Syntax (COMMENT ON & SECURITY LABEL)*
>>>
>>> We agree with the consensus that adopting PostgreSQL-style syntax is the
>>> most promising approach for the following reasons, which were
>>> well-articulated in the thread:
>>>
>>> - Avoids introducing new Syntax
>>>
>>> - Keeps CQL closer to mainstream SQL
>>>
>>> - More SQL data for LLM training
>>>
>>>
>>>
>>> *Storing Annotations*
>>> We propose to store these comments as part of the schema element's
>>> metadata, which will be persisted to TCM.
>>>
>>> Regarding the discussion about a separate table for annotations: We want
>>> to present an alternative to store annotations/comments in a virtual table.
>>> We can address this during implementation or as a follow-up to this CEP.
>>>
>>> *Impact on DESCRIBE Statements*
>>>
>>> Adopting the COMMENT ON syntax will require some changes to how the
>>> schema is displayed.
>>>
>>> To maintain consistency and ensure the schema can be fully reproduced,
>>> the COMMENT ON statements must be included in the output of DESCRIBE TABLE.
>>> We propose that the output for DESCRIBE TABLE would look something like
>>> this:
>>>
>>>
>>> // Comment creation & DESC table output
>>> CREATE TABLE ks.tb
>>> (
>>> id int PRIMARY KEY,
>>> val text
>>> )
>>>
>>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>>> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>>>
>>>
>>> Including the comment information within the CREATE TABLE statement
>>> itself might be redundant and displaying them as separate COMMENT ON
>>> statements might be better.
>>>
>>> Thanks
>>> Jyothsna
>>>
>>> On Tue, Aug 12, 2025 at 9:31 AM Štefan Miklošov

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-18 Thread Jyothsna Konisa
Hi Everyone,

Thank you for all the great feedback! I’ve updated the proposed grammar in
the CEP to align with our discussion and adopt PostgreSQL-style CQL
statements. Below are a few clarifications on specific points:

*Providers*
For SECURITY LABEL, we will accept CQL statements such as:

SECURITY LABEL [FOR ] ON  IS '';
However, FOR  will not be implemented in this CEP, as the scope
here is limited to enriching schema elements. When a provider is supplied,
value will be ignored. Server will log a warning message regarding the
ignored field. Support for providers can be addressed in a future CEP,
which would further strengthen Cassandra's security posture.

*USE KEYSPACE and Autocomplete*
We will support omitting the keyspace when USE KEYSPACE is active, enabling
the expected autocomplete behavior.

*CREATE TABLE LIKE*
I agree with the suggestion to add a WITH COMMENTS option. By default,
comments will not be copied. For example:
CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS AND SECURITY
LABEL;
Thanks again for the thoughtful discussion and valuable input!

Best,
Jyothsna

On Wed, Aug 13, 2025 at 4:03 AM Štefan Miklošovič 
wrote:

> Thank you very much, Jyothsna, for being so receptive to community
> suggestions. Really appreciate it.
>
> Regarding to your last example of comment creation, as you put that
>
> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>
> having Cassandra which has also the concept of keyspaces, when I compare
> it with PG which has this
>
> COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
>
>  and we would have this
>
> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>
> The construct of "ks.tb.val" is rather unusual but I think we could
> definitely live with it.
>
> One more caveat to all these examples is that if we have
>
> USE KEYSPACE ks;
>
> then this should "autocomplete" ks:
>
> COMMENT ON COLUMN tb.val IS 'credit card number'
>
> Similarly, it would be nice if it was done like that of all other elements
> which logically reside in a keyspace.
>
> There is also "CREATE TABLE LIKE" introduced recently (1, 2, 3) and if
> there is a table we go to copy like that to another one, it is questionable
> if we should automatically create all comments with it. We could follow how
> it is done for indexes:
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES;
>
> so here it would be
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH COMMENTS;
>
> and in case of both specified:
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS;
>
> and by default comments would _not_ be copied over.
>
> Regards and thank you
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-7662
> (2) https://issues.apache.org/jira/browse/CASSANDRA-19964
> (3) https://issues.apache.org/jira/browse/CASSANDRA-19965
>
> On Tue, Aug 12, 2025 at 9:53 PM Jyothsna Konisa 
> wrote:
>
>>
>> Hi Stefan, Patrick, and everyone,
>>
>> Thank you all for your valuable feedback and suggestions. I've
>> consolidated the key points and wanted to share our thinking on a path
>> forward.
>>
>>
>> *Regarding the PostgreSQL-style Syntax (COMMENT ON & SECURITY LABEL)*
>>
>> We agree with the consensus that adopting PostgreSQL-style syntax is the
>> most promising approach for the following reasons, which were
>> well-articulated in the thread:
>>
>> - Avoids introducing new Syntax
>>
>> - Keeps CQL closer to mainstream SQL
>>
>> - More SQL data for LLM training
>>
>>
>>
>> *Storing Annotations*
>> We propose to store these comments as part of the schema element's
>> metadata, which will be persisted to TCM.
>>
>> Regarding the discussion about a separate table for annotations: We want
>> to present an alternative to store annotations/comments in a virtual table.
>> We can address this during implementation or as a follow-up to this CEP.
>>
>> *Impact on DESCRIBE Statements*
>>
>> Adopting the COMMENT ON syntax will require some changes to how the
>> schema is displayed.
>>
>> To maintain consistency and ensure the schema can be fully reproduced,
>> the COMMENT ON statements must be included in the output of DESCRIBE TABLE.
>> We propose that the output for DESCRIBE TABLE would look something like
>> this:
>>
>>
>> // Comment creation & DESC table output
>> CREATE TABLE ks.tb
>> (
>> id int PRIMARY KEY,
>> val text
>> )
>>
>> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
>> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>>
>>
>> Including the comment information within the CREATE TABLE statement
>> itself might be redundant and displaying them as separate COMMENT ON
>> statements might be better.
>>
>> Thanks
>> Jyothsna
>>
>> On Tue, Aug 12, 2025 at 9:31 AM Štefan Miklošovič 
>> wrote:
>>
>>> One more point I would like to add. If we enrich the output with
>>> comments, I think that seeing comments should be only default if I can take
>>> what DESCRIBE prints and I can copy it as-is and create tables from it.
>>> Very often, DESCRIBE ac

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-13 Thread Štefan Miklošovič
Thank you very much, Jyothsna, for being so receptive to community
suggestions. Really appreciate it.

Regarding to your last example of comment creation, as you put that

COMMENT ON COLUMN ks.tb.val IS 'credit card number'
SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'

having Cassandra which has also the concept of keyspaces, when I compare it
with PG which has this

COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';

 and we would have this

COMMENT ON COLUMN ks.tb.val IS 'credit card number'

The construct of "ks.tb.val" is rather unusual but I think we could
definitely live with it.

One more caveat to all these examples is that if we have

USE KEYSPACE ks;

then this should "autocomplete" ks:

COMMENT ON COLUMN tb.val IS 'credit card number'

Similarly, it would be nice if it was done like that of all other elements
which logically reside in a keyspace.

There is also "CREATE TABLE LIKE" introduced recently (1, 2, 3) and if
there is a table we go to copy like that to another one, it is questionable
if we should automatically create all comments with it. We could follow how
it is done for indexes:

CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES;

so here it would be

CREATE TABLE ks.tb_copy LIKE ks.tb WITH COMMENTS;

and in case of both specified:

CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND COMMENTS;

and by default comments would _not_ be copied over.

Regards and thank you

(1) https://issues.apache.org/jira/browse/CASSANDRA-7662
(2) https://issues.apache.org/jira/browse/CASSANDRA-19964
(3) https://issues.apache.org/jira/browse/CASSANDRA-19965

On Tue, Aug 12, 2025 at 9:53 PM Jyothsna Konisa 
wrote:

>
> Hi Stefan, Patrick, and everyone,
>
> Thank you all for your valuable feedback and suggestions. I've
> consolidated the key points and wanted to share our thinking on a path
> forward.
>
>
> *Regarding the PostgreSQL-style Syntax (COMMENT ON & SECURITY LABEL)*
>
> We agree with the consensus that adopting PostgreSQL-style syntax is the
> most promising approach for the following reasons, which were
> well-articulated in the thread:
>
> - Avoids introducing new Syntax
>
> - Keeps CQL closer to mainstream SQL
>
> - More SQL data for LLM training
>
>
>
> *Storing Annotations*
> We propose to store these comments as part of the schema element's
> metadata, which will be persisted to TCM.
>
> Regarding the discussion about a separate table for annotations: We want
> to present an alternative to store annotations/comments in a virtual table.
> We can address this during implementation or as a follow-up to this CEP.
>
> *Impact on DESCRIBE Statements*
>
> Adopting the COMMENT ON syntax will require some changes to how the schema
> is displayed.
>
> To maintain consistency and ensure the schema can be fully reproduced, the
> COMMENT ON statements must be included in the output of DESCRIBE TABLE. We
> propose that the output for DESCRIBE TABLE would look something like this:
>
>
> // Comment creation & DESC table output
> CREATE TABLE ks.tb
> (
> id int PRIMARY KEY,
> val text
> )
>
> COMMENT ON COLUMN ks.tb.val IS 'credit card number'
> SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'
>
>
> Including the comment information within the CREATE TABLE statement itself
> might be redundant and displaying them as separate COMMENT ON statements
> might be better.
>
> Thanks
> Jyothsna
>
> On Tue, Aug 12, 2025 at 9:31 AM Štefan Miklošovič 
> wrote:
>
>> One more point I would like to add. If we enrich the output with
>> comments, I think that seeing comments should be only default if I can take
>> what DESCRIBE prints and I can copy it as-is and create tables from it.
>> Very often, DESCRIBE acts as something like "I will copy this schema here
>> so I can reconstruct it later". So I would expect that, by default, what
>> DESCRIBE gives is "reconstructable". I think there are a lot of tests
>> already which tests what DESCRIBE prints can be reconstructed and this
>> would need to be preserved.
>>
>> We might still do "DESCRIBE ks.tb" without comments / annotations and
>> then "DESCRIBE ks.tb WITH COMMENTS / ANNOTATIONS" to print them.
>>
>> If we put comments on this it is "reconstructable by copy-pasting" as
>> well:
>>
>> create table ks.tb
>> (
>> -- my primary key column
>> id int primary key,
>> -- this is my value
>> val text
>> )
>>
>> however this is not
>>
>> create table ks.tb
>> (
>> /**
>>  my primary key column
>> */
>> id int primary key,
>> val text
>> )
>>
>> you got me ...
>>
>> Also, if we start to automatically enrich DESCRIBE output, it would be
>> very nice if this was digestible by previous versions. Because if I copy
>> DESCRIBE output in 5.1 with @PII then I can not just apply that to 5.0
>> where that concept is not known yet. However plain comments do work in
>> previous versions as well.
>>
>> For this reason I would not make annotations visible by default, I would
>> opt-in by WITH COMMENTS / WITH ANNOTATIONS only and keep the curre

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-12 Thread Jyothsna Konisa
Hi Stefan, Patrick, and everyone,

Thank you all for your valuable feedback and suggestions. I've consolidated
the key points and wanted to share our thinking on a path forward.


*Regarding the PostgreSQL-style Syntax (COMMENT ON & SECURITY LABEL)*

We agree with the consensus that adopting PostgreSQL-style syntax is the
most promising approach for the following reasons, which were
well-articulated in the thread:

- Avoids introducing new Syntax

- Keeps CQL closer to mainstream SQL

- More SQL data for LLM training



*Storing Annotations*
We propose to store these comments as part of the schema element's
metadata, which will be persisted to TCM.

Regarding the discussion about a separate table for annotations: We want to
present an alternative to store annotations/comments in a virtual table. We
can address this during implementation or as a follow-up to this CEP.

*Impact on DESCRIBE Statements*

Adopting the COMMENT ON syntax will require some changes to how the schema
is displayed.

To maintain consistency and ensure the schema can be fully reproduced, the
COMMENT ON statements must be included in the output of DESCRIBE TABLE. We
propose that the output for DESCRIBE TABLE would look something like this:


// Comment creation & DESC table output
CREATE TABLE ks.tb
(
id int PRIMARY KEY,
val text
)

COMMENT ON COLUMN ks.tb.val IS 'credit card number'
SECURITY LABEL ON COLUMN ks.tb.val IS 'PII'


Including the comment information within the CREATE TABLE statement itself
might be redundant and displaying them as separate COMMENT ON statements
might be better.

Thanks
Jyothsna

On Tue, Aug 12, 2025 at 9:31 AM Štefan Miklošovič 
wrote:

> One more point I would like to add. If we enrich the output with comments,
> I think that seeing comments should be only default if I can take what
> DESCRIBE prints and I can copy it as-is and create tables from it.  Very
> often, DESCRIBE acts as something like "I will copy this schema here so I
> can reconstruct it later". So I would expect that, by default, what
> DESCRIBE gives is "reconstructable". I think there are a lot of tests
> already which tests what DESCRIBE prints can be reconstructed and this
> would need to be preserved.
>
> We might still do "DESCRIBE ks.tb" without comments / annotations and then
> "DESCRIBE ks.tb WITH COMMENTS / ANNOTATIONS" to print them.
>
> If we put comments on this it is "reconstructable by copy-pasting" as well:
>
> create table ks.tb
> (
> -- my primary key column
> id int primary key,
> -- this is my value
> val text
> )
>
> however this is not
>
> create table ks.tb
> (
> /**
>  my primary key column
> */
> id int primary key,
> val text
> )
>
> you got me ...
>
> Also, if we start to automatically enrich DESCRIBE output, it would be
> very nice if this was digestible by previous versions. Because if I copy
> DESCRIBE output in 5.1 with @PII then I can not just apply that to 5.0
> where that concept is not known yet. However plain comments do work in
> previous versions as well.
>
> For this reason I would not make annotations visible by default, I would
> opt-in by WITH COMMENTS / WITH ANNOTATIONS only and keep the current output
> as is.
>
>
> On Tue, Aug 12, 2025 at 10:56 AM Mick  wrote:
>
>> a point of order and a reminder: aside from suggestions that the CEP
>> author is free to adopt or not, anything that's assuming to steer what the
>> CEP should be should be accompanied with the willingness to commit in
>> helping making it happen.  we want to work as a meritocracy: those that
>> lead the work have the say, and blocking their chosen approach against
>> their wishes is only on clear technical reasons.  API designs (CQL
>> additions) always needs to be chosen and evolved carefully, and every CEP
>> proposed should be open to that being naturally part of its discussion
>> pre-vote.
>>
>> following the PG approach does make a lot of sense.
>> what are your thoughts on it Jyothsna & Yifan ?
>>
>>
>>
>> > On 12 Aug 2025, at 09:14, Štefan Miklošovič 
>> wrote:
>> >
>> > I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as
>> we do not invent anything custom and we will be as close as possible to
>> industry standard.
>> >
>> > So, if I understand this correctly, on COMMENT ON, we would save each
>> comment to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL
>> element we are describing with commentary, if any, from that comment table,
>> correct?
>> >
>> > I, in general, support this idea, but as usual the devil is in the
>> details. I am just genuinely curious how this would work in practice.
>> >
>> >
>> > If we go with COMMENT ON, is this going to be stored to TCM or not?
>> >
>> >
>> > If the answer is yes, then it is way more simpler, because then this
>> commentary would be dispersed by the means of TCM and each node would apply
>> this transformation locally to system_schema.annotations.
>> >
>> > If the answer is no and if there is a cluster and

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-12 Thread Štefan Miklošovič
One more point I would like to add. If we enrich the output with comments,
I think that seeing comments should be only default if I can take what
DESCRIBE prints and I can copy it as-is and create tables from it.  Very
often, DESCRIBE acts as something like "I will copy this schema here so I
can reconstruct it later". So I would expect that, by default, what
DESCRIBE gives is "reconstructable". I think there are a lot of tests
already which tests what DESCRIBE prints can be reconstructed and this
would need to be preserved.

We might still do "DESCRIBE ks.tb" without comments / annotations and then
"DESCRIBE ks.tb WITH COMMENTS / ANNOTATIONS" to print them.

If we put comments on this it is "reconstructable by copy-pasting" as well:

create table ks.tb
(
-- my primary key column
id int primary key,
-- this is my value
val text
)

however this is not

create table ks.tb
(
/**
 my primary key column
*/
id int primary key,
val text
)

you got me ...

Also, if we start to automatically enrich DESCRIBE output, it would be very
nice if this was digestible by previous versions. Because if I copy
DESCRIBE output in 5.1 with @PII then I can not just apply that to 5.0
where that concept is not known yet. However plain comments do work in
previous versions as well.

For this reason I would not make annotations visible by default, I would
opt-in by WITH COMMENTS / WITH ANNOTATIONS only and keep the current output
as is.


On Tue, Aug 12, 2025 at 10:56 AM Mick  wrote:

> a point of order and a reminder: aside from suggestions that the CEP
> author is free to adopt or not, anything that's assuming to steer what the
> CEP should be should be accompanied with the willingness to commit in
> helping making it happen.  we want to work as a meritocracy: those that
> lead the work have the say, and blocking their chosen approach against
> their wishes is only on clear technical reasons.  API designs (CQL
> additions) always needs to be chosen and evolved carefully, and every CEP
> proposed should be open to that being naturally part of its discussion
> pre-vote.
>
> following the PG approach does make a lot of sense.
> what are your thoughts on it Jyothsna & Yifan ?
>
>
>
> > On 12 Aug 2025, at 09:14, Štefan Miklošovič 
> wrote:
> >
> > I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as we
> do not invent anything custom and we will be as close as possible to
> industry standard.
> >
> > So, if I understand this correctly, on COMMENT ON, we would save each
> comment to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL
> element we are describing with commentary, if any, from that comment table,
> correct?
> >
> > I, in general, support this idea, but as usual the devil is in the
> details. I am just genuinely curious how this would work in practice.
> >
> >
> > If we go with COMMENT ON, is this going to be stored to TCM or not?
> >
> >
> > If the answer is yes, then it is way more simpler, because then this
> commentary would be dispersed by the means of TCM and each node would apply
> this transformation locally to system_schema.annotations.
> >
> > If the answer is no and if there is a cluster and we do COMMENT ON, then
> this comment has to be saved to a table. If we rule out TCM as a vehicle
> for the dispersion of these comments, that comment table has to be
> distributed / replicated, correct? I do not think that we can create that
> table under system_schema then, as that is on LocalStrategy and all
> modifications to that are, as I understand it, done via TCM?
> >
> > Hence, I guess the better place for that is under system_distributed?
> That means that if somebody changes that keyspace to NTS or nodes are not
> available, we will not be able to create any commentary.
> >
> > Also, if we remove / alter anything, like dropping a keyspace, table,
> index, removing column etc ... all these changes would need to also remove
> respective comments from that table etc etc.
> >
> > For these reasons, I think that having dedicated
> system_schema.annotations table while interacting with it via COMMENT ON to
> be "PG-compatible" so people can query that table directly, and backing
> COMMENT ON by TCM by having it as another transformation (as COMMENT ON is
> inherently part of the schema) is the best way to do this.
> >
> > On Mon, Aug 11, 2025 at 10:55 PM Patrick McFadin 
> wrote:
> > One (of many) reasons I'm advocating we migrate away from CQL. It served
> a purpose at the time, but this project is evolving and this to me seems
> like the logical next iteration. The Cassandra project has built it's
> reputation on what it can do, not clever syntax design. ;)
> >
> > Patrick
> >
> > On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai  wrote:
> > The reasonings on operator and LLM familiarity are spot on.
> >
> > I have experimented with LLM generated queries. It typically does a
> noticeably better job on SQL than CQL.
> >
> > - Yifan
> >
> > On Mon, Aug 11, 2025 at 1:

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-12 Thread Mick
a point of order and a reminder: aside from suggestions that the CEP author is 
free to adopt or not, anything that's assuming to steer what the CEP should be 
should be accompanied with the willingness to commit in helping making it 
happen.  we want to work as a meritocracy: those that lead the work have the 
say, and blocking their chosen approach against their wishes is only on clear 
technical reasons.  API designs (CQL additions) always needs to be chosen and 
evolved carefully, and every CEP proposed should be open to that being 
naturally part of its discussion pre-vote.

following the PG approach does make a lot of sense.
what are your thoughts on it Jyothsna & Yifan ?



> On 12 Aug 2025, at 09:14, Štefan Miklošovič  wrote:
> 
> I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as we do 
> not invent anything custom and we will be as close as possible to industry 
> standard. 
> 
> So, if I understand this correctly, on COMMENT ON, we would save each comment 
> to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL element we 
> are describing with commentary, if any, from that comment table, correct?
> 
> I, in general, support this idea, but as usual the devil is in the details. I 
> am just genuinely curious how this would work in practice.
> 
> 
> If we go with COMMENT ON, is this going to be stored to TCM or not?
> 
> 
> If the answer is yes, then it is way more simpler, because then this 
> commentary would be dispersed by the means of TCM and each node would apply 
> this transformation locally to system_schema.annotations.
> 
> If the answer is no and if there is a cluster and we do COMMENT ON, then this 
> comment has to be saved to a table. If we rule out TCM as a vehicle for the 
> dispersion of these comments, that comment table has to be distributed / 
> replicated, correct? I do not think that we can create that table under 
> system_schema then, as that is on LocalStrategy and all modifications to that 
> are, as I understand it, done via TCM?
> 
> Hence, I guess the better place for that is under system_distributed? That 
> means that if somebody changes that keyspace to NTS or nodes are not 
> available, we will not be able to create any commentary.
> 
> Also, if we remove / alter anything, like dropping a keyspace, table, index, 
> removing column etc ... all these changes would need to also remove 
> respective comments from that table etc etc.
> 
> For these reasons, I think that having dedicated system_schema.annotations 
> table while interacting with it via COMMENT ON to be "PG-compatible" so 
> people can query that table directly, and backing COMMENT ON by TCM by having 
> it as another transformation (as COMMENT ON is inherently part of the schema) 
> is the best way to do this. 
> 
> On Mon, Aug 11, 2025 at 10:55 PM Patrick McFadin  wrote:
> One (of many) reasons I'm advocating we migrate away from CQL. It served a 
> purpose at the time, but this project is evolving and this to me seems like 
> the logical next iteration. The Cassandra project has built it's reputation 
> on what it can do, not clever syntax design. ;) 
> 
> Patrick
> 
> On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai  wrote:
> The reasonings on operator and LLM familiarity are spot on. 
> 
> I have experimented with LLM generated queries. It typically does a 
> noticeably better job on SQL than CQL. 
> 
> - Yifan
> 
> On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin  wrote:
> I really love this CEP.  +1 on the goal. 
> 
> As you've already seen, I've been advocating to improve our syntax ergonomics 
> towards more mainstream SQL and avoiding new/custom syntax.  I would suggest 
> the following changes towards that goal:
>  - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing table 
> comments to that). For structured tags, mirror SECURITY LABEL[2]:
> SECURITY LABEL FOR  ON  IS ''; 
> 
> - Allow multiple providers per object. Store the value as text in v1 (JSON or 
> key/val later if we want), which avoids inventing new inline @ syntax.
> 
>  - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas 
> readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL 
> right after DDL, like PG users do today.
> 
>  - Names & built-ins. Case-insensitive provider names with canonical 
> lowercase. No separate @Description type. COMMENT ON already covers that use 
> case cleanly.
> 
>  - Introspection by query and by DESC. Keep annotations visible in DESCRIBE, 
> but also expose a single system_schema.annotations view (provider, 
> object_type, object_name, sub_name, value) so folks can get all annotations 
> for a table. Example: “find all columns labeled PII,” etc.
> 
> Why PG-like? Besides operator familiarity, there’s far more training data and 
> tooling around COMMENT ON/SECURITY LABEL than around bespoke @annotation 
> syntax. Sticking to that shape reduces LLM/tool friction and avoids teaching 
> the world a new grammar. This has been a

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-12 Thread Štefan Miklošovič
I like the idea of COMMENT ON and alike from PG! Yes, great stuff, as we do
not invent anything custom and we will be as close as possible to industry
standard.

So, if I understand this correctly, on COMMENT ON, we would save each
comment to a dedicated table. Then on DESCRIBE, we would "enrich" the CQL
element we are describing with commentary, if any, from that comment table,
correct?

I, in general, support this idea, but as usual the devil is in the details.
I am just genuinely curious how this would work in practice.


If we go with COMMENT ON, is this going to be stored to TCM or not?


If the answer is yes, then it is way more simpler, because then this
commentary would be dispersed by the means of TCM and each node would apply
this transformation locally to system_schema.annotations.

If the answer is no and if there is a cluster and we do COMMENT ON, then
this comment has to be saved to a table. If we rule out TCM as a vehicle
for the dispersion of these comments, that comment table has to be
distributed / replicated, correct? I do not think that we can create that
table under system_schema then, as that is on LocalStrategy and all
modifications to that are, as I understand it, done via TCM?

Hence, I guess the better place for that is under system_distributed? That
means that if somebody changes that keyspace to NTS or nodes are not
available, we will not be able to create any commentary.

Also, if we remove / alter anything, like dropping a keyspace, table,
index, removing column etc ... all these changes would need to also remove
respective comments from that table etc etc.

For these reasons, I think that having dedicated system_schema.annotations
table while interacting with it via COMMENT ON to be "PG-compatible" so
people can query that table directly, and backing COMMENT ON by TCM by
having it as another transformation (as COMMENT ON is inherently part of
the schema) is the best way to do this.

On Mon, Aug 11, 2025 at 10:55 PM Patrick McFadin  wrote:

> One (of many) reasons I'm advocating we migrate away from CQL. It served a
> purpose at the time, but this project is evolving and this to me seems like
> the logical next iteration. The Cassandra project has built it's
> reputation on what it can do, not clever syntax design. ;)
>
> Patrick
>
> On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai  wrote:
>
>> The reasonings on operator and LLM familiarity are spot on.
>>
>> I have experimented with LLM generated queries. It typically does a
>> noticeably better job on SQL than CQL.
>>
>> - Yifan
>>
>> On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin 
>> wrote:
>>
>>> I really love this CEP.  +1 on the goal.
>>>
>>> As you've already seen, I've been advocating to improve our syntax
>>> ergonomics towards more mainstream SQL and avoiding new/custom syntax.  I
>>> would suggest the following changes towards that goal:
>>>  - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing
>>> table comments to that). For structured tags, mirror SECURITY LABEL[2]:
>>> SECURITY LABEL FOR  ON  IS '';
>>>
>>> - Allow multiple providers per object. Store the value as text in v1
>>> (JSON or key/val later if we want), which avoids inventing new inline @
>>> syntax.
>>>
>>>  - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas
>>> readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL
>>> right after DDL, like PG users do today.
>>>
>>>  - Names & built-ins. Case-insensitive provider names with canonical
>>> lowercase. No separate @Description type. COMMENT ON already covers that
>>> use case cleanly.
>>>
>>>  - Introspection by query and by DESC. Keep annotations visible in
>>> DESCRIBE, but also expose a single system_schema.annotations view
>>> (provider, object_type, object_name, sub_name, value) so folks can get all
>>> annotations for a table. Example: “find all columns labeled PII,” etc.
>>>
>>> Why PG-like? Besides operator familiarity, there’s far more training
>>> data and tooling around COMMENT ON/SECURITY LABEL than around bespoke
>>> @annotation syntax. Sticking to that shape reduces LLM/tool friction and
>>> avoids teaching the world a new grammar. This has been a huge challenge for
>>> Cassandra work with LLMs as models tend to drift towards PG SQL in CQL
>>> often. (No Claude, JOIN is not a keyword in Cassandra)
>>>
>>> If this direction sounds good, happy to help update the CEP text and
>>> examples.
>>>
>>> Patrick
>>>
>>> 1: COMMENT ON docs
>>> https://www.postgresql.org/docs/current/sql-comment.html
>>> 2: SECURITY LABEL docs
>>> https://www.postgresql.org/docs/current/sql-security-label.html
>>>
>>>
>>> On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai  wrote:
>>>
 IMO, the full schema or table schema output already makes it
 possible to filter the fields (not limited to columns) that are using
 certain annotations, relatively easily. Grepping or parsing, whichever is
 more suitable for the scenarios; consumers make the call.
 There i

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Patrick McFadin
One (of many) reasons I'm advocating we migrate away from CQL. It served a
purpose at the time, but this project is evolving and this to me seems like
the logical next iteration. The Cassandra project has built it's
reputation on what it can do, not clever syntax design. ;)

Patrick

On Mon, Aug 11, 2025 at 1:51 PM Yifan Cai  wrote:

> The reasonings on operator and LLM familiarity are spot on.
>
> I have experimented with LLM generated queries. It typically does a
> noticeably better job on SQL than CQL.
>
> - Yifan
>
> On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin 
> wrote:
>
>> I really love this CEP.  +1 on the goal.
>>
>> As you've already seen, I've been advocating to improve our syntax
>> ergonomics towards more mainstream SQL and avoiding new/custom syntax.  I
>> would suggest the following changes towards that goal:
>>  - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing
>> table comments to that). For structured tags, mirror SECURITY LABEL[2]:
>> SECURITY LABEL FOR  ON  IS '';
>>
>> - Allow multiple providers per object. Store the value as text in v1
>> (JSON or key/val later if we want), which avoids inventing new inline @
>> syntax.
>>
>>  - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas
>> readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL
>> right after DDL, like PG users do today.
>>
>>  - Names & built-ins. Case-insensitive provider names with canonical
>> lowercase. No separate @Description type. COMMENT ON already covers that
>> use case cleanly.
>>
>>  - Introspection by query and by DESC. Keep annotations visible in
>> DESCRIBE, but also expose a single system_schema.annotations view
>> (provider, object_type, object_name, sub_name, value) so folks can get all
>> annotations for a table. Example: “find all columns labeled PII,” etc.
>>
>> Why PG-like? Besides operator familiarity, there’s far more training data
>> and tooling around COMMENT ON/SECURITY LABEL than around bespoke
>> @annotation syntax. Sticking to that shape reduces LLM/tool friction and
>> avoids teaching the world a new grammar. This has been a huge challenge for
>> Cassandra work with LLMs as models tend to drift towards PG SQL in CQL
>> often. (No Claude, JOIN is not a keyword in Cassandra)
>>
>> If this direction sounds good, happy to help update the CEP text and
>> examples.
>>
>> Patrick
>>
>> 1: COMMENT ON docs
>> https://www.postgresql.org/docs/current/sql-comment.html
>> 2: SECURITY LABEL docs
>> https://www.postgresql.org/docs/current/sql-security-label.html
>>
>>
>> On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai  wrote:
>>
>>> IMO, the full schema or table schema output already makes it possible to
>>> filter the fields (not limited to columns) that are using certain
>>> annotations, relatively easily. Grepping or parsing, whichever is more
>>> suitable for the scenarios; consumers make the call.
>>> There is not much added value by providing such a dedicated query,
>>> however, adding quite a lot of complexity in the design of this CEP. Please
>>> correct me if I have the wrong understanding of the queries.
>>>
>>> Another reason for preferring the existing "DESCRIBE" statements is the
>>> gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM
>>> the full (table) schema.
>>>
>>> The primary goal is to enrich the schema with annotations. Through the
>>> discussion thread, we will find out whether there is enough motivation to
>>> support such queries to filter by annotation. I appreciate that you brought
>>> up the idea.
>>>
>>> Although we are not at the stage of talking about the implementation,
>>> just sharing my thoughts a bit, I am thinking of the approach (1) that
>>> Stefan mentioned.
>>>
>>> - Yifan
>>>
>>> On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero 
>>> wrote:
>>>
 Another interesting query would be to retrieve all the fields annotated
 with PII
 for example.

 On 2025/08/11 01:01:21 Yifan Cai wrote:
 > >
 > > Will there be an option to do a SELECT query to read all the
 annotations
 > > of a table?
 >
 >
 > It is an interesting question! Would you mind sharing an example of
 the
 > output you'd expect from a query like *"SELECT * FROM
 > system_schema.annotations where keyspace_name=<> and table_name=<>"*?
 I am
 > curious how that might differ from what we get when running "DESC
 TABLE".
 >
 > - Yifan
 >
 > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
 [email protected]>
 > wrote:
 >
 > > >we could explore enriching the syntax with DESCRIBE
 > >
 > > Will there be an option to do a SELECT query to read all the
 annotations
 > > of a table? Something like *"SELECT * FROM system_schema.annotations
 > > where keyspace_name=<> and table_name=<>"*
 > > It would be helpful to have a structured CQL query on top of
 printing the
 > > annotations through DESC so that t

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Yifan Cai
The reasonings on operator and LLM familiarity are spot on.

I have experimented with LLM generated queries. It typically does a
noticeably better job on SQL than CQL.

- Yifan

On Mon, Aug 11, 2025 at 1:44 PM Patrick McFadin  wrote:

> I really love this CEP.  +1 on the goal.
>
> As you've already seen, I've been advocating to improve our syntax
> ergonomics towards more mainstream SQL and avoiding new/custom syntax.  I
> would suggest the following changes towards that goal:
>  - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing
> table comments to that). For structured tags, mirror SECURITY LABEL[2]:
> SECURITY LABEL FOR  ON  IS '';
>
> - Allow multiple providers per object. Store the value as text in v1 (JSON
> or key/val later if we want), which avoids inventing new inline @ syntax.
>
>  - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas
> readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL
> right after DDL, like PG users do today.
>
>  - Names & built-ins. Case-insensitive provider names with canonical
> lowercase. No separate @Description type. COMMENT ON already covers that
> use case cleanly.
>
>  - Introspection by query and by DESC. Keep annotations visible in
> DESCRIBE, but also expose a single system_schema.annotations view
> (provider, object_type, object_name, sub_name, value) so folks can get all
> annotations for a table. Example: “find all columns labeled PII,” etc.
>
> Why PG-like? Besides operator familiarity, there’s far more training data
> and tooling around COMMENT ON/SECURITY LABEL than around bespoke
> @annotation syntax. Sticking to that shape reduces LLM/tool friction and
> avoids teaching the world a new grammar. This has been a huge challenge for
> Cassandra work with LLMs as models tend to drift towards PG SQL in CQL
> often. (No Claude, JOIN is not a keyword in Cassandra)
>
> If this direction sounds good, happy to help update the CEP text and
> examples.
>
> Patrick
>
> 1: COMMENT ON docs
> https://www.postgresql.org/docs/current/sql-comment.html
> 2: SECURITY LABEL docs
> https://www.postgresql.org/docs/current/sql-security-label.html
>
>
> On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai  wrote:
>
>> IMO, the full schema or table schema output already makes it possible to
>> filter the fields (not limited to columns) that are using certain
>> annotations, relatively easily. Grepping or parsing, whichever is more
>> suitable for the scenarios; consumers make the call.
>> There is not much added value by providing such a dedicated query,
>> however, adding quite a lot of complexity in the design of this CEP. Please
>> correct me if I have the wrong understanding of the queries.
>>
>> Another reason for preferring the existing "DESCRIBE" statements is the
>> gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM
>> the full (table) schema.
>>
>> The primary goal is to enrich the schema with annotations. Through the
>> discussion thread, we will find out whether there is enough motivation to
>> support such queries to filter by annotation. I appreciate that you brought
>> up the idea.
>>
>> Although we are not at the stage of talking about the implementation,
>> just sharing my thoughts a bit, I am thinking of the approach (1) that
>> Stefan mentioned.
>>
>> - Yifan
>>
>> On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero 
>> wrote:
>>
>>> Another interesting query would be to retrieve all the fields annotated
>>> with PII
>>> for example.
>>>
>>> On 2025/08/11 01:01:21 Yifan Cai wrote:
>>> > >
>>> > > Will there be an option to do a SELECT query to read all the
>>> annotations
>>> > > of a table?
>>> >
>>> >
>>> > It is an interesting question! Would you mind sharing an example of the
>>> > output you'd expect from a query like *"SELECT * FROM
>>> > system_schema.annotations where keyspace_name=<> and table_name=<>"*?
>>> I am
>>> > curious how that might differ from what we get when running "DESC
>>> TABLE".
>>> >
>>> > - Yifan
>>> >
>>> > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
>>> [email protected]>
>>> > wrote:
>>> >
>>> > > >we could explore enriching the syntax with DESCRIBE
>>> > >
>>> > > Will there be an option to do a SELECT query to read all the
>>> annotations
>>> > > of a table? Something like *"SELECT * FROM system_schema.annotations
>>> > > where keyspace_name=<> and table_name=<>"*
>>> > > It would be helpful to have a structured CQL query on top of
>>> printing the
>>> > > annotations through DESC so that the information can be consumed
>>> easily.
>>> > >
>>> > > Jaydeep
>>> > >
>>> > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa <
>>> [email protected]>
>>> > > wrote:
>>> > >
>>> > >> Thanks, Joel, for the positive response.
>>> > >>
>>> > >> 1. User-defined vs. pre-defined annotation types
>>> > >>
>>> > >> We'd like to have one predefined annotation, Description, but also
>>> give
>>> > >> users the flexibility to create new ones. If a user feels th

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Patrick McFadin
I really love this CEP.  +1 on the goal.

As you've already seen, I've been advocating to improve our syntax
ergonomics towards more mainstream SQL and avoiding new/custom syntax.  I
would suggest the following changes towards that goal:
 - Reuse PG-shaped DDL. Keep human text in COMMENT ON[1] (map existing
table comments to that). For structured tags, mirror SECURITY LABEL[2]:
SECURITY LABEL FOR  ON  IS '';

- Allow multiple providers per object. Store the value as text in v1 (JSON
or key/val later if we want), which avoids inventing new inline @ syntax.

 - Avoid new grammar in CREATE/ALTER. Skipping inline @PII keeps schemas
readable and the grammar simple. Tools can issue COMMENT ON/SECURITY LABEL
right after DDL, like PG users do today.

 - Names & built-ins. Case-insensitive provider names with canonical
lowercase. No separate @Description type. COMMENT ON already covers that
use case cleanly.

 - Introspection by query and by DESC. Keep annotations visible in
DESCRIBE, but also expose a single system_schema.annotations view
(provider, object_type, object_name, sub_name, value) so folks can get all
annotations for a table. Example: “find all columns labeled PII,” etc.

Why PG-like? Besides operator familiarity, there’s far more training data
and tooling around COMMENT ON/SECURITY LABEL than around bespoke
@annotation syntax. Sticking to that shape reduces LLM/tool friction and
avoids teaching the world a new grammar. This has been a huge challenge for
Cassandra work with LLMs as models tend to drift towards PG SQL in CQL
often. (No Claude, JOIN is not a keyword in Cassandra)

If this direction sounds good, happy to help update the CEP text and
examples.

Patrick

1: COMMENT ON docs https://www.postgresql.org/docs/current/sql-comment.html
2: SECURITY LABEL docs
https://www.postgresql.org/docs/current/sql-security-label.html


On Mon, Aug 11, 2025 at 10:18 AM Yifan Cai  wrote:

> IMO, the full schema or table schema output already makes it possible to
> filter the fields (not limited to columns) that are using certain
> annotations, relatively easily. Grepping or parsing, whichever is more
> suitable for the scenarios; consumers make the call.
> There is not much added value by providing such a dedicated query,
> however, adding quite a lot of complexity in the design of this CEP. Please
> correct me if I have the wrong understanding of the queries.
>
> Another reason for preferring the existing "DESCRIBE" statements is the
> gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM
> the full (table) schema.
>
> The primary goal is to enrich the schema with annotations. Through the
> discussion thread, we will find out whether there is enough motivation to
> support such queries to filter by annotation. I appreciate that you brought
> up the idea.
>
> Although we are not at the stage of talking about the implementation, just
> sharing my thoughts a bit, I am thinking of the approach (1) that Stefan
> mentioned.
>
> - Yifan
>
> On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero 
> wrote:
>
>> Another interesting query would be to retrieve all the fields annotated
>> with PII
>> for example.
>>
>> On 2025/08/11 01:01:21 Yifan Cai wrote:
>> > >
>> > > Will there be an option to do a SELECT query to read all the
>> annotations
>> > > of a table?
>> >
>> >
>> > It is an interesting question! Would you mind sharing an example of the
>> > output you'd expect from a query like *"SELECT * FROM
>> > system_schema.annotations where keyspace_name=<> and table_name=<>"*? I
>> am
>> > curious how that might differ from what we get when running "DESC
>> TABLE".
>> >
>> > - Yifan
>> >
>> > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
>> [email protected]>
>> > wrote:
>> >
>> > > >we could explore enriching the syntax with DESCRIBE
>> > >
>> > > Will there be an option to do a SELECT query to read all the
>> annotations
>> > > of a table? Something like *"SELECT * FROM system_schema.annotations
>> > > where keyspace_name=<> and table_name=<>"*
>> > > It would be helpful to have a structured CQL query on top of printing
>> the
>> > > annotations through DESC so that the information can be consumed
>> easily.
>> > >
>> > > Jaydeep
>> > >
>> > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> Thanks, Joel, for the positive response.
>> > >>
>> > >> 1. User-defined vs. pre-defined annotation types
>> > >>
>> > >> We'd like to have one predefined annotation, Description, but also
>> give
>> > >> users the flexibility to create new ones. If a user feels that a
>> custom
>> > >> annotation like @Desc suits their use case, they should be allowed
>> to use
>> > >> it, as these elements are purely descriptive and have no actions
>> associated
>> > >> with them.
>> > >>
>> > >> 2. Syntactically, is it worth considering other alternatives?
>> > >>
>> > >> You're concerned that having several annotations on multiple columns
>> > >> could make

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Yifan Cai
IMO, the full schema or table schema output already makes it possible to
filter the fields (not limited to columns) that are using certain
annotations, relatively easily. Grepping or parsing, whichever is more
suitable for the scenarios; consumers make the call.
There is not much added value by providing such a dedicated query, however,
adding quite a lot of complexity in the design of this CEP. Please correct
me if I have the wrong understanding of the queries.

Another reason for preferring the existing "DESCRIBE" statements is the
gen-AI enrichment mentioned in the CEP. We most likely want to feed the LLM
the full (table) schema.

The primary goal is to enrich the schema with annotations. Through the
discussion thread, we will find out whether there is enough motivation to
support such queries to filter by annotation. I appreciate that you brought
up the idea.

Although we are not at the stage of talking about the implementation, just
sharing my thoughts a bit, I am thinking of the approach (1) that Stefan
mentioned.

- Yifan

On Mon, Aug 11, 2025 at 6:31 AM Francisco Guerrero 
wrote:

> Another interesting query would be to retrieve all the fields annotated
> with PII
> for example.
>
> On 2025/08/11 01:01:21 Yifan Cai wrote:
> > >
> > > Will there be an option to do a SELECT query to read all the
> annotations
> > > of a table?
> >
> >
> > It is an interesting question! Would you mind sharing an example of the
> > output you'd expect from a query like *"SELECT * FROM
> > system_schema.annotations where keyspace_name=<> and table_name=<>"*? I
> am
> > curious how that might differ from what we get when running "DESC TABLE".
> >
> > - Yifan
> >
> > On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
> [email protected]>
> > wrote:
> >
> > > >we could explore enriching the syntax with DESCRIBE
> > >
> > > Will there be an option to do a SELECT query to read all the
> annotations
> > > of a table? Something like *"SELECT * FROM system_schema.annotations
> > > where keyspace_name=<> and table_name=<>"*
> > > It would be helpful to have a structured CQL query on top of printing
> the
> > > annotations through DESC so that the information can be consumed
> easily.
> > >
> > > Jaydeep
> > >
> > > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa <
> [email protected]>
> > > wrote:
> > >
> > >> Thanks, Joel, for the positive response.
> > >>
> > >> 1. User-defined vs. pre-defined annotation types
> > >>
> > >> We'd like to have one predefined annotation, Description, but also
> give
> > >> users the flexibility to create new ones. If a user feels that a
> custom
> > >> annotation like @Desc suits their use case, they should be allowed to
> use
> > >> it, as these elements are purely descriptive and have no actions
> associated
> > >> with them.
> > >>
> > >> 2. Syntactically, is it worth considering other alternatives?
> > >>
> > >> You're concerned that having several annotations on multiple columns
> > >> could make schemas difficult to read. For now, we can have annotations
> > >> printed as part of DESCRIBE statements. If there's a strong need to
> > >> suppress annotations for readability, we could explore enriching the
> syntax
> > >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the
> existing
> > >> DESCRIBE [FULL] SCHEMA.
> > >>
> > >> Thanks,
> > >> Jyothsna
> > >>
> > >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa <
> [email protected]>
> > >> wrote:
> > >>
> > >>> Thanks, Stefan, for your feedback!
> > >>>
> > >>> To answer your questions,
> > >>>
> > >>> 1. I agree; annotations can optionally take arguments, and if an
> > >>> annotation doesn't have an argument, we can skip the arguments in the
> > >>> "DESCRIBE" statement's output.
> > >>>
> > >>> 2. Good point. We originally considered using "ANNOTATED WITH" but
> found
> > >>> it too verbose. As an alternative, we proposed using "@" preceding
> the
> > >>> annotation to signal it to the parser. We are open to using an
> explicit
> > >>> phrase like "ANNOTATED WITH" if you think it would make the code more
> > >>> readable.
> > >>>
> > >>> A full example of annotations along with constraints and masking
> could
> > >>> be:
> > >>>
> > >>>
> > >>> CREATE TABLE test_ks.test_table (
> > >>> id int PRIMARY KEY,
> > >>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND
> @DESCRIPTION('this
> > >>> is column col2') MASKED WITH default()
> > >>> );
> > >>>
> > >>> OR
> > >>>
> > >>> CREATE TABLE test_ks.test_table (
> > >>> id int PRIMARY KEY,
> > >>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column
> col2')
> > >>> MASKED WITH default()
> > >>> );
> > >>>
> > >>>
> > >>>
> > >>> 3. We do not have a prototype yet, but I think we will have to
> introduce
> > >>> new parsing branch for annotations at the table level
> > >>>
> > >>> I hope I answered all your questions!
> > >>>
> > >>> - Jyothsna
> > >>>
> > >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd 
> > >>> wrote:
> > >>>
> >  I like 

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Francisco Guerrero
Another interesting query would be to retrieve all the fields annotated with PII
for example.

On 2025/08/11 01:01:21 Yifan Cai wrote:
> >
> > Will there be an option to do a SELECT query to read all the annotations
> > of a table?
> 
> 
> It is an interesting question! Would you mind sharing an example of the
> output you'd expect from a query like *"SELECT * FROM
> system_schema.annotations where keyspace_name=<> and table_name=<>"*? I am
> curious how that might differ from what we get when running "DESC TABLE".
> 
> - Yifan
> 
> On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia 
> wrote:
> 
> > >we could explore enriching the syntax with DESCRIBE
> >
> > Will there be an option to do a SELECT query to read all the annotations
> > of a table? Something like *"SELECT * FROM system_schema.annotations
> > where keyspace_name=<> and table_name=<>"*
> > It would be helpful to have a structured CQL query on top of printing the
> > annotations through DESC so that the information can be consumed easily.
> >
> > Jaydeep
> >
> > On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa 
> > wrote:
> >
> >> Thanks, Joel, for the positive response.
> >>
> >> 1. User-defined vs. pre-defined annotation types
> >>
> >> We'd like to have one predefined annotation, Description, but also give
> >> users the flexibility to create new ones. If a user feels that a custom
> >> annotation like @Desc suits their use case, they should be allowed to use
> >> it, as these elements are purely descriptive and have no actions associated
> >> with them.
> >>
> >> 2. Syntactically, is it worth considering other alternatives?
> >>
> >> You're concerned that having several annotations on multiple columns
> >> could make schemas difficult to read. For now, we can have annotations
> >> printed as part of DESCRIBE statements. If there's a strong need to
> >> suppress annotations for readability, we could explore enriching the syntax
> >> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing
> >> DESCRIBE [FULL] SCHEMA.
> >>
> >> Thanks,
> >> Jyothsna
> >>
> >> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa 
> >> wrote:
> >>
> >>> Thanks, Stefan, for your feedback!
> >>>
> >>> To answer your questions,
> >>>
> >>> 1. I agree; annotations can optionally take arguments, and if an
> >>> annotation doesn't have an argument, we can skip the arguments in the
> >>> "DESCRIBE" statement's output.
> >>>
> >>> 2. Good point. We originally considered using "ANNOTATED WITH" but found
> >>> it too verbose. As an alternative, we proposed using "@" preceding the
> >>> annotation to signal it to the parser. We are open to using an explicit
> >>> phrase like "ANNOTATED WITH" if you think it would make the code more
> >>> readable.
> >>>
> >>> A full example of annotations along with constraints and masking could
> >>> be:
> >>>
> >>>
> >>> CREATE TABLE test_ks.test_table (
> >>> id int PRIMARY KEY,
> >>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this
> >>> is column col2') MASKED WITH default()
> >>> );
> >>>
> >>> OR
> >>>
> >>> CREATE TABLE test_ks.test_table (
> >>> id int PRIMARY KEY,
> >>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column col2')
> >>> MASKED WITH default()
> >>> );
> >>>
> >>>
> >>>
> >>> 3. We do not have a prototype yet, but I think we will have to introduce
> >>> new parsing branch for annotations at the table level
> >>>
> >>> I hope I answered all your questions!
> >>>
> >>> - Jyothsna
> >>>
> >>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd 
> >>> wrote:
> >>>
>  I like the aim of the CEP. Completely onboard with the idea that GenAI
>  tooling works better when you can provide it useful context about the 
>  data
>  it is working with. An organization I worked with in the past had a lot 
>  of
>  good results with marking up API models (not DB schemas, but similar 
>  idea)
>  with authorization-related annotations and using those to drive policy
>  linters and end-user interfaces. So, sold on the value of the capability.
> 
>  Two things I'm less sure of:
> 
>  1) User-defined vs pre-defined annotation types: I appreciate the
>  flexibility that user-defined annotations appears to give, but it adds
>  extra room for error. E.g. if annotation names are case-sensitive, do I
>  (the user) have to actively prevent creation of @description? Or, police
>  the accidental creation of alternative names like @Desc? If the community
>  settled on a small, fixed set of supported annotations, so Cassandra 
>  itself
>  was authoritative for valid annotation names, would make the feature a 
>  lot
>  less valuable, or prevent offering user-defined annotations in the 
>  future?
> 
>  2) Syntactically, is it worth considering other alternatives? I was
>  trying to imagine a CREATE TABLE statement marked up with two or three
>  types of column-level annotations, and my sense is that

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-11 Thread Štefan Miklošovič
I think there are in theory at least two ways how you could model this
(maybe there is more?)

1) Serialize these annotations and save them as part of TCM as part of
ColumnMetadata.

and additionally

2) have a dedicated table of them, like masks (look into
addTableToSchemaMutation and addColumnToSchemaMutation) - what this does is
that it will look if there are any masks on a column and if there are,
there will be new mutation added which will record this fact to
system_schema.column_masks. Similar concept is used for e.g. triggers,
dropped columns and so on, so there is a dedicated table for each of these.
Hence you could have system_schema.annotations which would be populated
accordingly.

For 1), when it comes to annotations on columns, you might just follow how
it was done for CEP-42 / constraints. For annotations on other elements,
(keyspaces, tables themselves etc), I am not completely sure about that as
it is way more involved if you go to cover this in such depth and it would
need further investigation.

for 2), there would need to be some common table for every element which
can be annotated otherwise we would have to have an "annotation table per
cql element" and I do not think that is necessary. On the other hand I do
not know how the schema of such a table would look like, because some
entries would have keyspace and table, some only keyspace (keyspace
itself), also functions and aggregates do not have a table assigned to them
etc.

Annotations might be a column of map. If you wanted to be an
UDT, you could not save it into a virtual table as that is not supported
yet (1)

What Jaydeep is suggesting makes sense. For example, for now, Jon's MCP
server looks into system_views and tells it to look into it so his thing
"learns" what is going on inside Cassandra based on the content of these
tables.

If we had a table with annotations, he could just point it there and be
done with it - it knows about all the annotations suddenly.

If it was visible only from DESCRIBE ... how that would look, like, what
would be parsed? Does it mean that  you would need to do "DESCRIBE
KEYSPACE" for each keyspace there is and then somehow learn how to parse
annotations?

It would be also cool to just scan one table, programmatically, and you
would know what annotations there are. If it was just in DESCRIBE, how
would you know where all your PII fields fast (or if there are any such
annotations?)

(1) https://issues.apache.org/jira/browse/CASSANDRA-19560

On Mon, Aug 11, 2025 at 3:03 AM Yifan Cai  wrote:

> Will there be an option to do a SELECT query to read all the annotations
>> of a table?
>
>
> It is an interesting question! Would you mind sharing an example of the
> output you'd expect from a query like *"SELECT * FROM
> system_schema.annotations where keyspace_name=<> and table_name=<>"*? I
> am curious how that might differ from what we get when running "DESC TABLE".
>
> - Yifan
>
> On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> >we could explore enriching the syntax with DESCRIBE
>>
>> Will there be an option to do a SELECT query to read all the annotations
>> of a table? Something like *"SELECT * FROM system_schema.annotations
>> where keyspace_name=<> and table_name=<>"*
>> It would be helpful to have a structured CQL query on top of printing the
>> annotations through DESC so that the information can be consumed easily.
>>
>> Jaydeep
>>
>> On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa 
>> wrote:
>>
>>> Thanks, Joel, for the positive response.
>>>
>>> 1. User-defined vs. pre-defined annotation types
>>>
>>> We'd like to have one predefined annotation, Description, but also give
>>> users the flexibility to create new ones. If a user feels that a custom
>>> annotation like @Desc suits their use case, they should be allowed to use
>>> it, as these elements are purely descriptive and have no actions associated
>>> with them.
>>>
>>> 2. Syntactically, is it worth considering other alternatives?
>>>
>>> You're concerned that having several annotations on multiple columns
>>> could make schemas difficult to read. For now, we can have annotations
>>> printed as part of DESCRIBE statements. If there's a strong need to
>>> suppress annotations for readability, we could explore enriching the syntax
>>> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing
>>> DESCRIBE [FULL] SCHEMA.
>>>
>>> Thanks,
>>> Jyothsna
>>>
>>> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa 
>>> wrote:
>>>
 Thanks, Stefan, for your feedback!

 To answer your questions,

 1. I agree; annotations can optionally take arguments, and if an
 annotation doesn't have an argument, we can skip the arguments in the
 "DESCRIBE" statement's output.

 2. Good point. We originally considered using "ANNOTATED WITH" but
 found it too verbose. As an alternative, we proposed using "@" preceding
 the annotation to signal it to the parser. We are open to

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-10 Thread Yifan Cai
>
> Will there be an option to do a SELECT query to read all the annotations
> of a table?


It is an interesting question! Would you mind sharing an example of the
output you'd expect from a query like *"SELECT * FROM
system_schema.annotations where keyspace_name=<> and table_name=<>"*? I am
curious how that might differ from what we get when running "DESC TABLE".

- Yifan

On Sat, Aug 9, 2025 at 9:43 AM Jaydeep Chovatia 
wrote:

> >we could explore enriching the syntax with DESCRIBE
>
> Will there be an option to do a SELECT query to read all the annotations
> of a table? Something like *"SELECT * FROM system_schema.annotations
> where keyspace_name=<> and table_name=<>"*
> It would be helpful to have a structured CQL query on top of printing the
> annotations through DESC so that the information can be consumed easily.
>
> Jaydeep
>
> On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa 
> wrote:
>
>> Thanks, Joel, for the positive response.
>>
>> 1. User-defined vs. pre-defined annotation types
>>
>> We'd like to have one predefined annotation, Description, but also give
>> users the flexibility to create new ones. If a user feels that a custom
>> annotation like @Desc suits their use case, they should be allowed to use
>> it, as these elements are purely descriptive and have no actions associated
>> with them.
>>
>> 2. Syntactically, is it worth considering other alternatives?
>>
>> You're concerned that having several annotations on multiple columns
>> could make schemas difficult to read. For now, we can have annotations
>> printed as part of DESCRIBE statements. If there's a strong need to
>> suppress annotations for readability, we could explore enriching the syntax
>> with DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing
>> DESCRIBE [FULL] SCHEMA.
>>
>> Thanks,
>> Jyothsna
>>
>> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa 
>> wrote:
>>
>>> Thanks, Stefan, for your feedback!
>>>
>>> To answer your questions,
>>>
>>> 1. I agree; annotations can optionally take arguments, and if an
>>> annotation doesn't have an argument, we can skip the arguments in the
>>> "DESCRIBE" statement's output.
>>>
>>> 2. Good point. We originally considered using "ANNOTATED WITH" but found
>>> it too verbose. As an alternative, we proposed using "@" preceding the
>>> annotation to signal it to the parser. We are open to using an explicit
>>> phrase like "ANNOTATED WITH" if you think it would make the code more
>>> readable.
>>>
>>> A full example of annotations along with constraints and masking could
>>> be:
>>>
>>>
>>> CREATE TABLE test_ks.test_table (
>>> id int PRIMARY KEY,
>>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this
>>> is column col2') MASKED WITH default()
>>> );
>>>
>>> OR
>>>
>>> CREATE TABLE test_ks.test_table (
>>> id int PRIMARY KEY,
>>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column col2')
>>> MASKED WITH default()
>>> );
>>>
>>>
>>>
>>> 3. We do not have a prototype yet, but I think we will have to introduce
>>> new parsing branch for annotations at the table level
>>>
>>> I hope I answered all your questions!
>>>
>>> - Jyothsna
>>>
>>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd 
>>> wrote:
>>>
 I like the aim of the CEP. Completely onboard with the idea that GenAI
 tooling works better when you can provide it useful context about the data
 it is working with. An organization I worked with in the past had a lot of
 good results with marking up API models (not DB schemas, but similar idea)
 with authorization-related annotations and using those to drive policy
 linters and end-user interfaces. So, sold on the value of the capability.

 Two things I'm less sure of:

 1) User-defined vs pre-defined annotation types: I appreciate the
 flexibility that user-defined annotations appears to give, but it adds
 extra room for error. E.g. if annotation names are case-sensitive, do I
 (the user) have to actively prevent creation of @description? Or, police
 the accidental creation of alternative names like @Desc? If the community
 settled on a small, fixed set of supported annotations, so Cassandra itself
 was authoritative for valid annotation names, would make the feature a lot
 less valuable, or prevent offering user-defined annotations in the future?

 2) Syntactically, is it worth considering other alternatives? I was
 trying to imagine a CREATE TABLE statement marked up with two or three
 types of column-level annotations, and my sense is that it could get hard
 to read quickly. Is it worth considering Javadoc-style annotations in
 schema comments instead? I think in today's world that means that they
 would not be accessible via CQL/Cassandra (CQL comments are not persisted
 as part of the schema, correct?) but they could be accessible to other
 schema-processing tools and IMO be a more readable syntax. It'd be good to
 work thro

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-09 Thread Jaydeep Chovatia
>we could explore enriching the syntax with DESCRIBE

Will there be an option to do a SELECT query to read all the annotations of
a table? Something like *"SELECT * FROM system_schema.annotations where
keyspace_name=<> and table_name=<>"*
It would be helpful to have a structured CQL query on top of printing the
annotations through DESC so that the information can be consumed easily.

Jaydeep

On Fri, Aug 8, 2025 at 11:03 AM Jyothsna Konisa 
wrote:

> Thanks, Joel, for the positive response.
>
> 1. User-defined vs. pre-defined annotation types
>
> We'd like to have one predefined annotation, Description, but also give
> users the flexibility to create new ones. If a user feels that a custom
> annotation like @Desc suits their use case, they should be allowed to use
> it, as these elements are purely descriptive and have no actions associated
> with them.
>
> 2. Syntactically, is it worth considering other alternatives?
>
> You're concerned that having several annotations on multiple columns could
> make schemas difficult to read. For now, we can have annotations printed as
> part of DESCRIBE statements. If there's a strong need to suppress
> annotations for readability, we could explore enriching the syntax with
> DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing DESCRIBE
> [FULL] SCHEMA.
>
> Thanks,
> Jyothsna
>
> On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa 
> wrote:
>
>> Thanks, Stefan, for your feedback!
>>
>> To answer your questions,
>>
>> 1. I agree; annotations can optionally take arguments, and if an
>> annotation doesn't have an argument, we can skip the arguments in the
>> "DESCRIBE" statement's output.
>>
>> 2. Good point. We originally considered using "ANNOTATED WITH" but found
>> it too verbose. As an alternative, we proposed using "@" preceding the
>> annotation to signal it to the parser. We are open to using an explicit
>> phrase like "ANNOTATED WITH" if you think it would make the code more
>> readable.
>>
>> A full example of annotations along with constraints and masking could be:
>>
>>
>> CREATE TABLE test_ks.test_table (
>> id int PRIMARY KEY,
>> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this is
>> column col2') MASKED WITH default()
>> );
>>
>> OR
>>
>> CREATE TABLE test_ks.test_table (
>> id int PRIMARY KEY,
>> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column col2')
>> MASKED WITH default()
>> );
>>
>>
>>
>> 3. We do not have a prototype yet, but I think we will have to introduce
>> new parsing branch for annotations at the table level
>>
>> I hope I answered all your questions!
>>
>> - Jyothsna
>>
>> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd 
>> wrote:
>>
>>> I like the aim of the CEP. Completely onboard with the idea that GenAI
>>> tooling works better when you can provide it useful context about the data
>>> it is working with. An organization I worked with in the past had a lot of
>>> good results with marking up API models (not DB schemas, but similar idea)
>>> with authorization-related annotations and using those to drive policy
>>> linters and end-user interfaces. So, sold on the value of the capability.
>>>
>>> Two things I'm less sure of:
>>>
>>> 1) User-defined vs pre-defined annotation types: I appreciate the
>>> flexibility that user-defined annotations appears to give, but it adds
>>> extra room for error. E.g. if annotation names are case-sensitive, do I
>>> (the user) have to actively prevent creation of @description? Or, police
>>> the accidental creation of alternative names like @Desc? If the community
>>> settled on a small, fixed set of supported annotations, so Cassandra itself
>>> was authoritative for valid annotation names, would make the feature a lot
>>> less valuable, or prevent offering user-defined annotations in the future?
>>>
>>> 2) Syntactically, is it worth considering other alternatives? I was
>>> trying to imagine a CREATE TABLE statement marked up with two or three
>>> types of column-level annotations, and my sense is that it could get hard
>>> to read quickly. Is it worth considering Javadoc-style annotations in
>>> schema comments instead? I think in today's world that means that they
>>> would not be accessible via CQL/Cassandra (CQL comments are not persisted
>>> as part of the schema, correct?) but they could be accessible to other
>>> schema-processing tools and IMO be a more readable syntax. It'd be good to
>>> work through a couple use-cases for actually using the data provided by the
>>> annotations and get a sense of whether making them first-class entities in
>>> CQL is necessary for getting most of the value from them.
>>>
>>> Thanks -- Joel.
>>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
>>>
>>> Sorry for the incorrect editable link, here is the updated link to the CEP
>>> 52: Schema Annotations for ApacheCassandra
>>> 
>>>
>>> On Wed, Aug 6, 2025 at 4:26 P

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-08 Thread Jyothsna Konisa
Thanks, Joel, for the positive response.

1. User-defined vs. pre-defined annotation types

We'd like to have one predefined annotation, Description, but also give
users the flexibility to create new ones. If a user feels that a custom
annotation like @Desc suits their use case, they should be allowed to use
it, as these elements are purely descriptive and have no actions associated
with them.

2. Syntactically, is it worth considering other alternatives?

You're concerned that having several annotations on multiple columns could
make schemas difficult to read. For now, we can have annotations printed as
part of DESCRIBE statements. If there's a strong need to suppress
annotations for readability, we could explore enriching the syntax with
DESCRIBE [FULL] SCHEMA [WITH ANNOTATIONS], similar to the existing DESCRIBE
[FULL] SCHEMA.

Thanks,
Jyothsna

On Fri, Aug 8, 2025 at 10:56 AM Jyothsna Konisa 
wrote:

> Thanks, Stefan, for your feedback!
>
> To answer your questions,
>
> 1. I agree; annotations can optionally take arguments, and if an
> annotation doesn't have an argument, we can skip the arguments in the
> "DESCRIBE" statement's output.
>
> 2. Good point. We originally considered using "ANNOTATED WITH" but found
> it too verbose. As an alternative, we proposed using "@" preceding the
> annotation to signal it to the parser. We are open to using an explicit
> phrase like "ANNOTATED WITH" if you think it would make the code more
> readable.
>
> A full example of annotations along with constraints and masking could be:
>
>
> CREATE TABLE test_ks.test_table (
> id int PRIMARY KEY,
> col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this is
> column col2') MASKED WITH default()
> );
>
> OR
>
> CREATE TABLE test_ks.test_table (
> id int PRIMARY KEY,
> col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column col2')
> MASKED WITH default()
> );
>
>
>
> 3. We do not have a prototype yet, but I think we will have to introduce
> new parsing branch for annotations at the table level
>
> I hope I answered all your questions!
>
> - Jyothsna
>
> On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd  wrote:
>
>> I like the aim of the CEP. Completely onboard with the idea that GenAI
>> tooling works better when you can provide it useful context about the data
>> it is working with. An organization I worked with in the past had a lot of
>> good results with marking up API models (not DB schemas, but similar idea)
>> with authorization-related annotations and using those to drive policy
>> linters and end-user interfaces. So, sold on the value of the capability.
>>
>> Two things I'm less sure of:
>>
>> 1) User-defined vs pre-defined annotation types: I appreciate the
>> flexibility that user-defined annotations appears to give, but it adds
>> extra room for error. E.g. if annotation names are case-sensitive, do I
>> (the user) have to actively prevent creation of @description? Or, police
>> the accidental creation of alternative names like @Desc? If the community
>> settled on a small, fixed set of supported annotations, so Cassandra itself
>> was authoritative for valid annotation names, would make the feature a lot
>> less valuable, or prevent offering user-defined annotations in the future?
>>
>> 2) Syntactically, is it worth considering other alternatives? I was
>> trying to imagine a CREATE TABLE statement marked up with two or three
>> types of column-level annotations, and my sense is that it could get hard
>> to read quickly. Is it worth considering Javadoc-style annotations in
>> schema comments instead? I think in today's world that means that they
>> would not be accessible via CQL/Cassandra (CQL comments are not persisted
>> as part of the schema, correct?) but they could be accessible to other
>> schema-processing tools and IMO be a more readable syntax. It'd be good to
>> work through a couple use-cases for actually using the data provided by the
>> annotations and get a sense of whether making them first-class entities in
>> CQL is necessary for getting most of the value from them.
>>
>> Thanks -- Joel.
>> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
>>
>> Sorry for the incorrect editable link, here is the updated link to the CEP
>> 52: Schema Annotations for ApacheCassandra
>> 
>>
>> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa 
>> wrote:
>>
>>> Hello Everyone!
>>>
>>> We would like to propose CEP 52: Schema Annotations for ApacheCassandra
>>> 
>>>
>>> This CEP outlines a plan to introduce *Schema Annotations* as a way to
>>> add better context to schema elements. We're also proposing a set of new
>>> DDL statements to manage these annotations.
>>>
>>> We believe these annotations will be highly beneficial for several key
>>> areas:
>>>
>>>-
>>>

Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-08 Thread Jyothsna Konisa
Thanks, Stefan, for your feedback!

To answer your questions,

1. I agree; annotations can optionally take arguments, and if an annotation
doesn't have an argument, we can skip the arguments in the "DESCRIBE"
statement's output.

2. Good point. We originally considered using "ANNOTATED WITH" but found it
too verbose. As an alternative, we proposed using "@" preceding the
annotation to signal it to the parser. We are open to using an explicit
phrase like "ANNOTATED WITH" if you think it would make the code more
readable.

A full example of annotations along with constraints and masking could be:


CREATE TABLE test_ks.test_table (
id int PRIMARY KEY,
col2 int CHECK col2 > 0 ANNOTATED WITH @PII AND @DESCRIPTION('this is
column col2') MASKED WITH default()
);

OR

CREATE TABLE test_ks.test_table (
id int PRIMARY KEY,
col2 int CHECK col2 > 0 @PII AND @DESCRIPTION('this is column col2')
MASKED WITH default()
);



3. We do not have a prototype yet, but I think we will have to introduce
new parsing branch for annotations at the table level

I hope I answered all your questions!

- Jyothsna

On Thu, Aug 7, 2025 at 11:36 AM Joel Shepherd  wrote:

> I like the aim of the CEP. Completely onboard with the idea that GenAI
> tooling works better when you can provide it useful context about the data
> it is working with. An organization I worked with in the past had a lot of
> good results with marking up API models (not DB schemas, but similar idea)
> with authorization-related annotations and using those to drive policy
> linters and end-user interfaces. So, sold on the value of the capability.
>
> Two things I'm less sure of:
>
> 1) User-defined vs pre-defined annotation types: I appreciate the
> flexibility that user-defined annotations appears to give, but it adds
> extra room for error. E.g. if annotation names are case-sensitive, do I
> (the user) have to actively prevent creation of @description? Or, police
> the accidental creation of alternative names like @Desc? If the community
> settled on a small, fixed set of supported annotations, so Cassandra itself
> was authoritative for valid annotation names, would make the feature a lot
> less valuable, or prevent offering user-defined annotations in the future?
>
> 2) Syntactically, is it worth considering other alternatives? I was trying
> to imagine a CREATE TABLE statement marked up with two or three types of
> column-level annotations, and my sense is that it could get hard to read
> quickly. Is it worth considering Javadoc-style annotations in schema
> comments instead? I think in today's world that means that they would not
> be accessible via CQL/Cassandra (CQL comments are not persisted as part of
> the schema, correct?) but they could be accessible to other
> schema-processing tools and IMO be a more readable syntax. It'd be good to
> work through a couple use-cases for actually using the data provided by the
> annotations and get a sense of whether making them first-class entities in
> CQL is necessary for getting most of the value from them.
>
> Thanks -- Joel.
> On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
>
> Sorry for the incorrect editable link, here is the updated link to the CEP
> 52: Schema Annotations for ApacheCassandra
> 
>
> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa 
> wrote:
>
>> Hello Everyone!
>>
>> We would like to propose CEP 52: Schema Annotations for ApacheCassandra
>> 
>>
>> This CEP outlines a plan to introduce *Schema Annotations* as a way to
>> add better context to schema elements. We're also proposing a set of new
>> DDL statements to manage these annotations.
>>
>> We believe these annotations will be highly beneficial for several key
>> areas:
>>
>>-
>>
>>GenAI Applications: Providing more context to LLMs could
>>significantly improve the accuracy and relevance of generated content.
>>-
>>
>>Data Governance: Annotations can help in enforcing policies using
>>annotations
>>-
>>
>>Compliance: They can be used to track and manage compliance
>>requirements directly within the schema.
>>
>> We're eager to hear your thoughts and feedback on this proposal. Please
>> keep the discussion within this mailing thread.
>>
>> Thanks for your time and feedback in advance.
>>
>> Best regards,
>>
>> Jyothsna & Yifan
>>
>>
>>
>>


Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-07 Thread Joel Shepherd
I like the aim of the CEP. Completely onboard with the idea that GenAI 
tooling works better when you can provide it useful context about the 
data it is working with. An organization I worked with in the past had a 
lot of good results with marking up API models (not DB schemas, but 
similar idea) with authorization-related annotations and using those to 
drive policy linters and end-user interfaces. So, sold on the value of 
the capability.


Two things I'm less sure of:

1) User-defined vs pre-defined annotation types: I appreciate the 
flexibility that user-defined annotations appears to give, but it adds 
extra room for error. E.g. if annotation names are case-sensitive, do I 
(the user) have to actively prevent creation of @description? Or, police 
the accidental creation of alternative names like @Desc? If the 
community settled on a small, fixed set of supported annotations, so 
Cassandra itself was authoritative for valid annotation names, would 
make the feature a lot less valuable, or prevent offering user-defined 
annotations in the future?


2) Syntactically, is it worth considering other alternatives? I was 
trying to imagine a CREATE TABLE statement marked up with two or three 
types of column-level annotations, and my sense is that it could get 
hard to read quickly. Is it worth considering Javadoc-style annotations 
in schema comments instead? I think in today's world that means that 
they would not be accessible via CQL/Cassandra (CQL comments are not 
persisted as part of the schema, correct?) but they could be accessible 
to other schema-processing tools and IMO be a more readable syntax. It'd 
be good to work through a couple use-cases for actually using the data 
provided by the annotations and get a sense of whether making them 
first-class entities in CQL is necessary for getting most of the value 
from them.


Thanks -- Joel.

On 8/6/2025 6:59 PM, Jyothsna Konisa wrote:
Sorry for the incorrect editable link, here is the updated link to the 
CEP 52: Schema Annotations for ApacheCassandra 



On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa 
 wrote:


Hello Everyone!

We would like to propose CEP 52: Schema Annotations for
ApacheCassandra



This CEP outlines a plan to introduce*Schema Annotations*as a way
to add better context to schema elements. We're also proposing a
set of new DDL statements to manage these annotations.

We believe these annotations will be highly beneficial for several
key areas:

 *

GenAI Applications:Providing more context to LLMs could
significantly improve the accuracy and relevance of generated
content.

 *

Data Governance:Annotations can help in enforcing policies
using annotations

 *

Compliance:They can be used to track and manage compliance
requirements directly within the schema.

We're eager to hear your thoughts and feedback on this proposal.
Please keep the discussion within this mailing thread.

Thanks for your time and feedback in advance.

Best regards,

Jyothsna & Yifan





Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-07 Thread Štefan Miklošovič
Hi Jyothsna,

Thank you for this proposal.

While reading it, various questions / points came to my mind:

1) In your examples, Description can take an argument - a comment
describing an element of a schema. However, I can imagine that not all
annotations need to have an argument like that. E.g. @PII can be without
any. It would be nice if your "framework" supported annotation creation
which does not take any arguments. How do you differentiate, on creation,
if an annotation is meant to take an argument or not? All annotations might
take a comment by default, but if a user does not specify any, this would
be also reflected in "DESCRIBE" which would display just the short version
- "@PII" instead of "@PII('')".

2) It seems to me that after CEP-42, we opened doors to the "additions" to
columns. If we have this:

CREATE TABLE ks.tb2 (id int CHECK id > 0 PRIMARY KEY);

or

CREATE TABLE ks.tb3 (id int CHECK id > 0 PRIMARY KEY, col2 text NOT NULL);

I think it would be worth thinking about how to specify annotations with
it. There are already some minor caveats when checks are being mixed with
data masks and I can see how incorporating annotations into this might be a
source of problems which would need to be addressed as well.

Do you propose something like this? I am open-minded about other approaches
here.

CREATE TABLE ks.tb3 (
id int PRIMARY KEY,
col2 text NOT NULL WITH @PII
);

CREATE TABLE ks.tb3 (
id int PRIMARY KEY,
col2 id CHECK col2 > 0 WITH @PII
);

All I am saying is that this integration needs to be thought through to
play together with masks, constraints etc and I think it will not be
completely trivial to do this on syntax level in antlr.

To make your life easier I guess that this would be more appropriate on
grammar level:

CREATE TABLE ks.tb3 (
id int PRIMARY KEY,
col2 id CHECK col2 > 0 ANNOTATIONS @PII AND @Description('123')
);

Also, I am not sure if you already have some prototype or not, but I am
curious how you achieved this:

// Viewing annotations of a table
DESC table .

-- prints

CREATE TABLE . (
column1 text, .)
WITH @Description('New annotation on table')

because, for now, everything in "WITH" is a completely separate entry in
TableParams.

I guess it might be expanded to something like this to be more illustrative:

CREATE TABLE . (
column1 text, .)
) WITH additional_write_policy = '99p'
AND allow_auto_snapshot = true
AND bloom_filter_fp_chance = 0.01
  ...
AND @Description('New annotation on table')
AND @PII

Again, I think that having a dedicated ANNOTATIONS entry

CREATE TABLE . (
column1 text, .)
) WITH additional_write_policy = '99p'
AND allow_auto_snapshot = true
AND bloom_filter_fp_chance = 0.01
  ...
AND ANNOTATIONS @PII AND @Description('New annotation on table');

Would be better because then ANNOTATIONS would serve as a "container" for
all of it which would be eventually stored in TCM etc.

On Thu, Aug 7, 2025 at 3:59 AM Jyothsna Konisa 
wrote:

> Sorry for the incorrect editable link, here is the updated link to the CEP
> 52: Schema Annotations for ApacheCassandra
> 
>
> On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa 
> wrote:
>
>> Hello Everyone!
>>
>> We would like to propose CEP 52: Schema Annotations for ApacheCassandra
>> 
>>
>> This CEP outlines a plan to introduce *Schema Annotations* as a way to
>> add better context to schema elements. We're also proposing a set of new
>> DDL statements to manage these annotations.
>>
>> We believe these annotations will be highly beneficial for several key
>> areas:
>>
>>-
>>
>>GenAI Applications: Providing more context to LLMs could
>>significantly improve the accuracy and relevance of generated content.
>>-
>>
>>Data Governance: Annotations can help in enforcing policies using
>>annotations
>>-
>>
>>Compliance: They can be used to track and manage compliance
>>requirements directly within the schema.
>>
>> We're eager to hear your thoughts and feedback on this proposal. Please
>> keep the discussion within this mailing thread.
>>
>> Thanks for your time and feedback in advance.
>>
>> Best regards,
>>
>> Jyothsna & Yifan
>>
>>
>>
>>


Re: [DISCUSS] CEP 52: Schema Annotations for ApacheCassandra

2025-08-06 Thread Jyothsna Konisa
Sorry for the incorrect editable link, here is the updated link to the CEP
52: Schema Annotations for ApacheCassandra


On Wed, Aug 6, 2025 at 4:26 PM Jyothsna Konisa 
wrote:

> Hello Everyone!
>
> We would like to propose CEP 52: Schema Annotations for ApacheCassandra
> 
>
> This CEP outlines a plan to introduce *Schema Annotations* as a way to
> add better context to schema elements. We're also proposing a set of new
> DDL statements to manage these annotations.
>
> We believe these annotations will be highly beneficial for several key
> areas:
>
>-
>
>GenAI Applications: Providing more context to LLMs could significantly
>improve the accuracy and relevance of generated content.
>-
>
>Data Governance: Annotations can help in enforcing policies using
>annotations
>-
>
>Compliance: They can be used to track and manage compliance
>requirements directly within the schema.
>
> We're eager to hear your thoughts and feedback on this proposal. Please
> keep the discussion within this mailing thread.
>
> Thanks for your time and feedback in advance.
>
> Best regards,
>
> Jyothsna & Yifan
>
>
>
>