Re: [DISCUSS] CEP-42: Constraints Framework

2024-07-01 Thread Bernardo Botella
Thanks everyone for all the feedback that came in after the call for votes.

To Yifan's point, yes you are right, and I updated the CEP with the expressions.

There’s been a really good discussion around adding or supporting constraints 
at read time. I think the point Doug made illustrate that such constraints may 
come with rough edges that have other implications that need be taken care of. 
Due to that, I’d like to follow Dinesh’s suggestion of deferring it, and start 
again with the call for votes for the proposal. 

I will resurface the call for votes thread.

Thanks everyone!
Bernardo

> On Jun 29, 2024, at 1:26 PM, Dinesh Joshi  wrote:
> 
> The read time constraint application is going to be expensive and possibly 
> complicated to implement with low RoI. Therefore my suggestion is to defer 
> it. If there are situations where it appears to be helpful, we can always 
> reconsider it.
> 
> On Tue, Jun 25, 2024 at 3:34 PM Yifan Cai  > wrote:
>>> - Alter and Drop constraints are as follows
>>> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
>> 
>> I think you mean the following syntax to modify existing constraints, since 
>> constraints are part of the table definition. 
>> ALTER TABLE [keyspace_name.]table_name ALTER CONSTRAINT [constraint_name] 
>> CHECK check_expression
>> 
>> Dinesh's proposal to check on read is a good addition. I think it is 
>> optional and should be enabled/disabled w/ configuration. The extra check 
>> may not be desirable in some circumstances, e.g. the use cases do not ever 
>> change the constraints and do not have other write data other than CQL. 
>> Since the original CEP defines that the constraints are applied at the write 
>> time, we need to update the CEP if we decide to include the check on read.
>> 
>> - Yifan
>> 
>> 
>> On Tue, Jun 25, 2024 at 1:13 PM Štefan Miklošovič > > wrote:
>>> I wonder how often it is that users will apply the constraints on tables 
>>> with data while they know their data is probably not compliant with the 
>>> constraint configuration. I humbly think that people are aware of this in 
>>> advance and what usually happens is that there is some kind of a job which 
>>> consolidates the data (or migrates them to a new table) before admins put a 
>>> "lid" on that so moving forward nobody puts there anything which would 
>>> violate it.
>>> 
>>> I probably have not kept myself up to date with the discussion but I was 
>>> thinking that constraints are effectively there just on the write path. 
>>> Whatever is read is not a job of a constraint to refuse to return.
>>> 
>>> On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi >> > wrote:
 Abe, that's a good point. We need to call out distinct use-cases here. 
 When a fresh cluster is set up with constraints we don't have any issues 
 because the data written and read back is going to be compliant to the 
 constraint(s). For existing data in a cluster where new constraints are 
 applied or existing constraints changed in such a way that may render 
 existing data unreadable, we need a good user experience. This is what I 
 propose –
 
 1. When a constraint is added or changed in such a way that existing data 
 could be rendered unreadable, we should warn the user.
 
 2. Give the user a choice of whether it is ok for the data to be rendered 
 unreadable and an error is issued or a warning should be issued when the 
 read violates the constraint but data is still readable. New data going in 
 will meet the constraint but old data would need to be rewritten for the 
 application to make it compliant.
 
 With this approach the application developer can decide what is right for 
 their particular use-case. In many cases the application developer may 
 decide to rewrite the data when they see a warning.
 
 
 On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky >>> > wrote:
> If we're going to introduce a feature that looks like SQL constraints, we 
> should make sure it's "reasonably" compliant. In particular, we should 
> avoid situations where a user creates a constraint, writes some data, 
> then reads data that violates that constraint, unless they've expressed 
> that violations on read would be acceptable.
> 
> For Postgres, when adding a new constraint you can specify NOT VALID to 
> avoid scanning all existing relevant data[1]. If we want to avoid 
> scan-on-DDL, this tradeoff needs to be made clear to a user.
> 
> As we've already discussed, constraints must deal with operations that 
> appear within limits on the write path, but once reconciled on read or 
> during compaction can lead to a violation. Adding to non-frozen 
> collections is one example. Expecting users to understand the write path 
> for collections feels unrealistic t

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-29 Thread Dinesh Joshi
The read time constraint application is going to be expensive and possibly
complicated to implement with low RoI. Therefore my suggestion is to defer
it. If there are situations where it appears to be helpful, we can always
reconsider it.

On Tue, Jun 25, 2024 at 3:34 PM Yifan Cai  wrote:

> - Alter and Drop constraints are as follows
>> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
>>
>
> I think you mean the following syntax to modify existing constraints,
> since constraints are part of the table definition.
> ALTER TABLE [keyspace_name.]table_name ALTER CONSTRAINT [constraint_name]
> CHECK check_expression
>
> Dinesh's proposal to check on read is a good addition. I think it is
> *optional* and should be enabled/disabled w/ configuration. The extra
> check may not be desirable in some circumstances, e.g. the use cases do not
> ever change the constraints and do not have other write data other than
> CQL.
> Since the original CEP defines that the constraints are applied at the
> write time, we need to update the CEP if we decide to include the check on
> read.
>
> - Yifan
>
>
> On Tue, Jun 25, 2024 at 1:13 PM Štefan Miklošovič 
> wrote:
>
>> I wonder how often it is that users will apply the constraints on tables
>> with data while they know their data is probably not compliant with the
>> constraint configuration. I humbly think that people are aware of this in
>> advance and what usually happens is that there is some kind of a job which
>> consolidates the data (or migrates them to a new table) before admins put a
>> "lid" on that so moving forward nobody puts there anything which would
>> violate it.
>>
>> I probably have not kept myself up to date with the discussion but I was
>> thinking that constraints are effectively there just on the write path.
>> Whatever is read is not a job of a constraint to refuse to return.
>>
>> On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi  wrote:
>>
>>> Abe, that's a good point. We need to call out distinct use-cases here.
>>> When a fresh cluster is set up with constraints we don't have any issues
>>> because the data written and read back is going to be compliant to the
>>> constraint(s). For existing data in a cluster where new constraints are
>>> applied or existing constraints changed in such a way that may render
>>> existing data unreadable, we need a good user experience. This is what I
>>> propose –
>>>
>>> 1. When a constraint is added or changed in such a way that existing
>>> data could be rendered unreadable, we should warn the user.
>>>
>>> 2. Give the user a choice of whether it is ok for the data to be
>>> rendered unreadable and an error is issued or a warning should be issued
>>> when the read violates the constraint but data is still readable. New data
>>> going in will meet the constraint but old data would need to be rewritten
>>> for the application to make it compliant.
>>>
>>> With this approach the application developer can decide what is right
>>> for their particular use-case. In many cases the application developer may
>>> decide to rewrite the data when they see a warning.
>>>
>>>
>>> On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:
>>>
 If we're going to introduce a feature that looks like SQL constraints,
 we should make sure it's "reasonably" compliant. In particular, we should
 avoid situations where a user creates a constraint, writes some data, then
 reads data that violates that constraint, unless they've expressed that
 violations on read would be acceptable.

 For Postgres, when adding a new constraint you can specify NOT VALID to
 avoid scanning all existing relevant data[1]. If we want to avoid
 scan-on-DDL, this tradeoff needs to be made clear to a user.

 As we've already discussed, constraints must deal with operations that
 appear within limits on the write path, but once reconciled on read or
 during compaction can lead to a violation. Adding to non-frozen collections
 is one example. Expecting users to understand the write path for
 collections feels unrealistic to me; I wonder if we should express in the
 constraint itself that it only applies during write.

 Anything that uses "nodetool import" (including cassandra-analytics)
 could theoretically push constraint-violating mutations to a table. We
 could update import to scan table contents first, or add a flag to trust
 the data in imported SSTables and make cassandra-analytics executors aware
 of table-level constraints.

 Some client implementations read the system_schema tables to build
 their object mappers, I'd like to confirm that nothing will require clients
 to be aware of these new schema constructs.

 Overall, I'm supportive of the distinctions discussed between
 constraints and guardrails and like the direction this is heading; I'd just
 like to make sure the more detailed semantics aren't confusing or
 misleadin

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Yifan Cai
>
> - Alter and Drop constraints are as follows
> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
>

I think you mean the following syntax to modify existing constraints, since
constraints are part of the table definition.
ALTER TABLE [keyspace_name.]table_name ALTER CONSTRAINT [constraint_name]
CHECK check_expression

Dinesh's proposal to check on read is a good addition. I think it is
*optional* and should be enabled/disabled w/ configuration. The extra check
may not be desirable in some circumstances, e.g. the use cases do not ever
change the constraints and do not have other write data other than CQL.
Since the original CEP defines that the constraints are applied at the
write time, we need to update the CEP if we decide to include the check on
read.

- Yifan


On Tue, Jun 25, 2024 at 1:13 PM Štefan Miklošovič 
wrote:

> I wonder how often it is that users will apply the constraints on tables
> with data while they know their data is probably not compliant with the
> constraint configuration. I humbly think that people are aware of this in
> advance and what usually happens is that there is some kind of a job which
> consolidates the data (or migrates them to a new table) before admins put a
> "lid" on that so moving forward nobody puts there anything which would
> violate it.
>
> I probably have not kept myself up to date with the discussion but I was
> thinking that constraints are effectively there just on the write path.
> Whatever is read is not a job of a constraint to refuse to return.
>
> On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi  wrote:
>
>> Abe, that's a good point. We need to call out distinct use-cases here.
>> When a fresh cluster is set up with constraints we don't have any issues
>> because the data written and read back is going to be compliant to the
>> constraint(s). For existing data in a cluster where new constraints are
>> applied or existing constraints changed in such a way that may render
>> existing data unreadable, we need a good user experience. This is what I
>> propose –
>>
>> 1. When a constraint is added or changed in such a way that existing data
>> could be rendered unreadable, we should warn the user.
>>
>> 2. Give the user a choice of whether it is ok for the data to be rendered
>> unreadable and an error is issued or a warning should be issued when the
>> read violates the constraint but data is still readable. New data going in
>> will meet the constraint but old data would need to be rewritten for
>> the application to make it compliant.
>>
>> With this approach the application developer can decide what is right for
>> their particular use-case. In many cases the application developer may
>> decide to rewrite the data when they see a warning.
>>
>>
>> On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:
>>
>>> If we're going to introduce a feature that looks like SQL constraints,
>>> we should make sure it's "reasonably" compliant. In particular, we should
>>> avoid situations where a user creates a constraint, writes some data, then
>>> reads data that violates that constraint, unless they've expressed that
>>> violations on read would be acceptable.
>>>
>>> For Postgres, when adding a new constraint you can specify NOT VALID to
>>> avoid scanning all existing relevant data[1]. If we want to avoid
>>> scan-on-DDL, this tradeoff needs to be made clear to a user.
>>>
>>> As we've already discussed, constraints must deal with operations that
>>> appear within limits on the write path, but once reconciled on read or
>>> during compaction can lead to a violation. Adding to non-frozen collections
>>> is one example. Expecting users to understand the write path for
>>> collections feels unrealistic to me; I wonder if we should express in the
>>> constraint itself that it only applies during write.
>>>
>>> Anything that uses "nodetool import" (including cassandra-analytics)
>>> could theoretically push constraint-violating mutations to a table. We
>>> could update import to scan table contents first, or add a flag to trust
>>> the data in imported SSTables and make cassandra-analytics executors aware
>>> of table-level constraints.
>>>
>>> Some client implementations read the system_schema tables to build their
>>> object mappers, I'd like to confirm that nothing will require clients to be
>>> aware of these new schema constructs.
>>>
>>> Overall, I'm supportive of the distinctions discussed between
>>> constraints and guardrails and like the direction this is heading; I'd just
>>> like to make sure the more detailed semantics aren't confusing or
>>> misleading for our users, and semantics are much harder to change in the
>>> future.
>>>
>>> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
>>>
>>>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Štefan Miklošovič
I wonder how often it is that users will apply the constraints on tables
with data while they know their data is probably not compliant with the
constraint configuration. I humbly think that people are aware of this in
advance and what usually happens is that there is some kind of a job which
consolidates the data (or migrates them to a new table) before admins put a
"lid" on that so moving forward nobody puts there anything which would
violate it.

I probably have not kept myself up to date with the discussion but I was
thinking that constraints are effectively there just on the write path.
Whatever is read is not a job of a constraint to refuse to return.

On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi  wrote:

> Abe, that's a good point. We need to call out distinct use-cases here.
> When a fresh cluster is set up with constraints we don't have any issues
> because the data written and read back is going to be compliant to the
> constraint(s). For existing data in a cluster where new constraints are
> applied or existing constraints changed in such a way that may render
> existing data unreadable, we need a good user experience. This is what I
> propose –
>
> 1. When a constraint is added or changed in such a way that existing data
> could be rendered unreadable, we should warn the user.
>
> 2. Give the user a choice of whether it is ok for the data to be rendered
> unreadable and an error is issued or a warning should be issued when the
> read violates the constraint but data is still readable. New data going in
> will meet the constraint but old data would need to be rewritten for
> the application to make it compliant.
>
> With this approach the application developer can decide what is right for
> their particular use-case. In many cases the application developer may
> decide to rewrite the data when they see a warning.
>
>
> On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:
>
>> If we're going to introduce a feature that looks like SQL constraints, we
>> should make sure it's "reasonably" compliant. In particular, we should
>> avoid situations where a user creates a constraint, writes some data, then
>> reads data that violates that constraint, unless they've expressed that
>> violations on read would be acceptable.
>>
>> For Postgres, when adding a new constraint you can specify NOT VALID to
>> avoid scanning all existing relevant data[1]. If we want to avoid
>> scan-on-DDL, this tradeoff needs to be made clear to a user.
>>
>> As we've already discussed, constraints must deal with operations that
>> appear within limits on the write path, but once reconciled on read or
>> during compaction can lead to a violation. Adding to non-frozen collections
>> is one example. Expecting users to understand the write path for
>> collections feels unrealistic to me; I wonder if we should express in the
>> constraint itself that it only applies during write.
>>
>> Anything that uses "nodetool import" (including cassandra-analytics)
>> could theoretically push constraint-violating mutations to a table. We
>> could update import to scan table contents first, or add a flag to trust
>> the data in imported SSTables and make cassandra-analytics executors aware
>> of table-level constraints.
>>
>> Some client implementations read the system_schema tables to build their
>> object mappers, I'd like to confirm that nothing will require clients to be
>> aware of these new schema constructs.
>>
>> Overall, I'm supportive of the distinctions discussed between constraints
>> and guardrails and like the direction this is heading; I'd just like to
>> make sure the more detailed semantics aren't confusing or misleading for
>> our users, and semantics are much harder to change in the future.
>>
>> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
>>
>>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
Abe, that's a good point. We need to call out distinct use-cases here. When
a fresh cluster is set up with constraints we don't have any issues because
the data written and read back is going to be compliant to the
constraint(s). For existing data in a cluster where new constraints are
applied or existing constraints changed in such a way that may render
existing data unreadable, we need a good user experience. This is what I
propose –

1. When a constraint is added or changed in such a way that existing data
could be rendered unreadable, we should warn the user.

2. Give the user a choice of whether it is ok for the data to be rendered
unreadable and an error is issued or a warning should be issued when the
read violates the constraint but data is still readable. New data going in
will meet the constraint but old data would need to be rewritten for
the application to make it compliant.

With this approach the application developer can decide what is right for
their particular use-case. In many cases the application developer may
decide to rewrite the data when they see a warning.


On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky  wrote:

> If we're going to introduce a feature that looks like SQL constraints, we
> should make sure it's "reasonably" compliant. In particular, we should
> avoid situations where a user creates a constraint, writes some data, then
> reads data that violates that constraint, unless they've expressed that
> violations on read would be acceptable.
>
> For Postgres, when adding a new constraint you can specify NOT VALID to
> avoid scanning all existing relevant data[1]. If we want to avoid
> scan-on-DDL, this tradeoff needs to be made clear to a user.
>
> As we've already discussed, constraints must deal with operations that
> appear within limits on the write path, but once reconciled on read or
> during compaction can lead to a violation. Adding to non-frozen collections
> is one example. Expecting users to understand the write path for
> collections feels unrealistic to me; I wonder if we should express in the
> constraint itself that it only applies during write.
>
> Anything that uses "nodetool import" (including cassandra-analytics) could
> theoretically push constraint-violating mutations to a table. We could
> update import to scan table contents first, or add a flag to trust the data
> in imported SSTables and make cassandra-analytics executors aware of
> table-level constraints.
>
> Some client implementations read the system_schema tables to build their
> object mappers, I'd like to confirm that nothing will require clients to be
> aware of these new schema constructs.
>
> Overall, I'm supportive of the distinctions discussed between constraints
> and guardrails and like the direction this is heading; I'd just like to
> make sure the more detailed semantics aren't confusing or misleading for
> our users, and semantics are much harder to change in the future.
>
> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Doug Rohrer
On the Analytics side, as long as the CQLSSTableWriter understands and enforces 
the constraints (which it should be able to , given we provide the table 
schema) we should be good to go. We should try hard to avoid scanning the data 
on import, as the Analytics library does a bunch of things to push that kind of 
logic and CPU + I/O work off to the Spark executors that write the sstables, 
and reading the whole SSTable on import can drastically slow down that process.

I agree warning the users in docs that we don’t scan the existing data for data 
that violates constraints if the table wasn’t create with them is important, 
but I don’t think it would be feasible to do scan-on-DDL change.

Could we only support collection-level constraints on frozen lists/sets/maps, 
as that way the end user would have to be aware of the current size of the 
collection?

Doug

> On Jun 25, 2024, at 2:27 PM, Abe Ratnofsky  wrote:
> 
> If we're going to introduce a feature that looks like SQL constraints, we 
> should make sure it's "reasonably" compliant. In particular, we should avoid 
> situations where a user creates a constraint, writes some data, then reads 
> data that violates that constraint, unless they've expressed that violations 
> on read would be acceptable.
> 
> For Postgres, when adding a new constraint you can specify NOT VALID to avoid 
> scanning all existing relevant data[1]. If we want to avoid scan-on-DDL, this 
> tradeoff needs to be made clear to a user.
> 
> As we've already discussed, constraints must deal with operations that appear 
> within limits on the write path, but once reconciled on read or during 
> compaction can lead to a violation. Adding to non-frozen collections is one 
> example. Expecting users to understand the write path for collections feels 
> unrealistic to me; I wonder if we should express in the constraint itself 
> that it only applies during write.
> 
> Anything that uses "nodetool import" (including cassandra-analytics) could 
> theoretically push constraint-violating mutations to a table. We could update 
> import to scan table contents first, or add a flag to trust the data in 
> imported SSTables and make cassandra-analytics executors aware of table-level 
> constraints.
> 
> Some client implementations read the system_schema tables to build their 
> object mappers, I'd like to confirm that nothing will require clients to be 
> aware of these new schema constructs.
> 
> Overall, I'm supportive of the distinctions discussed between constraints and 
> guardrails and like the direction this is heading; I'd just like to make sure 
> the more detailed semantics aren't confusing or misleading for our users, and 
> semantics are much harder to change in the future.
> 
> [1]: https://www.postgresql.org/docs/current/sql-altertable.html
> 



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Abe Ratnofsky
If we're going to introduce a feature that looks like SQL constraints, we 
should make sure it's "reasonably" compliant. In particular, we should avoid 
situations where a user creates a constraint, writes some data, then reads data 
that violates that constraint, unless they've expressed that violations on read 
would be acceptable.

For Postgres, when adding a new constraint you can specify NOT VALID to avoid 
scanning all existing relevant data[1]. If we want to avoid scan-on-DDL, this 
tradeoff needs to be made clear to a user.

As we've already discussed, constraints must deal with operations that appear 
within limits on the write path, but once reconciled on read or during 
compaction can lead to a violation. Adding to non-frozen collections is one 
example. Expecting users to understand the write path for collections feels 
unrealistic to me; I wonder if we should express in the constraint itself that 
it only applies during write.

Anything that uses "nodetool import" (including cassandra-analytics) could 
theoretically push constraint-violating mutations to a table. We could update 
import to scan table contents first, or add a flag to trust the data in 
imported SSTables and make cassandra-analytics executors aware of table-level 
constraints.

Some client implementations read the system_schema tables to build their object 
mappers, I'd like to confirm that nothing will require clients to be aware of 
these new schema constructs.

Overall, I'm supportive of the distinctions discussed between constraints and 
guardrails and like the direction this is heading; I'd just like to make sure 
the more detailed semantics aren't confusing or misleading for our users, and 
semantics are much harder to change in the future.

[1]: https://www.postgresql.org/docs/current/sql-altertable.html



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
On Tue, Jun 25, 2024 at 10:59 AM Josh McKenzie  wrote:

>
> My intuition is the vote got called a *smidge* early but that things are
> very much moving in the right direction and are very close.
>

Agreed and the vote thread got us more feedback which is valuable :)


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Josh McKenzie
> I was referring to the name guardrail, using the same infra as guardrails
Curious if there's a subtle distinction implicit in this (or just in my 
brain...). A guardrail is something one person puts in place for someone else - 
in our case operators to users. Constraints are something inherent to the 
use-case or the abstractions, and something users are putting on themselves. 
Maybe... either that or we just named guardrails in a NIH way. /sad

That aside, if other DB's are using "constraint" I think it's the most 
user-friendly approach to use terminology they're already familiar with from 
other contexts. And we need more user-friendly. ;)

As for the infra - if we're talking plumbing inside the guts I agree. I think 
the UX definitely should be via DDL rather than the .yaml file; I don't think 
that's what you were alluding to but just figured it's worth pointing it out.

My intuition is the vote got called a *smidge* early but that things are very 
much moving in the right direction and are very close.

On Tue, Jun 25, 2024, at 1:41 PM, Bernardo Botella wrote:
> Hi Ariel,
> 
> Your suggestions make sense, and I’ll be updating the CEP with the details. 
> Basically:
> - We have an optional name for the constraints. If the name is not provided, 
> a random name is generated for a constraint:
> CREATE TABLE keyspace.table ( p1 int, p2 int, ..., CONSTRAINT [name] CHECK p1 
> != p2 );
> 
> - Alter and Drop constraints are as follows
> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
> 
> - Describe table returns the list of constraints for a table.
> - The condition of the CONSTRAINT (after the CHECK keyword) can be surrounded 
> by optional parentheses to keep consistency with other databases syntax.
> 
> I will update the CEP with those details.
> 
> To Dinesh’s point, I agree that a NOT NULL constraint will be really useful. 
> I can add it to the list on the CEP
> 
> Regards,
> Bernardo
> 
> 
>> On Jun 25, 2024, at 9:22 AM, Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> I am also +1 on Doug's distinction between things that can be managed by 
>> operators and things that can be managed by applications.
>> 
>> Some things to note about the syntax is that there are parens around the 
>> condition in SQL. In your example there are multiple anonymous constraints 
>> on the same column, how are anonymous constraints handled? Does the database 
>> automatically generate a named constraint for them so they can be referenced 
>> later? Do we allow multiple constraints on the same column and AND them 
>> together?
>> 
>> Ariel
>> 
>> 
>> 
>> On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
>>> Hi Ariel and Jon,
>>> 
>>> Let me address your question first. Yes, AND is supported in the proposal. 
>>> Below you can find some examples of different constraints applied to the 
>>> same column.
>>> 
>>> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
>>> opposed to it if it is more consistent with terminology in the databases 
>>> universe.
>>> 
>>> So, to recap, there seems to be general agreement on the usefulness of the 
>>> Constraints Framework.
>>> Now, from the feedback that has arrived after the voting has been called, I 
>>> see there are three different proposals for syntax:
>>> 
>>> 1.-
>>> The syntax currently described in the CEP. Example:
>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>   ip_adress inet,
>>>   subnet_mask int,
>>>   CONSTRAINT subnet_mask > 0,
>>>   CONSTRAINT subnet_mask < 32
>>> )
>>> 
>>> 2.-
>>> As Jon suggested, leaving this definitions to more specific Guardrails at 
>>> table level. Example, something like:
>>> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
>>> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>>> 
>>> 3.-
>>> As Ariel suggested, having the CHECK keyword added to align consistency 
>>> with SQL. Example:
>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>   ip_adress inet,
>>>   subnet_mask int,
>>>   CONSTRAINT CHECK subnet_mask > 0,
>>>   CONSTRAINT CHECK subnet_mask < 32
>>> )
>>> 
>>> For the guardrails vs cql syntax, I think that keeping the conceptual 
>>> separation that has been explored in this thread, and perfectly recapped by 
>>> Doug, is closer to what we are trying to achieve with this framework. In my 
>>> opinion, having them in the CQL schema definition provides those 
>>> application level constraints that Doug mentions in an more accesible way 
>>> than having to configure such specific guardrais.
>>> 
>>> For the addition of the CHECK keyword, I'm definitely not opposed to it if 
>>> it helps Cassandra users coming from other databases understand concepts 
>>> that were already familiar to them.
>>> 
>>> I hope this helps move the conversation forward,
>>> Bernardo
>>> 
>>> 
>>> 
 On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
 
 Hi,
 
 I see a vote for this has been called. I should have provided more prompt 
 feedback 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Bernardo Botella
Hi Ariel,

Your suggestions make sense, and I’ll be updating the CEP with the details. 
Basically:
- We have an optional name for the constraints. If the name is not provided, a 
random name is generated for a constraint:
CREATE TABLE keyspace.table (
  p1 int, 
  p2 int,
  ...,
  CONSTRAINT [name] CHECK p1 != p2
);

- Alter and Drop constraints are as follows
ALTER CONSTRAINT [name] CHECK new_condition
DROP CONSTRAINT [name]

- Describe table returns the list of constraints for a table.
- The condition of the CONSTRAINT (after the CHECK keyword) can be surrounded 
by optional parentheses to keep consistency with other databases syntax.

I will update the CEP with those details.

To Dinesh’s point, I agree that a NOT NULL constraint will be really useful. I 
can add it to the list on the CEP

Regards,
Bernardo


> On Jun 25, 2024, at 9:22 AM, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I am also +1 on Doug's distinction between things that can be managed by 
> operators and things that can be managed by applications.
> 
> Some things to note about the syntax is that there are parens around the 
> condition in SQL. In your example there are multiple anonymous constraints on 
> the same column, how are anonymous constraints handled? Does the database 
> automatically generate a named constraint for them so they can be referenced 
> later? Do we allow multiple constraints on the same column and AND them 
> together?
> 
> Ariel
> 
> 
> 
> On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
>> Hi Ariel and Jon,
>> 
>> Let me address your question first. Yes, AND is supported in the proposal. 
>> Below you can find some examples of different constraints applied to the 
>> same column.
>> 
>> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
>> opposed to it if it is more consistent with terminology in the databases 
>> universe.
>> 
>> So, to recap, there seems to be general agreement on the usefulness of the 
>> Constraints Framework.
>> Now, from the feedback that has arrived after the voting has been called, I 
>> see there are three different proposals for syntax:
>> 
>> 1.-
>> The syntax currently described in the CEP. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> 2.-
>> As Jon suggested, leaving this definitions to more specific Guardrails at 
>> table level. Example, something like:
>> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
>> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>> 
>> 3.-
>> As Ariel suggested, having the CHECK keyword added to align consistency with 
>> SQL. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT CHECK subnet_mask > 0,
>>   CONSTRAINT CHECK subnet_mask < 32
>> )
>> 
>> For the guardrails vs cql syntax, I think that keeping the conceptual 
>> separation that has been explored in this thread, and perfectly recapped by 
>> Doug, is closer to what we are trying to achieve with this framework. In my 
>> opinion, having them in the CQL schema definition provides those application 
>> level constraints that Doug mentions in an more accesible way than having to 
>> configure such specific guardrais.
>> 
>> For the addition of the CHECK keyword, I'm definitely not opposed to it if 
>> it helps Cassandra users coming from other databases understand concepts 
>> that were already familiar to them.
>> 
>> I hope this helps move the conversation forward,
>> Bernardo
>> 
>> 
>> 
>>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> I see a vote for this has been called. I should have provided more prompt 
>>> feedback sooner.
>>> 
>>> I am a strong +1 on adding column level constraints being a good thing to 
>>> add. I'm not too concerned about row/partition/table level constraints, but 
>>> I would like to change the syntax before I would be +1 on this CEP.
>>> 
>>> It would be good to align the syntax as closely as possible to our existing 
>>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>>> don't have a string length function so maybe add `LENGTH` (consistent with 
>>> MySQL/Postgres) to also use with column level constraints.
>>> 
>>> It looks like there are generally two forms of constraint syntax, one is 
>>> expressed as part of the column definition, and the other is a named or 
>>> anonymous constraint on the table. 
>>> https://www.w3schools.com/sql/sql_check.asp
>>> 
>>> Can we align with having these column level ones as `CHECK` constraints 
>>> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if 
>>> creating a named or multi-column constraint?
>>> 
>>> Will column level check constraints support `AND` so that you can specify 
>>> multiple constraints on the column? I am not sure if that is supported in 
>>> other databases, but it would be

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Dinesh Joshi
+1 on Doug's suggestion. The operator sets a limit that application
developers should not be allowed to violate. This is precisely the type of
safety that we should strive for.

To Jordan's point, I also agree that the read before write type of
constraints should be avoided but if there is a very good case for it, we
can discuss it.

We should also consider adding a NOT NULL constraint on columns. This will
allow applications to model columns that are mandatory for INSERT and
UPDATEs.




On Tue, Jun 25, 2024 at 9:24 AM Ariel Weisberg  wrote:

> Hi,
>
> I am also +1 on Doug's distinction between things that can be managed by
> operators and things that can be managed by applications.
>
> Some things to note about the syntax is that there are parens around the
> condition in SQL. In your example there are multiple anonymous constraints
> on the same column, how are anonymous constraints handled? Does the
> database automatically generate a named constraint for them so they can be
> referenced later? Do we allow multiple constraints on the same column and
> AND them together?
>
> Ariel
>
>
>
> On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
>
> Hi Ariel and Jon,
>
> Let me address your question first. Yes, AND is supported in the proposal.
> Below you can find some examples of different constraints applied to the
> same column.
>
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not
> opposed to it if it is more consistent with terminology in the databases
> universe.
>
> So, to recap, there seems to be general agreement on the usefulness of the
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called,
> I see there are three different proposals for syntax:
>
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency
> with SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
>
> For the guardrails vs cql syntax, I think that keeping the conceptual
> separation that has been explored in this thread, and perfectly recapped by
> Doug, is closer to what we are trying to achieve with this framework. In my
> opinion, having them in the CQL schema definition provides those
> application level constraints that Doug mentions in an more accesible way
> than having to configure such specific guardrais.
>
> For the addition of the CHECK keyword, I'm definitely not opposed to it if
> it helps Cassandra users coming from other databases understand concepts
> that were already familiar to them.
>
> I hope this helps move the conversation forward,
> Bernardo
>
>
>
> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>
> Hi,
>
> I see a vote for this has been called. I should have provided more prompt
> feedback sooner.
>
> I am a strong +1 on adding column level constraints being a good thing to
> add. I'm not too concerned about row/partition/table level constraints, but
> I would like to change the syntax before I would be +1 on this CEP.
>
> It would be good to align the syntax as closely as possible to our
> existing syntax, and if not that then MySQL/Postgres. For example it looks
> like we don't have a string length function so maybe add `LENGTH`
> (consistent with MySQL/Postgres) to also use with column level constraints.
>
> It looks like there are generally two forms of constraint syntax, one is
> expressed as part of the column definition, and the other is a named or
> anonymous constraint on the table.
> https://www.w3schools.com/sql/sql_check.asp
>
> Can we align with having these column level ones as `CHECK` constraints
> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if
> creating a named or multi-column constraint?
>
> Will column level check constraints support `AND` so that you can specify
> multiple constraints on the column? I am not sure if that is supported in
> other databases, but it would be good to align on that as well.
>
> RE some implementation things to keep in mind:
>
> If TCM is in use and the constraints are defined in the schema data
> structure this should work fine with Accord because all coordinators
> (regular, recovery) will deterministically agree on the constraints being
> enforced BUT... this also has to map to how/when constraints are enforced.
>
> Both Accord and Paxos work best when the constraints are enforced when the
> final mutation to be applied is created 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Ariel Weisberg
Hi,

I am also +1 on Doug's distinction between things that can be managed by 
operators and things that can be managed by applications.

Some things to note about the syntax is that there are parens around the 
condition in SQL. In your example there are multiple anonymous constraints on 
the same column, how are anonymous constraints handled? Does the database 
automatically generate a named constraint for them so they can be referenced 
later? Do we allow multiple constraints on the same column and AND them 
together?

Ariel



On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
> Hi Ariel and Jon,
> 
> Let me address your question first. Yes, AND is supported in the proposal. 
> Below you can find some examples of different constraints applied to the same 
> column.
> 
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
> opposed to it if it is more consistent with terminology in the databases 
> universe.
> 
> So, to recap, there seems to be general agreement on the usefulness of the 
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called, I 
> see there are three different proposals for syntax:
> 
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
> 
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at 
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
> 
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency with 
> SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
> 
> For the guardrails vs cql syntax, I think that keeping the conceptual 
> separation that has been explored in this thread, and perfectly recapped by 
> Doug, is closer to what we are trying to achieve with this framework. In my 
> opinion, having them in the CQL schema definition provides those application 
> level constraints that Doug mentions in an more accesible way than having to 
> configure such specific guardrais.
> 
> For the addition of the CHECK keyword, I'm definitely not opposed to it if it 
> helps Cassandra users coming from other databases understand concepts that 
> were already familiar to them.
> 
> I hope this helps move the conversation forward,
> Bernardo
> 
> 
> 
>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> I see a vote for this has been called. I should have provided more prompt 
>> feedback sooner.
>> 
>> I am a strong +1 on adding column level constraints being a good thing to 
>> add. I'm not too concerned about row/partition/table level constraints, but 
>> I would like to change the syntax before I would be +1 on this CEP.
>> 
>> It would be good to align the syntax as closely as possible to our existing 
>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>> don't have a string length function so maybe add `LENGTH` (consistent with 
>> MySQL/Postgres) to also use with column level constraints.
>> 
>> It looks like there are generally two forms of constraint syntax, one is 
>> expressed as part of the column definition, and the other is a named or 
>> anonymous constraint on the table. 
>> https://www.w3schools.com/sql/sql_check.asp
>> 
>> Can we align with having these column level ones as `CHECK` constraints like 
>> in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a 
>> named or multi-column constraint?
>> 
>> Will column level check constraints support `AND` so that you can specify 
>> multiple constraints on the column? I am not sure if that is supported in 
>> other databases, but it would be good to align on that as well.
>> 
>> RE some implementation things to keep in mind:
>> 
>> If TCM is in use and the constraints are defined in the schema data 
>> structure this should work fine with Accord because all coordinators 
>> (regular, recovery) will deterministically agree on the constraints being 
>> enforced BUT... this also has to map to how/when constraints are enforced.
>> 
>> Both Accord and Paxos work best when the constraints are enforced when the 
>> final mutation to be applied is created and not later when it is being 
>> applied to the CFS. This also reduces duplication of enforcement checking 
>> work to just the coordinator for the write.
>> 
>> Ariel
>> 
>> On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
>>> Hello everyone,
>>> 
>>> I am proposing this CEP:
>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
>>> 
>>> cwiki.apache.org 
>>>

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Bernardo Botella
Got it. Thanks for the clarification Jon. Then, in terms of syntax, I think we 
can discard the option 2.

In terms of GUARDRAIL vs CONSTRAINT concept you bring up, I guess here we have 
pros and cons for both sides. It is true that there is an existing concept of 
GUARDRAIL on Cassandra, and that reusing it comes with benefits. But, in my 
opinion, there are two main advantages to use the CONSTRAINT name for the 
feature:
- It keeps consistency with concepts from other databases (this may be a minor, 
but I really think there is benefit for those coming from other databases, and 
may help them understand what this actually is)
- Having it presented as a different concept help illustrate how those two 
features are different. Following the example provided by Doug, we can have 
clear separation on those two levels of restrictions to a write.



> On Jun 24, 2024, at 9:46 PM, Jon Haddad  wrote:
> 
> I think my suggestion was unclear. I was referring to the name guardrail, 
> using the same infra as guardrails, rather than a separate concept. Not 
> applying it like we do table options. 
> 
> 
> 
> On Tue, Jun 25, 2024 at 12:44 AM Bernardo Botella 
> mailto:[email protected]>> wrote:
>> Hi Ariel and Jon,
>> 
>> Let me address your question first. Yes, AND is supported in the proposal. 
>> Below you can find some examples of different constraints applied to the 
>> same column.
>> 
>> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
>> opposed to it if it is more consistent with terminology in the databases 
>> universe.
>> 
>> So, to recap, there seems to be general agreement on the usefulness of the 
>> Constraints Framework.
>> Now, from the feedback that has arrived after the voting has been called, I 
>> see there are three different proposals for syntax:
>> 
>> 1.-
>> The syntax currently described in the CEP. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> 2.-
>> As Jon suggested, leaving this definitions to more specific Guardrails at 
>> table level. Example, something like:
>> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
>> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>> 
>> 3.-
>> As Ariel suggested, having the CHECK keyword added to align consistency with 
>> SQL. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT CHECK subnet_mask > 0,
>>   CONSTRAINT CHECK subnet_mask < 32
>> )
>> 
>> For the guardrails vs cql syntax, I think that keeping the conceptual 
>> separation that has been explored in this thread, and perfectly recapped by 
>> Doug, is closer to what we are trying to achieve with this framework. In my 
>> opinion, having them in the CQL schema definition provides those application 
>> level constraints that Doug mentions in an more accesible way than having to 
>> configure such specific guardrais.
>> 
>> For the addition of the CHECK keyword, I'm definitely not opposed to it if 
>> it helps Cassandra users coming from other databases understand concepts 
>> that were already familiar to them.
>> 
>> I hope this helps move the conversation forward,
>> Bernardo
>> 
>> 
>> 
>>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg >> > wrote:
>>> 
>>> Hi,
>>> 
>>> I see a vote for this has been called. I should have provided more prompt 
>>> feedback sooner.
>>> 
>>> I am a strong +1 on adding column level constraints being a good thing to 
>>> add. I'm not too concerned about row/partition/table level constraints, but 
>>> I would like to change the syntax before I would be +1 on this CEP.
>>> 
>>> It would be good to align the syntax as closely as possible to our existing 
>>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>>> don't have a string length function so maybe add `LENGTH` (consistent with 
>>> MySQL/Postgres) to also use with column level constraints.
>>> 
>>> It looks like there are generally two forms of constraint syntax, one is 
>>> expressed as part of the column definition, and the other is a named or 
>>> anonymous constraint on the table. 
>>> https://www.w3schools.com/sql/sql_check.asp
>>> 
>>> Can we align with having these column level ones as `CHECK` constraints 
>>> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if 
>>> creating a named or multi-column constraint?
>>> 
>>> Will column level check constraints support `AND` so that you can specify 
>>> multiple constraints on the column? I am not sure if that is supported in 
>>> other databases, but it would be good to align on that as well.
>>> 
>>> RE some implementation things to keep in mind:
>>> 
>>> If TCM is in use and the constraints are defined in the schema data 
>>> structure this should work fine with Accord because all coordinators 
>>> (regular, recovery) will

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Jon Haddad
I think my suggestion was unclear. I was referring to the name guardrail,
using the same infra as guardrails, rather than a separate concept. Not
applying it like we do table options.



On Tue, Jun 25, 2024 at 12:44 AM Bernardo Botella <
[email protected]> wrote:

> Hi Ariel and Jon,
>
> Let me address your question first. Yes, AND is supported in the proposal.
> Below you can find some examples of different constraints applied to the
> same column.
>
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not
> opposed to it if it is more consistent with terminology in the databases
> universe.
>
> So, to recap, there seems to be general agreement on the usefulness of the
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called,
> I see there are three different proposals for syntax:
>
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency
> with SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
>
> For the guardrails vs cql syntax, I think that keeping the conceptual
> separation that has been explored in this thread, and perfectly recapped by
> Doug, is closer to what we are trying to achieve with this framework. In my
> opinion, having them in the CQL schema definition provides those
> application level constraints that Doug mentions in an more accesible way
> than having to configure such specific guardrais.
>
> For the addition of the CHECK keyword, I'm definitely not opposed to it if
> it helps Cassandra users coming from other databases understand concepts
> that were already familiar to them.
>
> I hope this helps move the conversation forward,
> Bernardo
>
>
>
> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>
> Hi,
>
> I see a vote for this has been called. I should have provided more prompt
> feedback sooner.
>
> I am a strong +1 on adding column level constraints being a good thing to
> add. I'm not too concerned about row/partition/table level constraints, but
> I would like to change the syntax before I would be +1 on this CEP.
>
> It would be good to align the syntax as closely as possible to our
> existing syntax, and if not that then MySQL/Postgres. For example it looks
> like we don't have a string length function so maybe add `LENGTH`
> (consistent with MySQL/Postgres) to also use with column level constraints.
>
> It looks like there are generally two forms of constraint syntax, one is
> expressed as part of the column definition, and the other is a named or
> anonymous constraint on the table.
> https://www.w3schools.com/sql/sql_check.asp
>
> Can we align with having these column level ones as `CHECK` constraints
> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if
> creating a named or multi-column constraint?
>
> Will column level check constraints support `AND` so that you can specify
> multiple constraints on the column? I am not sure if that is supported in
> other databases, but it would be good to align on that as well.
>
> RE some implementation things to keep in mind:
>
> If TCM is in use and the constraints are defined in the schema data
> structure this should work fine with Accord because all coordinators
> (regular, recovery) will deterministically agree on the constraints being
> enforced BUT... this also has to map to how/when constraints are enforced.
>
> Both Accord and Paxos work best when the constraints are enforced when the
> final mutation to be applied is created and not later when it is being
> applied to the CFS. This also reduces duplication of enforcement checking
> work to just the coordinator for the write.
>
> Ariel
>
> On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
>
> Hello everyone,
>
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
> 
> cwiki.apache.org
> 
> 
> 
>
>
> And I’m looking for feedback from the community.
>
> Thanks a lot!
> Bernardo
>
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Bernardo Botella
Hi Ariel and Jon,

Let me address your question first. Yes, AND is supported in the proposal. 
Below you can find some examples of different constraints applied to the same 
column.

As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
opposed to it if it is more consistent with terminology in the databases 
universe.

So, to recap, there seems to be general agreement on the usefulness of the 
Constraints Framework.
Now, from the feedback that has arrived after the voting has been called, I see 
there are three different proposals for syntax:

1.-
The syntax currently described in the CEP. Example:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
)

2.-
As Jon suggested, leaving this definitions to more specific Guardrails at table 
level. Example, something like:
column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32

3.-
As Ariel suggested, having the CHECK keyword added to align consistency with 
SQL. Example:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT CHECK subnet_mask > 0,
  CONSTRAINT CHECK subnet_mask < 32
)

For the guardrails vs cql syntax, I think that keeping the conceptual 
separation that has been explored in this thread, and perfectly recapped by 
Doug, is closer to what we are trying to achieve with this framework. In my 
opinion, having them in the CQL schema definition provides those application 
level constraints that Doug mentions in an more accesible way than having to 
configure such specific guardrais.

For the addition of the CHECK keyword, I'm definitely not opposed to it if it 
helps Cassandra users coming from other databases understand concepts that were 
already familiar to them.

I hope this helps move the conversation forward,
Bernardo



> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I see a vote for this has been called. I should have provided more prompt 
> feedback sooner.
> 
> I am a strong +1 on adding column level constraints being a good thing to 
> add. I'm not too concerned about row/partition/table level constraints, but I 
> would like to change the syntax before I would be +1 on this CEP.
> 
> It would be good to align the syntax as closely as possible to our existing 
> syntax, and if not that then MySQL/Postgres. For example it looks like we 
> don't have a string length function so maybe add `LENGTH` (consistent with 
> MySQL/Postgres) to also use with column level constraints.
> 
> It looks like there are generally two forms of constraint syntax, one is 
> expressed as part of the column definition, and the other is a named or 
> anonymous constraint on the table. https://www.w3schools.com/sql/sql_check.asp
> 
> Can we align with having these column level ones as `CHECK` constraints like 
> in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a 
> named or multi-column constraint?
> 
> Will column level check constraints support `AND` so that you can specify 
> multiple constraints on the column? I am not sure if that is supported in 
> other databases, but it would be good to align on that as well.
> 
> RE some implementation things to keep in mind:
> 
> If TCM is in use and the constraints are defined in the schema data structure 
> this should work fine with Accord because all coordinators (regular, 
> recovery) will deterministically agree on the constraints being enforced 
> BUT... this also has to map to how/when constraints are enforced.
> 
> Both Accord and Paxos work best when the constraints are enforced when the 
> final mutation to be applied is created and not later when it is being 
> applied to the CFS. This also reduces duplication of enforcement checking 
> work to just the coordinator for the write.
> 
> Ariel
> 
> On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
>> Hello everyone,
>> 
>> I am proposing this CEP:
>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
>> 
>> cwiki.apache.org 
>> 
>>  
>> 
>> 
>> And I’m looking for feedback from the community.
>> 
>> Thanks a lot!
>> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Ariel Weisberg
Hi,

I see a vote for this has been called. I should have provided more prompt 
feedback sooner.

I am a strong +1 on adding column level constraints being a good thing to add. 
I'm not too concerned about row/partition/table level constraints, but I would 
like to change the syntax before I would be +1 on this CEP.

It would be good to align the syntax as closely as possible to our existing 
syntax, and if not that then MySQL/Postgres. For example it looks like we don't 
have a string length function so maybe add `LENGTH` (consistent with 
MySQL/Postgres) to also use with column level constraints.

It looks like there are generally two forms of constraint syntax, one is 
expressed as part of the column definition, and the other is a named or 
anonymous constraint on the table. https://www.w3schools.com/sql/sql_check.asp

Can we align with having these column level ones as `CHECK` constraints like in 
SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a named 
or multi-column constraint?

Will column level check constraints support `AND` so that you can specify 
multiple constraints on the column? I am not sure if that is supported in other 
databases, but it would be good to align on that as well.

RE some implementation things to keep in mind:

If TCM is in use and the constraints are defined in the schema data structure 
this should work fine with Accord because all coordinators (regular, recovery) 
will deterministically agree on the constraints being enforced BUT... this also 
has to map to how/when constraints are enforced.

Both Accord and Paxos work best when the constraints are enforced when the 
final mutation to be applied is created and not later when it is being applied 
to the CFS. This also reduces duplication of enforcement checking work to just 
the coordinator for the write.

Ariel

On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
> Hello everyone,
> 
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> 
> cwiki.apache.org 
> 
> favicon.ico 
> 
> 
> And I’m looking for feedback from the community.
> 
> Thanks a lot!
> Bernardo


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Jon Haddad
I love where this is going. I have one question , however. I think it would
be more consistent if these were table level guardrails.  Is there anything
that prevents us from utilizing the same underlying system and terminology
for both the node level guardrails and the table ones?

If we can avoid duplicate concepts we should.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Mon, Jun 24, 2024 at 4:19 PM Doug Rohrer  wrote:

> To your point about Guardrails vs. Constraints, I do think the distinct
> roles of “cluster operator” and “application developer” help show how these
> two frameworks are both valuable. I don’t think I’d expect a cluster
> operator to be involved in every table design decision, but being able to
> set warning and error-level guardrails allows an operator to set absolute
> limits on what the database itself accepts. Table-level constraints allow
> application developers (hopefully in concert with operators, where they are
> two distinct people/groups) to add *additional*, application-layer
> constraints that are likely to be app specific. To restate what I think you
> were getting at, your example of a production issue caused by the
> development team missing a key verbal agreement probably helps illustrate
> why both table-level constraints *and* guardrails are valuable.
>
> Imagine that, as an operator, you are *generally* comfortable with
> individual values in rows being, say, 256k, but because of the way in which
> this *particular* use case works, 64k chunks needed to be enforced. Your
> cluster-level *guardrails* could be set at 256k, but the table-level
> *constraints* could have enforced this 64k chunk size rule.
>
> Doug
>
> On Jun 23, 2024, at 5:38 PM, Jordan West  wrote:
>
> I am generally for this CEP, particularly the sizeOf guardrail. For
> example, we recently had an incident caused by a client who wrote outside
> of the contract we had verbally established. The constraint would have let
> us encode that contract into the database. In this case, clients are
> writing large blobs at the application layer and internally the client
> performs chunking.  We had established a chunk size of 64k, for example.
> However, the application team wanted to use a different programming
> language than the ones we provide clients for so they wrote their own. The
> new client had a bug that did not honor the agreed upon chunk size and
> wrote chunks that were MBs in size. This eventually led to a production
> incident and the issue was discovered as a result of a bunch of analysis
> (dumping sstables, etc). Had we had the sizeOf guardrail it would have
> turned a production incident with hours of investigation into a bug found
> immediately during development. Could this be done with a node-level
> guardrail? Likely. But config has the issues described above and its
> possible to have two tables with different constraints around similar
> fields (for example, two different chunk size configs due to data shape).
> Could it be done at the client layer? Yes that's what we are doing now, but
> this incident highlights the weakness with that approach (having to
> implement the contract everywhere and having disjoint features across
> clients).
>
> I also think there is benefit to application owners. Encoding constraints
> in the database ensures continuity as ownership and contributors change and
> reduces the need for comments or documentation as the means to enforce or
> share this knowledge.
>
> I think enforcing them at write time makes sense. Thinking about it in the
> scope of compaction for example reminds me of a data loss incident where
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch
> of 4 byte ints were thrown away because the field expected an 8 byte long.
>
> My primary concern would be ensuring that we don't implement constraints
> that require a read before right (not inList comes to mind as an example of
> one that could imply reading before writing and could confuse a user if it
> doesn't).
>
> Regarding the conflict with existing guardrails, I do think that is
> tougher. On one hand I find this feature to be more evolved than those
> guardrails and would be fine to see them be replaced by it. On the other,
> the guardrails provide sole control to the operator which is nice but adds
> some complexity that has been rightly called out.  But I don't see that as
> a reason not to go forward with this feature. We should pick a path and
> accept the tradeoffs.
>
> Jordan
>
>
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella <
> [email protected]> wrote:
>
>> Thanks a lot for your comments Abe!
>>
>> I do agree that the Constraint clause should be as simple as possible. I
>> will add a note on the CEP along with some specifics about the proposed
>> constraints (removing the ones that are contentious, and adding them to a
>> possible future additions section). And yeah, I also think that these
>> constraints will help differe

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Doug Rohrer
To your point about Guardrails vs. Constraints, I do think the distinct roles 
of “cluster operator” and “application developer” help show how these two 
frameworks are both valuable. I don’t think I’d expect a cluster operator to be 
involved in every table design decision, but being able to set warning and 
error-level guardrails allows an operator to set absolute limits on what the 
database itself accepts. Table-level constraints allow application developers 
(hopefully in concert with operators, where they are two distinct 
people/groups) to add additional, application-layer constraints that are likely 
to be app specific. To restate what I think you were getting at, your example 
of a production issue caused by the development team missing a key verbal 
agreement probably helps illustrate why both table-level constraints and 
guardrails are valuable. 

Imagine that, as an operator, you are generally comfortable with individual 
values in rows being, say, 256k, but because of the way in which this 
particular use case works, 64k chunks needed to be enforced. Your cluster-level 
guardrails could be set at 256k, but the table-level constraints could have 
enforced this 64k chunk size rule.

Doug

> On Jun 23, 2024, at 5:38 PM, Jordan West  wrote:
> 
> I am generally for this CEP, particularly the sizeOf guardrail. For example, 
> we recently had an incident caused by a client who wrote outside of the 
> contract we had verbally established. The constraint would have let us encode 
> that contract into the database. In this case, clients are writing large 
> blobs at the application layer and internally the client performs chunking.  
> We had established a chunk size of 64k, for example. However, the application 
> team wanted to use a different programming language than the ones we provide 
> clients for so they wrote their own. The new client had a bug that did not 
> honor the agreed upon chunk size and wrote chunks that were MBs in size. This 
> eventually led to a production incident and the issue was discovered as a 
> result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf 
> guardrail it would have turned a production incident with hours of 
> investigation into a bug found immediately during development. Could this be 
> done with a node-level guardrail? Likely. But config has the issues described 
> above and its possible to have two tables with different constraints around 
> similar fields (for example, two different chunk size configs due to data 
> shape). Could it be done at the client layer? Yes that's what we are doing 
> now, but this incident highlights the weakness with that approach (having to 
> implement the contract everywhere and having disjoint features across 
> clients).
>  
> I also think there is benefit to application owners. Encoding constraints in 
> the database ensures continuity as ownership and contributors change and 
> reduces the need for comments or documentation as the means to enforce or 
> share this knowledge. 
> 
> I think enforcing them at write time makes sense. Thinking about it in the 
> scope of compaction for example reminds me of a data loss incident where 
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of 
> 4 byte ints were thrown away because the field expected an 8 byte long. 
> 
> My primary concern would be ensuring that we don't implement constraints that 
> require a read before right (not inList comes to mind as an example of one 
> that could imply reading before writing and could confuse a user if it 
> doesn't). 
> 
> Regarding the conflict with existing guardrails, I do think that is tougher. 
> On one hand I find this feature to be more evolved than those guardrails and 
> would be fine to see them be replaced by it. On the other, the guardrails 
> provide sole control to the operator which is nice but adds some complexity 
> that has been rightly called out.  But I don't see that as a reason not to go 
> forward with this feature. We should pick a path and accept the tradeoffs. 
>   
> Jordan
> 
> 
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella 
> mailto:[email protected]>> wrote:
>> Thanks a lot for your comments Abe!
>> 
>> I do agree that the Constraint clause should be as simple as possible. I 
>> will add a note on the CEP along with some specifics about the proposed 
>> constraints (removing the ones that are contentious, and adding them to a 
>> possible future additions section). And yeah, I also think that these 
>> constraints will help different Cassandra operating paradigms (multi-tenant 
>> clusters and diverse workflows).
>> 
>> Besides that, I hope that I’ve addressed all the potential concerns and 
>> feedback on the thread. Let’s let a bit more time for others to chime in 
>> (any further feedback will be more than welcome), but I’d like to move 
>> forward with a voting soon if no other concerns are pointed out.
>> 
>> All and all, thanks a lot to everyo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Bernardo Botella
Thanks for the comments Jordan.

Completely agreed that we will need to be careful on not accepting constraints 
that require a read before a write. It is called out on the CEP itself, and 
will have to be enforced in the future.

After all the feedback and discussion, I think we are ready to move to a voting 
thread for CEP-42. I will be posting the thread today.

Thanks everyone who participated in the discussion!
Bernardo

> On Jun 23, 2024, at 2:38 PM, Jordan West  wrote:
> 
> I am generally for this CEP, particularly the sizeOf guardrail. For example, 
> we recently had an incident caused by a client who wrote outside of the 
> contract we had verbally established. The constraint would have let us encode 
> that contract into the database. In this case, clients are writing large 
> blobs at the application layer and internally the client performs chunking.  
> We had established a chunk size of 64k, for example. However, the application 
> team wanted to use a different programming language than the ones we provide 
> clients for so they wrote their own. The new client had a bug that did not 
> honor the agreed upon chunk size and wrote chunks that were MBs in size. This 
> eventually led to a production incident and the issue was discovered as a 
> result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf 
> guardrail it would have turned a production incident with hours of 
> investigation into a bug found immediately during development. Could this be 
> done with a node-level guardrail? Likely. But config has the issues described 
> above and its possible to have two tables with different constraints around 
> similar fields (for example, two different chunk size configs due to data 
> shape). Could it be done at the client layer? Yes that's what we are doing 
> now, but this incident highlights the weakness with that approach (having to 
> implement the contract everywhere and having disjoint features across 
> clients).
>  
> I also think there is benefit to application owners. Encoding constraints in 
> the database ensures continuity as ownership and contributors change and 
> reduces the need for comments or documentation as the means to enforce or 
> share this knowledge. 
> 
> I think enforcing them at write time makes sense. Thinking about it in the 
> scope of compaction for example reminds me of a data loss incident where 
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of 
> 4 byte ints were thrown away because the field expected an 8 byte long. 
> 
> My primary concern would be ensuring that we don't implement constraints that 
> require a read before right (not inList comes to mind as an example of one 
> that could imply reading before writing and could confuse a user if it 
> doesn't). 
> 
> Regarding the conflict with existing guardrails, I do think that is tougher. 
> On one hand I find this feature to be more evolved than those guardrails and 
> would be fine to see them be replaced by it. On the other, the guardrails 
> provide sole control to the operator which is nice but adds some complexity 
> that has been rightly called out.  But I don't see that as a reason not to go 
> forward with this feature. We should pick a path and accept the tradeoffs. 
>   
> Jordan
> 
> 
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella 
> mailto:[email protected]>> wrote:
>> Thanks a lot for your comments Abe!
>> 
>> I do agree that the Constraint clause should be as simple as possible. I 
>> will add a note on the CEP along with some specifics about the proposed 
>> constraints (removing the ones that are contentious, and adding them to a 
>> possible future additions section). And yeah, I also think that these 
>> constraints will help different Cassandra operating paradigms (multi-tenant 
>> clusters and diverse workflows).
>> 
>> Besides that, I hope that I’ve addressed all the potential concerns and 
>> feedback on the thread. Let’s let a bit more time for others to chime in 
>> (any further feedback will be more than welcome), but I’d like to move 
>> forward with a voting soon if no other concerns are pointed out.
>> 
>> All and all, thanks a lot to everyone that participated in the thread and 
>> added to the discussion!
>> Bernardo
>> 
>> 
>> 
>> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky > > > wrote:
>> > 
>> > I've thought about this some more. It would be useful for Cassandra to 
>> > support user-defined "guardrails" (or constraints, whatever you want to 
>> > call them), that could be applied per keyspace or table. Whether a user or 
>> > an operator is considered the owner of a table depends on the organization 
>> > deploying Cassandra, so allowing both parties to protect their tables 
>> > against mis-use seems good to me, especially for large multi-tenant 
>> > clusters with diverse workloads.
>> > 
>> > For example, it would be really useful if a user could set the 
>> > Guardrails.{read,write}C

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-23 Thread Jordan West
I am generally for this CEP, particularly the sizeOf guardrail. For
example, we recently had an incident caused by a client who wrote outside
of the contract we had verbally established. The constraint would have let
us encode that contract into the database. In this case, clients are
writing large blobs at the application layer and internally the client
performs chunking.  We had established a chunk size of 64k, for example.
However, the application team wanted to use a different programming
language than the ones we provide clients for so they wrote their own. The
new client had a bug that did not honor the agreed upon chunk size and
wrote chunks that were MBs in size. This eventually led to a production
incident and the issue was discovered as a result of a bunch of analysis
(dumping sstables, etc). Had we had the sizeOf guardrail it would have
turned a production incident with hours of investigation into a bug found
immediately during development. Could this be done with a node-level
guardrail? Likely. But config has the issues described above and its
possible to have two tables with different constraints around similar
fields (for example, two different chunk size configs due to data shape).
Could it be done at the client layer? Yes that's what we are doing now, but
this incident highlights the weakness with that approach (having to
implement the contract everywhere and having disjoint features across
clients).

I also think there is benefit to application owners. Encoding constraints
in the database ensures continuity as ownership and contributors change and
reduces the need for comments or documentation as the means to enforce or
share this knowledge.

I think enforcing them at write time makes sense. Thinking about it in the
scope of compaction for example reminds me of a data loss incident where
someone ran a validation in an older version (like 2.0 or 2.1) and a bunch
of 4 byte ints were thrown away because the field expected an 8 byte long.

My primary concern would be ensuring that we don't implement constraints
that require a read before right (not inList comes to mind as an example of
one that could imply reading before writing and could confuse a user if it
doesn't).

Regarding the conflict with existing guardrails, I do think that is
tougher. On one hand I find this feature to be more evolved than those
guardrails and would be fine to see them be replaced by it. On the other,
the guardrails provide sole control to the operator which is nice but adds
some complexity that has been rightly called out.  But I don't see that as
a reason not to go forward with this feature. We should pick a path and
accept the tradeoffs.

Jordan


On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella <
[email protected]> wrote:

> Thanks a lot for your comments Abe!
>
> I do agree that the Constraint clause should be as simple as possible. I
> will add a note on the CEP along with some specifics about the proposed
> constraints (removing the ones that are contentious, and adding them to a
> possible future additions section). And yeah, I also think that these
> constraints will help different Cassandra operating paradigms (multi-tenant
> clusters and diverse workflows).
>
> Besides that, I hope that I’ve addressed all the potential concerns and
> feedback on the thread. Let’s let a bit more time for others to chime in
> (any further feedback will be more than welcome), but I’d like to move
> forward with a voting soon if no other concerns are pointed out.
>
> All and all, thanks a lot to everyone that participated in the thread and
> added to the discussion!
> Bernardo
>
>
>
> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky  wrote:
> >
> > I've thought about this some more. It would be useful for Cassandra to
> support user-defined "guardrails" (or constraints, whatever you want to
> call them), that could be applied per keyspace or table. Whether a user or
> an operator is considered the owner of a table depends on the organization
> deploying Cassandra, so allowing both parties to protect their tables
> against mis-use seems good to me, especially for large multi-tenant
> clusters with diverse workloads.
> >
> > For example, it would be really useful if a user could set the
> Guardrails.{read,write}ConsistencyLevels for their tables, or declare
> whether all operations should be over LWTs to avoid mixing regular and LWT
> workloads.
> >
> > I'm hesitant about adding lots of expression syntax to the CONSTRAINT
> clause. I think I'd prefer a function calling syntax that represents:
> > 1. Whether the constraint is system / keyspace / table scoped
> > 2. Where in query processing the constraint is checked
> > 3. What is executed by the check
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-13 Thread Bernardo Botella
Thanks a lot for your comments Abe!

I do agree that the Constraint clause should be as simple as possible. I will 
add a note on the CEP along with some specifics about the proposed constraints 
(removing the ones that are contentious, and adding them to a possible future 
additions section). And yeah, I also think that these constraints will help 
different Cassandra operating paradigms (multi-tenant clusters and diverse 
workflows).

Besides that, I hope that I’ve addressed all the potential concerns and 
feedback on the thread. Let’s let a bit more time for others to chime in (any 
further feedback will be more than welcome), but I’d like to move forward with 
a voting soon if no other concerns are pointed out.

All and all, thanks a lot to everyone that participated in the thread and added 
to the discussion!
Bernardo



> On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky  wrote:
> 
> I've thought about this some more. It would be useful for Cassandra to 
> support user-defined "guardrails" (or constraints, whatever you want to call 
> them), that could be applied per keyspace or table. Whether a user or an 
> operator is considered the owner of a table depends on the organization 
> deploying Cassandra, so allowing both parties to protect their tables against 
> mis-use seems good to me, especially for large multi-tenant clusters with 
> diverse workloads.
> 
> For example, it would be really useful if a user could set the 
> Guardrails.{read,write}ConsistencyLevels for their tables, or declare whether 
> all operations should be over LWTs to avoid mixing regular and LWT workloads.
> 
> I'm hesitant about adding lots of expression syntax to the CONSTRAINT clause. 
> I think I'd prefer a function calling syntax that represents:
> 1. Whether the constraint is system / keyspace / table scoped
> 2. Where in query processing the constraint is checked
> 3. What is executed by the check



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky
I've thought about this some more. It would be useful for Cassandra to support 
user-defined "guardrails" (or constraints, whatever you want to call them), 
that could be applied per keyspace or table. Whether a user or an operator is 
considered the owner of a table depends on the organization deploying 
Cassandra, so allowing both parties to protect their tables against mis-use 
seems good to me, especially for large multi-tenant clusters with diverse 
workloads.

For example, it would be really useful if a user could set the 
Guardrails.{read,write}ConsistencyLevels for their tables, or declare whether 
all operations should be over LWTs to avoid mixing regular and LWT workloads.

I'm hesitant about adding lots of expression syntax to the CONSTRAINT clause. I 
think I'd prefer a function calling syntax that represents:
1. Whether the constraint is system / keyspace / table scoped
2. Where in query processing the constraint is checked
3. What is executed by the check

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Jon Haddad
I think having JSON validation on existing text fields is a pretty
reasonable idea, regardless if we have a JSON type or not.  I could see
folks wanting to add a JSON constraint to an existing text field, for
example.

I like the idea of a postgres-style JSONB type, but I don't want to derail
this convo into a JSON one.  I'd be happy to see a JSONB added to Cassandra
along with all the functionality that is included in postgres, especially
searching / indexes on JSON fields, I think it should be its own CEP though.

DB Constraints vs Client side logic, I see both aspects here.  I've gone
back and forth over the years on what belongs in the DB vs not, and there's
good arguments to be made for both.  For example, supporting a regex
constraint on a field can be done, but from a cost and
scalability perspective it's way better to do it in the application logic.
However, putting a constraint in like this could make sense in some cases:

```
CREATE TABLE circles (
  key id primary key,
  radius double,
  diameter double,
  CONSTRAINT diameter = 2 * radius
)
```

which is also a (maybe contrived) example of an equality constraint.
There's a good argument to be made in this case that the constraint isn't
what we really need here - it's default values (`circumference double
default radius * 2`), and that's a whole read-before-write can of worms we
probably don't need to get into on this thread.

Jon




On Wed, Jun 12, 2024 at 8:46 AM Abe Ratnofsky  wrote:

> Hey Bernardo,
>
> Thanks for the proposal and putting together your summary of the
> discussion. A few thoughts:
>
> I'm not completely convinced of the value of CONSTRAINTS for a database
> like Cassandra, which doesn't support any referential integrity checks,
> doesn't do read-before-write for all queries, and doesn't have a wide
> library of built-in functions.
>
> I'd be a supporter of more BIFs, and that's a solvable problem. String
> size, collection size, timestamp conversions, etc. could all be useful,
> even though there's not much gained over doing them in the client.
>
> With constraints only being applied during write coordination, there's not
> much of an advantage over implementing the equivalent constraints in
> clients. Writes that don't include all columns could violate multi-column
> constraints, like your (a > b) example, for the same reason as
> CASSANDRA-19007 .
> Constraints could be limited to only apply to frozen columns, where it's
> known that the entire value will be updated at once.
>
> I don't think we should include any constraints where valid user action
> would lead to a violated constraint, like permitting multi-column
> constraints on regular columns or non-frozen types, since they would be too
> prone to mis-use.
>
> Regarding 19007, it could be useful to have a constraint that indicates
> that a subset of columns will always be updated together, since that would
> actually allow Cassandra to know which read queries are safe, and permit a
> fix for 19007 that minimizes the additional data replicas need to send to
> coordinators on ALLOW FILTERING queries. That's a very specific situation
> and shouldn't justify a new framework / API, but might be a useful
> consequence of it.
>
> > - isJson (is the text a json?)
>
> Wouldn't it be more compelling to have a new type, analogous to the
> Postgres JSONB type?
> https://www.postgresql.org/docs/current/datatype-json.html
>
> If we're going to parse the entire JSON blob for validation, we might as
> well store it in an optimized format, support better access patterns, etc.
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky
Hey Bernardo,

Thanks for the proposal and putting together your summary of the discussion. A 
few thoughts:

I'm not completely convinced of the value of CONSTRAINTS for a database like 
Cassandra, which doesn't support any referential integrity checks, doesn't do 
read-before-write for all queries, and doesn't have a wide library of built-in 
functions.

I'd be a supporter of more BIFs, and that's a solvable problem. String size, 
collection size, timestamp conversions, etc. could all be useful, even though 
there's not much gained over doing them in the client.

With constraints only being applied during write coordination, there's not much 
of an advantage over implementing the equivalent constraints in clients. Writes 
that don't include all columns could violate multi-column constraints, like 
your (a > b) example, for the same reason as CASSANDRA-19007 
. Constraints could be 
limited to only apply to frozen columns, where it's known that the entire value 
will be updated at once.

I don't think we should include any constraints where valid user action would 
lead to a violated constraint, like permitting multi-column constraints on 
regular columns or non-frozen types, since they would be too prone to mis-use.

Regarding 19007, it could be useful to have a constraint that indicates that a 
subset of columns will always be updated together, since that would actually 
allow Cassandra to know which read queries are safe, and permit a fix for 19007 
that minimizes the additional data replicas need to send to coordinators on 
ALLOW FILTERING queries. That's a very specific situation and shouldn't justify 
a new framework / API, but might be a useful consequence of it.

> - isJson (is the text a json?)

Wouldn't it be more compelling to have a new type, analogous to the Postgres 
JSONB type? https://www.postgresql.org/docs/current/datatype-json.html

If we're going to parse the entire JSON blob for validation, we might as well 
store it in an optimized format, support better access patterns, etc.

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Bernardo Botella
Hi again,

I completely agree that anything beyond simple poses a problem. My point is 
that the definition of simple may vary, and each of those constraints I 
mentioned deserves a conversation on its own. As I previously mentioned on the 
dev thread:
https://lists.apache.org/thread/qln8cbkhlw9j9563p0kl12wrm5w62nq0

I am trying to propose here the two constraints that will add a lot of value to 
the framework (size and value), and illustrating how the framework is to be 
extended.

The final list I proposed can either be expanded (I’m more than happy to hear 
more proposals :-) ) or reduced (you and Claude present very valid points), 
but, I think using this thread to discuss them one by one may derail the 
conversation and make it hard to follow. Having said that, we can leave out 
from the CEP the isList type of constraints and defer it to a future 
conversation if the constraints framework CEP is approved. Once we have the 
basic ones in place, we can have a deeper discussion on this one.

What do you think?


> On Jun 12, 2024, at 3:39 AM, Štefan Miklošovič  
> wrote:
> 
> My gut feeling is that anything beyond simple comparisons is just too 
> problematic / complex. I think that this should be part of the application 
> logic rather than putting that to the database. Is there any major database 
> out there which has constraints modelled like that? (belongsToEnum, 
> isNotBlocked, inList ...). It just opens a lot of questions, like how would 
> we treat nulls? How would this be supported in the driver? Etc ... 
>  
> 
> 
> On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev 
> mailto:[email protected]>> wrote:
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>> 
>> Are these not just "CONSTRAINT inList([List of valid values], field);"  and 
>> "CONSTRAINT not inList([List of valid values], field);"?
>> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not 
>> inList([p1], p2);"?
>> 
>> Can "[List of values]" point to a variable containing a list?  Or does it 
>> require hard coding in the constraint itself?
>> 
>> 
>> 
>> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella 
>> mailto:[email protected]>> wrote:
>>> Hi Štephan
>>> 
>>> I'll address the different points:
>>> 1)
>>> An example (possibly a stretch) of use case for != constraint would be:
>>> Let's say you have a table in which you want to record a movement, from 
>>> position p1 to position p2. You may want to check that those two are 
>>> different to make sure there is actual movement.
>>> 
>>> CREATE TABLE keyspace.table (
>>>   p1 int, 
>>>   p2 int,
>>>   ...,
>>>   CONSTRAINT p1 != p2
>>> );
>>> 
>>> For the case of ==, I agree that it is harder to come up with a valid use 
>>> case, and I added it for completion.
>>> 
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>>> 
>>> Please let me know if this helps,
>>> Bernardo
>>> 
>>> 
>>> 
 On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič 
 mailto:[email protected]>> wrote:
 
 Hi Bernardo,
 
 1) Could you elaborate on these two constraints?
 
 == and != ?
 
 What is the use case? Why would I want to have data in a database stored 
 in some column which would need to be _same as my constraint_ and which 
 _could not_ be same as my constraint? Can you give me at least one example 
 of each? It looks like I am going to put a constant into a database in 
 case of ==, wouldn't a static column be better?
 
 2) For examples of text based types you mentioned: "is part of an enum" - 
 how would you enforce this in Cassandra? What enum do we have in CQL?
 3) What does "is it block listed" mean?
 
 In the meanwhile, I made changes to CEP-24 to move transactionality into 
 optional features.
 
 On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
 mailto:[email protected]>> wrote:
> Hi everyone,
> 
> After the feedback, I'd like to make a recap of what we have discussed in 
> this 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Štefan Miklošovič
My gut feeling is that anything beyond simple comparisons is just too
problematic / complex. I think that this should be part of the application
logic rather than putting that to the database. Is there any major database
out there which has constraints modelled like that? (belongsToEnum,
isNotBlocked, inList ...). It just opens a lot of questions, like how would
we treat nulls? How would this be supported in the driver? Etc ...



On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev <
[email protected]> wrote:

> 2)
>> Is part of an enum is somehow suplying the lack of enum types. Constraint
>> could be something like CONSTRAINT belongsToEnum([list of valid values],
>> field):
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>>   ...
>> );
>> 3)
>> Similarly, we can check and reject if a term is part of a list of blocked
>> terms:
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
>> field),
>>   ...
>> );
>
>
> Are these not just "CONSTRAINT inList([List of valid values], field);"
> and "CONSTRAINT not inList([List of valid values], field);"?
> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
> inList([p1], p2);"?
>
> Can "[List of values]" point to a variable containing a list?  Or does it
> require hard coding in the constraint itself?
>
>
>
> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
> [email protected]> wrote:
>
>> Hi Štephan
>>
>> I'll address the different points:
>> 1)
>> An example (possibly a stretch) of use case for != constraint would be:
>> Let's say you have a table in which you want to record a movement, from
>> position p1 to position p2. You may want to check that those two are
>> different to make sure there is actual movement.
>>
>> CREATE TABLE keyspace.table (
>>   p1 int,
>>   p2 int,
>>   ...,
>>   CONSTRAINT p1 != p2
>> );
>>
>> For the case of ==, I agree that it is harder to come up with a valid use
>> case, and I added it for completion.
>>
>> 2)
>> Is part of an enum is somehow suplying the lack of enum types. Constraint
>> could be something like CONSTRAINT belongsToEnum([list of valid values],
>> field):
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>>   ...
>> );
>>
>> 3)
>> Similarly, we can check and reject if a term is part of a list of blocked
>> terms:
>> CREATE TABLE keyspace.table (
>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
>> field),
>>   ...
>> );
>>
>> Please let me know if this helps,
>> Bernardo
>>
>>
>>
>> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
>> [email protected]> wrote:
>>
>> Hi Bernardo,
>>
>> 1) Could you elaborate on these two constraints?
>>
>> == and != ?
>>
>> What is the use case? Why would I want to have data in a database stored
>> in some column which would need to be _same as my constraint_ and which
>> _could not_ be same as my constraint? Can you give me at least one example
>> of each? It looks like I am going to put a constant into a database in case
>> of ==, wouldn't a static column be better?
>>
>> 2) For examples of text based types you mentioned: "is part of an enum" -
>> how would you enforce this in Cassandra? What enum do we have in CQL?
>> 3) What does "is it block listed" mean?
>>
>> In the meanwhile, I made changes to CEP-24 to move transactionality into
>> optional features.
>>
>> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
>> [email protected]> wrote:
>>
>>> Hi everyone,
>>>
>>> After the feedback, I'd like to make a recap of what we have discussed
>>> in this thread and try to move forward with the conversation.
>>>
>>> I made some clarifications:
>>> - Constraints are only applied at write time.
>>> - Guardrail configurations should maintain preference over what's being
>>> defined as a constraint.
>>>
>>> *Specify constraints:*
>>> There is a general feedback around adding more concrete examples than
>>> the ones that can be found on the CEP document.
>>> Basically, the initial constraints I am proposing are:
>>> - SizeOf Constraint for String types, as in
>>> name text CONSTRAINT sizeOf(name) < 256
>>>
>>> - Value Constraint for numeric types
>>> number_of_items int CONSTRAINT number_of_items < 1000
>>>
>>> Those two alone and combined provide a lot of flexibility, and allow
>>> complex validations that enable "new types" such as:
>>>
>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>   ip_adress inet,
>>>   subnet_mask int,
>>>   CONSTRAINT subnet_mask > 0,
>>>   CONSTRAINT subnet_mask < 32
>>> )
>>>
>>> CREATE TYPE keyspace.color (
>>>   r int,
>>>   g int,
>>>   b int,
>>>   CONSTRAINT r >= 0,
>>>   CONSTRAINT r < 255,
>>>   CONSTRAINT g >= 0,
>>>   CONSTRAINT g < 255,
>>>   CONSTRAINT b >= 0,
>>>   CONSTRAINT b < 255,
>>> )
>>>
>>>
>>> Those two initial Constraints are de fundamental constraints that would
>>> give value to the feature. The framewor

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Claude Warren, Jr via dev
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );


Are these not just "CONSTRAINT inList([List of valid values], field);"  and
"CONSTRAINT not inList([List of valid values], field);"?
At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
inList([p1], p2);"?

Can "[List of values]" point to a variable containing a list?  Or does it
require hard coding in the constraint itself?



On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
[email protected]> wrote:

> Hi Štephan
>
> I'll address the different points:
> 1)
> An example (possibly a stretch) of use case for != constraint would be:
> Let's say you have a table in which you want to record a movement, from
> position p1 to position p2. You may want to check that those two are
> different to make sure there is actual movement.
>
> CREATE TABLE keyspace.table (
>   p1 int,
>   p2 int,
>   ...,
>   CONSTRAINT p1 != p2
> );
>
> For the case of ==, I agree that it is harder to come up with a valid use
> case, and I added it for completion.
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
>
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );
>
> Please let me know if this helps,
> Bernardo
>
>
>
> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
> [email protected]> wrote:
>
> Hi Bernardo,
>
> 1) Could you elaborate on these two constraints?
>
> == and != ?
>
> What is the use case? Why would I want to have data in a database stored
> in some column which would need to be _same as my constraint_ and which
> _could not_ be same as my constraint? Can you give me at least one example
> of each? It looks like I am going to put a constant into a database in case
> of ==, wouldn't a static column be better?
>
> 2) For examples of text based types you mentioned: "is part of an enum" -
> how would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
>
> In the meanwhile, I made changes to CEP-24 to move transactionality into
> optional features.
>
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
> [email protected]> wrote:
>
>> Hi everyone,
>>
>> After the feedback, I'd like to make a recap of what we have discussed in
>> this thread and try to move forward with the conversation.
>>
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being
>> defined as a constraint.
>>
>> *Specify constraints:*
>> There is a general feedback around adding more concrete examples than the
>> ones that can be found on the CEP document.
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>>
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>>
>> Those two alone and combined provide a lot of flexibility, and allow
>> complex validations that enable "new types" such as:
>>
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>>
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> )
>>
>>
>> Those two initial Constraints are de fundamental constraints that would
>> give value to the feature. The framework can (and will) be extended with
>> other Constraints, leaving us with the following:
>>
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>>
>> For date types:
>> - Before (<)
>> - After (>)
>>
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>>
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>>
>> I have updated the CEP with this information.
>>
>> *Potential 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-11 Thread Bernardo Botella
Hi Štephan

I'll address the different points:
1)
An example (possibly a stretch) of use case for != constraint would be:
Let's say you have a table in which you want to record a movement, from 
position p1 to position p2. You may want to check that those two are different 
to make sure there is actual movement.

CREATE TABLE keyspace.table (
  p1 int, 
  p2 int,
  ...,
  CONSTRAINT p1 != p2
);

For the case of ==, I agree that it is harder to come up with a valid use case, 
and I added it for completion.

2)
Is part of an enum is somehow suplying the lack of enum types. Constraint could 
be something like CONSTRAINT belongsToEnum([list of valid values], field):
CREATE TABLE keyspace.table (
  field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
  ...
);

3)
Similarly, we can check and reject if a term is part of a list of blocked terms:
CREATE TABLE keyspace.table (
  field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], field), 
  ...
);

Please let me know if this helps,
Bernardo



> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič  
> wrote:
> 
> Hi Bernardo,
> 
> 1) Could you elaborate on these two constraints?
> 
> == and != ?
> 
> What is the use case? Why would I want to have data in a database stored in 
> some column which would need to be _same as my constraint_ and which _could 
> not_ be same as my constraint? Can you give me at least one example of each? 
> It looks like I am going to put a constant into a database in case of ==, 
> wouldn't a static column be better?
> 
> 2) For examples of text based types you mentioned: "is part of an enum" - how 
> would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
> 
> In the meanwhile, I made changes to CEP-24 to move transactionality into 
> optional features.
> 
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
> mailto:[email protected]>> wrote:
>> Hi everyone,
>> 
>> After the feedback, I'd like to make a recap of what we have discussed in 
>> this thread and try to move forward with the conversation.
>> 
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being 
>> defined as a constraint.
>> 
>> Specify constraints:
>> There is a general feedback around adding more concrete examples than the 
>> ones that can be found on the CEP document. 
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>> 
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>> 
>> Those two alone and combined provide a lot of flexibility, and allow complex 
>> validations that enable "new types" such as:
>> 
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> ) 
>> 
>> 
>> Those two initial Constraints are de fundamental constraints that would give 
>> value to the feature. The framework can (and will) be extended with other 
>> Constraints, leaving us with the following:
>> 
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>> 
>> For date types:
>> - Before (<)
>> - After (>)
>> 
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>> 
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>> 
>> I have updated the CEP with this information.
>> 
>> Potential dependency on CEP-24:
>> Giving that the Constraints Framework provides a set of checks to be 
>> performed along side those that can be made using the Guardrails framework, 
>> there may be some relation with CEP-24, which mentions transactional 
>> Guardrails to prevent situation in which the limit configurations are 
>> different across the cluster.
>> 
>> This CEP-42 is not proposing modifying the Guardrails framework, and 
>> therefore should not be affected by CEP-24. It is true that the improvements 
>> provided by CEP-24 would benefit this Constraints framework, but it is not 
>> dependent on them.
>> 
>> 
>> I hope I included all the points and addressed them on the CEP, otherwise, 
>> please call it out and I’ll be more than happy to include it.
>> 
>> Thanks everyone for all the inputs!
>> Bernardo
>> 
>>> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič >> > wrote:
>>> 
>>> How I see it is that in 5.1 there will be TCM for the very f

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-11 Thread Štefan Miklošovič
Hi Bernardo,

1) Could you elaborate on these two constraints?

== and != ?

What is the use case? Why would I want to have data in a database stored in
some column which would need to be _same as my constraint_ and which _could
not_ be same as my constraint? Can you give me at least one example of
each? It looks like I am going to put a constant into a database in case of
==, wouldn't a static column be better?

2) For examples of text based types you mentioned: "is part of an enum" -
how would you enforce this in Cassandra? What enum do we have in CQL?
3) What does "is it block listed" mean?

In the meanwhile, I made changes to CEP-24 to move transactionality into
optional features.

On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
[email protected]> wrote:

> Hi everyone,
>
> After the feedback, I'd like to make a recap of what we have discussed in
> this thread and try to move forward with the conversation.
>
> I made some clarifications:
> - Constraints are only applied at write time.
> - Guardrail configurations should maintain preference over what's being
> defined as a constraint.
>
> *Specify constraints:*
> There is a general feedback around adding more concrete examples than the
> ones that can be found on the CEP document.
> Basically, the initial constraints I am proposing are:
> - SizeOf Constraint for String types, as in
> name text CONSTRAINT sizeOf(name) < 256
>
> - Value Constraint for numeric types
> number_of_items int CONSTRAINT number_of_items < 1000
>
> Those two alone and combined provide a lot of flexibility, and allow
> complex validations that enable "new types" such as:
>
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> CREATE TYPE keyspace.color (
>   r int,
>   g int,
>   b int,
>   CONSTRAINT r >= 0,
>   CONSTRAINT r < 255,
>   CONSTRAINT g >= 0,
>   CONSTRAINT g < 255,
>   CONSTRAINT b >= 0,
>   CONSTRAINT b < 255,
> )
>
>
> Those two initial Constraints are de fundamental constraints that would
> give value to the feature. The framework can (and will) be extended with
> other Constraints, leaving us with the following:
>
> For numeric types:
> - Max (<)
> - Min (>)
> - Equality ( = = )
> - Difference (!=)
>
> For date types:
> - Before (<)
> - After (>)
>
> For text based types:
> - Size (sizeOf)
> - isJson (is the text a json?)
> - complies with a given pattern
> - Is it block listed?
> - Is it part of an enum?
>
> General table constraints (including more than one column):
> - Compare between numeric types (a < b, a > b, a != b, …)
> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>
> I have updated the CEP with this information.
>
> *Potential dependency on CEP-24:*
> Giving that the Constraints Framework provides a set of checks to be
> performed along side those that can be made using the Guardrails framework,
> there may be some relation with CEP-24, which mentions transactional
> Guardrails to prevent situation in which the limit configurations are
> different across the cluster.
>
> This CEP-42 is not proposing modifying the Guardrails framework, and
> therefore should not be affected by CEP-24. It is true that the
> improvements provided by CEP-24 would benefit this Constraints framework,
> but it is not dependent on them.
>
>
> I hope I included all the points and addressed them on the CEP, otherwise,
> please call it out and I’ll be more than happy to include it.
>
> Thanks everyone for all the inputs!
> Bernardo
>
> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič <
> [email protected]> wrote:
>
> How I see it is that in 5.1 there will be TCM for the very first time and
> I do not think that config in TCM would make it into 5.1 based on what Sam
> talks about (need for some stability etc), that makes total sense to me.
> TCM is quite a big feature to deliver on its own and putting even way more
> stuff into that might be detrimental to the quality if we rush it.
>
> Then sometimes after 5.1 we might take a serious look for config in TCM
> itself.
>
> My plan, ideally, is to still ship CEP-24 without config in TCM, then
> after 5.1 when config in TCM lands, CEP-24 might integrate with that on a
> deeper level.
>
> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case
> might be done about that as well (integration with guardrails).
>
> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe  wrote:
>
>> We've been working on a draft CEP for migrating config from yaml to
>> cluster metadata but have been a bit short of time recently, I'll try to
>> get something out for discussion as soon as possible.
>> A little delay isn't such a bad thing IMO, as we're still ironing out the
>> kinks in the TCM implementation itself. It'd be good to get a bit more road
>> testing done with that before we start adding more to it, which I'm sure
>> will start to ramp up once 5.0 is out.
>>
>> Thanks,
>> Sam
>>
>> On 7

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-10 Thread Bernardo Botella
Hi everyone,

After the feedback, I'd like to make a recap of what we have discussed in this 
thread and try to move forward with the conversation.

I made some clarifications:
- Constraints are only applied at write time.
- Guardrail configurations should maintain preference over what's being defined 
as a constraint.

Specify constraints:
There is a general feedback around adding more concrete examples than the ones 
that can be found on the CEP document. 
Basically, the initial constraints I am proposing are:
- SizeOf Constraint for String types, as in
name text CONSTRAINT sizeOf(name) < 256

- Value Constraint for numeric types
number_of_items int CONSTRAINT number_of_items < 1000

Those two alone and combined provide a lot of flexibility, and allow complex 
validations that enable "new types" such as:

CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
)

CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 


Those two initial Constraints are de fundamental constraints that would give 
value to the feature. The framework can (and will) be extended with other 
Constraints, leaving us with the following:

For numeric types:
- Max (<)
- Min (>)
- Equality ( = = )
- Difference (!=)

For date types:
- Before (<)
- After (>)

For text based types:
- Size (sizeOf)
- isJson (is the text a json?)
- complies with a given pattern
- Is it block listed?
- Is it part of an enum?

General table constraints (including more than one column):
- Compare between numeric types (a < b, a > b, a != b, …)
- Compare between date types (date1 < date2, date1>date2, date1!=date2, …)

I have updated the CEP with this information.

Potential dependency on CEP-24:
Giving that the Constraints Framework provides a set of checks to be performed 
along side those that can be made using the Guardrails framework, there may be 
some relation with CEP-24, which mentions transactional Guardrails to prevent 
situation in which the limit configurations are different across the cluster.

This CEP-42 is not proposing modifying the Guardrails framework, and therefore 
should not be affected by CEP-24. It is true that the improvements provided by 
CEP-24 would benefit this Constraints framework, but it is not dependent on 
them.


I hope I included all the points and addressed them on the CEP, otherwise, 
please call it out and I’ll be more than happy to include it.

Thanks everyone for all the inputs!
Bernardo

> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič  
> wrote:
> 
> How I see it is that in 5.1 there will be TCM for the very first time and I 
> do not think that config in TCM would make it into 5.1 based on what Sam 
> talks about (need for some stability etc), that makes total sense to me. TCM 
> is quite a big feature to deliver on its own and putting even way more stuff 
> into that might be detrimental to the quality if we rush it.
> 
> Then sometimes after 5.1 we might take a serious look for config in TCM 
> itself.
> 
> My plan, ideally, is to still ship CEP-24 without config in TCM, then after 
> 5.1 when config in TCM lands, CEP-24 might integrate with that on a deeper 
> level.
> 
> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case 
> might be done about that as well (integration with guardrails).
> 
> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe  > wrote:
>> We've been working on a draft CEP for migrating config from yaml to cluster 
>> metadata but have been a bit short of time recently, I'll try to get 
>> something out for discussion as soon as possible. 
>> A little delay isn't such a bad thing IMO, as we're still ironing out the 
>> kinks in the TCM implementation itself. It'd be good to get a bit more road 
>> testing done with that before we start adding more to it, which I'm sure 
>> will start to ramp up once 5.0 is out.  
>> 
>> Thanks,
>> Sam
>> 
>>> On 7 Jun 2024, at 19:19, Štefan Miklošovič >> > wrote:
>>> 
>>> Yes, all configuration should be transactional (configuration which makes 
>>> sense to require to be the same cluster-wide). Guardrails in TCM are just a 
>>> subset of this problem. When I started to do CEP-24 I started with 
>>> guardrails in TCM but then I realized it leads to more general "all config 
>>> in TCM" and I found myself rabbit-hole-ing endlessly.
>>> 
>>> BTW I do not think that once CEP-24 is in place without guardrails in TCM 
>>> then implementing it would blow up things a lot. It is really just about a 
>>> couple mutable virtual tables and a couple transformations for various 
>>> guardrail types we have but I expect that its integration into more general 
>>> config in TCM should be rather straightforward.
>>> 
>>> Config in TCM definitely deserves its own CEP, it is too much to hand

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Štefan Miklošovič
How I see it is that in 5.1 there will be TCM for the very first time and I
do not think that config in TCM would make it into 5.1 based on what Sam
talks about (need for some stability etc), that makes total sense to me.
TCM is quite a big feature to deliver on its own and putting even way more
stuff into that might be detrimental to the quality if we rush it.

Then sometimes after 5.1 we might take a serious look for config in TCM
itself.

My plan, ideally, is to still ship CEP-24 without config in TCM, then after
5.1 when config in TCM lands, CEP-24 might integrate with that on a deeper
level.

If CEP-42 (this one) makes it into 5.1 as well, I think the similar case
might be done about that as well (integration with guardrails).

On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe  wrote:

> We've been working on a draft CEP for migrating config from yaml to
> cluster metadata but have been a bit short of time recently, I'll try to
> get something out for discussion as soon as possible.
> A little delay isn't such a bad thing IMO, as we're still ironing out the
> kinks in the TCM implementation itself. It'd be good to get a bit more road
> testing done with that before we start adding more to it, which I'm sure
> will start to ramp up once 5.0 is out.
>
> Thanks,
> Sam
>
> On 7 Jun 2024, at 19:19, Štefan Miklošovič 
> wrote:
>
> Yes, all configuration should be transactional (configuration which makes
> sense to require to be the same cluster-wide). Guardrails in TCM are just a
> subset of this problem. When I started to do CEP-24 I started with
> guardrails in TCM but then I realized it leads to more general "all config
> in TCM" and I found myself rabbit-hole-ing endlessly.
>
> BTW I do not think that once CEP-24 is in place without guardrails in TCM
> then implementing it would blow up things a lot. It is really just about a
> couple mutable virtual tables and a couple transformations for various
> guardrail types we have but I expect that its integration into more general
> config in TCM should be rather straightforward.
>
> Config in TCM definitely deserves its own CEP, it is too much to handle
> under CEP-24 and CEP-24 can go without it already. It just put a little bit
> more configuration acumen to nail it down correctly.
>
> Regards
>
> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer  wrote:
>
>> There’s a difference between the two though. Constraints are part of the
>> table schema, and (independent of the interaction with Guardrails), have no
>> dependency on yaml files being perfectly in sync across the cluster.
>> Therefore, the feature (Constraints) on its own doesn’t depend on
>> configuration files to be correct in its own right. The only place where
>> this isn’t true is it’s interaction with Guardrails, which happen to be
>> yaml-file based and cause issues.
>>
>> CEP-24’s password length requirements, however, is intended to be
>> implemented *by adding a new guardrail*, which is totally dependent on
>> YAML files today (and thus the concerns around a single misconfigured
>> server allowing someone to use an insecure password). If CEP-24 fixes
>> guardrails’ dependence on yaml files, it would *also* fix the
>> problematic interaction between guardrails and constraints.
>>
>> I agree that it would be incredibly valuable to find a solution to the
>> “yaml files need to be correct everywhere or something breaks” problem, and
>> I think CEP-24, being security-focused, is more likely to be problematic
>> without a solution to this issue. That said, I think Dinesh is right in
>> that, at the end of the day, CEP-24 could be implemented without fixing the
>> yaml config issue.
>>
>> I do wonder if the “Guardrails should be transactional” should really be
>> “configuration should be transactional”, or at least as much config as
>> possible should be, but that would blow up CEP-24 fairly dramatically
>> (maybe?). Maybe “cluster-wide configuration should be read from a
>> distributed source on startup/joining the cluster” or something would make
>> sense, so the yaml file works as the source of truth on startup, but as
>> soon as possible it’s read from a TCM-backed data source, and anything the
>> node can get from other nodes it would… but now I’m designing a different
>> CEP in a discuss thread, which is probably a bad idea...
>>
>> Regardless, I hope that I’m explaining why I see a difference between
>> constraints and guardrails, and why I think it makes sense that constraints
>> can move forward without a solution the misconfiguration problem where I
>> also think you were right in calling it out in CEP-24 (even if we
>> eventually move forward on CEP-24 without the solution in place).
>>
>> Doug
>>
>>
>>
>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi  wrote:
>>
>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
>> [email protected]> wrote:
>>
>>> It is interesting to see this feedback. When I look at CEP-24 where I am
>>> obsessing about a user being able to misconfigure the password val

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Sam Tunnicliffe
We've been working on a draft CEP for migrating config from yaml to cluster 
metadata but have been a bit short of time recently, I'll try to get something 
out for discussion as soon as possible. 
A little delay isn't such a bad thing IMO, as we're still ironing out the kinks 
in the TCM implementation itself. It'd be good to get a bit more road testing 
done with that before we start adding more to it, which I'm sure will start to 
ramp up once 5.0 is out.  

Thanks,
Sam

> On 7 Jun 2024, at 19:19, Štefan Miklošovič  
> wrote:
> 
> Yes, all configuration should be transactional (configuration which makes 
> sense to require to be the same cluster-wide). Guardrails in TCM are just a 
> subset of this problem. When I started to do CEP-24 I started with guardrails 
> in TCM but then I realized it leads to more general "all config in TCM" and I 
> found myself rabbit-hole-ing endlessly.
> 
> BTW I do not think that once CEP-24 is in place without guardrails in TCM 
> then implementing it would blow up things a lot. It is really just about a 
> couple mutable virtual tables and a couple transformations for various 
> guardrail types we have but I expect that its integration into more general 
> config in TCM should be rather straightforward.
> 
> Config in TCM definitely deserves its own CEP, it is too much to handle under 
> CEP-24 and CEP-24 can go without it already. It just put a little bit more 
> configuration acumen to nail it down correctly. 
> 
> Regards
> 
> On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer  > wrote:
>> There’s a difference between the two though. Constraints are part of the 
>> table schema, and (independent of the interaction with Guardrails), have no 
>> dependency on yaml files being perfectly in sync across the cluster. 
>> Therefore, the feature (Constraints) on its own doesn’t depend on 
>> configuration files to be correct in its own right. The only place where 
>> this isn’t true is it’s interaction with Guardrails, which happen to be 
>> yaml-file based and cause issues. 
>> 
>> CEP-24’s password length requirements, however, is intended to be 
>> implemented by adding a new guardrail, which is totally dependent on YAML 
>> files today (and thus the concerns around a single misconfigured server 
>> allowing someone to use an insecure password). If CEP-24 fixes guardrails’ 
>> dependence on yaml files, it would also fix the problematic interaction 
>> between guardrails and constraints.
>> 
>> I agree that it would be incredibly valuable to find a solution to the “yaml 
>> files need to be correct everywhere or something breaks” problem, and I 
>> think CEP-24, being security-focused, is more likely to be problematic 
>> without a solution to this issue. That said, I think Dinesh is right in 
>> that, at the end of the day, CEP-24 could be implemented without fixing the 
>> yaml config issue.
>> 
>> I do wonder if the “Guardrails should be transactional” should really be 
>> “configuration should be transactional”, or at least as much config as 
>> possible should be, but that would blow up CEP-24 fairly dramatically 
>> (maybe?). Maybe “cluster-wide configuration should be read from a 
>> distributed source on startup/joining the cluster” or something would make 
>> sense, so the yaml file works as the source of truth on startup, but as soon 
>> as possible it’s read from a TCM-backed data source, and anything the node 
>> can get from other nodes it would… but now I’m designing a different CEP in 
>> a discuss thread, which is probably a bad idea...
>> 
>> Regardless, I hope that I’m explaining why I see a difference between 
>> constraints and guardrails, and why I think it makes sense that constraints 
>> can move forward without a solution the misconfiguration problem where I 
>> also think you were right in calling it out in CEP-24 (even if we eventually 
>> move forward on CEP-24 without the solution in place).
>> 
>> Doug
>> 
>> 
>> 
>>> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi >> > wrote:
>>> 
>>> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič 
>>> mailto:[email protected]>> wrote:
 It is interesting to see this feedback. When I look at CEP-24 where I am 
 obsessing about a user being able to misconfigure the password validation 
 strength so if a user hits a "weak" node then she would be able to bypass 
 it, and I see what is our approach here, then I am not sure what I was 
 waiting so long for and I should probably be just more aggressive with the 
 CEP and all the "caveats" could be just overlooked and deferred to 
 "sometimes later".
>>> 
>>> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread. 
>>> Had I paid attention I would have suggested waiting on TCM doesn't make the 
>>> feature any different. The feature is less likely to be misconfigured in a 
>>> cluster. CEP-24 is valuable and password compliance with policies is a 
>>> super useful feature whic

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Štefan Miklošovič
Yes, all configuration should be transactional (configuration which makes
sense to require to be the same cluster-wide). Guardrails in TCM are just a
subset of this problem. When I started to do CEP-24 I started with
guardrails in TCM but then I realized it leads to more general "all config
in TCM" and I found myself rabbit-hole-ing endlessly.

BTW I do not think that once CEP-24 is in place without guardrails in TCM
then implementing it would blow up things a lot. It is really just about a
couple mutable virtual tables and a couple transformations for various
guardrail types we have but I expect that its integration into more general
config in TCM should be rather straightforward.

Config in TCM definitely deserves its own CEP, it is too much to handle
under CEP-24 and CEP-24 can go without it already. It just put a little bit
more configuration acumen to nail it down correctly.

Regards

On Fri, Jun 7, 2024 at 8:12 PM Doug Rohrer  wrote:

> There’s a difference between the two though. Constraints are part of the
> table schema, and (independent of the interaction with Guardrails), have no
> dependency on yaml files being perfectly in sync across the cluster.
> Therefore, the feature (Constraints) on its own doesn’t depend on
> configuration files to be correct in its own right. The only place where
> this isn’t true is it’s interaction with Guardrails, which happen to be
> yaml-file based and cause issues.
>
> CEP-24’s password length requirements, however, is intended to be
> implemented *by adding a new guardrail*, which is totally dependent on
> YAML files today (and thus the concerns around a single misconfigured
> server allowing someone to use an insecure password). If CEP-24 fixes
> guardrails’ dependence on yaml files, it would *also* fix the problematic
> interaction between guardrails and constraints.
>
> I agree that it would be incredibly valuable to find a solution to the
> “yaml files need to be correct everywhere or something breaks” problem, and
> I think CEP-24, being security-focused, is more likely to be problematic
> without a solution to this issue. That said, I think Dinesh is right in
> that, at the end of the day, CEP-24 could be implemented without fixing the
> yaml config issue.
>
> I do wonder if the “Guardrails should be transactional” should really be
> “configuration should be transactional”, or at least as much config as
> possible should be, but that would blow up CEP-24 fairly dramatically
> (maybe?). Maybe “cluster-wide configuration should be read from a
> distributed source on startup/joining the cluster” or something would make
> sense, so the yaml file works as the source of truth on startup, but as
> soon as possible it’s read from a TCM-backed data source, and anything the
> node can get from other nodes it would… but now I’m designing a different
> CEP in a discuss thread, which is probably a bad idea...
>
> Regardless, I hope that I’m explaining why I see a difference between
> constraints and guardrails, and why I think it makes sense that constraints
> can move forward without a solution the misconfiguration problem where I
> also think you were right in calling it out in CEP-24 (even if we
> eventually move forward on CEP-24 without the solution in place).
>
> Doug
>
>
>
> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi  wrote:
>
> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
> [email protected]> wrote:
>
>> It is interesting to see this feedback. When I look at CEP-24 where I am
>> obsessing about a user being able to misconfigure the password validation
>> strength so if a user hits a "weak" node then she would be able to bypass
>> it, and I see what is our approach here, then I am not sure what I was
>> waiting so long for and I should probably be just more aggressive with the
>> CEP and all the "caveats" could be just overlooked and deferred to
>> "sometimes later".
>>
>
> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread.
> Had I paid attention I would have suggested waiting on TCM doesn't make
> the feature any different. The feature is less likely to be misconfigured
> in a cluster. CEP-24 is valuable and password compliance with policies is a
> super useful feature which IMO shouldn't have been held back due to lack of
> TCM.
>
>
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Doug Rohrer
There’s a difference between the two though. Constraints are part of the table 
schema, and (independent of the interaction with Guardrails), have no 
dependency on yaml files being perfectly in sync across the cluster. Therefore, 
the feature (Constraints) on its own doesn’t depend on configuration files to 
be correct in its own right. The only place where this isn’t true is it’s 
interaction with Guardrails, which happen to be yaml-file based and cause 
issues. 

CEP-24’s password length requirements, however, is intended to be implemented 
by adding a new guardrail, which is totally dependent on YAML files today (and 
thus the concerns around a single misconfigured server allowing someone to use 
an insecure password). If CEP-24 fixes guardrails’ dependence on yaml files, it 
would also fix the problematic interaction between guardrails and constraints.

I agree that it would be incredibly valuable to find a solution to the “yaml 
files need to be correct everywhere or something breaks” problem, and I think 
CEP-24, being security-focused, is more likely to be problematic without a 
solution to this issue. That said, I think Dinesh is right in that, at the end 
of the day, CEP-24 could be implemented without fixing the yaml config issue.

I do wonder if the “Guardrails should be transactional” should really be 
“configuration should be transactional”, or at least as much config as possible 
should be, but that would blow up CEP-24 fairly dramatically (maybe?). Maybe 
“cluster-wide configuration should be read from a distributed source on 
startup/joining the cluster” or something would make sense, so the yaml file 
works as the source of truth on startup, but as soon as possible it’s read from 
a TCM-backed data source, and anything the node can get from other nodes it 
would… but now I’m designing a different CEP in a discuss thread, which is 
probably a bad idea...

Regardless, I hope that I’m explaining why I see a difference between 
constraints and guardrails, and why I think it makes sense that constraints can 
move forward without a solution the misconfiguration problem where I also think 
you were right in calling it out in CEP-24 (even if we eventually move forward 
on CEP-24 without the solution in place).

Doug



> On Jun 7, 2024, at 1:51 AM, Dinesh Joshi  wrote:
> 
> On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič  > wrote:
>> It is interesting to see this feedback. When I look at CEP-24 where I am 
>> obsessing about a user being able to misconfigure the password validation 
>> strength so if a user hits a "weak" node then she would be able to bypass 
>> it, and I see what is our approach here, then I am not sure what I was 
>> waiting so long for and I should probably be just more aggressive with the 
>> CEP and all the "caveats" could be just overlooked and deferred to 
>> "sometimes later".
> 
> Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread. Had 
> I paid attention I would have suggested waiting on TCM doesn't make the 
> feature any different. The feature is less likely to be misconfigured in a 
> cluster. CEP-24 is valuable and password compliance with policies is a super 
> useful feature which IMO shouldn't have been held back due to lack of TCM.
>  



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Bernardo Botella
My concern about mentioning other potential constraints to be implemented in 
the future on the CEP is it may derail the conversation from the set of initial 
ones I want to propose, which are size and value constraints. There is 
definitely a lot of other potential constraints that we could discuss in future 
updates. For example:

For numeric types:
- Max, Min, equality, difference (included)

For date types:
- Range (as you mentioned)

For text based types:
- Size (included)
- isJson
- complies with a pattern (as you mentioned)
- is block listed
- complies with an enum

General table constraints (including one or more columns):
- Compare between numeric types (a < b, a > b, a != b, …)
- Compare between date types (date1 < date2, date1>date2, date1!=date2, …)

Do you think this CEP should also contain those?

And, about your question, the answer is yes. Take a look at the Color example 
that I mentioned above:
CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 

Here, you have more than one constraint per column to form a composite object. 
Similar things should be supported at table level.

I hope this helps,
Bernardo



> On Jun 6, 2024, at 11:08 PM, Dinesh Joshi  wrote:
> 
> On Thu, Jun 6, 2024 at 1:50 PM Bernardo Botella  > wrote:
>> I will update the CEP being specific with the two specific Constraint types 
>> I will be adding, which are size and value (the ones shown in the example). 
> 
> Could you identify constraints for the most common data types? It would be 
> nice to ship a good set of default constraints. For example, it would be nice 
> to constrain numeric & date data types within a range, text could comply with 
> a pattern, etc.
> 
> One question that I'm not sure if it came up, is whether a column could have 
> multiple constraints?
> 
> Dinesh
> 
> 



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Dinesh Joshi
On Thu, Jun 6, 2024 at 1:50 PM Bernardo Botella <
[email protected]> wrote:

> I will update the CEP being specific with the two specific Constraint
> types I will be adding, which are size and value (the ones shown in the
> example).
>

Could you identify constraints for the most common data types? It would be
nice to ship a good set of default constraints. For example, it would be
nice to constrain numeric & date data types within a range, text could
comply with a pattern, etc.

One question that I'm not sure if it came up, is whether a column could
have multiple constraints?

Dinesh


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Dinesh Joshi
On Thu, Jun 6, 2024 at 1:03 PM Štefan Miklošovič <
[email protected]> wrote:

> It is interesting to see this feedback. When I look at CEP-24 where I am
> obsessing about a user being able to misconfigure the password validation
> strength so if a user hits a "weak" node then she would be able to bypass
> it, and I see what is our approach here, then I am not sure what I was
> waiting so long for and I should probably be just more aggressive with the
> CEP and all the "caveats" could be just overlooked and deferred to
> "sometimes later".
>

Stefan, unfortunately I didn't participate in the CEP-24 DISCUSS thread.
Had I paid attention I would have suggested waiting on TCM doesn't make
the feature any different. The feature is less likely to be misconfigured
in a cluster. CEP-24 is valuable and password compliance with policies is a
super useful feature which IMO shouldn't have been held back due to lack of
TCM.


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Bernardo Botella
Thanks for the clarification Jon.

I will update the CEP being specific with the two specific Constraint types I 
will be adding, which are size and value (the ones shown in the example). 

And, just to clarify, the mention to extensibility just aims to state that the 
feature should be built in a way that allow more constraints being added. 
 


> On Jun 5, 2024, at 9:24 PM, Jon Haddad  wrote:
> 
> I think there's some promising ideas here, but the CEP needs to be developed 
> a bit more.
> 
> > Another types of constraints and functions can be added in the future to 
> > provide even more flexibility, but are out of the scope of this CEP.
> 
> > For the third point, I didn’t want to be prescriptive on what those 
> > validations should be, but the fact that the proposal is extensible to 
> > those potential use cases is something concrete that, in my opinion, comes 
> > as a benefit of the actual proposal. I’d be happy to develop a bit more the 
> > main example used of sizeOf if it helps alleviate your concerns on this 
> > point.
> 
> I disagree, quite strongly, with this.  While I appreciate extensibility, I 
> think having a variety of actual constraints that ship with the feature means 
> it needs to be built to satisfy real world use cases.  Without going through 
> this process, it feels a bit too much like triggers, UDAs and UDFs  - 
> incomplete, and too much left to the end user.  
> 
> To me, punting on thinking through constraints kicks the most important can 
> down the road.  
> 
> Jon
> 
> 
> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella  <mailto:[email protected]>> wrote:
>> In the CEP document there is another example (altho not explicetly 
>> mentioned) adding a constraint to the max value of an int -> 
>> `number_of_items int CONSTRAINT number_of_items < 1000`
>> 
>> This basic example can also be used to expand on how to extend this 
>> functionality with these two initial constraints (size and value), by 
>> composing them to create new data types with proper validation. 
>> 
>> For example, this could create an ipv4 with built in validation:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> ) 
>> 
>> Or a color type:
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> ) 
>> 
>> 
>> Another types of constraints and functions can be added in the future to 
>> provide even more flexibility, but are out of the scope of this CEP.
>> 
>> Bernardo
>> 
>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad >> <mailto:[email protected]>> wrote:
>>> 
>>> The idea is interesting.  I think it would help to have more concrete 
>>> examples.  It's a bit sparse at the moment, and I have a hard time getting 
>>> on board with new features where the main selling point is Extensibility 
>>> over the value they provide on their own.  
>>> 
>>> I think it would help a lot if we knew what types of constraints, besides 
>>> the size check, you were thinking of adding.
>>> 
>>> Jon
>>> 
>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella 
>>> mailto:[email protected]>> wrote:
>>>> Yes, that is correct. This particular behavior will need CEP-24 in order 
>>>> to work reliably. But, if my understanding is correct, that statement 
>>>> holds true for the entirety of Guardrails, and not only for this 
>>>> particular feature.
>>>> 
>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan 
>>>>> mailto:[email protected]>> 
>>>>> wrote:
>>>>> 
>>>>> That would work reliably in case there is no way how to misconfigure 
>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>> you don’t set it (or set it differently) on the other? If it is 
>>>>> configured differently and you want to check the guardrails if 
>>>>> constraints do not violate them, then your query might fail or not based 
>>>>> on what node is hit. 
>>>>>  
>>>>> I guess that guardrails would need to start to be transactional to be 
>>>>> sure this is avoided and guardrails 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Jon Haddad
ble to
>>>>>> those potential use cases is something concrete that, in my opinion, 
>>>>>> comes
>>>>>> as a benefit of the actual proposal. I’d be happy to develop a bit more 
>>>>>> the
>>>>>> main example used of sizeOf if it helps alleviate your concerns on this
>>>>>> point.
>>>>>>
>>>>>> I disagree, quite strongly, with this.  While I appreciate
>>>>>> extensibility, I think having a variety of actual constraints that ship
>>>>>> with the feature means it needs to be built to satisfy real world use
>>>>>> cases.  Without going through this process, it feels a bit too much like
>>>>>> triggers, UDAs and UDFs  - incomplete, and too much left to the end user.
>>>>>>
>>>>>> To me, punting on thinking through constraints kicks the most
>>>>>> important can down the road.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> In the CEP document there is another example (altho not explicetly
>>>>>>> mentioned) adding a constraint to the max value of an int ->
>>>>>>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>>>>>>
>>>>>>> This basic example can also be used to expand on how to extend this
>>>>>>> functionality with these two initial constraints (size and value), by
>>>>>>> composing them to create new data types with proper validation.
>>>>>>>
>>>>>>> For example, this could create an ipv4 with built in validation:
>>>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>>>   ip_adress inet,
>>>>>>>   subnet_mask int,
>>>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>>>   CONSTRAINT subnet_mask < 32
>>>>>>> )
>>>>>>>
>>>>>>> Or a color type:
>>>>>>> CREATE TYPE keyspace.color (
>>>>>>>   r int,
>>>>>>>   g int,
>>>>>>>   b int,
>>>>>>>   CONSTRAINT r >= 0,
>>>>>>>   CONSTRAINT r < 255,
>>>>>>>   CONSTRAINT g >= 0,
>>>>>>>   CONSTRAINT g < 255,
>>>>>>>   CONSTRAINT b >= 0,
>>>>>>>   CONSTRAINT b < 255,
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> Another types of constraints and functions can be added in the
>>>>>>> future to provide even more flexibility, but are out of the scope of 
>>>>>>> this
>>>>>>> CEP.
>>>>>>>
>>>>>>> Bernardo
>>>>>>>
>>>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>>>>>>>
>>>>>>> The idea is interesting.  I think it would help to have more
>>>>>>> concrete examples.  It's a bit sparse at the moment, and I have a hard 
>>>>>>> time
>>>>>>> getting on board with new features where the main selling point
>>>>>>> is Extensibility over the value they provide on their own.
>>>>>>>
>>>>>>> I think it would help a lot if we knew what types of constraints,
>>>>>>> besides the size check, you were thinking of adding.
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>>>> statement
>>>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>>>> particular
>>>>>>>> feature.
>>>>>>>>
>>>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>>>> [email protected]> wrote:
>>>>>>&g

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Štefan Miklošovič
tems int CONSTRAINT number_of_items < 1000`
>>>>>>
>>>>>> This basic example can also be used to expand on how to extend this
>>>>>> functionality with these two initial constraints (size and value), by
>>>>>> composing them to create new data types with proper validation.
>>>>>>
>>>>>> For example, this could create an ipv4 with built in validation:
>>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>>   ip_adress inet,
>>>>>>   subnet_mask int,
>>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>>   CONSTRAINT subnet_mask < 32
>>>>>> )
>>>>>>
>>>>>> Or a color type:
>>>>>> CREATE TYPE keyspace.color (
>>>>>>   r int,
>>>>>>   g int,
>>>>>>   b int,
>>>>>>   CONSTRAINT r >= 0,
>>>>>>   CONSTRAINT r < 255,
>>>>>>   CONSTRAINT g >= 0,
>>>>>>   CONSTRAINT g < 255,
>>>>>>   CONSTRAINT b >= 0,
>>>>>>   CONSTRAINT b < 255,
>>>>>> )
>>>>>>
>>>>>>
>>>>>> Another types of constraints and functions can be added in the future
>>>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>>>
>>>>>> Bernardo
>>>>>>
>>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>>>>>>
>>>>>> The idea is interesting.  I think it would help to have more concrete
>>>>>> examples.  It's a bit sparse at the moment, and I have a hard time 
>>>>>> getting
>>>>>> on board with new features where the main selling point is Extensibility
>>>>>> over the value they provide on their own.
>>>>>>
>>>>>> I think it would help a lot if we knew what types of constraints,
>>>>>> besides the size check, you were thinking of adding.
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>>> statement
>>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>>> particular
>>>>>>> feature.
>>>>>>>
>>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> That would work reliably in case there is no way how to misconfigure
>>>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>>>> you
>>>>>>> don’t set it (or set it differently) on the other? If it is configured
>>>>>>> differently and you want to check the guardrails if constraints do not
>>>>>>> violate them, then your query might fail or not based on what node is 
>>>>>>> hit.
>>>>>>>
>>>>>>> I guess that guardrails would need to start to be transactional to
>>>>>>> be sure this is avoided and guardrails are indeed same everywhere 
>>>>>>> (CEP-24
>>>>>>> thread sent recently here in ML).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Bernardo Botella 
>>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>>>> *To: *[email protected] 
>>>>>>> *Cc: *Miklosovic, Stefan 
>>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>>>>> You don't often get email from [email protected]. Learn
>>>>>>> why this is important
>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>
>>>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Basically, I am trying to protect the limits set

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Doug Rohrer
T subnet_mask > 0,
>>>>>>   CONSTRAINT subnet_mask < 32
>>>>>> ) 
>>>>>> 
>>>>>> Or a color type:
>>>>>> CREATE TYPE keyspace.color (
>>>>>>   r int,
>>>>>>   g int,
>>>>>>   b int,
>>>>>>   CONSTRAINT r >= 0,
>>>>>>   CONSTRAINT r < 255,
>>>>>>   CONSTRAINT g >= 0,
>>>>>>   CONSTRAINT g < 255,
>>>>>>   CONSTRAINT b >= 0,
>>>>>>   CONSTRAINT b < 255,
>>>>>> ) 
>>>>>> 
>>>>>> 
>>>>>> Another types of constraints and functions can be added in the future to 
>>>>>> provide even more flexibility, but are out of the scope of this CEP.
>>>>>> 
>>>>>> Bernardo
>>>>>> 
>>>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad >>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> The idea is interesting.  I think it would help to have more concrete 
>>>>>>> examples.  It's a bit sparse at the moment, and I have a hard time 
>>>>>>> getting on board with new features where the main selling point is 
>>>>>>> Extensibility over the value they provide on their own.  
>>>>>>> 
>>>>>>> I think it would help a lot if we knew what types of constraints, 
>>>>>>> besides the size check, you were thinking of adding.
>>>>>>> 
>>>>>>> Jon
>>>>>>> 
>>>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella 
>>>>>>> mailto:[email protected]>> 
>>>>>>> wrote:
>>>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in 
>>>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>>>> statement holds true for the entirety of Guardrails, and not only for 
>>>>>>>> this particular feature.
>>>>>>>> 
>>>>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan 
>>>>>>>>> mailto:[email protected]>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> That would work reliably in case there is no way how to misconfigure 
>>>>>>>>> guardrails in the cluster. What if you set a guardrail on one node 
>>>>>>>>> but you don’t set it (or set it differently) on the other? If it is 
>>>>>>>>> configured differently and you want to check the guardrails if 
>>>>>>>>> constraints do not violate them, then your query might fail or not 
>>>>>>>>> based on what node is hit. 
>>>>>>>>>  
>>>>>>>>> I guess that guardrails would need to start to be transactional to be 
>>>>>>>>> sure this is avoided and guardrails are indeed same everywhere 
>>>>>>>>> (CEP-24 thread sent recently here in ML).
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> From: Bernardo Botella >>>>>>>> <mailto:[email protected]>>
>>>>>>>>> Date: Tuesday, 4 June 2024 at 00:31
>>>>>>>>> To: [email protected] <mailto:[email protected]> 
>>>>>>>>> mailto:[email protected]>>
>>>>>>>>> Cc: Miklosovic, Stefan >>>>>>>> <mailto:[email protected]>>
>>>>>>>>> Subject: Re: [DISCUSS] CEP-42: Constraints Framework
>>>>>>>>> 
>>>>>>>>> You don't often get email from [email protected] 
>>>>>>>>> <mailto:[email protected]>. Learn why this is important 
>>>>>>>>> <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>>> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Basically, I am trying to protect the limits set by the operator 
>>>>>>>>

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Štefan Miklošovič
sibility, I think having a variety of actual constraints that ship
>>>> with the feature means it needs to be built to satisfy real world use
>>>> cases.  Without going through this process, it feels a bit too much like
>>>> triggers, UDAs and UDFs  - incomplete, and too much left to the end user.
>>>>
>>>> To me, punting on thinking through constraints kicks the most important
>>>> can down the road.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
>>>> [email protected]> wrote:
>>>>
>>>>> In the CEP document there is another example (altho not explicetly
>>>>> mentioned) adding a constraint to the max value of an int ->
>>>>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>>>>
>>>>> This basic example can also be used to expand on how to extend this
>>>>> functionality with these two initial constraints (size and value), by
>>>>> composing them to create new data types with proper validation.
>>>>>
>>>>> For example, this could create an ipv4 with built in validation:
>>>>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>>>>   ip_adress inet,
>>>>>   subnet_mask int,
>>>>>   CONSTRAINT subnet_mask > 0,
>>>>>   CONSTRAINT subnet_mask < 32
>>>>> )
>>>>>
>>>>> Or a color type:
>>>>> CREATE TYPE keyspace.color (
>>>>>   r int,
>>>>>   g int,
>>>>>   b int,
>>>>>   CONSTRAINT r >= 0,
>>>>>   CONSTRAINT r < 255,
>>>>>   CONSTRAINT g >= 0,
>>>>>   CONSTRAINT g < 255,
>>>>>   CONSTRAINT b >= 0,
>>>>>   CONSTRAINT b < 255,
>>>>> )
>>>>>
>>>>>
>>>>> Another types of constraints and functions can be added in the future
>>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>>
>>>>> Bernardo
>>>>>
>>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>>>>>
>>>>> The idea is interesting.  I think it would help to have more concrete
>>>>> examples.  It's a bit sparse at the moment, and I have a hard time getting
>>>>> on board with new features where the main selling point is Extensibility
>>>>> over the value they provide on their own.
>>>>>
>>>>> I think it would help a lot if we knew what types of constraints,
>>>>> besides the size check, you were thinking of adding.
>>>>>
>>>>> Jon
>>>>>
>>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>>> statement
>>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>>> particular
>>>>>> feature.
>>>>>>
>>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> That would work reliably in case there is no way how to misconfigure
>>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>>> you
>>>>>> don’t set it (or set it differently) on the other? If it is configured
>>>>>> differently and you want to check the guardrails if constraints do not
>>>>>> violate them, then your query might fail or not based on what node is 
>>>>>> hit.
>>>>>>
>>>>>> I guess that guardrails would need to start to be transactional to be
>>>>>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>>>>>> thread sent recently here in ML).
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Bernardo Botella 
>>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>>> *To: *[email protected] 
>>>>>> *Cc: *Miklosovic, Stefan 
>>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Yifan Cai
gt;= 0,
>>>>   CONSTRAINT b < 255,
>>>> )
>>>>
>>>>
>>>> Another types of constraints and functions can be added in the future
>>>> to provide even more flexibility, but are out of the scope of this CEP.
>>>>
>>>> Bernardo
>>>>
>>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>>>>
>>>> The idea is interesting.  I think it would help to have more concrete
>>>> examples.  It's a bit sparse at the moment, and I have a hard time getting
>>>> on board with new features where the main selling point is Extensibility
>>>> over the value they provide on their own.
>>>>
>>>> I think it would help a lot if we knew what types of constraints,
>>>> besides the size check, you were thinking of adding.
>>>>
>>>> Jon
>>>>
>>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>>>> [email protected]> wrote:
>>>>
>>>>> Yes, that is correct. This particular behavior will need CEP-24 in
>>>>> order to work reliably. But, if my understanding is correct, that 
>>>>> statement
>>>>> holds true for the entirety of Guardrails, and not only for this 
>>>>> particular
>>>>> feature.
>>>>>
>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>>> [email protected]> wrote:
>>>>>
>>>>> That would work reliably in case there is no way how to misconfigure
>>>>> guardrails in the cluster. What if you set a guardrail on one node but you
>>>>> don’t set it (or set it differently) on the other? If it is configured
>>>>> differently and you want to check the guardrails if constraints do not
>>>>> violate them, then your query might fail or not based on what node is hit.
>>>>>
>>>>> I guess that guardrails would need to start to be transactional to be
>>>>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>>>>> thread sent recently here in ML).
>>>>>
>>>>>
>>>>>
>>>>> *From: *Bernardo Botella 
>>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>>> *To: *[email protected] 
>>>>> *Cc: *Miklosovic, Stefan 
>>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>>> You don't often get email from [email protected]. Learn
>>>>> why this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>
>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>
>>>>>
>>>>>
>>>>> Basically, I am trying to protect the limits set by the operator
>>>>> against misconfigured schemas from the customers.
>>>>>
>>>>> I see the guardrails as a safety limit added by the operator, setting
>>>>> the limits within the customers owning the actual schema (and their
>>>>> constraints) can operate. With that vision, if a customer tries to 
>>>>> “ignore”
>>>>> the actual limits set by the operator by adding more relaxed constraints,
>>>>> it gets a nice message saying that “that is not allowed for the cluster,
>>>>> please contact your admin".
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>> You wrote in the CEP:
>>>>>
>>>>> As we mentioned in the motivation section, we currently have some
>>>>> guardrails for columns size in place which can be extended for other data
>>>>> types.
>>>>> Those guardrails will take preference over the defined constraints in
>>>>> the schema, and a SCHEMA ALTER adding constraints that break the limits
>>>>> defined by the guardrails framework will fail.
>>>>> If the guardrails themselves are modified, operator should get a
>>>>> warning mentioning that there are schemas with offending constraints.
>>>>>
>>>>> I think that this should be other way around. Guardrails should kick
>>>>> in when there are no constraints and they would be overridden by table
>>>>> schema. That way, there is always a “default” in terms of guardrails 
>>>>> (which
>>>>> one can turn off on demand / change) but you can override it by table
>>>>> alternation.
>>>>>
>>>>> Basically, what is in schema should win regardless of how guardrails
>>>>> are configured. They don’t matter when a constraint is explicitly 
>>>>> specified
>>>>> in a schema. It should take the defaults in guardrails if there are any 
>>>>> and
>>>>> no constraint is specified on schema level.
>>>>>
>>>>> What is your motivation to do it like you suggested?
>>>>>
>>>>>
>>>>> *From: *Bernardo Botella 
>>>>> *Date: *Friday, 31 May 2024 at 23:24
>>>>> *To: *[email protected] 
>>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework
>>>>> You don't often get email from [email protected]. Learn
>>>>> why this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>
>>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>>
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> I am proposing this CEP:
>>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>> cwiki.apache.org
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>> 
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>>
>>>>>
>>>>> And I’m looking for feedback from the community.
>>>>>
>>>>> Thanks a lot!
>>>>> Bernardo
>>>>>
>>>>>
>>>>>
>>>>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Štefan Miklošovič
rstanding is correct, that statement
>>>> holds true for the entirety of Guardrails, and not only for this particular
>>>> feature.
>>>>
>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>>> [email protected]> wrote:
>>>>
>>>> That would work reliably in case there is no way how to misconfigure
>>>> guardrails in the cluster. What if you set a guardrail on one node but you
>>>> don’t set it (or set it differently) on the other? If it is configured
>>>> differently and you want to check the guardrails if constraints do not
>>>> violate them, then your query might fail or not based on what node is hit.
>>>>
>>>> I guess that guardrails would need to start to be transactional to be
>>>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>>>> thread sent recently here in ML).
>>>>
>>>>
>>>>
>>>> *From: *Bernardo Botella 
>>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>>> *To: *[email protected] 
>>>> *Cc: *Miklosovic, Stefan 
>>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>>> You don't often get email from [email protected]. Learn why
>>>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>
>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>
>>>>
>>>>
>>>> Basically, I am trying to protect the limits set by the operator
>>>> against misconfigured schemas from the customers.
>>>>
>>>> I see the guardrails as a safety limit added by the operator, setting
>>>> the limits within the customers owning the actual schema (and their
>>>> constraints) can operate. With that vision, if a customer tries to “ignore”
>>>> the actual limits set by the operator by adding more relaxed constraints,
>>>> it gets a nice message saying that “that is not allowed for the cluster,
>>>> please contact your admin".
>>>>
>>>>
>>>>
>>>>
>>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
>>>> [email protected]> wrote:
>>>>
>>>> You wrote in the CEP:
>>>>
>>>> As we mentioned in the motivation section, we currently have some
>>>> guardrails for columns size in place which can be extended for other data
>>>> types.
>>>> Those guardrails will take preference over the defined constraints in
>>>> the schema, and a SCHEMA ALTER adding constraints that break the limits
>>>> defined by the guardrails framework will fail.
>>>> If the guardrails themselves are modified, operator should get a
>>>> warning mentioning that there are schemas with offending constraints.
>>>>
>>>> I think that this should be other way around. Guardrails should kick in
>>>> when there are no constraints and they would be overridden by table schema.
>>>> That way, there is always a “default” in terms of guardrails (which one can
>>>> turn off on demand / change) but you can override it by table alternation.
>>>>
>>>> Basically, what is in schema should win regardless of how guardrails
>>>> are configured. They don’t matter when a constraint is explicitly specified
>>>> in a schema. It should take the defaults in guardrails if there are any and
>>>> no constraint is specified on schema level.
>>>>
>>>> What is your motivation to do it like you suggested?
>>>>
>>>>
>>>> *From: *Bernardo Botella 
>>>> *Date: *Friday, 31 May 2024 at 23:24
>>>> *To: *[email protected] 
>>>> *Subject: *[DISCUSS] CEP-42: Constraints Framework
>>>> You don't often get email from [email protected]. Learn why
>>>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>
>>>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>>>
>>>>
>>>> Hello everyone,
>>>>
>>>> I am proposing this CEP:
>>>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>> cwiki.apache.org
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>> 
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>>>
>>>>
>>>> And I’m looking for feedback from the community.
>>>>
>>>> Thanks a lot!
>>>> Bernardo
>>>>
>>>>
>>>>
>>>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Štefan Miklošovič
I agree with Jon that a detailed description of all constraints to be
introduced is necessary. Only to say that it will be extensible so we can
add other constraints later is not enough. What other constraints?

On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad  wrote:

> I think there's some promising ideas here, but the CEP needs to be
> developed a bit more.
>
> > Another types of constraints and functions can be added in the future to
> provide even more flexibility, but are out of the scope of this CEP.
>
> > For the third point, I didn’t want to be prescriptive on what those
> validations should be, but the fact that the proposal is extensible to
> those potential use cases is something concrete that, in my opinion, comes
> as a benefit of the actual proposal. I’d be happy to develop a bit more the
> main example used of sizeOf if it helps alleviate your concerns on this
> point.
>
> I disagree, quite strongly, with this.  While I appreciate extensibility,
> I think having a variety of actual constraints that ship with the feature
> means it needs to be built to satisfy real world use cases.  Without going
> through this process, it feels a bit too much like triggers, UDAs and UDFs
> - incomplete, and too much left to the end user.
>
> To me, punting on thinking through constraints kicks the most important
> can down the road.
>
> Jon
>
>
> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
> [email protected]> wrote:
>
>> In the CEP document there is another example (altho not explicetly
>> mentioned) adding a constraint to the max value of an int ->
>> `number_of_items int CONSTRAINT number_of_items < 1000`
>>
>> This basic example can also be used to expand on how to extend this
>> functionality with these two initial constraints (size and value), by
>> composing them to create new data types with proper validation.
>>
>> For example, this could create an ipv4 with built in validation:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>>
>> Or a color type:
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> )
>>
>>
>> Another types of constraints and functions can be added in the future to
>> provide even more flexibility, but are out of the scope of this CEP.
>>
>> Bernardo
>>
>> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>>
>> The idea is interesting.  I think it would help to have more concrete
>> examples.  It's a bit sparse at the moment, and I have a hard time getting
>> on board with new features where the main selling point is Extensibility
>> over the value they provide on their own.
>>
>> I think it would help a lot if we knew what types of constraints, besides
>> the size check, you were thinking of adding.
>>
>> Jon
>>
>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
>> [email protected]> wrote:
>>
>>> Yes, that is correct. This particular behavior will need CEP-24 in order
>>> to work reliably. But, if my understanding is correct, that statement holds
>>> true for the entirety of Guardrails, and not only for this particular
>>> feature.
>>>
>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>>> [email protected]> wrote:
>>>
>>> That would work reliably in case there is no way how to misconfigure
>>> guardrails in the cluster. What if you set a guardrail on one node but you
>>> don’t set it (or set it differently) on the other? If it is configured
>>> differently and you want to check the guardrails if constraints do not
>>> violate them, then your query might fail or not based on what node is hit.
>>>
>>> I guess that guardrails would need to start to be transactional to be
>>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>>> thread sent recently here in ML).
>>>
>>>
>>>
>>> *From: *Bernardo Botella 
>>> *Date: *Tuesday, 4 June 2024 at 00:31
>>> *To: *[email protected] 
>>> *Cc: *Miklosovic, Stefan 
>>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>>> You don't often get email from [email protected]. Learn why
>>> this is important <https://a

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-05 Thread Jon Haddad
I think there's some promising ideas here, but the CEP needs to be
developed a bit more.

> Another types of constraints and functions can be added in the future to
provide even more flexibility, but are out of the scope of this CEP.

> For the third point, I didn’t want to be prescriptive on what those
validations should be, but the fact that the proposal is extensible to
those potential use cases is something concrete that, in my opinion, comes
as a benefit of the actual proposal. I’d be happy to develop a bit more the
main example used of sizeOf if it helps alleviate your concerns on this
point.

I disagree, quite strongly, with this.  While I appreciate extensibility, I
think having a variety of actual constraints that ship with the feature
means it needs to be built to satisfy real world use cases.  Without going
through this process, it feels a bit too much like triggers, UDAs and UDFs
- incomplete, and too much left to the end user.

To me, punting on thinking through constraints kicks the most important can
down the road.

Jon


On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
[email protected]> wrote:

> In the CEP document there is another example (altho not explicetly
> mentioned) adding a constraint to the max value of an int ->
> `number_of_items int CONSTRAINT number_of_items < 1000`
>
> This basic example can also be used to expand on how to extend this
> functionality with these two initial constraints (size and value), by
> composing them to create new data types with proper validation.
>
> For example, this could create an ipv4 with built in validation:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> Or a color type:
> CREATE TYPE keyspace.color (
>   r int,
>   g int,
>   b int,
>   CONSTRAINT r >= 0,
>   CONSTRAINT r < 255,
>   CONSTRAINT g >= 0,
>   CONSTRAINT g < 255,
>   CONSTRAINT b >= 0,
>   CONSTRAINT b < 255,
> )
>
>
> Another types of constraints and functions can be added in the future to
> provide even more flexibility, but are out of the scope of this CEP.
>
> Bernardo
>
> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>
> The idea is interesting.  I think it would help to have more concrete
> examples.  It's a bit sparse at the moment, and I have a hard time getting
> on board with new features where the main selling point is Extensibility
> over the value they provide on their own.
>
> I think it would help a lot if we knew what types of constraints, besides
> the size check, you were thinking of adding.
>
> Jon
>
> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
> [email protected]> wrote:
>
>> Yes, that is correct. This particular behavior will need CEP-24 in order
>> to work reliably. But, if my understanding is correct, that statement holds
>> true for the entirety of Guardrails, and not only for this particular
>> feature.
>>
>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>> [email protected]> wrote:
>>
>> That would work reliably in case there is no way how to misconfigure
>> guardrails in the cluster. What if you set a guardrail on one node but you
>> don’t set it (or set it differently) on the other? If it is configured
>> differently and you want to check the guardrails if constraints do not
>> violate them, then your query might fail or not based on what node is hit.
>>
>> I guess that guardrails would need to start to be transactional to be
>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>> thread sent recently here in ML).
>>
>>
>>
>> *From: *Bernardo Botella 
>> *Date: *Tuesday, 4 June 2024 at 00:31
>> *To: *[email protected] 
>> *Cc: *Miklosovic, Stefan 
>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>> You don't often get email from [email protected]. Learn why
>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>
>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>
>>
>>
>> Basically, I am trying to protect the limits set by the operator against
>> misconfigured schemas from the customers.
>>
>> I see the guardrails as a safety limit added by the operator, setting the
>> limits within the customers owning the actual schema (and their
>> constraints) can operate. With that vision, if a customer tries to “ignore”
>> the actual limits set by the operator by adding more relaxed constraints,
>> it gets a nice message saying that “that is not allowed for the cluster,
&

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-04 Thread Bernardo Botella
In the CEP document there is another example (altho not explicetly mentioned) 
adding a constraint to the max value of an int -> `number_of_items int 
CONSTRAINT number_of_items < 1000`

This basic example can also be used to expand on how to extend this 
functionality with these two initial constraints (size and value), by composing 
them to create new data types with proper validation. 

For example, this could create an ipv4 with built in validation:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
) 

Or a color type:
CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 


Another types of constraints and functions can be added in the future to 
provide even more flexibility, but are out of the scope of this CEP.

Bernardo

> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
> 
> The idea is interesting.  I think it would help to have more concrete 
> examples.  It's a bit sparse at the moment, and I have a hard time getting on 
> board with new features where the main selling point is Extensibility over 
> the value they provide on their own.  
> 
> I think it would help a lot if we knew what types of constraints, besides the 
> size check, you were thinking of adding.
> 
> Jon
> 
> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella  <mailto:[email protected]>> wrote:
>> Yes, that is correct. This particular behavior will need CEP-24 in order to 
>> work reliably. But, if my understanding is correct, that statement holds 
>> true for the entirety of Guardrails, and not only for this particular 
>> feature.
>> 
>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan 
>>> mailto:[email protected]>> wrote:
>>> 
>>> That would work reliably in case there is no way how to misconfigure 
>>> guardrails in the cluster. What if you set a guardrail on one node but you 
>>> don’t set it (or set it differently) on the other? If it is configured 
>>> differently and you want to check the guardrails if constraints do not 
>>> violate them, then your query might fail or not based on what node is hit. 
>>>  
>>> I guess that guardrails would need to start to be transactional to be sure 
>>> this is avoided and guardrails are indeed same everywhere (CEP-24 thread 
>>> sent recently here in ML).
>>>  
>>>  
>>> From: Bernardo Botella >> <mailto:[email protected]>>
>>> Date: Tuesday, 4 June 2024 at 00:31
>>> To: [email protected] <mailto:[email protected]> 
>>> mailto:[email protected]>>
>>> Cc: Miklosovic, Stefan >> <mailto:[email protected]>>
>>> Subject: Re: [DISCUSS] CEP-42: Constraints Framework
>>> 
>>> You don't often get email from [email protected] 
>>> <mailto:[email protected]>. Learn why this is important 
>>> <https://aka.ms/LearnAboutSenderIdentification>  
>>> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
>>> 
>>> 
>>> 
>>> Basically, I am trying to protect the limits set by the operator against 
>>> misconfigured schemas from the customers. 
>>>  
>>> I see the guardrails as a safety limit added by the operator, setting the 
>>> limits within the customers owning the actual schema (and their 
>>> constraints) can operate. With that vision, if a customer tries to “ignore” 
>>> the actual limits set by the operator by adding more relaxed constraints, 
>>> it gets a nice message saying that “that is not allowed for the cluster, 
>>> please contact your admin".
>>>  
>>>  
>>> 
>>> 
>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>>> mailto:[email protected]>> wrote:
>>>  
>>> You wrote in the CEP:
>>>  
>>> As we mentioned in the motivation section, we currently have some 
>>> guardrails for columns size in place which can be extended for other data 
>>> types.
>>> Those guardrails will take preference over the defined constraints in the 
>>> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
>>> by the guardrails framework will fail.
>>> If the guardrails themselves are modified, operator should get a warning 
>>> mentioning that there are schemas with offending constraints.

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-04 Thread Jon Haddad
The idea is interesting.  I think it would help to have more concrete
examples.  It's a bit sparse at the moment, and I have a hard time getting
on board with new features where the main selling point is Extensibility
over the value they provide on their own.

I think it would help a lot if we knew what types of constraints, besides
the size check, you were thinking of adding.

Jon

On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
[email protected]> wrote:

> Yes, that is correct. This particular behavior will need CEP-24 in order
> to work reliably. But, if my understanding is correct, that statement holds
> true for the entirety of Guardrails, and not only for this particular
> feature.
>
> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
> [email protected]> wrote:
>
> That would work reliably in case there is no way how to misconfigure
> guardrails in the cluster. What if you set a guardrail on one node but you
> don’t set it (or set it differently) on the other? If it is configured
> differently and you want to check the guardrails if constraints do not
> violate them, then your query might fail or not based on what node is hit.
>
> I guess that guardrails would need to start to be transactional to be sure
> this is avoided and guardrails are indeed same everywhere (CEP-24 thread
> sent recently here in ML).
>
>
>
> *From: *Bernardo Botella 
> *Date: *Tuesday, 4 June 2024 at 00:31
> *To: *[email protected] 
> *Cc: *Miklosovic, Stefan 
> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
> You don't often get email from [email protected]. Learn why
> this is important <https://aka.ms/LearnAboutSenderIdentification>
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
>
> Basically, I am trying to protect the limits set by the operator against
> misconfigured schemas from the customers.
>
> I see the guardrails as a safety limit added by the operator, setting the
> limits within the customers owning the actual schema (and their
> constraints) can operate. With that vision, if a customer tries to “ignore”
> the actual limits set by the operator by adding more relaxed constraints,
> it gets a nice message saying that “that is not allowed for the cluster,
> please contact your admin".
>
>
>
>
> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
> [email protected]> wrote:
>
> You wrote in the CEP:
>
> As we mentioned in the motivation section, we currently have some
> guardrails for columns size in place which can be extended for other data
> types.
> Those guardrails will take preference over the defined constraints in the
> schema, and a SCHEMA ALTER adding constraints that break the limits defined
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning
> mentioning that there are schemas with offending constraints.
>
> I think that this should be other way around. Guardrails should kick in
> when there are no constraints and they would be overridden by table schema.
> That way, there is always a “default” in terms of guardrails (which one can
> turn off on demand / change) but you can override it by table alternation.
>
> Basically, what is in schema should win regardless of how guardrails are
> configured. They don’t matter when a constraint is explicitly specified in
> a schema. It should take the defaults in guardrails if there are any and no
> constraint is specified on schema level.
>
> What is your motivation to do it like you suggested?
>
>
> *From: *Bernardo Botella 
> *Date: *Friday, 31 May 2024 at 23:24
> *To: *[email protected] 
> *Subject: *[DISCUSS] CEP-42: Constraints Framework
> You don't often get email from [email protected]. Learn why
> this is important <https://aka.ms/LearnAboutSenderIdentification>
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
> Hello everyone,
>
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
> cwiki.apache.org
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
> 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>
>
> And I’m looking for feedback from the community.
>
> Thanks a lot!
> Bernardo
>
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Bernardo Botella
Yes, that is correct. This particular behavior will need CEP-24 in order to 
work reliably. But, if my understanding is correct, that statement holds true 
for the entirety of Guardrails, and not only for this particular feature.

> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan  
> wrote:
> 
> That would work reliably in case there is no way how to misconfigure 
> guardrails in the cluster. What if you set a guardrail on one node but you 
> don’t set it (or set it differently) on the other? If it is configured 
> differently and you want to check the guardrails if constraints do not 
> violate them, then your query might fail or not based on what node is hit. 
>  
> I guess that guardrails would need to start to be transactional to be sure 
> this is avoided and guardrails are indeed same everywhere (CEP-24 thread sent 
> recently here in ML).
>  
>  
> From: Bernardo Botella  <mailto:[email protected]>>
> Date: Tuesday, 4 June 2024 at 00:31
> To: [email protected] <mailto:[email protected]> 
> mailto:[email protected]>>
> Cc: Miklosovic, Stefan  <mailto:[email protected]>>
> Subject: Re: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from [email protected] 
> <mailto:[email protected]>. Learn why this is important 
> <https://aka.ms/LearnAboutSenderIdentification>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
> 
> 
> 
> Basically, I am trying to protect the limits set by the operator against 
> misconfigured schemas from the customers. 
>  
> I see the guardrails as a safety limit added by the operator, setting the 
> limits within the customers owning the actual schema (and their constraints) 
> can operate. With that vision, if a customer tries to “ignore” the actual 
> limits set by the operator by adding more relaxed constraints, it gets a nice 
> message saying that “that is not allowed for the cluster, please contact your 
> admin".
>  
>  
> 
> 
> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>  wrote:
>  
> You wrote in the CEP:
>  
> As we mentioned in the motivation section, we currently have some guardrails 
> for columns size in place which can be extended for other data types.
> Those guardrails will take preference over the defined constraints in the 
> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning 
> mentioning that there are schemas with offending constraints.
>  
> I think that this should be other way around. Guardrails should kick in when 
> there are no constraints and they would be overridden by table schema. That 
> way, there is always a “default” in terms of guardrails (which one can turn 
> off on demand / change) but you can override it by table alternation.
>  
> Basically, what is in schema should win regardless of how guardrails are 
> configured. They don’t matter when a constraint is explicitly specified in a 
> schema. It should take the defaults in guardrails if there are any and no 
> constraint is specified on schema level.
>  
> What is your motivation to do it like you suggested?
>  
> From: Bernardo Botella  <mailto:[email protected]>>
> Date: Friday, 31 May 2024 at 23:24
> To: [email protected] <mailto:[email protected]> 
> mailto:[email protected]>>
> Subject: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from [email protected] 
> <mailto:[email protected]>. Learn why this is important 
> <https://aka.ms/LearnAboutSenderIdentification>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
>  
> 
> Hello everyone, 
>  
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
> cwiki.apache.org 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>   
>  
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>  
>  
> And I’m looking for feedback from the community.
>  
> Thanks a lot!
> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Miklosovic, Stefan via dev
That would work reliably in case there is no way how to misconfigure guardrails 
in the cluster. What if you set a guardrail on one node but you don’t set it 
(or set it differently) on the other? If it is configured differently and you 
want to check the guardrails if constraints do not violate them, then your 
query might fail or not based on what node is hit.

I guess that guardrails would need to start to be transactional to be sure this 
is avoided and guardrails are indeed same everywhere (CEP-24 thread sent 
recently here in ML).


From: Bernardo Botella 
Date: Tuesday, 4 June 2024 at 00:31
To: [email protected] 
Cc: Miklosovic, Stefan 
Subject: Re: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from [email protected]. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


Basically, I am trying to protect the limits set by the operator against 
misconfigured schemas from the customers.

I see the guardrails as a safety limit added by the operator, setting the 
limits within the customers owning the actual schema (and their constraints) 
can operate. With that vision, if a customer tries to “ignore” the actual 
limits set by the operator by adding more relaxed constraints, it gets a nice 
message saying that “that is not allowed for the cluster, please contact your 
admin".




On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
 wrote:

You wrote in the CEP:

As we mentioned in the motivation section, we currently have some guardrails 
for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the 
schema, and a SCHEMA ALTER adding constraints that break the limits defined by 
the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning 
mentioning that there are schemas with offending constraints.

I think that this should be other way around. Guardrails should kick in when 
there are no constraints and they would be overridden by table schema. That 
way, there is always a “default” in terms of guardrails (which one can turn off 
on demand / change) but you can override it by table alternation.

Basically, what is in schema should win regardless of how guardrails are 
configured. They don’t matter when a constraint is explicitly specified in a 
schema. It should take the defaults in guardrails if there are any and no 
constraint is specified on schema level.

What is your motivation to do it like you suggested?

From: Bernardo Botella 
mailto:[email protected]>>
Date: Friday, 31 May 2024 at 23:24
To: [email protected]<mailto:[email protected]> 
mailto:[email protected]>>
Subject: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why 
this is important<https://aka.ms/LearnAboutSenderIdentification>

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments

Hello everyone,

I am proposing this CEP:
CEP-42: Constraints Framework - CASSANDRA - Apache Software 
Foundation<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
cwiki.apache.org<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>


And I’m looking for feedback from the community.

Thanks a lot!
Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Bernardo Botella
Basically, I am trying to protect the limits set by the operator against 
misconfigured schemas from the customers. 

I see the guardrails as a safety limit added by the operator, setting the 
limits within the customers owning the actual schema (and their constraints) 
can operate. With that vision, if a customer tries to “ignore” the actual 
limits set by the operator by adding more relaxed constraints, it gets a nice 
message saying that “that is not allowed for the cluster, please contact your 
admin".



> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>  wrote:
> 
> You wrote in the CEP:
>  
> As we mentioned in the motivation section, we currently have some guardrails 
> for columns size in place which can be extended for other data types.
> Those guardrails will take preference over the defined constraints in the 
> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning 
> mentioning that there are schemas with offending constraints.
>  
> I think that this should be other way around. Guardrails should kick in when 
> there are no constraints and they would be overridden by table schema. That 
> way, there is always a “default” in terms of guardrails (which one can turn 
> off on demand / change) but you can override it by table alternation.
>  
> Basically, what is in schema should win regardless of how guardrails are 
> configured. They don’t matter when a constraint is explicitly specified in a 
> schema. It should take the defaults in guardrails if there are any and no 
> constraint is specified on schema level.
>  
> What is your motivation to do it like you suggested?
>  
> From: Bernardo Botella  >
> Date: Friday, 31 May 2024 at 23:24
> To: [email protected]  
> mailto:[email protected]>>
> Subject: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from [email protected] 
> . Learn why this is important 
> 
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
> 
> 
> 
> Hello everyone, 
>  
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> 
> cwiki.apache.org 
> 
>   
>  
> 
>  
>  
> And I’m looking for feedback from the community.
>  
> Thanks a lot!
> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Miklosovic, Stefan via dev
You wrote in the CEP:

As we mentioned in the motivation section, we currently have some guardrails 
for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the 
schema, and a SCHEMA ALTER adding constraints that break the limits defined by 
the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning 
mentioning that there are schemas with offending constraints.

I think that this should be other way around. Guardrails should kick in when 
there are no constraints and they would be overridden by table schema. That 
way, there is always a “default” in terms of guardrails (which one can turn off 
on demand / change) but you can override it by table alternation.

Basically, what is in schema should win regardless of how guardrails are 
configured. They don’t matter when a constraint is explicitly specified in a 
schema. It should take the defaults in guardrails if there are any and no 
constraint is specified on schema level.

What is your motivation to do it like you suggested?

From: Bernardo Botella 
Date: Friday, 31 May 2024 at 23:24
To: [email protected] 
Subject: [DISCUSS] CEP-42: Constraints Framework
You don't often get email from [email protected]. Learn why this is 
important

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments


Hello everyone,

I am proposing this CEP:
CEP-42: Constraints Framework - CASSANDRA - Apache Software 
Foundation
cwiki.apache.org
[favicon.ico]


And I’m looking for feedback from the community.

Thanks a lot!
Bernardo


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Bernardo Botella
Hi Jeff,

Thanks a lot for your comments. 

At your first question "Would this be implemented solely in the write path?”, 
the answer is yes. I think enforcing it at reads/compaction/repairs may pose 
problems for cases in which an alter table is performed adding new or more 
strict constraints to a table that has some already offending data. I think the 
cleanest way to handle these scenarios is to just prevent new data to be added 
if it does not comply with the current constraints.

At your second comment:
For the third point, I didn’t want to be prescriptive on what those validations 
should be, but the fact that the proposal is extensible to those potential use 
cases is something concrete that, in my opinion, comes as a benefit of the 
actual proposal. I’d be happy to develop a bit more the main example used of 
sizeOf if it helps alleviate your concerns on this point.

I still do think that the general benefit of allowing flexibility at adding 
limits to what can be written to the database is something positive that help 
Cassandra users keep healthy clusters.


 

> On Jun 2, 2024, at 12:04 PM, Jeff Jirsa  wrote:
> 
> Separately, when we discuss benefits of a proposal in a CEP, we should talk 
> about what’s concrete and ignore the stuff that’s idealistic. Of these four 
> points:
> 
> This brings to the table several benefits and flexibility. Some examples:
> 
> Cassandra operators have more control to reason about your data and 
> appropriately tune for performance.
> Potential reduction on maintenance overhead, being able to better predict 
> partition sizes.
> Extensibility to more complex validations in the future.
> Potential value in storage engine making decisions based on data size.
> The second is just the first, restated, and the fourth seems incredibly 
> unlikely. The third seems maybe possible, but why not spec out the full range 
> with the CEP instead of assuming iterative implementation?
> 
> 
> 
>> On Jun 2, 2024, at 20:59, Jeff Jirsa  wrote:
>> 
>> 
>> Would this be implemented solely in the write path? Or would you also try to 
>> enforce it in the read and sstable/compaction/repair paths as well?  
>> 
>> 
>> 
>>> On May 31, 2024, at 23:24, Bernardo Botella  
>>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> I am proposing this CEP:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>>> 
>>> And I’m looking for feedback from the community.
>>> 
>>> Thanks a lot!
>>> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Jeff Jirsa
Separately, when we discuss benefits of a proposal in a CEP, we should talk about what’s concrete and ignore the stuff that’s idealistic. Of these four points:This brings to the table several benefits and flexibility. Some examples:Cassandra operators have more control to reason about your data and appropriately tune for performance.Potential reduction on maintenance overhead, being able to better predict partition sizes.Extensibility to more complex validations in the future.Potential value in storage engine making decisions based on data size.The second is just the first, restated, and the fourth seems incredibly unlikely. The third seems maybe possible, but why not spec out the full range with the CEP instead of assuming iterative implementation?On Jun 2, 2024, at 20:59, Jeff Jirsa  wrote:Would this be implemented solely in the write path? Or would you also try to enforce it in the read and sstable/compaction/repair paths as well?  On May 31, 2024, at 23:24, Bernardo Botella  wrote:Hello everyone,I am proposing this CEP:CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundationcwiki.apache.orgAnd I’m looking for feedback from the community.Thanks a lot!Bernardo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Jeff Jirsa
Would this be implemented solely in the write path? Or would you also try to enforce it in the read and sstable/compaction/repair paths as well?  On May 31, 2024, at 23:24, Bernardo Botella  wrote:Hello everyone,I am proposing this CEP:CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundationcwiki.apache.orgAnd I’m looking for feedback from the community.Thanks a lot!Bernardo