Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Avi Kivity via dev
IMO it's wrong to change an aggregate's meaning from "aggregate across
GROUPs or entire SELECT" to "aggregate within column". Aggregation is
long established in SQL and it will just confuse experienced database
users.

PostgresQL maintains the meaning of max:

CREATE TABLE tab (
    x int[]
);

INSERT INTO tab(x) VALUES ( '{1, 2}' );
INSERT INTO tab(x) VALUES ( '{3, 4}' );

SELECT max(x) FROM tab;

max
[3,4]

One option is to treat the collection as a tiny table:

SELECT (SELECT max(key) FROM a_set_column) AS m1, (SELECT max(value)
FROM a_map_column) FROM tab;

Though it's better to look for existing practice and emulate it than to
exercise creativity here, IMO.


On Tue, 2022-12-06 at 13:30 +, Benedict wrote:
> Thanks Andres, I think community input on direction here will be
> invaluable. There’s a bunch of interrelated tickets, and my opinions
> are as follows:
> 
> 1. I think it is a mistake to offer a function MAX that operates over
> rows containing collections, returning the collection with the most
> elements. This is just a nonsensical operation to support IMO. We
> should decide as a community whether we “fix” this aggregation, or
> remove it.
> 2. I think “collection_" prefixed methods are non-intuitive for
> discovery, and all-else equal it would be better to use MAX,MIN, etc,
> same as for aggregations.
> 3. I think it is peculiar to permit methods named collection_ to
> operate over non-collection types when they are explicitly collection
> variants.
> 
> Given (1), (2) becomes simple except for COUNT which remains
> ambiguous, but this could be solved by either providing a separate
> method for collections (e.g. SIZE) which seems fine to me, or by
> offering a precedence order for matching and a keyword for overriding
> the precedence order (e.g. COUNT(collection AS COLLECTION)).
> 
> Given (2), (3) is a little more difficult. However, I think this can
> be solved several ways. 
>  - We could permit explicit casts to collection types, that for a
> collection type would be a no-op, and for a single value would create
> a collection
>  - With precedence orders, by always selecting the scalar function
> last
>  - By permitting WRITETIME to accept a binary operator reduce
> function to resolve multiple values
> 
> These decisions all imply trade-offs on each other, and affect the
> evolution of CQL, so I think community input would be helpful.
> 
> > On 6 Dec 2022, at 12:44, Andrés de la Peña 
> > wrote:
> > 
> > 
> > This will require some long introduction for context:
> > 
> > The MAX/MIN functions aggregate rows to get the row with min/max
> > column value according to their comparator. For collections, the
> > comparison is on the lexicographical order of the collection
> > elements. That's the very same comparator that is used when
> > collections are used as clustering keys and for ORDER BY.
> > 
> > However, a bug in the MIN/MAX aggregate functions used to make that
> > the results were presented in their unserialized form, although the
> > row selection was correct. That bug was recently solved by
> > CASSANDRA-17811. During that ticket it was also considered the
> > option of simply disabling MIN/MAX on collection since applying
> > those functions to collections, since they don't seem super useful.
> > However, that option was quickly discarded and the operation was
> > fixed so the MIN/MAX functions correctly work for every data type.
> > 
> > As a byproduct of the internal improvements of that fix, CASSANDRA-
> > 8877 introduced a new set of functions that can perform
> > aggregations of the elements of a collection. Those where named
> > "map_keys", "map_values", "collection_min", "collection_max",
> > "collection_sum", and "collection_count". Those are the names
> > mentioned on the mail list thread about function naming
> > conventions. Despite doing a kind of within-collection aggregation,
> > these functions are not what we usually call aggregate functions,
> > since they don't aggregate multiple rows together.
> > 
> > On a different line of work, CASSANDRA-17425 added to trunk a
> > MAXWRITETIME function to get the max timestamp of a multi-cell
> > column. However, the new collection functions can be used in
> > combination with the WRITETIME and TTL functions to retrieve the
> > min/max/sum/avg timestamp or ttl of a multi-cell column. Since the
> > new functions give a generic way of aggreagting timestamps ant TTLs
> > of multi-cell columns, CASSANDRA-18078 proposed to remove that
> > MAXWRITETIME function.
> > 
> > Yifan Cai, author of the MAXWRITETIME function, agreed to remove
> > that function in favour of the new generic collection functions.
> > However, the MAXWRITETIME function can work on both single-cell and
> > multi-cell columns, whereas "COLLECTION_MAX(WRITETIME(column))"
> > would only work on multi-cell columns, That's because MAXWRITETIME
> > of a not-multicell column doesn't return a collection, and one
> > should simply use "WRITETIME(column)" 

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Avi Kivity via dev
Agree with views, or alternatively, column permissions together with 
computed columns:



CREATE TABLE foo (

  id int PRIMARY KEY,

  unmasked_name text,

  name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7)

)


(syntax from postgresql)


GRANT SELECT ON foo.name TO general_use;

GRANT SELECT ON foo.unmasked_name TO top_secret;


On 26/08/2022 00.10, Benedict wrote:
I’m inclined to agree that this seems a more straightforward approach 
that makes fewer implied promises.


Perhaps we could deliver simple views backed by virtual tables, and 
model our approach on that of Postgres, MySQL et al?


Views in C* would be very simple, just offering a subset of fields 
with some UDFs applied. It would allow users to define roles with 
access only to the views, or for applications to use the views for 
presentation purposes.


It feels like a cleaner approach to me, and we’d get two features for 
the price of one. BUT I don’t feel super strongly about this.


On 25 Aug 2022, at 20:16, Derek Chen-Becker  
wrote:



To make sure I understand, if I wanted to use a masked column for a 
conditional update, you're saying we would need SELECT_MASKED to use 
it in the IF clause? I worry that this proposal is increasing in 
complexity; I would actually be OK starting with something smaller in 
scope. Perhaps just providing the masking functions and not tying 
masking to schema would be sufficient for an initial goal? That 
wouldn't preclude additional permissions, schema integration, or 
perhaps just plain Views in the future.


Cheers,

Derek

On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
 wrote:


I have modified the proposal adding a new SELECT_MASKED
permission. Using masked columns on WHERE/IF clauses would
require having SELECT and either UNMASK or SELECT_MASKED
permissions. Seeing the unmasked values in the query results
would always require both SELECT and UNMASK.

This way we can have the best of both worlds, allowing admins to
decide whether they trust their immediate users or not. wdyt?

On Wed, 24 Aug 2022 at 16:06, Henrik Ingo
 wrote:

This is the difference between security and compliance I
guess :-D

The way I see this, the attacker or threat in this concept is
not the developer with access to the database. Rather a
feature like this is just a convenient way to apply some
masking rule in a centralized way. The protection is against
an end user of the application, who should not be able to see
the personal data of someone else. Or themselves, even. As
long as the application end user doesn't have access to run
arbitrary CQL, then these frorms of masking prevent
accidental unauthorized use/leaking of personal data.

henrik



On Wed, Aug 24, 2022 at 10:40 AM Benedict
 wrote:

Is it typical for a masking feature to make no effort to
prevent unmasking? I’m just struggling to see the value
of this without such mechanisms. Otherwise it’s just a
default formatter, and we should consider renaming the
feature IMO


On 23 Aug 2022, at 21:27, Andrés de la Peña
 wrote:


As mentioned in the CEP document, dynamic data masking
doesn't try to prevent malicious users with SELECT
permissions to indirectly guess the real value of the
masked value. This can easily be done by just trying
values on the WHERE clause of SELECT queries. DDM would
not be a replacement for proper column-level permissions.

The data served by the database is usually consumed by
applications that present this data to end users. These
end users are not necessarily the users directly
connecting to the database. With DDM, it would be easy
for applications to mask sensitive data that is going to
be consumed by the end users. However, the users
directly connecting to the database should be trusted,
provided that they have the right SELECT permissions.

In other words, DDM doesn't directly protect the data,
but it eases the production of protected data.

Said that, we could later go one step ahead and add a
way to prevent untrusted users from inferring the masked
data. That could be done adding a new permission
required to use certain columns on WHERE clauses,
different to the current SELECT permission. That would
play especially well with column-level permissions,
which is something that we still have pending.

On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz
 wrote:

Applying this should prevent querying on a
field, else you could leak its contents, surely?


   

Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Avi Kivity via dev
I wasn't referring to specific syntax but to the concept. If a SQL 
dialect (or better, the standard) has a way to select data into a 
variable, let's adopt it.


If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) 
is my preference.


On 8/22/22 19:13, Patrick McFadin wrote:

The replies got trashed pretty badly in the responses.
When you say: "Agree it's better to reuse existing syntax than invent 
new syntax."


Which syntax are you referring to?

Patrick


On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
 wrote:


Agree it's better to reuse existing syntax than invent new syntax.

On 8/21/22 16:52, Konstantin Osipov wrote:
    > * Avi Kivity via dev  [22/08/14 15:59]:
>
> MySQL supports SELECT  INTO  FROM ... WHERE
> ...
>
> PostgreSQL supports pretty much the same syntax.
>
> Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var
TYPE and
> MySQL/PostgreSQL SELECT ... INTO?
>
>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>> 
>>> I’ll do my best to express with my thinking, as well as how I
would
>>> explain the feature to a user.
>>>
>>> My mental model for LET statements is that they are simply SELECT
>>> statements where the columns that are selected become variables
>>> accessible anywhere in the scope of the transaction. That is
to say, you
>>> should be able to run something like s/LET/SELECT and
>>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET
statement
>>> and produce a valid SELECT statement, and vice versa. Both should
>>> perform identically.
>>>
>>> e.g.
>>> SELECT pk AS key, v AS value FROM table
>>>
>>> =>
>>> LET key = pk, value = v FROM table
>>
>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet,
but SQL
>> supports selecting comparisons:
>>
>>
>> $ psql
>> psql (14.3)
>> Type "help" for help.
>>
>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>   ?column? | ?column? | ?column?
>> --+--+--
>>   f    | t    |
>> (1 row)
>>
>>
>> Using "=" as a syntactic element in LET would make SELECT and LET
>> incompatible once comparisons become valid selectors. Unless
they become
>> mandatory (and then you'd write "LET q = a = b" if you wanted
to select a
>> comparison).
>>
>>
>> I personally prefer the nested query syntax:
>>
>>
>>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>
>>
>> So there aren't two similar-but-not-quite-the-same syntaxes.
SELECT is
>> immediately recognizable by everyone as a query, LET is not.
>>
>>
>>> Identical form, identical behaviour. Every statement should be
directly
>>> translatable with some simple text manipulation.
>>>
>>> We can then make this more powerful for users by simply
expanding SELECT
>>> statements, e.g. by permitting them to declare constants and
tuples in
>>> the column results. In this scheme LET x = * is simply
syntactic sugar
>>> for LET x = (pk, ck, field1, …) This scheme then supports
options 2, 4
>>> and 5 all at once, consistently alongside each other.
>>>
>>> Option 6 is in fact very similar, but is strictly less
flexible for the
>>> user as they have no way to declare multiple scalar variables
without
>>> scoping them inside a tuple.
>>>
>>> e.g.
>>> LET key = pk, value = v FROM table
>>> IF key > 1 AND value > 1 THEN...
>>>
>>> =>
>>> LET row = SELECT pk AS key, v AS value FROM table
>>> IF row.key > 1 AND row.value > 1 THEN…
>>>
>>> However, both are expressible in the existing proposal, as if
you prefer
>>> this naming scheme you can simply write
>>>
>>> LET row = (pk AS key, v AS value) FROM table
>>> IF row.key > 1 AND row.value > 1 THEN…
>>>
>>> With respect to auto converting single column results to a
scalar, we do
>>> need a way for the user to say they care whether the row was
null or the
>>> column. I think an implicit conversion here could be
surprising. However
>>> we 

Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Avi Kivity via dev

Agree it's better to reuse existing syntax than invent new syntax.

On 8/21/22 16:52, Konstantin Osipov wrote:

* Avi Kivity via dev  [22/08/14 15:59]:

MySQL supports SELECT  INTO  FROM ... WHERE
...

PostgreSQL supports pretty much the same syntax.

Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
MySQL/PostgreSQL SELECT ... INTO?


On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would
explain the feature to a user.

My mental model for LET statements is that they are simply SELECT
statements where the columns that are selected become variables
accessible anywhere in the scope of the transaction. That is to say, you
should be able to run something like s/LET/SELECT and
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
and produce a valid SELECT statement, and vice versa. Both should
perform identically.

e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table


"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
supports selecting comparisons:


$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
  ?column? | ?column? | ?column?
--+--+--
  f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET
incompatible once comparisons become valid selectors. Unless they become
mandatory (and then you'd write "LET q = a = b" if you wanted to select a
comparison).


I personally prefer the nested query syntax:


     LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
immediately recognizable by everyone as a query, LET is not.



Identical form, identical behaviour. Every statement should be directly
translatable with some simple text manipulation.

We can then make this more powerful for users by simply expanding SELECT
statements, e.g. by permitting them to declare constants and tuples in
the column results. In this scheme LET x = * is simply syntactic sugar
for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
and 5 all at once, consistently alongside each other.

Option 6 is in fact very similar, but is strictly less flexible for the
user as they have no way to declare multiple scalar variables without
scoping them inside a tuple.

e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you prefer
this naming scheme you can simply write

LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we do
need a way for the user to say they care whether the row was null or the
column. I think an implicit conversion here could be surprising. However
we could implement tuple expressions anyway and let the user explicitly
declare v as a tuple as Caleb has suggested for the existing proposal as
well.

Assigning constants or other values not selected from a table would also
be a little clunky:

LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is just slightly
more verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. It might
be that Option 6 would be slightly easier, but I’m unsure if there would
be a huge difference.



On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:

I'm really happy to see CEP-15 getting closer to a final
implementation. I'm going to walk through my reasoning for your
proposals wrt trying to explain this to somebody new.

Looking at all the options, the first thing that comes up for me is
the Cassandra project's complicated relationship with NULL.  We have
prior art with EXISTS/NOT EXISTS when creating new tables. IS
NULL/IS NOT NULL is used in materialized views similarly to
proposals 2,4 and 5.

CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
   AS SELECT [ (column_list) ]
   FROM [keyspace_name.]table_name
   [ WHERE column_name IS NOT NULL
   [ AND column_name IS NOT NULL ... ] ]
   [ AND relation [ AND ... ] ]
   PRIMARY KEY ( column_list )
   [ WITH [ table_properties ]
   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

  Based on that, I believe 1 and 3 would just confuse users, so -1 on
those.

Trying to explain the difference between row and column operations
with LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds more
syntax.

#5 is verbose and, in my opinion, easier to rea

Re: [Proposal] add pull request template

2022-08-18 Thread Avi Kivity via dev


On 18/08/2022 18.46, Mick Semb Wever wrote:




Until IDEs auto cross-reference JIRA,

I'm going to lightly touch the lid of Pandora's Box here and walk
away slowly. It drives me *nuts* when I'm git blaming a file to
understand the context of why a change was made (to make sure I
continue to respect it!) and I see "merge 3.11 into trunk" or some
other such useless commit message, then have to dig into the git
integration and history, then figure out which merge commits were
real and which were -s ours and silently changed, etc.

So about those merge commits... ;)



The beef I have with this is it's just not that difficult: just look 
at the parent 2 commit of the merge.


```
|git log -n1 ^2|
```
(you can also use `git log --follow .` if you like history without 
merge commits)




There's `git merge --log` which provides a short-form log in the merge 
commit.





Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Avi Kivity via dev


On 14/08/2022 17.50, Benedict Elliott Smith wrote:


> SELECT and LET incompatible once comparisons become valid selectors

I don’t think this would be ambiguous, as = is required in the LET 
syntax as we have to bind the result to a variable name.


But, I like the deconstructed tuple syntax improvement over “Option 
6”. This would also seem to easily support assigning from non-query 
statements, such as LET (a, b) = (someFunc(), someOtherFunc(?))


I don’t think it is ideal to depend on relative position in the tuple 
for assigning results to a variable name, as it leaves more scope for 
errors. It would be nice to have a simple way to deconstruct safely. 
But, I think this proposal is good, and I’d be fine with it as an 
alternative if others concur. I agree that seeing the SELECT 
independently may be more easily recognisable to users.


With this approach there remains the question of how we handle single 
column results. I’d be inclined to treat in the following way:


LET (a) = SELECT val FROM table
IF a > 1 THEN...

LET a = SELECT val FROM table
IF a.val > 1 THEN...



I think SQL dialects require subqueries to be parenthesized (not sure). 
If that's the case I think we should keep the tradition.





There is also the question of whether we support SELECT without a FROM 
clause, e.g.

LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2

Or just LET (since they are no longer equivalent)
e.g.
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())



I see no harm in making FROM optional, as it's recognized by other SQL 
dialects.





Also since LET is only binding variables, is there any reason we 
shouldn’t support multiple SELECT assignments in a single LET?, e.g.

LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))



What if an inner select returns a tuple? Would y be a tuple?


I think this is redundant and atypical enough to not be worth 
supporting. Most people would use separate LETs.





Also whether we support tuples in SELECT statements anyway, e.g.
LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
IF tuple1.a > 1 AND tuple2.d > 1…



Absolutely, this just flows naturally from having tuples. There's no 
difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".






and whether we support nested deconstruction, e.g.
LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
IF a > 1 AND d > 1…



I think this can be safely deferred. Most people would again separate it 
into separate LETs.



I'd add (to the specification) that LETs cannot override a previously 
defined variable, just to reduce ambiguity.










On 14 Aug 2022, at 13:55, Avi Kivity via dev 
 wrote:



On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would 
explain the feature to a user.


My mental model for LET statements is that they are simply SELECT 
statements where the columns that are selected become variables 
accessible anywhere in the scope of the transaction. That is to say, 
you should be able to run something like s/LET/SELECT and 
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET 
statement and produce a valid SELECT statement, and vice versa. Both 
should perform identically.


e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table



"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL 
supports selecting comparisons:



$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 ?column? | ?column? | ?column?
--+--+--
 f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET 
incompatible once comparisons become valid selectors. Unless they 
become mandatory (and then you'd write "LET q = a = b" if you wanted 
to select a comparison).



I personally prefer the nested query syntax:


    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT 
is immediately recognizable by everyone as a query, LET is not.





Identical form, identical behaviour. Every statement should be 
directly translatable with some simple text manipulation.


We can then make this more powerful for users by simply expanding 
SELECT statements, e.g. by permitting them to declare constants and 
tuples in the column results. In this scheme LET x = * is simply 
syntactic sugar for LET x = (pk, ck, field1, …) This scheme then 
supports options 2, 4 and 5 all at once, consistently alongside each 
other.


Option 6 is in fact very similar, but is strictly less flexible for 
the user as they have no way to declare multiple scalar variables 
without scoping them inside a tuple.


e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
I

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Avi Kivity via dev


On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would 
explain the feature to a user.


My mental model for LET statements is that they are simply SELECT 
statements where the columns that are selected become variables 
accessible anywhere in the scope of the transaction. That is to say, 
you should be able to run something like s/LET/SELECT and 
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement 
and produce a valid SELECT statement, and vice versa. Both should 
perform identically.


e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table



"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL 
supports selecting comparisons:



$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 ?column? | ?column? | ?column?
--+--+--
 f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET 
incompatible once comparisons become valid selectors. Unless they become 
mandatory (and then you'd write "LET q = a = b" if you wanted to select 
a comparison).



I personally prefer the nested query syntax:


    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is 
immediately recognizable by everyone as a query, LET is not.





Identical form, identical behaviour. Every statement should be 
directly translatable with some simple text manipulation.


We can then make this more powerful for users by simply expanding 
SELECT statements, e.g. by permitting them to declare constants and 
tuples in the column results. In this scheme LET x = * is simply 
syntactic sugar for LET x = (pk, ck, field1, …) This scheme then 
supports options 2, 4 and 5 all at once, consistently alongside each 
other.


Option 6 is in fact very similar, but is strictly less flexible for 
the user as they have no way to declare multiple scalar variables 
without scoping them inside a tuple.


e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you 
prefer this naming scheme you can simply write


LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we 
do need a way for the user to say they care whether the row was null 
or the column. I think an implicit conversion here could be 
surprising. However we could implement tuple expressions anyway and 
let the user explicitly declare v as a tuple as Caleb has suggested 
for the existing proposal as well.


Assigning constants or other values not selected from a table would 
also be a little clunky:


LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is just slightly 
more verbose and slightly less flexible.


Which one would be most intuitive to users is hard to predict. It 
might be that Option 6 would be slightly easier, but I’m unsure if 
there would be a huge difference.




On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:

I'm really happy to see CEP-15 getting closer to a final 
implementation. I'm going to walk through my reasoning for your 
proposals wrt trying to explain this to somebody new.


Looking at all the options, the first thing that comes up for me is 
the Cassandra project's complicated relationship with NULL.  We have 
prior art with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS 
NOT NULL is used in materialized views similarly to proposals 2,4 and 5.


CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
  AS SELECT [ (column_list) ]
  FROM [keyspace_name.]table_name
  [ WHERE column_name IS NOT NULL
  [ AND column_name IS NOT NULL ... ] ]
  [ AND relation [ AND ... ] ]
  PRIMARY KEY ( column_list )
  [ WITH [ table_properties ]
  [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

 Based on that, I believe 1 and 3 would just confuse users, so -1 on 
those.


Trying to explain the difference between row and column operations 
with LET, I can't see the difference between a row and column in #2.


#4 introduces a boolean instead of column names and just adds more 
syntax.


#5 is verbose and, in my opinion, easier to reason when writing a 
query. Thinking top down, I need to know if these exact rows and/or 
column values exist before changing them, so I'll define them first. 
Then I'll iterate over the state I created in my actual changes so I 
know I'm changing precisely what I want.


#5 could use a bit more to be clearer to somebody who doesn't write 
CQL queries daily and wouldn't require memorizing subtle