Re: Aggregate functions on collections, collection functions and MAXWRITETIME
IMO it's wrong to change an aggregate's meaning from "aggregate across GROUPs or entire SELECT" to "aggregate within column". Aggregation is long established in SQL and it will just confuse experienced database users. PostgresQL maintains the meaning of max: CREATE TABLE tab ( x int[] ); INSERT INTO tab(x) VALUES ( '{1, 2}' ); INSERT INTO tab(x) VALUES ( '{3, 4}' ); SELECT max(x) FROM tab; max [3,4] One option is to treat the collection as a tiny table: SELECT (SELECT max(key) FROM a_set_column) AS m1, (SELECT max(value) FROM a_map_column) FROM tab; Though it's better to look for existing practice and emulate it than to exercise creativity here, IMO. On Tue, 2022-12-06 at 13:30 +, Benedict wrote: > Thanks Andres, I think community input on direction here will be > invaluable. There’s a bunch of interrelated tickets, and my opinions > are as follows: > > 1. I think it is a mistake to offer a function MAX that operates over > rows containing collections, returning the collection with the most > elements. This is just a nonsensical operation to support IMO. We > should decide as a community whether we “fix” this aggregation, or > remove it. > 2. I think “collection_" prefixed methods are non-intuitive for > discovery, and all-else equal it would be better to use MAX,MIN, etc, > same as for aggregations. > 3. I think it is peculiar to permit methods named collection_ to > operate over non-collection types when they are explicitly collection > variants. > > Given (1), (2) becomes simple except for COUNT which remains > ambiguous, but this could be solved by either providing a separate > method for collections (e.g. SIZE) which seems fine to me, or by > offering a precedence order for matching and a keyword for overriding > the precedence order (e.g. COUNT(collection AS COLLECTION)). > > Given (2), (3) is a little more difficult. However, I think this can > be solved several ways. > - We could permit explicit casts to collection types, that for a > collection type would be a no-op, and for a single value would create > a collection > - With precedence orders, by always selecting the scalar function > last > - By permitting WRITETIME to accept a binary operator reduce > function to resolve multiple values > > These decisions all imply trade-offs on each other, and affect the > evolution of CQL, so I think community input would be helpful. > > > On 6 Dec 2022, at 12:44, Andrés de la Peña > > wrote: > > > > > > This will require some long introduction for context: > > > > The MAX/MIN functions aggregate rows to get the row with min/max > > column value according to their comparator. For collections, the > > comparison is on the lexicographical order of the collection > > elements. That's the very same comparator that is used when > > collections are used as clustering keys and for ORDER BY. > > > > However, a bug in the MIN/MAX aggregate functions used to make that > > the results were presented in their unserialized form, although the > > row selection was correct. That bug was recently solved by > > CASSANDRA-17811. During that ticket it was also considered the > > option of simply disabling MIN/MAX on collection since applying > > those functions to collections, since they don't seem super useful. > > However, that option was quickly discarded and the operation was > > fixed so the MIN/MAX functions correctly work for every data type. > > > > As a byproduct of the internal improvements of that fix, CASSANDRA- > > 8877 introduced a new set of functions that can perform > > aggregations of the elements of a collection. Those where named > > "map_keys", "map_values", "collection_min", "collection_max", > > "collection_sum", and "collection_count". Those are the names > > mentioned on the mail list thread about function naming > > conventions. Despite doing a kind of within-collection aggregation, > > these functions are not what we usually call aggregate functions, > > since they don't aggregate multiple rows together. > > > > On a different line of work, CASSANDRA-17425 added to trunk a > > MAXWRITETIME function to get the max timestamp of a multi-cell > > column. However, the new collection functions can be used in > > combination with the WRITETIME and TTL functions to retrieve the > > min/max/sum/avg timestamp or ttl of a multi-cell column. Since the > > new functions give a generic way of aggreagting timestamps ant TTLs > > of multi-cell columns, CASSANDRA-18078 proposed to remove that > > MAXWRITETIME function. > > > > Yifan Cai, author of the MAXWRITETIME function, agreed to remove > > that function in favour of the new generic collection functions. > > However, the MAXWRITETIME function can work on both single-cell and > > multi-cell columns, whereas "COLLECTION_MAX(WRITETIME(column))" > > would only work on multi-cell columns, That's because MAXWRITETIME > > of a not-multicell column doesn't return a collection, and one > > should simply use "WRITETIME(column)"
Re: [DISCUSS] CEP-20: Dynamic Data Masking
Agree with views, or alternatively, column permissions together with computed columns: CREATE TABLE foo ( id int PRIMARY KEY, unmasked_name text, name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7) ) (syntax from postgresql) GRANT SELECT ON foo.name TO general_use; GRANT SELECT ON foo.unmasked_name TO top_secret; On 26/08/2022 00.10, Benedict wrote: I’m inclined to agree that this seems a more straightforward approach that makes fewer implied promises. Perhaps we could deliver simple views backed by virtual tables, and model our approach on that of Postgres, MySQL et al? Views in C* would be very simple, just offering a subset of fields with some UDFs applied. It would allow users to define roles with access only to the views, or for applications to use the views for presentation purposes. It feels like a cleaner approach to me, and we’d get two features for the price of one. BUT I don’t feel super strongly about this. On 25 Aug 2022, at 20:16, Derek Chen-Becker wrote: To make sure I understand, if I wanted to use a masked column for a conditional update, you're saying we would need SELECT_MASKED to use it in the IF clause? I worry that this proposal is increasing in complexity; I would actually be OK starting with something smaller in scope. Perhaps just providing the masking functions and not tying masking to schema would be sufficient for an initial goal? That wouldn't preclude additional permissions, schema integration, or perhaps just plain Views in the future. Cheers, Derek On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña wrote: I have modified the proposal adding a new SELECT_MASKED permission. Using masked columns on WHERE/IF clauses would require having SELECT and either UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the query results would always require both SELECT and UNMASK. This way we can have the best of both worlds, allowing admins to decide whether they trust their immediate users or not. wdyt? On Wed, 24 Aug 2022 at 16:06, Henrik Ingo wrote: This is the difference between security and compliance I guess :-D The way I see this, the attacker or threat in this concept is not the developer with access to the database. Rather a feature like this is just a convenient way to apply some masking rule in a centralized way. The protection is against an end user of the application, who should not be able to see the personal data of someone else. Or themselves, even. As long as the application end user doesn't have access to run arbitrary CQL, then these frorms of masking prevent accidental unauthorized use/leaking of personal data. henrik On Wed, Aug 24, 2022 at 10:40 AM Benedict wrote: Is it typical for a masking feature to make no effort to prevent unmasking? I’m just struggling to see the value of this without such mechanisms. Otherwise it’s just a default formatter, and we should consider renaming the feature IMO On 23 Aug 2022, at 21:27, Andrés de la Peña wrote: As mentioned in the CEP document, dynamic data masking doesn't try to prevent malicious users with SELECT permissions to indirectly guess the real value of the masked value. This can easily be done by just trying values on the WHERE clause of SELECT queries. DDM would not be a replacement for proper column-level permissions. The data served by the database is usually consumed by applications that present this data to end users. These end users are not necessarily the users directly connecting to the database. With DDM, it would be easy for applications to mask sensitive data that is going to be consumed by the end users. However, the users directly connecting to the database should be trusted, provided that they have the right SELECT permissions. In other words, DDM doesn't directly protect the data, but it eases the production of protected data. Said that, we could later go one step ahead and add a way to prevent untrusted users from inferring the masked data. That could be done adding a new permission required to use certain columns on WHERE clauses, different to the current SELECT permission. That would play especially well with column-level permissions, which is something that we still have pending. On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz wrote: Applying this should prevent querying on a field, else you could leak its contents, surely?
Re: CEP-15 multi key transaction syntax
I wasn't referring to specific syntax but to the concept. If a SQL dialect (or better, the standard) has a way to select data into a variable, let's adopt it. If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my preference. On 8/22/22 19:13, Patrick McFadin wrote: The replies got trashed pretty badly in the responses. When you say: "Agree it's better to reuse existing syntax than invent new syntax." Which syntax are you referring to? Patrick On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev wrote: Agree it's better to reuse existing syntax than invent new syntax. On 8/21/22 16:52, Konstantin Osipov wrote: > * Avi Kivity via dev [22/08/14 15:59]: > > MySQL supports SELECT INTO FROM ... WHERE > ... > > PostgreSQL supports pretty much the same syntax. > > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and > MySQL/PostgreSQL SELECT ... INTO? > >> On 14/08/2022 01.29, Benedict Elliott Smith wrote: >>> >>> I’ll do my best to express with my thinking, as well as how I would >>> explain the feature to a user. >>> >>> My mental model for LET statements is that they are simply SELECT >>> statements where the columns that are selected become variables >>> accessible anywhere in the scope of the transaction. That is to say, you >>> should be able to run something like s/LET/SELECT and >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement >>> and produce a valid SELECT statement, and vice versa. Both should >>> perform identically. >>> >>> e.g. >>> SELECT pk AS key, v AS value FROM table >>> >>> => >>> LET key = pk, value = v FROM table >> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL >> supports selecting comparisons: >> >> >> $ psql >> psql (14.3) >> Type "help" for help. >> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; >> ?column? | ?column? | ?column? >> --+--+-- >> f | t | >> (1 row) >> >> >> Using "=" as a syntactic element in LET would make SELECT and LET >> incompatible once comparisons become valid selectors. Unless they become >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a >> comparison). >> >> >> I personally prefer the nested query syntax: >> >> >> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); >> >> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is >> immediately recognizable by everyone as a query, LET is not. >> >> >>> Identical form, identical behaviour. Every statement should be directly >>> translatable with some simple text manipulation. >>> >>> We can then make this more powerful for users by simply expanding SELECT >>> statements, e.g. by permitting them to declare constants and tuples in >>> the column results. In this scheme LET x = * is simply syntactic sugar >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4 >>> and 5 all at once, consistently alongside each other. >>> >>> Option 6 is in fact very similar, but is strictly less flexible for the >>> user as they have no way to declare multiple scalar variables without >>> scoping them inside a tuple. >>> >>> e.g. >>> LET key = pk, value = v FROM table >>> IF key > 1 AND value > 1 THEN... >>> >>> => >>> LET row = SELECT pk AS key, v AS value FROM table >>> IF row.key > 1 AND row.value > 1 THEN… >>> >>> However, both are expressible in the existing proposal, as if you prefer >>> this naming scheme you can simply write >>> >>> LET row = (pk AS key, v AS value) FROM table >>> IF row.key > 1 AND row.value > 1 THEN… >>> >>> With respect to auto converting single column results to a scalar, we do >>> need a way for the user to say they care whether the row was null or the >>> column. I think an implicit conversion here could be surprising. However >>> we
Re: CEP-15 multi key transaction syntax
Agree it's better to reuse existing syntax than invent new syntax. On 8/21/22 16:52, Konstantin Osipov wrote: * Avi Kivity via dev [22/08/14 15:59]: MySQL supports SELECT INTO FROM ... WHERE ... PostgreSQL supports pretty much the same syntax. Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and MySQL/PostgreSQL SELECT ... INTO? On 14/08/2022 01.29, Benedict Elliott Smith wrote: I’ll do my best to express with my thinking, as well as how I would explain the feature to a user. My mental model for LET statements is that they are simply SELECT statements where the columns that are selected become variables accessible anywhere in the scope of the transaction. That is to say, you should be able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement and produce a valid SELECT statement, and vice versa. Both should perform identically. e.g. SELECT pk AS key, v AS value FROM table => LET key = pk, value = v FROM table "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL supports selecting comparisons: $ psql psql (14.3) Type "help" for help. avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; ?column? | ?column? | ?column? --+--+-- f | t | (1 row) Using "=" as a syntactic element in LET would make SELECT and LET incompatible once comparisons become valid selectors. Unless they become mandatory (and then you'd write "LET q = a = b" if you wanted to select a comparison). I personally prefer the nested query syntax: LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is immediately recognizable by everyone as a query, LET is not. Identical form, identical behaviour. Every statement should be directly translatable with some simple text manipulation. We can then make this more powerful for users by simply expanding SELECT statements, e.g. by permitting them to declare constants and tuples in the column results. In this scheme LET x = * is simply syntactic sugar for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at once, consistently alongside each other. Option 6 is in fact very similar, but is strictly less flexible for the user as they have no way to declare multiple scalar variables without scoping them inside a tuple. e.g. LET key = pk, value = v FROM table IF key > 1 AND value > 1 THEN... => LET row = SELECT pk AS key, v AS value FROM table IF row.key > 1 AND row.value > 1 THEN… However, both are expressible in the existing proposal, as if you prefer this naming scheme you can simply write LET row = (pk AS key, v AS value) FROM table IF row.key > 1 AND row.value > 1 THEN… With respect to auto converting single column results to a scalar, we do need a way for the user to say they care whether the row was null or the column. I think an implicit conversion here could be surprising. However we could implement tuple expressions anyway and let the user explicitly declare v as a tuple as Caleb has suggested for the existing proposal as well. Assigning constants or other values not selected from a table would also be a little clunky: LET v1 = someFunc(), v2 = someOtherFunc(?) IF v1 > 1 AND v2 > 1 THEN… => LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2 IF row.v1 > 1 AND row.v2 > 1 THEN... That said, the proposals are /close/ to identical, it is just slightly more verbose and slightly less flexible. Which one would be most intuitive to users is hard to predict. It might be that Option 6 would be slightly easier, but I’m unsure if there would be a huge difference. On 13 Aug 2022, at 16:59, Patrick McFadin wrote: I'm really happy to see CEP-15 getting closer to a final implementation. I'm going to walk through my reasoning for your proposals wrt trying to explain this to somebody new. Looking at all the options, the first thing that comes up for me is the Cassandra project's complicated relationship with NULL. We have prior art with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is used in materialized views similarly to proposals 2,4 and 5. CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name AS SELECT [ (column_list) ] FROM [keyspace_name.]table_name [ WHERE column_name IS NOT NULL [ AND column_name IS NOT NULL ... ] ] [ AND relation [ AND ... ] ] PRIMARY KEY ( column_list ) [ WITH [ table_properties ] [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ; Based on that, I believe 1 and 3 would just confuse users, so -1 on those. Trying to explain the difference between row and column operations with LET, I can't see the difference between a row and column in #2. #4 introduces a boolean instead of column names and just adds more syntax. #5 is verbose and, in my opinion, easier to rea
Re: [Proposal] add pull request template
On 18/08/2022 18.46, Mick Semb Wever wrote: Until IDEs auto cross-reference JIRA, I'm going to lightly touch the lid of Pandora's Box here and walk away slowly. It drives me *nuts* when I'm git blaming a file to understand the context of why a change was made (to make sure I continue to respect it!) and I see "merge 3.11 into trunk" or some other such useless commit message, then have to dig into the git integration and history, then figure out which merge commits were real and which were -s ours and silently changed, etc. So about those merge commits... ;) The beef I have with this is it's just not that difficult: just look at the parent 2 commit of the merge. ``` |git log -n1 ^2| ``` (you can also use `git log --follow .` if you like history without merge commits) There's `git merge --log` which provides a short-form log in the merge commit.
Re: CEP-15 multi key transaction syntax
On 14/08/2022 17.50, Benedict Elliott Smith wrote: > SELECT and LET incompatible once comparisons become valid selectors I don’t think this would be ambiguous, as = is required in the LET syntax as we have to bind the result to a variable name. But, I like the deconstructed tuple syntax improvement over “Option 6”. This would also seem to easily support assigning from non-query statements, such as LET (a, b) = (someFunc(), someOtherFunc(?)) I don’t think it is ideal to depend on relative position in the tuple for assigning results to a variable name, as it leaves more scope for errors. It would be nice to have a simple way to deconstruct safely. But, I think this proposal is good, and I’d be fine with it as an alternative if others concur. I agree that seeing the SELECT independently may be more easily recognisable to users. With this approach there remains the question of how we handle single column results. I’d be inclined to treat in the following way: LET (a) = SELECT val FROM table IF a > 1 THEN... LET a = SELECT val FROM table IF a.val > 1 THEN... I think SQL dialects require subqueries to be parenthesized (not sure). If that's the case I think we should keep the tradition. There is also the question of whether we support SELECT without a FROM clause, e.g. LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2 Or just LET (since they are no longer equivalent) e.g. LET x = (someFunc() AS v1, someOtherFunc() as v2) LET (v1, v2) = (someFunc(), someOtherFunc()) I see no harm in making FROM optional, as it's recognized by other SQL dialects. Also since LET is only binding variables, is there any reason we shouldn’t support multiple SELECT assignments in a single LET?, e.g. LET (x, y) = ((SELECT x FROM…), (SELECT y FROM)) What if an inner select returns a tuple? Would y be a tuple? I think this is redundant and atypical enough to not be worth supporting. Most people would use separate LETs. Also whether we support tuples in SELECT statements anyway, e.g. LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM.. IF tuple1.a > 1 AND tuple2.d > 1… Absolutely, this just flows naturally from having tuples. There's no difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple". and whether we support nested deconstruction, e.g. LET (a, b, (c, d)) = SELECT a, b, someTuple FROM.. IF a > 1 AND d > 1… I think this can be safely deferred. Most people would again separate it into separate LETs. I'd add (to the specification) that LETs cannot override a previously defined variable, just to reduce ambiguity. On 14 Aug 2022, at 13:55, Avi Kivity via dev wrote: On 14/08/2022 01.29, Benedict Elliott Smith wrote: I’ll do my best to express with my thinking, as well as how I would explain the feature to a user. My mental model for LET statements is that they are simply SELECT statements where the columns that are selected become variables accessible anywhere in the scope of the transaction. That is to say, you should be able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement and produce a valid SELECT statement, and vice versa. Both should perform identically. e.g. SELECT pk AS key, v AS value FROM table => LET key = pk, value = v FROM table "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL supports selecting comparisons: $ psql psql (14.3) Type "help" for help. avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; ?column? | ?column? | ?column? --+--+-- f | t | (1 row) Using "=" as a syntactic element in LET would make SELECT and LET incompatible once comparisons become valid selectors. Unless they become mandatory (and then you'd write "LET q = a = b" if you wanted to select a comparison). I personally prefer the nested query syntax: LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is immediately recognizable by everyone as a query, LET is not. Identical form, identical behaviour. Every statement should be directly translatable with some simple text manipulation. We can then make this more powerful for users by simply expanding SELECT statements, e.g. by permitting them to declare constants and tuples in the column results. In this scheme LET x = * is simply syntactic sugar for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at once, consistently alongside each other. Option 6 is in fact very similar, but is strictly less flexible for the user as they have no way to declare multiple scalar variables without scoping them inside a tuple. e.g. LET key = pk, value = v FROM table IF key > 1 AND value > 1 THEN... => LET row = SELECT pk AS key, v AS value FROM table I
Re: CEP-15 multi key transaction syntax
On 14/08/2022 01.29, Benedict Elliott Smith wrote: I’ll do my best to express with my thinking, as well as how I would explain the feature to a user. My mental model for LET statements is that they are simply SELECT statements where the columns that are selected become variables accessible anywhere in the scope of the transaction. That is to say, you should be able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement and produce a valid SELECT statement, and vice versa. Both should perform identically. e.g. SELECT pk AS key, v AS value FROM table => LET key = pk, value = v FROM table "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL supports selecting comparisons: $ psql psql (14.3) Type "help" for help. avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; ?column? | ?column? | ?column? --+--+-- f | t | (1 row) Using "=" as a syntactic element in LET would make SELECT and LET incompatible once comparisons become valid selectors. Unless they become mandatory (and then you'd write "LET q = a = b" if you wanted to select a comparison). I personally prefer the nested query syntax: LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is immediately recognizable by everyone as a query, LET is not. Identical form, identical behaviour. Every statement should be directly translatable with some simple text manipulation. We can then make this more powerful for users by simply expanding SELECT statements, e.g. by permitting them to declare constants and tuples in the column results. In this scheme LET x = * is simply syntactic sugar for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at once, consistently alongside each other. Option 6 is in fact very similar, but is strictly less flexible for the user as they have no way to declare multiple scalar variables without scoping them inside a tuple. e.g. LET key = pk, value = v FROM table IF key > 1 AND value > 1 THEN... => LET row = SELECT pk AS key, v AS value FROM table IF row.key > 1 AND row.value > 1 THEN… However, both are expressible in the existing proposal, as if you prefer this naming scheme you can simply write LET row = (pk AS key, v AS value) FROM table IF row.key > 1 AND row.value > 1 THEN… With respect to auto converting single column results to a scalar, we do need a way for the user to say they care whether the row was null or the column. I think an implicit conversion here could be surprising. However we could implement tuple expressions anyway and let the user explicitly declare v as a tuple as Caleb has suggested for the existing proposal as well. Assigning constants or other values not selected from a table would also be a little clunky: LET v1 = someFunc(), v2 = someOtherFunc(?) IF v1 > 1 AND v2 > 1 THEN… => LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2 IF row.v1 > 1 AND row.v2 > 1 THEN... That said, the proposals are /close/ to identical, it is just slightly more verbose and slightly less flexible. Which one would be most intuitive to users is hard to predict. It might be that Option 6 would be slightly easier, but I’m unsure if there would be a huge difference. On 13 Aug 2022, at 16:59, Patrick McFadin wrote: I'm really happy to see CEP-15 getting closer to a final implementation. I'm going to walk through my reasoning for your proposals wrt trying to explain this to somebody new. Looking at all the options, the first thing that comes up for me is the Cassandra project's complicated relationship with NULL. We have prior art with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is used in materialized views similarly to proposals 2,4 and 5. CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name AS SELECT [ (column_list) ] FROM [keyspace_name.]table_name [ WHERE column_name IS NOT NULL [ AND column_name IS NOT NULL ... ] ] [ AND relation [ AND ... ] ] PRIMARY KEY ( column_list ) [ WITH [ table_properties ] [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ; Based on that, I believe 1 and 3 would just confuse users, so -1 on those. Trying to explain the difference between row and column operations with LET, I can't see the difference between a row and column in #2. #4 introduces a boolean instead of column names and just adds more syntax. #5 is verbose and, in my opinion, easier to reason when writing a query. Thinking top down, I need to know if these exact rows and/or column values exist before changing them, so I'll define them first. Then I'll iterate over the state I created in my actual changes so I know I'm changing precisely what I want. #5 could use a bit more to be clearer to somebody who doesn't write CQL queries daily and wouldn't require memorizing subtle