subject:"CEP\-15 multi key transaction syntax"

Re: CEP-15 multi key transaction syntax

2022-09-22 Thread Caleb Rackliffe

Yeah, if you remember Blake's original syntax proposal, the IF condition
was actually specified outside the body of the transaction following the
COMMIT TRANSACTION phrase. (i.e. COMMIT TRANSACTION IF blah, blah)

The current syntax only changed that superficially, but in doing so, it
left the door open structurally for an arbitrary number of conditional and
unconditional updates.

I've got a large TODO list already to get this ready to review ASAP, but
I'll noodle on it and we can make a final decision when that gets closer. I
also lean toward doing it when "we support arbitrary numbers of IF
statements".

On Wed, Sep 21, 2022 at 3:41 PM David Capwell  wrote:

> I expect that a lot of use cases will update M and insert into N tables
> based on one condition
>
>
> Jeff, the issue is a scope issue
>
> — works fine today
> IF …
>   UPDATE ….;
>   INSERT …;
> END IF
>
> — also works today just fine; no condition is used with the mutations
> UPDATE ….;
> INSERT …;
>
> — does not work today
> IF
>   UPDATE ….;
>   INSERT …;
> END IF
> — this breaks the parser as it does not belong to the above IF block
> INSERT …;
>
> So its not that updating multiple tables is a problem, its just that
> mapping mutations to conditions is purely on if a condition exists today
> and the parser assumes this as well… so all mutations are tied to a
> condition if present, else all mutations have no conditions…. The parser
> helps enforce this by failing if you mix.
>
> My inclination is not to support this until we support arbitrary numbers
> of IF statements.
>
>
> That is my feeling as well.  I am cool with v1 having this limitation as
> it does NOT block future versions to enhance the syntax, and when we can
> support multiple IF then we need to decouple this current implementation
> detail… so easier to deal with then.
>
> On Sep 21, 2022, at 1:22 PM, Benedict  wrote:
>
> Not quite sure I follow, but the syntax we agreed permits you to update as
> many tables as you like with a single condition, or with no condition, but
> not to mix both conditional and unconditional updates in a single
> transaction.
>
> My preference is to keep this simple until we permit arbitrarily complex
> logic, ie sequences of (potentially nested) ifs and unconditional updates.
>
> On 21 Sep 2022, at 21:04, Jeff Jirsa  wrote:
>
> 
> I expect that a lot of use cases will update M and insert into N tables
> based on one condition, so if that's a problem with the grammar today, I
> think it'd probably be worth the time to sort that out?
>
>
>
> On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:
>
>> Caleb is making great progress on this, and I have been working on CQL
>> fuzz testing the new grammar to make sure we flesh out cases quickly; one
>> thing we hit was about mixing conditional and non-conditional updates; will
>> use a example to better show
>>
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>>
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
>> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
>> be simple and have all updates tied to conditional when present?
>>
>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
>> wrote:
>>
>> I wasn't referring to specific syntax but to the concept. If a SQL
>> dialect (or better, the standard) has a way to select data into a variable,
>> let's adopt it.
>>
>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab)
>> is my preference.
>>
>> On 8/22/22 19:13, Patrick McFadin wrote:
>>
>> The replies got trashed pretty badly in the responses.
>> When you say: "Agree it's better to reuse existing syntax than invent new
>> syntax."
>>
>> Which syntax are you referring to?
>>
>> Patrick
>>
>>
>> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> Agree it's better to reuse existing syntax than invent new syntax.
>>>
>>> On 8/21/22 16:52, Konstantin Osipov wrote:
>>> > * Avi Kivity via dev  [22/08/14 15:59]:
>>> >
>>> > MySQL supports SELECT  INTO  FROM ... WHERE
>>> > ...
>>> >
>>> > PostgreSQL supports pretty much the same syntax.
>>> >
>>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>>> > MySQL/PostgreSQL SELECT ... INTO?
>>> >
>>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>> >>> 
>>> >>> I’ll do my best to express with my thinking, as well as how I would
>>> >>> explain the feature to a user.
>>> >>>
>>> >>> My mental model for LET statements is that they are simply SELECT
>>> >>> statements where the columns that are selected become variables
>>> >>> accessible anywhere in the scope of the transaction. That is to say,
>>> you
>>> >>> should be able to run something like s/LET/SELECT and
>>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>>> >>> and produce a valid SELECT statement, and vice versa. Both should
>>> >>>

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread David Capwell

> I expect that a lot of use cases will update M and insert into N tables based 
> on one condition


Jeff, the issue is a scope issue

— works fine today
IF …
  UPDATE ….;
  INSERT …;
END IF

— also works today just fine; no condition is used with the mutations
UPDATE ….;
INSERT …;

— does not work today
IF
  UPDATE ….;
  INSERT …;
END IF
— this breaks the parser as it does not belong to the above IF block
INSERT …;

So its not that updating multiple tables is a problem, its just that mapping 
mutations to conditions is purely on if a condition exists today and the parser 
assumes this as well… so all mutations are tied to a condition if present, else 
all mutations have no conditions…. The parser helps enforce this by failing if 
you mix.

> My inclination is not to support this until we support arbitrary numbers of 
> IF statements. 

That is my feeling as well.  I am cool with v1 having this limitation as it 
does NOT block future versions to enhance the syntax, and when we can support 
multiple IF then we need to decouple this current implementation detail… so 
easier to deal with then.

> On Sep 21, 2022, at 1:22 PM, Benedict  wrote:
> 
> Not quite sure I follow, but the syntax we agreed permits you to update as 
> many tables as you like with a single condition, or with no condition, but 
> not to mix both conditional and unconditional updates in a single transaction.
> 
> My preference is to keep this simple until we permit arbitrarily complex 
> logic, ie sequences of (potentially nested) ifs and unconditional updates.
> 
>> On 21 Sep 2022, at 21:04, Jeff Jirsa  wrote:
>> 
>> 
>> I expect that a lot of use cases will update M and insert into N tables 
>> based on one condition, so if that's a problem with the grammar today, I 
>> think it'd probably be worth the time to sort that out? 
>> 
>> 
>> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell > > wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev >> > wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
 mailto:dev@cassandra.apache.org>> wrote:
 Agree it's better to reuse existing syntax than invent new syntax.
 
 On 8/21/22 16:52, Konstantin Osipov wrote:
 > * Avi Kivity via dev >>> > > [22/08/14 15:59]:
 >
 > MySQL supports SELECT  INTO  FROM ... WHERE
 > ...
 >
 > PostgreSQL supports pretty much the same syntax.
 >
 > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
 > MySQL/PostgreSQL SELECT ... INTO?
 >
 >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
 >>> 
 >>> I’ll do my best to express with my thinking, as well as how I would
 >>> explain the feature to a user.
 >>>
 >>> My mental model for LET statements is that they are simply SELECT
 >>> statements where the columns that are selected become variables
 >>> accessible anywhere in the scope of the transaction. That is to say, 
 >>> you
 >>> should be able to run something like s/LET/SELECT and
 >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
 >>> and produce a valid SELECT statement, and vice versa. Both should
 >>> perform identically.
 >>>
 >>> e.g.
 >>> SELECT pk AS key, v AS value FROM table
 >>>
 >>> =>
 >>> LET key = pk, value = v FROM table
 >>
 >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
 >> supports selecting comparisons:
 >>
 >>
 >> $ psql
 >> psql (14.3)
 >> Type "help" for help.
 >>
 >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 >>   ?column? | ?column? | ?column?
 >> --+--+--
 >>   f

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Benedict

Not quite sure I follow, but the syntax we agreed permits you to update as many 
tables as you like with a single condition, or with no condition, but not to 
mix both conditional and unconditional updates in a single transaction.

My preference is to keep this simple until we permit arbitrarily complex logic, 
ie sequences of (potentially nested) ifs and unconditional updates.

> On 21 Sep 2022, at 21:04, Jeff Jirsa  wrote:
> 
> 
> I expect that a lot of use cases will update M and insert into N tables based 
> on one condition, so if that's a problem with the grammar today, I think it'd 
> probably be worth the time to sort that out? 
> 
> 
> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
>>> wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
  wrote:
> Agree it's better to reuse existing syntax than invent new syntax.
> 
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say, 
> >>> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they 
> >> become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to 
> >> select a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be 
> >>> directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding 
> >>> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = * is simply syntactic sugar
> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
> >>> and 5 all at once, consistently alongside each other.
> >>>
> >>> Option 6 is in

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Jeff Jirsa

I expect that a lot of use cases will update M and insert into N tables
based on one condition, so if that's a problem with the grammar today, I
think it'd probably be worth the time to sort that out?



On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
> UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> dev@cassandra.apache.org> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev  [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be
>> directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding
>> SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for
>> the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you
>> prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we
>> do
>> >>> need a way for the user to say they care

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Benedict

My inclination is not to support this until we support arbitrary numbers of IF 
statements. It’s one too many arbitrary restrictions and it potentially gets 
confusing.

But I don’t feel super strongly about it.

> On 21 Sep 2022, at 20:56, Patrick McFadin  wrote:
> 
> 
> I'm also working on different use cases and syntax for Accord :D
> 
> I'm +1 on this change and leaving the door open for maybe a few more as we 
> test this out. It needs to be functionally useful for developers in v1, and I 
> think it's worth the changes to get it right. 
> 
> One other thing Caleb and I have been discussing is how, when running a 
> transaction, the statement returns with no message. In CQLSH you have no idea 
> if anything happened unless you select from the tables and look for changes. 
> Even something like LWT adds with "applied=true|false" 
> 
> Patrick
> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
>>> wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
  wrote:
> Agree it's better to reuse existing syntax than invent new syntax.
> 
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say, 
> >>> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they 
> >> become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to 
> >> select a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be 
> >>> directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding 
> >>> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x =

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Patrick McFadin

I'm also working on different use cases and syntax for Accord :D

I'm +1 on this change and leaving the door open for maybe a few more as we
test this out. It needs to be functionally useful for developers in v1, and
I think it's worth the changes to get it right.

One other thing Caleb and I have been discussing is how, when running a
transaction, the statement returns with no message. In CQLSH you have no
idea if anything happened unless you select from the tables and look for
changes. Even something like LWT adds with "applied=true|false"

Patrick

On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
> UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> dev@cassandra.apache.org> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev  [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be
>> directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding
>> SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for
>> the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However,

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread David Capwell

Caleb is making great progress on this, and I have been working on CQL fuzz 
testing the new grammar to make sure we flesh out cases quickly; one thing we 
hit was about mixing conditional and non-conditional updates; will use a 
example to better show

BEGIN TRANSACTION
  LET a = (SELECT * FROM ….);
  IF a IS NOT NULL THEN
UPDATE …;
  END IF
  INSERT INTO ...
COMMIT TRANSACTION

In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
simple and have all updates tied to conditional when present?

> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
> wrote:
> 
> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
> (or better, the standard) has a way to select data into a variable, let's 
> adopt it.
> 
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my 
> preference.
> 
> On 8/22/22 19:13, Patrick McFadin wrote:
>> The replies got trashed pretty badly in the responses. 
>> When you say: "Agree it's better to reuse existing syntax than invent new 
>> syntax."
>> 
>> Which syntax are you referring to?
>> 
>> Patrick
>> 
>> 
>> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev > > wrote:
>> Agree it's better to reuse existing syntax than invent new syntax.
>> 
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev > > > [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say, you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we do
>> >>> need a way for the user to say they care whether the row was null or the
>> >>> column. I think an implicit conversion here could be surprising. However
>> >>> we could implement tuple expressions anyway and let the user explicitly
>> >>> declare v as a tuple as Caleb has suggested

Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Avi Kivity via dev

I wasn't referring to specific syntax but to the concept. If a SQL 
dialect (or better, the standard) has a way to select data into a 
variable, let's adopt it.


If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) 
is my preference.


On 8/22/22 19:13, Patrick McFadin wrote:

The replies got trashed pretty badly in the responses.
When you say: "Agree it's better to reuse existing syntax than invent 
new syntax."


Which syntax are you referring to?

Patrick


On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
 wrote:


Agree it's better to reuse existing syntax than invent new syntax.

On 8/21/22 16:52, Konstantin Osipov wrote:
> * Avi Kivity via dev  [22/08/14 15:59]:
>
> MySQL supports SELECT  INTO  FROM ... WHERE
> ...
>
> PostgreSQL supports pretty much the same syntax.
>
> Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var
TYPE and
> MySQL/PostgreSQL SELECT ... INTO?
>
>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>> 
>>> I’ll do my best to express with my thinking, as well as how I
would
>>> explain the feature to a user.
>>>
>>> My mental model for LET statements is that they are simply SELECT
>>> statements where the columns that are selected become variables
>>> accessible anywhere in the scope of the transaction. That is
to say, you
>>> should be able to run something like s/LET/SELECT and
>>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET
statement
>>> and produce a valid SELECT statement, and vice versa. Both should
>>> perform identically.
>>>
>>> e.g.
>>> SELECT pk AS key, v AS value FROM table
>>>
>>> =>
>>> LET key = pk, value = v FROM table
>>
>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet,
but SQL
>> supports selecting comparisons:
>>
>>
>> $ psql
>> psql (14.3)
>> Type "help" for help.
>>
>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>   ?column? | ?column? | ?column?
>> --+--+--
>>   f    | t    |
>> (1 row)
>>
>>
>> Using "=" as a syntactic element in LET would make SELECT and LET
>> incompatible once comparisons become valid selectors. Unless
they become
>> mandatory (and then you'd write "LET q = a = b" if you wanted
to select a
>> comparison).
>>
>>
>> I personally prefer the nested query syntax:
>>
>>
>>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>
>>
>> So there aren't two similar-but-not-quite-the-same syntaxes.
SELECT is
>> immediately recognizable by everyone as a query, LET is not.
>>
>>
>>> Identical form, identical behaviour. Every statement should be
directly
>>> translatable with some simple text manipulation.
>>>
>>> We can then make this more powerful for users by simply
expanding SELECT
>>> statements, e.g. by permitting them to declare constants and
tuples in
>>> the column results. In this scheme LET x = * is simply
syntactic sugar
>>> for LET x = (pk, ck, field1, …) This scheme then supports
options 2, 4
>>> and 5 all at once, consistently alongside each other.
>>>
>>> Option 6 is in fact very similar, but is strictly less
flexible for the
>>> user as they have no way to declare multiple scalar variables
without
>>> scoping them inside a tuple.
>>>
>>> e.g.
>>> LET key = pk, value = v FROM table
>>> IF key > 1 AND value > 1 THEN...
>>>
>>> =>
>>> LET row = SELECT pk AS key, v AS value FROM table
>>> IF row.key > 1 AND row.value > 1 THEN…
>>>
>>> However, both are expressible in the existing proposal, as if
you prefer
>>> this naming scheme you can simply write
>>>
>>> LET row = (pk AS key, v AS value) FROM table
>>> IF row.key > 1 AND row.value > 1 THEN…
>>>
>>> With respect to auto converting single column results to a
scalar, we do
>>> need a way for the user to say they care whether the row was
null or the
>>> column. I think an implicit conversion here could be
surprising. However
>>> we could implement tuple expressions anyway and let the user
explicitly
>>> declare v as a tuple as Caleb has suggested for the existing
proposal as
>>> well.
>>>
>>> Assigning constants or other values not selected from a table
would also
>>> be a little clunky:
>>>
>>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>>> IF v1 > 1 AND v2 > 1 THEN…
>>>
>>> =>
>>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>>>
>>> That said, the proposals are /close/ to identical, it is just
slightly
>>> more verbose and slightly less flexible.
>>>
>>> Which one would be most intuitive to users is hard to predict.

Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Patrick McFadin

The replies got trashed pretty badly in the responses.
When you say: "Agree it's better to reuse existing syntax than invent new
syntax."

Which syntax are you referring to?

Patrick


On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
wrote:

> Agree it's better to reuse existing syntax than invent new syntax.
>
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say,
> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select
> a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding
> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = * is simply syntactic sugar
> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
> >>> and 5 all at once, consistently alongside each other.
> >>>
> >>> Option 6 is in fact very similar, but is strictly less flexible for the
> >>> user as they have no way to declare multiple scalar variables without
> >>> scoping them inside a tuple.
> >>>
> >>> e.g.
> >>> LET key = pk, value = v FROM table
> >>> IF key > 1 AND value > 1 THEN...
> >>>
> >>> =>
> >>> LET row = SELECT pk AS key, v AS value FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> However, both are expressible in the existing proposal, as if you
> prefer
> >>> this naming scheme you can simply write
> >>>
> >>> LET row = (pk AS key, v AS value) FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> With respect to auto converting single column results to a scalar, we
> do
> >>> need a way for the user to say they care whether the row was null or
> the
> >>> column. I think an implicit conversion here could be surprising.
> However
> >>> we could implement tuple expressions anyway and let the user explicitly
> >>> declare v as a tuple as Caleb has suggested for the existing proposal
> as
> >>> well.
> >>>
> >>> Assigning constants or other values not selected from a table would
> also
> >>> be a little clunky:
> >>>
> >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
> >>> IF v1 > 1 AND v2 > 1 THEN…
> >>>
> >>> =>
> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
> >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
> >>>
> >>> That said, the proposals are /close/ to identical, it is just slightly
> >>> more verbose and slightly less flexible.
> >>>
> >>> Which one would be most intuitive to users is hard to predict. It might
> >>> be that Option 6 would be slightly easier, but I’m unsure if there
> would
> >>> be a huge difference.
> >>>
> >>>
>  On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:
> 
>  I'm really happy to see CEP-15 getting closer to a final
>  implementation. I'm going to walk through my reasoning for your
>  proposals wrt trying to explain this to somebody new.
> 
>  Looking at all the options, the first thing that comes up for me is
>  the Cassandra project's complicated relationship with NULL.  We have
>  prior art with EXISTS/NOT EXISTS when

Re: CEP-15 multi key transaction syntax

2022-08-22 Thread Avi Kivity via dev


Agree it's better to reuse existing syntax than invent new syntax.

On 8/21/22 16:52, Konstantin Osipov wrote:

* Avi Kivity via dev  [22/08/14 15:59]:

MySQL supports SELECT  INTO  FROM ... WHERE
...

PostgreSQL supports pretty much the same syntax.

Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
MySQL/PostgreSQL SELECT ... INTO?


On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would
explain the feature to a user.

My mental model for LET statements is that they are simply SELECT
statements where the columns that are selected become variables
accessible anywhere in the scope of the transaction. That is to say, you
should be able to run something like s/LET/SELECT and
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
and produce a valid SELECT statement, and vice versa. Both should
perform identically.

e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table


"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
supports selecting comparisons:


$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
  ?column? | ?column? | ?column?
--+--+--
  f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET
incompatible once comparisons become valid selectors. Unless they become
mandatory (and then you'd write "LET q = a = b" if you wanted to select a
comparison).


I personally prefer the nested query syntax:


     LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
immediately recognizable by everyone as a query, LET is not.



Identical form, identical behaviour. Every statement should be directly
translatable with some simple text manipulation.

We can then make this more powerful for users by simply expanding SELECT
statements, e.g. by permitting them to declare constants and tuples in
the column results. In this scheme LET x = * is simply syntactic sugar
for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
and 5 all at once, consistently alongside each other.

Option 6 is in fact very similar, but is strictly less flexible for the
user as they have no way to declare multiple scalar variables without
scoping them inside a tuple.

e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you prefer
this naming scheme you can simply write

LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we do
need a way for the user to say they care whether the row was null or the
column. I think an implicit conversion here could be surprising. However
we could implement tuple expressions anyway and let the user explicitly
declare v as a tuple as Caleb has suggested for the existing proposal as
well.

Assigning constants or other values not selected from a table would also
be a little clunky:

LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is just slightly
more verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. It might
be that Option 6 would be slightly easier, but I’m unsure if there would
be a huge difference.



On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:

I'm really happy to see CEP-15 getting closer to a final
implementation. I'm going to walk through my reasoning for your
proposals wrt trying to explain this to somebody new.

Looking at all the options, the first thing that comes up for me is
the Cassandra project's complicated relationship with NULL.  We have
prior art with EXISTS/NOT EXISTS when creating new tables. IS
NULL/IS NOT NULL is used in materialized views similarly to
proposals 2,4 and 5.

CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
   AS SELECT [ (column_list) ]
   FROM [keyspace_name.]table_name
   [ WHERE column_name IS NOT NULL
   [ AND column_name IS NOT NULL ... ] ]
   [ AND relation [ AND ... ] ]
   PRIMARY KEY ( column_list )
   [ WITH [ table_properties ]
   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

  Based on that, I believe 1 and 3 would just confuse users, so -1 on
those.

Trying to explain the difference between row and column operations
with LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds more
syntax.

#5 is verbose and, in my opinion, easier to reason when writing a
query. Thinking top down, I need to know if these exact rows and/or

Re: CEP-15 multi key transaction syntax

2022-08-21 Thread Benedict




> On 21 Aug 2022, at 14:59, Benedict  wrote:
> 
> SELECT INTO in T-SQL creates a new table with the results. Since our 
> semantics are likely to be different than Postgres and MySQL, I’m not sure 
> it’s less confusing or otherwise beneficial to mimic an existing syntax.
> 
> Personally I find the LET syntax easier to read, and where ANSI SQL isn’t 
> prescriptive it may be better to aim for a more modern look.
> 
> 
> 
>> On 21 Aug 2022, at 14:53, Konstantin Osipov  wrote:
>> 
>> * Avi Kivity via dev  [22/08/14 15:59]:
>> 
>> MySQL supports SELECT  INTO  FROM ... WHERE
>> ...
>> 
>> PostgreSQL supports pretty much the same syntax.
>> 
>> Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> MySQL/PostgreSQL SELECT ... INTO?
>> 
>>> 
> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> 
> I’ll do my best to express with my thinking, as well as how I would
> explain the feature to a user.
> 
> My mental model for LET statements is that they are simply SELECT
> statements where the columns that are selected become variables
> accessible anywhere in the scope of the transaction. That is to say, you
> should be able to run something like s/LET/SELECT and
> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> and produce a valid SELECT statement, and vice versa. Both should
> perform identically.
> 
> e.g.
> SELECT pk AS key, v AS value FROM table
> 
> =>
> LET key = pk, value = v FROM table
>>> 
>>> 
>>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>>> supports selecting comparisons:
>>> 
>>> 
>>> $ psql
>>> psql (14.3)
>>> Type "help" for help.
>>> 
>>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>> ?column? | ?column? | ?column?
>>> --+--+--
>>> f| t|
>>> (1 row)
>>> 
>>> 
>>> Using "=" as a syntactic element in LET would make SELECT and LET
>>> incompatible once comparisons become valid selectors. Unless they become
>>> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>>> comparison).
>>> 
>>> 
>>> I personally prefer the nested query syntax:
>>> 
>>> 
>>>LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>> 
>>> 
>>> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>>> immediately recognizable by everyone as a query, LET is not.
>>> 
>>> 
 
 Identical form, identical behaviour. Every statement should be directly
 translatable with some simple text manipulation.
 
 We can then make this more powerful for users by simply expanding SELECT
 statements, e.g. by permitting them to declare constants and tuples in
 the column results. In this scheme LET x = * is simply syntactic sugar
 for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
 and 5 all at once, consistently alongside each other.
 
 Option 6 is in fact very similar, but is strictly less flexible for the
 user as they have no way to declare multiple scalar variables without
 scoping them inside a tuple.
 
 e.g.
 LET key = pk, value = v FROM table
 IF key > 1 AND value > 1 THEN...
 
 =>
 LET row = SELECT pk AS key, v AS value FROM table
 IF row.key > 1 AND row.value > 1 THEN…
 
 However, both are expressible in the existing proposal, as if you prefer
 this naming scheme you can simply write
 
 LET row = (pk AS key, v AS value) FROM table
 IF row.key > 1 AND row.value > 1 THEN…
 
 With respect to auto converting single column results to a scalar, we do
 need a way for the user to say they care whether the row was null or the
 column. I think an implicit conversion here could be surprising. However
 we could implement tuple expressions anyway and let the user explicitly
 declare v as a tuple as Caleb has suggested for the existing proposal as
 well.
 
 Assigning constants or other values not selected from a table would also
 be a little clunky:
 
 LET v1 = someFunc(), v2 = someOtherFunc(?)
 IF v1 > 1 AND v2 > 1 THEN…
 
 =>
 LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
 IF row.v1 > 1 AND row.v2 > 1 THEN...
 
 That said, the proposals are /close/ to identical, it is just slightly
 more verbose and slightly less flexible.
 
 Which one would be most intuitive to users is hard to predict. It might
 be that Option 6 would be slightly easier, but I’m unsure if there would
 be a huge difference.
 
 
> On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:
> 
> I'm really happy to see CEP-15 getting closer to a final
> implementation. I'm going to walk through my reasoning for your
> proposals wrt trying to explain this to somebody new.
> 
> Looking at all the options, the first thing that comes up for me is
> the Cassandra project's complicated relationship with NULL.

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Caleb Rackliffe

Just updated the Jira
 to reflect the
latest conversation here.

On Mon, Aug 15, 2022 at 1:06 PM Patrick McFadin  wrote:

> I am +1 on
>
> IS NOT NULL/IS NULL instead of EXISTS/NOT EXISTS
>
> Not requiring (but allowing) SELECT on LET
>
> Patrick
>
> On Mon, Aug 15, 2022 at 11:01 AM Caleb Rackliffe 
> wrote:
>
>> Monday Morning Caleb has digested, and here's where I am...
>>
>> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment,
>> and to be honest, this may make some implementation things easier for me
>> (i.e. the encapsulation of SELECT within LET)
>> 2.) I'm in favor of LET without a select, although I have no strong
>> feeling that it needs to be in v1.
>> 3.) I like Benedict's tuple deconstruction idea, as it restores some of
>> the notational convenience of the previous proposal. Again, though, I don't
>> have a strong feeling this needs to be in v1.
>> 3.b.) When we do implement tuple deconstruction, I'd be in favor of
>> supporting a single level of deconstruction to begin with.
>>
>> Having said all that, on Friday I finished a prototype (based on some of
>> Blake's previous work) of the syntax/grammar we've more or less agreed upon
>> here, including an implementation of what I described as option #5 above:
>> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
>>
>> To look at specific examples, see these tests:
>> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
>>
>> There are only two things that aren't yet congruent w/ our discussion
>> above, but they should both be trivial to fix:
>>
>> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
>> 2.) I don't require SELECT on the RHS of LET yet.
>>
>> If I were to just fix those two items, would we be in agreement on this
>> being both the core of the syntax we want and compatible w/ the wish list
>> for future items?
>>
>>
>> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> 
>>> 
>>>
>>> Verbose version:
>>> LET (a) = SELECT val FROM table
>>> IF a > 1 THEN...
>>>
>>> Less verbose version:
>>> LET a = SELECT val FROM table
>>> IF a.val > 1 THEN...
>>>
>>>
>>>
>>> My intention is that these are actually two different ways of expressing
>>> the same thing, both supported and neither intended to be more or less
>>> verbose than the other. The advantage of permitting both is that you can
>>> also write
>>>
>>> LET a = SELECT val FROM table
>>> IF a IS NOT NULL AND a.val IS NULL THEN …
>>>
>>> Alternatively, for non-queries:
>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>> or less verbose:
>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>>
>>>
>>> I personally prefer clarity over any arbitrary verbosity/succinct
>>> distinction, but we’re in general “taste” territory here. Since this syntax
>>> includes the SELECT on the RHS, it makes sense to only require this for
>>> situations where a query is being performed. Though I think if SELECT
>>> without a FROM is supported then we will likely end up supporting *all
>>> of the above*.
>>>
>>> Weighing in on the "SELECT without a FROM," I think that is fine and, as
>>> Avi stated
>>>
>>>
>>> Yep, definitely fine. Question is just whether we bother to offer it.
>>> Also, evidently, whether we support LET *without* a SELECT on the RHS.
>>> I am strongly in favour of this, as *requiring* a SELECT even when
>>> there’s no table involved is counter-intuitive to me, as LET is now a
>>> distinct concept that looks like variable declaration in other languages.
>>>
>>> Nested:
>>> LET (x, y) = SELECT x, y FROM…
>>>
>>>
>>> Deconstruction here refers to the above, i.e. extracting variables x and
>>> y from the tuple on the RHS
>>>
>>> Nesting is just a question of whether we support either nested tuple
>>> declarations, or nested deconstruction, which might include any of the
>>> following:
>>>
>>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>>>
>>> IMO, once you start supporting features they need to be sort of
>>> intuitively discoverable by users, so that a concept can be used in all
>>> places you might expect.
>>>
>>> But I would be fine with an arbitrary restriction of at most one SELECT
>>> on the RHS, or even ONLY a SELECT *or* some other tuple, and at most
>>> one level of deconstruction of the RHS.
>>>
>>>
>>>
>>>
>>>
>>> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
>>>
>>> Let me just state my bias right up front. For any kind of QL I lean
>>> heavily toward verbose and explicit based on their lifecycle. A CQL query
>>> will probably need to be

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Caleb Rackliffe

Ha, apologies to Avi ;)

On Mon, Aug 15, 2022 at 2:01 PM Benedict Elliott Smith 
wrote:

> 
>
> I like Benedict's tuple deconstruction idea
>
>
> For posterity, this was Avi’s idea!
>
> On 15 Aug 2022, at 18:59, Caleb Rackliffe 
> wrote:
>
> Monday Morning Caleb has digested, and here's where I am...
>
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and
> to be honest, this may make some implementation things easier for me (i.e.
> the encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong
> feeling that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of
> the notational convenience of the previous proposal. Again, though, I don't
> have a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of
> supporting a single level of deconstruction to begin with.
>
> Having said all that, on Friday I finished a prototype (based on some of
> Blake's previous work) of the syntax/grammar we've more or less agreed upon
> here, including an implementation of what I described as option #5 above:
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
>
> To look at specific examples, see these tests:
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
>
> There are only two things that aren't yet congruent w/ our discussion
> above, but they should both be trivial to fix:
>
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
>
> If I were to just fix those two items, would we be in agreement on this
> being both the core of the syntax we want and compatible w/ the wish list
> for future items?
>
>
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> 
>> 
>>
>> Verbose version:
>> LET (a) = SELECT val FROM table
>> IF a > 1 THEN...
>>
>> Less verbose version:
>> LET a = SELECT val FROM table
>> IF a.val > 1 THEN...
>>
>>
>>
>> My intention is that these are actually two different ways of expressing
>> the same thing, both supported and neither intended to be more or less
>> verbose than the other. The advantage of permitting both is that you can
>> also write
>>
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>>
>> Alternatively, for non-queries:
>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>> or less verbose:
>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>
>>
>> I personally prefer clarity over any arbitrary verbosity/succinct
>> distinction, but we’re in general “taste” territory here. Since this syntax
>> includes the SELECT on the RHS, it makes sense to only require this for
>> situations where a query is being performed. Though I think if SELECT
>> without a FROM is supported then we will likely end up supporting *all
>> of the above*.
>>
>> Weighing in on the "SELECT without a FROM," I think that is fine and, as
>> Avi stated
>>
>>
>> Yep, definitely fine. Question is just whether we bother to offer it.
>> Also, evidently, whether we support LET *without* a SELECT on the RHS. I
>> am strongly in favour of this, as *requiring* a SELECT even when there’s
>> no table involved is counter-intuitive to me, as LET is now a distinct
>> concept that looks like variable declaration in other languages.
>>
>> Nested:
>> LET (x, y) = SELECT x, y FROM…
>>
>>
>> Deconstruction here refers to the above, i.e. extracting variables x and
>> y from the tuple on the RHS
>>
>> Nesting is just a question of whether we support either nested tuple
>> declarations, or nested deconstruction, which might include any of the
>> following:
>>
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>>
>> IMO, once you start supporting features they need to be sort of
>> intuitively discoverable by users, so that a concept can be used in all
>> places you might expect.
>>
>> But I would be fine with an arbitrary restriction of at most one SELECT
>> on the RHS, or even ONLY a SELECT *or* some other tuple, and at most one
>> level of deconstruction of the RHS.
>>
>>
>>
>>
>>
>> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
>>
>> Let me just state my bias right up front. For any kind of QL I lean
>> heavily toward verbose and explicit based on their lifecycle. A CQL query
>> will probably need to be understood by the next person looking at it, and a
>> few seconds saved typing isn't worth the potential misunderstanding later.
>> My opinion is formed by having to be the second person many times.  :D
>>
>> I just want to make sure I have the

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Benedict Elliott Smith


> I like Benedict's tuple deconstruction idea

For posterity, this was Avi’s idea!

> On 15 Aug 2022, at 18:59, Caleb Rackliffe  wrote:
> 
> Monday Morning Caleb has digested, and here's where I am...
> 
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and to 
> be honest, this may make some implementation things easier for me (i.e. the 
> encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong feeling 
> that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of the 
> notational convenience of the previous proposal. Again, though, I don't have 
> a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of 
> supporting a single level of deconstruction to begin with.
> 
> Having said all that, on Friday I finished a prototype (based on some of 
> Blake's previous work) of the syntax/grammar we've more or less agreed upon 
> here, including an implementation of what I described as option #5 above: 
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
> 
> To look at specific examples, see these tests: 
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
> 
> There are only two things that aren't yet congruent w/ our discussion above, 
> but they should both be trivial to fix:
> 
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
> 
> If I were to just fix those two items, would we be in agreement on this being 
> both the core of the syntax we want and compatible w/ the wish list for 
> future items?
> 
> 
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith  
> wrote:
>> 
>> 
>>> 
>>> Verbose version:
>>> LET (a) = SELECT val FROM table
>>> IF a > 1 THEN...
>>> 
>>> Less verbose version:
>>> LET a = SELECT val FROM table
>>> IF a.val > 1 THEN...
>> 
>> 
>> My intention is that these are actually two different ways of expressing the 
>> same thing, both supported and neither intended to be more or less verbose 
>> than the other. The advantage of permitting both is that you can also write
>> 
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>> 
>>> Alternatively, for non-queries:
>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>> or less verbose:
>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>> 
>> I personally prefer clarity over any arbitrary verbosity/succinct 
>> distinction, but we’re in general “taste” territory here. Since this syntax 
>> includes the SELECT on the RHS, it makes sense to only require this for 
>> situations where a query is being performed. Though I think if SELECT 
>> without a FROM is supported then we will likely end up supporting all of the 
>> above.
>> 
>>> Weighing in on the "SELECT without a FROM," I think that is fine and, as 
>>> Avi stated
>> 
>> Yep, definitely fine. Question is just whether we bother to offer it. Also, 
>> evidently, whether we support LET without a SELECT on the RHS. I am strongly 
>> in favour of this, as requiring a SELECT even when there’s no table involved 
>> is counter-intuitive to me, as LET is now a distinct concept that looks like 
>> variable declaration in other languages.
>> 
>>> Nested:
>>> LET (x, y) = SELECT x, y FROM…
>> 
>> Deconstruction here refers to the above, i.e. extracting variables x and y 
>> from the tuple on the RHS
>> 
>> Nesting is just a question of whether we support either nested tuple 
>> declarations, or nested deconstruction, which might include any of the 
>> following:
>> 
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>> 
>> IMO, once you start supporting features they need to be sort of intuitively 
>> discoverable by users, so that a concept can be used in all places you might 
>> expect.
>> 
>> But I would be fine with an arbitrary restriction of at most one SELECT on 
>> the RHS, or even ONLY a SELECT or some other tuple, and at most one level of 
>> deconstruction of the RHS.
>> 
>> 
>> 
>> 
>> 
>>> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
>>> 
>>> Let me just state my bias right up front. For any kind of QL I lean heavily 
>>> toward verbose and explicit based on their lifecycle. A CQL query will 
>>> probably need to be understood by the next person looking at it, and a few 
>>> seconds saved typing isn't worth the potential misunderstanding later.  My 
>>> opinion is formed by having to be the second person many times.  :D 
>>> 
>>> I just want to make sure I have the syntax you are proposing. 
>>> 
>>> Verbose version:
>>>

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Patrick McFadin

I am +1 on

IS NOT NULL/IS NULL instead of EXISTS/NOT EXISTS

Not requiring (but allowing) SELECT on LET

Patrick

On Mon, Aug 15, 2022 at 11:01 AM Caleb Rackliffe 
wrote:

> Monday Morning Caleb has digested, and here's where I am...
>
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and
> to be honest, this may make some implementation things easier for me (i.e.
> the encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong
> feeling that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of
> the notational convenience of the previous proposal. Again, though, I don't
> have a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of
> supporting a single level of deconstruction to begin with.
>
> Having said all that, on Friday I finished a prototype (based on some of
> Blake's previous work) of the syntax/grammar we've more or less agreed upon
> here, including an implementation of what I described as option #5 above:
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
>
> To look at specific examples, see these tests:
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
>
> There are only two things that aren't yet congruent w/ our discussion
> above, but they should both be trivial to fix:
>
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
>
> If I were to just fix those two items, would we be in agreement on this
> being both the core of the syntax we want and compatible w/ the wish list
> for future items?
>
>
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> 
>> 
>>
>> Verbose version:
>> LET (a) = SELECT val FROM table
>> IF a > 1 THEN...
>>
>> Less verbose version:
>> LET a = SELECT val FROM table
>> IF a.val > 1 THEN...
>>
>>
>>
>> My intention is that these are actually two different ways of expressing
>> the same thing, both supported and neither intended to be more or less
>> verbose than the other. The advantage of permitting both is that you can
>> also write
>>
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>>
>> Alternatively, for non-queries:
>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>> or less verbose:
>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>
>>
>> I personally prefer clarity over any arbitrary verbosity/succinct
>> distinction, but we’re in general “taste” territory here. Since this syntax
>> includes the SELECT on the RHS, it makes sense to only require this for
>> situations where a query is being performed. Though I think if SELECT
>> without a FROM is supported then we will likely end up supporting *all
>> of the above*.
>>
>> Weighing in on the "SELECT without a FROM," I think that is fine and, as
>> Avi stated
>>
>>
>> Yep, definitely fine. Question is just whether we bother to offer it.
>> Also, evidently, whether we support LET *without* a SELECT on the RHS. I
>> am strongly in favour of this, as *requiring* a SELECT even when there’s
>> no table involved is counter-intuitive to me, as LET is now a distinct
>> concept that looks like variable declaration in other languages.
>>
>> Nested:
>> LET (x, y) = SELECT x, y FROM…
>>
>>
>> Deconstruction here refers to the above, i.e. extracting variables x and
>> y from the tuple on the RHS
>>
>> Nesting is just a question of whether we support either nested tuple
>> declarations, or nested deconstruction, which might include any of the
>> following:
>>
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>>
>> IMO, once you start supporting features they need to be sort of
>> intuitively discoverable by users, so that a concept can be used in all
>> places you might expect.
>>
>> But I would be fine with an arbitrary restriction of at most one SELECT
>> on the RHS, or even ONLY a SELECT *or* some other tuple, and at most one
>> level of deconstruction of the RHS.
>>
>>
>>
>>
>>
>> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
>>
>> Let me just state my bias right up front. For any kind of QL I lean
>> heavily toward verbose and explicit based on their lifecycle. A CQL query
>> will probably need to be understood by the next person looking at it, and a
>> few seconds saved typing isn't worth the potential misunderstanding later.
>> My opinion is formed by having to be the second person many times.  :D
>>
>> I just want to make sure I have the syntax you are proposing.
>>
>> Verbose version:
>> LET (a) = SELECT val

Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Caleb Rackliffe

Monday Morning Caleb has digested, and here's where I am...

1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and
to be honest, this may make some implementation things easier for me (i.e.
the encapsulation of SELECT within LET)
2.) I'm in favor of LET without a select, although I have no strong feeling
that it needs to be in v1.
3.) I like Benedict's tuple deconstruction idea, as it restores some of the
notational convenience of the previous proposal. Again, though, I don't
have a strong feeling this needs to be in v1.
3.b.) When we do implement tuple deconstruction, I'd be in favor of
supporting a single level of deconstruction to begin with.

Having said all that, on Friday I finished a prototype (based on some of
Blake's previous work) of the syntax/grammar we've more or less agreed upon
here, including an implementation of what I described as option #5 above:
https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype

To look at specific examples, see these tests:
https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java

There are only two things that aren't yet congruent w/ our discussion
above, but they should both be trivial to fix:

1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
2.) I don't require SELECT on the RHS of LET yet.

If I were to just fix those two items, would we be in agreement on this
being both the core of the syntax we want and compatible w/ the wish list
for future items?


On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith 
wrote:

> 
> 
>
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
>
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
>
>
>
> My intention is that these are actually two different ways of expressing
> the same thing, both supported and neither intended to be more or less
> verbose than the other. The advantage of permitting both is that you can
> also write
>
> LET a = SELECT val FROM table
> IF a IS NOT NULL AND a.val IS NULL THEN …
>
> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
>
>
> I personally prefer clarity over any arbitrary verbosity/succinct
> distinction, but we’re in general “taste” territory here. Since this syntax
> includes the SELECT on the RHS, it makes sense to only require this for
> situations where a query is being performed. Though I think if SELECT
> without a FROM is supported then we will likely end up supporting *all of
> the above*.
>
> Weighing in on the "SELECT without a FROM," I think that is fine and, as
> Avi stated
>
>
> Yep, definitely fine. Question is just whether we bother to offer it.
> Also, evidently, whether we support LET *without* a SELECT on the RHS. I
> am strongly in favour of this, as *requiring* a SELECT even when there’s
> no table involved is counter-intuitive to me, as LET is now a distinct
> concept that looks like variable declaration in other languages.
>
> Nested:
> LET (x, y) = SELECT x, y FROM…
>
>
> Deconstruction here refers to the above, i.e. extracting variables x and y
> from the tuple on the RHS
>
> Nesting is just a question of whether we support either nested tuple
> declarations, or nested deconstruction, which might include any of the
> following:
>
> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
> LET (x, (y, z)) = SELECT x, someTuple FROM…
> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>
> IMO, once you start supporting features they need to be sort of
> intuitively discoverable by users, so that a concept can be used in all
> places you might expect.
>
> But I would be fine with an arbitrary restriction of at most one SELECT on
> the RHS, or even ONLY a SELECT *or* some other tuple, and at most one
> level of deconstruction of the RHS.
>
>
>
>
>
> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
>
> Let me just state my bias right up front. For any kind of QL I lean
> heavily toward verbose and explicit based on their lifecycle. A CQL query
> will probably need to be understood by the next person looking at it, and a
> few seconds saved typing isn't worth the potential misunderstanding later.
> My opinion is formed by having to be the second person many times.  :D
>
> I just want to make sure I have the syntax you are proposing.
>
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
>
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
>
> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
>
> Weighing in on

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith



> 
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
> 
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...


My intention is that these are actually two different ways of expressing the 
same thing, both supported and neither intended to be more or less verbose than 
the other. The advantage of permitting both is that you can also write

LET a = SELECT val FROM table
IF a IS NOT NULL AND a.val IS NULL THEN …

> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())

I personally prefer clarity over any arbitrary verbosity/succinct distinction, 
but we’re in general “taste” territory here. Since this syntax includes the 
SELECT on the RHS, it makes sense to only require this for situations where a 
query is being performed. Though I think if SELECT without a FROM is supported 
then we will likely end up supporting all of the above.

> Weighing in on the "SELECT without a FROM," I think that is fine and, as Avi 
> stated

Yep, definitely fine. Question is just whether we bother to offer it. Also, 
evidently, whether we support LET without a SELECT on the RHS. I am strongly in 
favour of this, as requiring a SELECT even when there’s no table involved is 
counter-intuitive to me, as LET is now a distinct concept that looks like 
variable declaration in other languages.

> Nested:
> LET (x, y) = SELECT x, y FROM…

Deconstruction here refers to the above, i.e. extracting variables x and y from 
the tuple on the RHS

Nesting is just a question of whether we support either nested tuple 
declarations, or nested deconstruction, which might include any of the 
following:

LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
LET (x, (y, z)) = SELECT x, someTuple FROM…
LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)

IMO, once you start supporting features they need to be sort of intuitively 
discoverable by users, so that a concept can be used in all places you might 
expect.

But I would be fine with an arbitrary restriction of at most one SELECT on the 
RHS, or even ONLY a SELECT or some other tuple, and at most one level of 
deconstruction of the RHS.





> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
> 
> Let me just state my bias right up front. For any kind of QL I lean heavily 
> toward verbose and explicit based on their lifecycle. A CQL query will 
> probably need to be understood by the next person looking at it, and a few 
> seconds saved typing isn't worth the potential misunderstanding later.  My 
> opinion is formed by having to be the second person many times.  :D 
> 
> I just want to make sure I have the syntax you are proposing. 
> 
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
> 
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
> 
> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
> 
> Weighing in on the "SELECT without a FROM," I think that is fine and, as Avi 
> stated, already present in the SQL world. I would prefer that over 'SELECT  
> func() FROM dual;' (Looking at you, Oracle)
> 
> Finally, on the topic of deconstructing SELECT statements instead of nesting. 
> If I understand the argument here, I would favor deconstructing over nesting 
> if there is a choice. I think this is what that choice would look like.
> 
> Deconstructed:
> LET x = SELECT x FROM ...
> LET y = SELECT y FROM ...
> 
> Nested:
> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
> 
> I'm trying to summate but let me know if I missed something. I apologize in 
> advance to Monday morning Caleb, who will have to digest this thread. 
> 
> Patrick
> 
> On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith  
> wrote:
>> 
>>> 
>>> I think SQL dialects require subqueries to be parenthesized (not sure). If 
>>> that's the case I think we should keep the tradition.
>>> 
>> 
>> This isn’t a sub-query though, since LET is not a query. If we permit at 
>> most one SELECT, and do not permit mixing SELECT with constant assignments, 
>> I don’t see why we would require parentheses.
>> 
>>> I see no harm in making FROM optional, as it's recognized by other SQL 
>>> dialects.
>>> 
>>> Absolutely, this just flows naturally from having tuples. There's no 
>>> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.
>> 
>> Neither of these things are supported today, and they’re no longer necessary 
>> with this syntax proposal. The downside of splitting SELECT and LET is that 
>> there’s no impetus to improve the former. So the question was really whether 
>> we bother to improve it anyway, not whether or not they

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Patrick McFadin

Let me just state my bias right up front. For any kind of QL I lean heavily
toward verbose and explicit based on their lifecycle. A CQL query will
probably need to be understood by the next person looking at it, and a few
seconds saved typing isn't worth the potential misunderstanding later.  My
opinion is formed by having to be the second person many times.  :D

I just want to make sure I have the syntax you are proposing.

Verbose version:
LET (a) = SELECT val FROM table
IF a > 1 THEN...

Less verbose version:
LET a = SELECT val FROM table
IF a.val > 1 THEN...

Alternatively, for non-queries:
LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
or less verbose:
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())

Weighing in on the "SELECT without a FROM," I think that is fine and, as
Avi stated, already present in the SQL world. I would prefer that over
'SELECT  func() FROM dual;' (Looking at you, Oracle)

Finally, on the topic of deconstructing SELECT statements instead of
nesting. If I understand the argument here, I would favor deconstructing
over nesting if there is a choice. I think this is what that choice would
look like.

Deconstructed:
LET x = SELECT x FROM ...
LET y = SELECT y FROM ...

Nested:
LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))

I'm trying to summate but let me know if I missed something. I apologize in
advance to Monday morning Caleb, who will have to digest this thread.

Patrick

On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith 
wrote:

> 
>
> I think SQL dialects require subqueries to be parenthesized (not sure). If
> that's the case I think we should keep the tradition.
>
>
> This isn’t a sub-query though, since LET is not a query. If we permit at
> most one SELECT, and do not permit mixing SELECT with constant assignments,
> I don’t see why we would require parentheses.
>
> I see no harm in making FROM optional, as it's recognized by other SQL
> dialects.
> Absolutely, this just flows naturally from having tuples. There's no
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.
>
>
> Neither of these things are supported today, and they’re no longer
> necessary with this syntax proposal. The downside of splitting SELECT and
> LET is that there’s no impetus to improve the former. So the question was
> really whether we bother to improve it anyway, not whether or not they
> would be good improvements (I think they obviously are).
>
> I think this can be safely deferred. Most people would again separate it
> into separate LETs.
>
> That implies we’ll permit deconstructing a tuple variable in a LET. This
> makes sense to me, but is roughly equivalent to nested deconstruction. It
> might be that v1 we only support deconstructing SELECT statements, but I
> guess all of this is probably up to the implementor.
>
> I'd add (to the specification) that LETs cannot override a previously
> defined variable, just to reduce ambiguity.
>
> Yep, this was already agreed way back with the earlier proposal.
>
>
> On 14 Aug 2022, at 16:30, Avi Kivity  wrote:
>
>
> On 14/08/2022 17.50, Benedict Elliott Smith wrote:
>
> 
> > SELECT and LET incompatible once comparisons become valid selectors
>
> I don’t think this would be ambiguous, as = is required in the LET syntax
> as we have to bind the result to a variable name.
>
> But, I like the deconstructed tuple syntax improvement over “Option 6”.
> This would also seem to easily support assigning from non-query statements,
> such as LET (a, b) = (someFunc(), someOtherFunc(?))
>
> I don’t think it is ideal to depend on relative position in the tuple for
> assigning results to a variable name, as it leaves more scope for errors.
> It would be nice to have a simple way to deconstruct safely. But, I think
> this proposal is good, and I’d be fine with it as an alternative if others
> concur. I agree that seeing the SELECT independently may be more easily
> recognisable to users.
>
> With this approach there remains the question of how we handle single
> column results. I’d be inclined to treat in the following way:
>
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
>
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
>
>
> I think SQL dialects require subqueries to be parenthesized (not sure). If
> that's the case I think we should keep the tradition.
>
>
> 
> There is also the question of whether we support SELECT without a FROM
> clause, e.g.
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>
> Or just LET (since they are no longer equivalent)
> e.g.
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
>
>
> I see no harm in making FROM optional, as it's recognized by other SQL
> dialects.
>
>
> 
> Also since LET is only binding variables, is there any reason we shouldn’t
> support multiple SELECT assignments in a single LET?, e.g.
> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>
>
> What if an inner select

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith


> 
> I think SQL dialects require subqueries to be parenthesized (not sure). If 
> that's the case I think we should keep the tradition.
> 

This isn’t a sub-query though, since LET is not a query. If we permit at most 
one SELECT, and do not permit mixing SELECT with constant assignments, I don’t 
see why we would require parentheses.

> I see no harm in making FROM optional, as it's recognized by other SQL 
> dialects.
> 
> Absolutely, this just flows naturally from having tuples. There's no 
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.

Neither of these things are supported today, and they’re no longer necessary 
with this syntax proposal. The downside of splitting SELECT and LET is that 
there’s no impetus to improve the former. So the question was really whether we 
bother to improve it anyway, not whether or not they would be good improvements 
(I think they obviously are).

> I think this can be safely deferred. Most people would again separate it into 
> separate LETs.
> 
That implies we’ll permit deconstructing a tuple variable in a LET. This makes 
sense to me, but is roughly equivalent to nested deconstruction. It might be 
that v1 we only support deconstructing SELECT statements, but I guess all of 
this is probably up to the implementor.
> I'd add (to the specification) that LETs cannot override a previously defined 
> variable, just to reduce ambiguity.
> 

Yep, this was already agreed way back with the earlier proposal.


> On 14 Aug 2022, at 16:30, Avi Kivity  wrote:
> 
> 
> 
> On 14/08/2022 17.50, Benedict Elliott Smith wrote:
>> 
>> > SELECT and LET incompatible once comparisons become valid selectors
>> 
>> I don’t think this would be ambiguous, as = is required in the LET syntax as 
>> we have to bind the result to a variable name.
>> 
>> But, I like the deconstructed tuple syntax improvement over   
>> “Option 6”. This would also seem to easily support assigning from non-query 
>> statements, such as LET (a, b) = (someFunc(), someOtherFunc(?))
>> 
>> I don’t think it is ideal to depend on relative position in the tuple for 
>> assigning results to a variable name, as it leaves more scope for errors. It 
>> would be nice to have a simple way to deconstruct safely. But, I think this 
>> proposal is good, and I’d be fine with it as an alternative if others 
>> concur. I agree that seeing the SELECT independently may be more easily 
>> recognisable to users.
>> 
>> With this approach there remains the question of how we handle single column 
>> results. I’d be inclined to treat in the following way:
>> 
>> LET (a) = SELECT val FROM table
>> IF a > 1 THEN...
>> 
>> LET a = SELECT val FROM table
>> IF a.val > 1 THEN...
>> 
> 
> I think SQL dialects require subqueries to be parenthesized (not sure). If 
> that's the case I think we should keep the tradition.
> 
> 
> 
>> 
>> There is also the question of whether we support SELECT without a FROM 
>> clause, e.g.
>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>> 
>> Or just LET (since they are no longer equivalent)
>> e.g.
>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>> LET (v1, v2) = (someFunc(), someOtherFunc())
>> 
> 
> I see no harm in making FROM optional, as it's recognized by other SQL 
> dialects.
> 
> 
> 
>> 
>> Also since LET is only binding variables, is there any reason we shouldn’t 
>> support multiple SELECT assignments in a single LET?, e.g.
>> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>> 
> 
> What if an inner select returns a tuple? Would y be a tuple?
> 
> 
> 
> I think this is redundant and atypical enough to not be worth   
> supporting. Most people would use separate LETs.
> 
> 
> 
>> 
>> Also whether we support tuples in SELECT statements anyway, e.g.
>> LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
>> IF tuple1.a > 1 AND tuple2.d > 1…
> 
> Absolutely, this just flows naturally from having tuples. There's no 
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".
> 
> 
> 
>> 
>> 
>> and whether we support nested deconstruction, e.g.
>> LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
>> IF a > 1 AND d > 1…
>> 
> 
> I think this can be safely deferred. Most people would again separate it into 
> separate LETs.
> 
> 
> 
> I'd add (to the specification) that LETs cannot override a previously defined 
> variable, just to reduce ambiguity.
> 
> 
> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 14 Aug 2022, at 13:55, Avi Kivity via dev  
>>> wrote:
>>> 
>>> 
>>> 
>>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
 
 I’ll do my best to express with my thinking, as well as how I would 
 explain the feature to a user.
 
 My mental model for LET statements is that they are simply SELECT 
 statements where the columns that are selected become variables accessible 
 anywhere in the scope of the transaction. That is to say, you should be 
 able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Avi Kivity via dev



On 14/08/2022 17.50, Benedict Elliott Smith wrote:


> SELECT and LET incompatible once comparisons become valid selectors

I don’t think this would be ambiguous, as = is required in the LET 
syntax as we have to bind the result to a variable name.


But, I like the deconstructed tuple syntax improvement over “Option 
6”. This would also seem to easily support assigning from non-query 
statements, such as LET (a, b) = (someFunc(), someOtherFunc(?))


I don’t think it is ideal to depend on relative position in the tuple 
for assigning results to a variable name, as it leaves more scope for 
errors. It would be nice to have a simple way to deconstruct safely. 
But, I think this proposal is good, and I’d be fine with it as an 
alternative if others concur. I agree that seeing the SELECT 
independently may be more easily recognisable to users.


With this approach there remains the question of how we handle single 
column results. I’d be inclined to treat in the following way:


LET (a) = SELECT val FROM table
IF a > 1 THEN...

LET a = SELECT val FROM table
IF a.val > 1 THEN...



I think SQL dialects require subqueries to be parenthesized (not sure). 
If that's the case I think we should keep the tradition.





There is also the question of whether we support SELECT without a FROM 
clause, e.g.

LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2

Or just LET (since they are no longer equivalent)
e.g.
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())



I see no harm in making FROM optional, as it's recognized by other SQL 
dialects.





Also since LET is only binding variables, is there any reason we 
shouldn’t support multiple SELECT assignments in a single LET?, e.g.

LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))



What if an inner select returns a tuple? Would y be a tuple?


I think this is redundant and atypical enough to not be worth 
supporting. Most people would use separate LETs.





Also whether we support tuples in SELECT statements anyway, e.g.
LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
IF tuple1.a > 1 AND tuple2.d > 1…



Absolutely, this just flows naturally from having tuples. There's no 
difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".






and whether we support nested deconstruction, e.g.
LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
IF a > 1 AND d > 1…



I think this can be safely deferred. Most people would again separate it 
into separate LETs.



I'd add (to the specification) that LETs cannot override a previously 
defined variable, just to reduce ambiguity.










On 14 Aug 2022, at 13:55, Avi Kivity via dev 
 wrote:



On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would 
explain the feature to a user.


My mental model for LET statements is that they are simply SELECT 
statements where the columns that are selected become variables 
accessible anywhere in the scope of the transaction. That is to say, 
you should be able to run something like s/LET/SELECT and 
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET 
statement and produce a valid SELECT statement, and vice versa. Both 
should perform identically.


e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table



"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL 
supports selecting comparisons:



$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 ?column? | ?column? | ?column?
--+--+--
 f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET 
incompatible once comparisons become valid selectors. Unless they 
become mandatory (and then you'd write "LET q = a = b" if you wanted 
to select a comparison).



I personally prefer the nested query syntax:


    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT 
is immediately recognizable by everyone as a query, LET is not.





Identical form, identical behaviour. Every statement should be 
directly translatable with some simple text manipulation.


We can then make this more powerful for users by simply expanding 
SELECT statements, e.g. by permitting them to declare constants and 
tuples in the column results. In this scheme LET x = * is simply 
syntactic sugar for LET x = (pk, ck, field1, …) This scheme then 
supports options 2, 4 and 5 all at once, consistently alongside each 
other.


Option 6 is in fact very similar, but is strictly less flexible for 
the user as they have no way to declare multiple scalar variables 
without scoping them inside a tuple.


e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith


> SELECT and LET incompatible once comparisons become valid selectors

I don’t think this would be ambiguous, as = is required in the LET syntax as we 
have to bind the result to a variable name.

But, I like the deconstructed tuple syntax improvement over “Option 6”. This 
would also seem to easily support assigning from non-query statements, such as 
LET (a, b) = (someFunc(), someOtherFunc(?))

I don’t think it is ideal to depend on relative position in the tuple for 
assigning results to a variable name, as it leaves more scope for errors. It 
would be nice to have a simple way to deconstruct safely. But, I think this 
proposal is good, and I’d be fine with it as an alternative if others concur. I 
agree that seeing the SELECT independently may be more easily recognisable to 
users.

With this approach there remains the question of how we handle single column 
results. I’d be inclined to treat in the following way:

LET (a) = SELECT val FROM table
IF a > 1 THEN...

LET a = SELECT val FROM table
IF a.val > 1 THEN...


There is also the question of whether we support SELECT without a FROM clause, 
e.g.
LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2

Or just LET (since they are no longer equivalent)
e.g.
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())


Also since LET is only binding variables, is there any reason we shouldn’t 
support multiple SELECT assignments in a single LET?, e.g.
LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))


Also whether we support tuples in SELECT statements anyway, e.g.
LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
IF tuple1.a > 1 AND tuple2.d > 1…


and whether we support nested deconstruction, e.g.
LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
IF a > 1 AND d > 1…







> On 14 Aug 2022, at 13:55, Avi Kivity via dev  wrote:
> 
> 
> 
> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> 
>> I’ll do my best to express with my thinking, as well as how I would explain 
>> the feature to a user.
>> 
>> My mental model for LET statements is that they are simply SELECT statements 
>> where the columns that are selected become variables accessible anywhere in 
>> the scope of the transaction. That is to say, you should be able to run 
>> something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the 
>> columns of a LET statement and produce a valid SELECT statement, and vice 
>> versa. Both should perform identically.
>> 
>> e.g. 
>> SELECT pk AS key, v AS value FROM table 
>> 
>> => 
>> LET key = pk, value = v FROM table
> 
> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL supports 
> selecting comparisons:
> 
> 
> 
> $ psql
> psql (14.3)
> Type "help" for help.
> 
> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>  ?column? | ?column? | ?column? 
> --+--+--
>  f| t| 
> (1 row)
> 
> 
> 
> Using "=" as a syntactic element in LET would make SELECT and LET 
> incompatible once comparisons become valid selectors. Unless they become 
> mandatory (and then you'd write "LET q = a = b" if you wanted to select a 
> comparison).
> 
> 
> 
> I personally prefer the nested query syntax:
> 
> 
> 
> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> 
> 
> 
> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is 
> immediately recognizable by everyone as a query, LET is not.
> 
> 
> 
>> 
>> Identical form, identical behaviour. Every statement should be directly 
>> translatable with some simple text manipulation.
>> 
>> We can then make this more powerful for users by simply expanding SELECT 
>> statements, e.g. by permitting them to declare constants and tuples in the 
>> column results. In this scheme LET x = * is simply syntactic sugar for LET x 
>> = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at 
>> once, consistently alongside each other.
>> 
>> Option 6 is in fact very similar, but is strictly less flexible for the user 
>> as they have no way to declare multiple scalar variables without scoping 
>> them inside a tuple.
>> 
>> e.g.
>> LET key = pk, value = v FROM table
>> IF key > 1 AND value > 1 THEN...
>> 
>> =>
>> LET row = SELECT pk AS key, v AS value FROM table
>> IF row.key > 1 AND row.value > 1 THEN…
>> 
>> However, both are expressible in the existing proposal, as if you prefer 
>> this naming scheme you can simply write
>> 
>> LET row = (pk AS key, v AS value) FROM table
>> IF row.key > 1 AND row.value > 1 THEN…
>> 
>> With respect to auto converting single column results to a scalar, we do 
>> need a way for the user to say they care whether the row was null or the 
>> column. I think an implicit conversion here could be surprising. However we 
>> could implement tuple expressions anyway and let the user explicitly declare 
>> v as a tuple as Caleb has suggested for the existing proposal as well.
>> 
>> Assigning constants or other values not selected from a table would also be 
>> a

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Avi Kivity via dev



On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would 
explain the feature to a user.


My mental model for LET statements is that they are simply SELECT 
statements where the columns that are selected become variables 
accessible anywhere in the scope of the transaction. That is to say, 
you should be able to run something like s/LET/SELECT and 
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement 
and produce a valid SELECT statement, and vice versa. Both should 
perform identically.


e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table



"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL 
supports selecting comparisons:



$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 ?column? | ?column? | ?column?
--+--+--
 f    | t    |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET 
incompatible once comparisons become valid selectors. Unless they become 
mandatory (and then you'd write "LET q = a = b" if you wanted to select 
a comparison).



I personally prefer the nested query syntax:


    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is 
immediately recognizable by everyone as a query, LET is not.





Identical form, identical behaviour. Every statement should be 
directly translatable with some simple text manipulation.


We can then make this more powerful for users by simply expanding 
SELECT statements, e.g. by permitting them to declare constants and 
tuples in the column results. In this scheme LET x = * is simply 
syntactic sugar for LET x = (pk, ck, field1, …) This scheme then 
supports options 2, 4 and 5 all at once, consistently alongside each 
other.


Option 6 is in fact very similar, but is strictly less flexible for 
the user as they have no way to declare multiple scalar variables 
without scoping them inside a tuple.


e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you 
prefer this naming scheme you can simply write


LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we 
do need a way for the user to say they care whether the row was null 
or the column. I think an implicit conversion here could be 
surprising. However we could implement tuple expressions anyway and 
let the user explicitly declare v as a tuple as Caleb has suggested 
for the existing proposal as well.


Assigning constants or other values not selected from a table would 
also be a little clunky:


LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is just slightly 
more verbose and slightly less flexible.


Which one would be most intuitive to users is hard to predict. It 
might be that Option 6 would be slightly easier, but I’m unsure if 
there would be a huge difference.




On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:

I'm really happy to see CEP-15 getting closer to a final 
implementation. I'm going to walk through my reasoning for your 
proposals wrt trying to explain this to somebody new.


Looking at all the options, the first thing that comes up for me is 
the Cassandra project's complicated relationship with NULL.  We have 
prior art with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS 
NOT NULL is used in materialized views similarly to proposals 2,4 and 5.


CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
  AS SELECT [ (column_list) ]
  FROM [keyspace_name.]table_name
  [ WHERE column_name IS NOT NULL
  [ AND column_name IS NOT NULL ... ] ]
  [ AND relation [ AND ... ] ]
  PRIMARY KEY ( column_list )
  [ WITH [ table_properties ]
  [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

 Based on that, I believe 1 and 3 would just confuse users, so -1 on 
those.


Trying to explain the difference between row and column operations 
with LET, I can't see the difference between a row and column in #2.


#4 introduces a boolean instead of column names and just adds more 
syntax.


#5 is verbose and, in my opinion, easier to reason when writing a 
query. Thinking top down, I need to know if these exact rows and/or 
column values exist before changing them, so I'll define them first. 
Then I'll iterate over the state I created in my actual changes so I 
know I'm changing precisely what I want.


#5 could use a bit more to be clearer to somebody who doesn't write 
CQL queries daily and wouldn't require memorizing subtle

Re: CEP-15 multi key transaction syntax

2022-08-13 Thread Benedict Elliott Smith

I’ll do my best to express with my thinking, as well as how I would explain the 
feature to a user.

My mental model for LET statements is that they are simply SELECT statements 
where the columns that are selected become variables accessible anywhere in the 
scope of the transaction. That is to say, you should be able to run something 
like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a 
LET statement and produce a valid SELECT statement, and vice versa. Both should 
perform identically.

e.g. 
SELECT pk AS key, v AS value FROM table 

=> 
LET key = pk, value = v FROM table

Identical form, identical behaviour. Every statement should be directly 
translatable with some simple text manipulation.

We can then make this more powerful for users by simply expanding SELECT 
statements, e.g. by permitting them to declare constants and tuples in the 
column results. In this scheme LET x = * is simply syntactic sugar for LET x = 
(pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at once, 
consistently alongside each other.

Option 6 is in fact very similar, but is strictly less flexible for the user as 
they have no way to declare multiple scalar variables without scoping them 
inside a tuple.

e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you prefer this 
naming scheme you can simply write

LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we do need a 
way for the user to say they care whether the row was null or the column. I 
think an implicit conversion here could be surprising. However we could 
implement tuple expressions anyway and let the user explicitly declare v as a 
tuple as Caleb has suggested for the existing proposal as well.

Assigning constants or other values not selected from a table would also be a 
little clunky:

LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are close to identical, it is just slightly more 
verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. It might be that 
Option 6 would be slightly easier, but I’m unsure if there would be a huge 
difference.

> On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:
> 
> I'm really happy to see CEP-15 getting closer to a final implementation. I'm 
> going to walk through my reasoning for your proposals wrt trying to explain 
> this to somebody new. 
> 
> Looking at all the options, the first thing that comes up for me is the 
> Cassandra project's complicated relationship with NULL.  We have prior art 
> with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is used 
> in materialized views similarly to proposals 2,4 and 5. 
> 
> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>   AS SELECT [ (column_list) ]
>   FROM [keyspace_name.]table_name
>   [ WHERE column_name IS NOT NULL
>   [ AND column_name IS NOT NULL ... ] ]
>   [ AND relation [ AND ... ] ] 
>   PRIMARY KEY ( column_list )
>   [ WITH [ table_properties ]
>   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;
> 
>  Based on that, I believe 1 and 3 would just confuse users, so -1 on those. 
> 
> Trying to explain the difference between row and column operations with LET, 
> I can't see the difference between a row and column in #2. 
> 
> #4 introduces a boolean instead of column names and just adds more syntax.
> 
> #5 is verbose and, in my opinion, easier to reason when writing a query. 
> Thinking top down, I need to know if these exact rows and/or column values 
> exist before changing them, so I'll define them first. Then I'll iterate over 
> the state I created in my actual changes so I know I'm changing precisely 
> what I want. 
> 
> #5 could use a bit more to be clearer to somebody who doesn't write CQL 
> queries daily and wouldn't require memorizing subtle differences. It should 
> be similar to all the other syntax, so learning a little about CQL will let 
> you move into more without completely re-learning the new syntax.  
> 
> So I propose #6)
> BEGIN TRANSACTION
>   LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
>   LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1, row2
>   IF row1 IS NULL AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
> 
> I added the SELECT in the LET just so it's straightforward, you are reading, 
> and it's just like doing a regular select, but you are assigning it to a 
> variable. 
> 
> I removed the confusing 'row1.v' and replaced it with 'row1' I can't see why 
>

Re: CEP-15 multi key transaction syntax

2022-08-13 Thread Patrick McFadin

I'm really happy to see CEP-15 getting closer to a final implementation.
I'm going to walk through my reasoning for your proposals wrt trying to
explain this to somebody new.

Looking at all the options, the first thing that comes up for me is the
Cassandra project's complicated relationship with NULL.  We have prior art
with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is
used in materialized views similarly to proposals 2,4 and 5.

CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
  AS SELECT [ (column_list) ]
  FROM [keyspace_name.]table_name
  [ WHERE column_name IS NOT NULL
  [ AND column_name IS NOT NULL ... ] ]
  [ AND relation [ AND ... ] ]
  PRIMARY KEY ( column_list )
  [ WITH [ table_properties ]
  [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

 Based on that, I believe 1 and 3 would just confuse users, so -1 on those.

Trying to explain the difference between row and column operations with
LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds more syntax.

#5 is verbose and, in my opinion, easier to reason when writing a query.
Thinking top down, I need to know if these exact rows and/or column values
exist before changing them, so I'll define them first. Then I'll iterate
over the state I created in my actual changes so I know I'm changing
precisely what I want.

#5 could use a bit more to be clearer to somebody who doesn't write CQL
queries daily and wouldn't require memorizing subtle differences. It should
be similar to all the other syntax, so learning a little about CQL will let
you move into more without completely re-learning the new syntax.

So I propose #6)
BEGIN TRANSACTION
  LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
columns
  LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1, row2
  IF row1 IS NULL AND row2.v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

I added the SELECT in the LET just so it's straightforward, you are
reading, and it's just like doing a regular select, but you are assigning
it to a variable.

I removed the confusing 'row1.v' and replaced it with 'row1' I can't see
why you would need the '.v' vs having the complete variable I created in
the statement above.

EOL

Patrick

On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe 
wrote:

> ...and one more option...
>
> 5.) Introduce tuple assignments, removing all ambiguity around row vs.
> column operations.
>
> BEGIN TRANSACTION
>   LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
>   LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1.v, row2.v
>   IF row1 IS NULL AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
>
>
> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe 
> wrote:
>
>> via Benedict, here is a 4th option:
>>
>> 4.) Similar to #2, but don't rely on the key element being NULL.
>>
>> If the read returns no result, x effectively becomes NULL. Otherwise, it
>> remains true/NOT NULL.
>>
>> BEGIN TRANSACTION
>>   LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>   SELECT x, row2_v
>>   IF x IS NULL AND row2_v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> Hello again everyone!
>>>
>>> I've been working on a prototype
>>>  in
>>> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed
>>> on in this thread. One thing that isn't immediately obvious to me is how
>>> the LET syntax handles cases where we want to check for the plain existence
>>> of a row in IF. For example, in this hybrid of the originally proposed
>>> syntax and something more like what we've agreed on (and the RETURNING just
>>> to distinguish between that and SELECT), this could be pretty
>>> straightforward:
>>>
>>> BEGIN TRANSACTION
>>>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>   RETURNING row1.v, row2.v
>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>   END IF
>>> COMMIT TRANSACTION
>>>
>>> The NOT EXISTS operator has row1 to work with. One the other hand, w/
>>> the LET syntax and no naming of reads, it's not clear what the best
>>> solution would be. Here are a few possibilities:
>>>
>>> 1.) Provide a few built-in functions that operate on a whole result row.
>>> If we assume a SQL style IS NULL and IS NOT NULL (see my last post here)
>>> for operations on particular columns, this probably eliminates the need for
>>> EXISTS/NOT EXISTS as well.
>>>
>>> BEGIN TRANSACTION
>>>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>>>   LET row2_v = v

Re: CEP-15 multi key transaction syntax

2022-08-11 Thread Caleb Rackliffe

...and one more option...

5.) Introduce tuple assignments, removing all ambiguity around row vs.
column operations.

BEGIN TRANSACTION
  LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
  LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1.v, row2.v
  IF row1 IS NULL AND row2.v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION



On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe 
wrote:

> via Benedict, here is a 4th option:
>
> 4.) Similar to #2, but don't rely on the key element being NULL.
>
> If the read returns no result, x effectively becomes NULL. Otherwise, it
> remains true/NOT NULL.
>
> BEGIN TRANSACTION
>   LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT x, row2_v
>   IF x IS NULL AND row2_v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe 
> wrote:
>
>> Hello again everyone!
>>
>> I've been working on a prototype
>>  in
>> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed
>> on in this thread. One thing that isn't immediately obvious to me is how
>> the LET syntax handles cases where we want to check for the plain existence
>> of a row in IF. For example, in this hybrid of the originally proposed
>> syntax and something more like what we've agreed on (and the RETURNING just
>> to distinguish between that and SELECT), this could be pretty
>> straightforward:
>>
>> BEGIN TRANSACTION
>>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>   RETURNING row1.v, row2.v
>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> The NOT EXISTS operator has row1 to work with. One the other hand, w/ the
>> LET syntax and no naming of reads, it's not clear what the best solution
>> would be. Here are a few possibilities:
>>
>> 1.) Provide a few built-in functions that operate on a whole result row.
>> If we assume a SQL style IS NULL and IS NOT NULL (see my last post here)
>> for operations on particular columns, this probably eliminates the need for
>> EXISTS/NOT EXISTS as well.
>>
>> BEGIN TRANSACTION
>>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>   SELECT row1_missing, row2_v
>>   IF row1_missing AND row2_v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> 2.) Assign and check the first primary key element to determine whether
>> the row exists.
>>
>> BEGIN TRANSACTION
>>   LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>   SELECT row1_k, row2_v
>>   IF row1_k IS NULL AND row2_v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> 3.) Reconsider the LET concept toward something that allows us to
>> explicitly name our reads again.
>>
>> BEGIN TRANSACTION
>>   WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>>   WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>>   SELECT row1.v, row2.v
>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>> I don't have a strong affinity for any of these, although #1 seems the
>> most awkward.
>>
>> Does anyone have any other alternatives? Preference for one of the above
>> options?
>>
>> Thanks!
>>
>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> Avi brought up an interesting point around NULLness checking in
>>> CASSANDRA-17762 
>>> ...
>>>
>>> In SQL, any comparison with NULL is NULL, which is interpreted as FALSE
 in a condition. To test for NULLness, you use IS NULL or IS NOT NULL. But
 LWT uses IF col = NULL as a NULLness test. This is likely to confuse people
 coming from SQL and hamper attempts to extend the dialect.
>>>
>>>
>>> We can leave that Jira open to address what to do in the legacy LWT
>>> case, but I'd support a SQL-congruent syntax here (IS NULL or IS NOT
>>> NULL), where we have something closer to a blank slate.
>>>
>>> Thoughts?
>>>
>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky  wrote:
>>>
 The new syntax looks great, and I’m really excited to see this coming
 together.

 One piece of feedback on the proposed syntax is around the use of “=“
 as a declaration in addition to its current use as an equality operator in
 a WHERE clause and an assignment operator in an UPDATE:

 BEGIN TRANSACTION
   LET car_miles = miles_driven, car_is_running = is_running FROM cars
 WHERE model=’pinto’
   LET user_miles = miles_driven FROM users WHERE name=’blake’

Re: CEP-15 multi key transaction syntax

2022-08-11 Thread Caleb Rackliffe

via Benedict, here is a 4th option:

4.) Similar to #2, but don't rely on the key element being NULL.

If the read returns no result, x effectively becomes NULL. Otherwise, it
remains true/NOT NULL.

BEGIN TRANSACTION
  LET x = true FROM ks.tbl WHERE k=0 AND c=0;
  LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT x, row2_v
  IF x IS NULL AND row2_v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe 
wrote:

> Hello again everyone!
>
> I've been working on a prototype
>  in
> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed
> on in this thread. One thing that isn't immediately obvious to me is how
> the LET syntax handles cases where we want to check for the plain existence
> of a row in IF. For example, in this hybrid of the originally proposed
> syntax and something more like what we've agreed on (and the RETURNING just
> to distinguish between that and SELECT), this could be pretty
> straightforward:
>
> BEGIN TRANSACTION
>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>   RETURNING row1.v, row2.v
>   IF row1 NOT EXISTS AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> The NOT EXISTS operator has row1 to work with. One the other hand, w/ the
> LET syntax and no naming of reads, it's not clear what the best solution
> would be. Here are a few possibilities:
>
> 1.) Provide a few built-in functions that operate on a whole result row.
> If we assume a SQL style IS NULL and IS NOT NULL (see my last post here)
> for operations on particular columns, this probably eliminates the need for
> EXISTS/NOT EXISTS as well.
>
> BEGIN TRANSACTION
>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1_missing, row2_v
>   IF row1_missing AND row2_v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> 2.) Assign and check the first primary key element to determine whether
> the row exists.
>
> BEGIN TRANSACTION
>   LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1_k, row2_v
>   IF row1_k IS NULL AND row2_v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> 3.) Reconsider the LET concept toward something that allows us to
> explicitly name our reads again.
>
> BEGIN TRANSACTION
>   WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>   WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>   SELECT row1.v, row2.v
>   IF row1 NOT EXISTS AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> I don't have a strong affinity for any of these, although #1 seems the
> most awkward.
>
> Does anyone have any other alternatives? Preference for one of the above
> options?
>
> Thanks!
>
> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe 
> wrote:
>
>> Avi brought up an interesting point around NULLness checking in
>> CASSANDRA-17762 
>> ...
>>
>> In SQL, any comparison with NULL is NULL, which is interpreted as FALSE
>>> in a condition. To test for NULLness, you use IS NULL or IS NOT NULL. But
>>> LWT uses IF col = NULL as a NULLness test. This is likely to confuse people
>>> coming from SQL and hamper attempts to extend the dialect.
>>
>>
>> We can leave that Jira open to address what to do in the legacy LWT case,
>> but I'd support a SQL-congruent syntax here (IS NULL or IS NOT NULL),
>> where we have something closer to a blank slate.
>>
>> Thoughts?
>>
>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky  wrote:
>>
>>> The new syntax looks great, and I’m really excited to see this coming
>>> together.
>>>
>>> One piece of feedback on the proposed syntax is around the use of “=“ as
>>> a declaration in addition to its current use as an equality operator in a
>>> WHERE clause and an assignment operator in an UPDATE:
>>>
>>> BEGIN TRANSACTION
>>>   LET car_miles = miles_driven, car_is_running = is_running FROM cars
>>> WHERE model=’pinto’
>>>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>>>   SELECT something else from some other table
>>>   IF NOT car_is_running THEN ABORT
>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>>> COMMIT TRANSACTION
>>>
>>> This is supported in languages like PL/pgSQL, but in a normal SQL query
>>> kind of local declaration is often expressed as an alias (SELECT col AS
>>> new_col), subquery alias (SELECT col) t, or common table expression (WITH t
>>> AS (SELECT col)).
>>>
>>> Here’s an example of an alternative to the proposed syntax that I’d find
>>> more readable:
>>>

Re: CEP-15 multi key transaction syntax

2022-08-11 Thread Caleb Rackliffe

Hello again everyone!

I've been working on a prototype
 in CASSANDRA-17719
for a grammar that roughly corresponds to what we've agreed on in this
thread. One thing that isn't immediately obvious to me is how the LET
syntax handles cases where we want to check for the plain existence of a
row in IF. For example, in this hybrid of the originally proposed syntax
and something more like what we've agreed on (and the RETURNING just to
distinguish between that and SELECT), this could be pretty straightforward:

BEGIN TRANSACTION
  SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
  SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
  RETURNING row1.v, row2.v
  IF row1 NOT EXISTS AND row2.v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

The NOT EXISTS operator has row1 to work with. One the other hand, w/ the
LET syntax and no naming of reads, it's not clear what the best solution
would be. Here are a few possibilities:

1.) Provide a few built-in functions that operate on a whole result row. If
we assume a SQL style IS NULL and IS NOT NULL (see my last post here) for
operations on particular columns, this probably eliminates the need for
EXISTS/NOT EXISTS as well.

BEGIN TRANSACTION
  LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
  LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1_missing, row2_v
  IF row1_missing AND row2_v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

2.) Assign and check the first primary key element to determine whether the
row exists.

BEGIN TRANSACTION
  LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
  LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
  SELECT row1_k, row2_v
  IF row1_k IS NULL AND row2_v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

3.) Reconsider the LET concept toward something that allows us to
explicitly name our reads again.

BEGIN TRANSACTION
  WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
  WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
  SELECT row1.v, row2.v
  IF row1 NOT EXISTS AND row2.v = 3 THEN
INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
  END IF
COMMIT TRANSACTION

I don't have a strong affinity for any of these, although #1 seems the most
awkward.

Does anyone have any other alternatives? Preference for one of the above
options?

Thanks!

On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe 
wrote:

> Avi brought up an interesting point around NULLness checking in
> CASSANDRA-17762 ...
>
> In SQL, any comparison with NULL is NULL, which is interpreted as FALSE in
>> a condition. To test for NULLness, you use IS NULL or IS NOT NULL. But LWT
>> uses IF col = NULL as a NULLness test. This is likely to confuse people
>> coming from SQL and hamper attempts to extend the dialect.
>
>
> We can leave that Jira open to address what to do in the legacy LWT case,
> but I'd support a SQL-congruent syntax here (IS NULL or IS NOT NULL),
> where we have something closer to a blank slate.
>
> Thoughts?
>
> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky  wrote:
>
>> The new syntax looks great, and I’m really excited to see this coming
>> together.
>>
>> One piece of feedback on the proposed syntax is around the use of “=“ as
>> a declaration in addition to its current use as an equality operator in a
>> WHERE clause and an assignment operator in an UPDATE:
>>
>> BEGIN TRANSACTION
>>   LET car_miles = miles_driven, car_is_running = is_running FROM cars
>> WHERE model=’pinto’
>>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>>   SELECT something else from some other table
>>   IF NOT car_is_running THEN ABORT
>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>> COMMIT TRANSACTION
>>
>> This is supported in languages like PL/pgSQL, but in a normal SQL query
>> kind of local declaration is often expressed as an alias (SELECT col AS
>> new_col), subquery alias (SELECT col) t, or common table expression (WITH t
>> AS (SELECT col)).
>>
>> Here’s an example of an alternative to the proposed syntax that I’d find
>> more readable:
>>
>> BEGIN TRANSACTION
>>   WITH car_miles, car_is_running AS (SELECT miles_driven, is_running FROM
>> cars WHERE model=’pinto’),
>>   user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
>>   IF NOT car_is_running THEN ABORT
>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>> COMMIT TRANSACTION
>>
>> There’s also the option of naming the transaction like a subquery, and
>> supporting LET via AS (this one I’m less sure about but wanted to propose
>> anyway):
>>
>> BEGIN TRANSACTION t1
>>   SELECT miles_driven AS t1.car_miles, is_running AS t1.car_is_running
>> FROM cars

Re: CEP-15 multi key transaction syntax

2022-07-22 Thread Caleb Rackliffe

Avi brought up an interesting point around NULLness checking in
CASSANDRA-17762 ...

In SQL, any comparison with NULL is NULL, which is interpreted as FALSE in
> a condition. To test for NULLness, you use IS NULL or IS NOT NULL. But LWT
> uses IF col = NULL as a NULLness test. This is likely to confuse people
> coming from SQL and hamper attempts to extend the dialect.


We can leave that Jira open to address what to do in the legacy LWT case,
but I'd support a SQL-congruent syntax here (IS NULL or IS NOT NULL), where
we have something closer to a blank slate.

Thoughts?

On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky  wrote:

> The new syntax looks great, and I’m really excited to see this coming
> together.
>
> One piece of feedback on the proposed syntax is around the use of “=“ as a
> declaration in addition to its current use as an equality operator in a
> WHERE clause and an assignment operator in an UPDATE:
>
> BEGIN TRANSACTION
>   LET car_miles = miles_driven, car_is_running = is_running FROM cars
> WHERE model=’pinto’
>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>   SELECT something else from some other table
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> This is supported in languages like PL/pgSQL, but in a normal SQL query
> kind of local declaration is often expressed as an alias (SELECT col AS
> new_col), subquery alias (SELECT col) t, or common table expression (WITH t
> AS (SELECT col)).
>
> Here’s an example of an alternative to the proposed syntax that I’d find
> more readable:
>
> BEGIN TRANSACTION
>   WITH car_miles, car_is_running AS (SELECT miles_driven, is_running FROM
> cars WHERE model=’pinto’),
>   user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> There’s also the option of naming the transaction like a subquery, and
> supporting LET via AS (this one I’m less sure about but wanted to propose
> anyway):
>
> BEGIN TRANSACTION t1
>   SELECT miles_driven AS t1.car_miles, is_running AS t1.car_is_running
> FROM cars WHERE model=’pinto’;
>   SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’;
>   IF NOT car_is_running THEN ABORT
>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
> COMMIT TRANSACTION
>
> This also has the benefit of resolving ambiguity in case of naming
> conflicts with existing (or future) column names.
>
> --
> Abe
>

Re: CEP-15 multi key transaction syntax

2022-06-30 Thread Abe Ratnofsky

The new syntax looks great, and I’m really excited to see this coming together.

One piece of feedback on the proposed syntax is around the use of “=“ as a 
declaration in addition to its current use as an equality operator in a WHERE 
clause and an assignment operator in an UPDATE:

BEGIN TRANSACTION
  LET car_miles = miles_driven, car_is_running = is_running FROM cars WHERE 
model=’pinto’
  LET user_miles = miles_driven FROM users WHERE name=’blake’
  SELECT something else from some other table
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

This is supported in languages like PL/pgSQL, but in a normal SQL query kind of 
local declaration is often expressed as an alias (SELECT col AS new_col), 
subquery alias (SELECT col) t, or common table expression (WITH t AS (SELECT 
col)).

Here’s an example of an alternative to the proposed syntax that I’d find more 
readable:

BEGIN TRANSACTION
  WITH car_miles, car_is_running AS (SELECT miles_driven, is_running FROM cars 
WHERE model=’pinto’),
user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

There’s also the option of naming the transaction like a subquery, and 
supporting LET via AS (this one I’m less sure about but wanted to propose 
anyway):

BEGIN TRANSACTION t1
  SELECT miles_driven AS t1.car_miles, is_running AS t1.car_is_running FROM 
cars WHERE model=’pinto’;
  SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’;
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

This also has the benefit of resolving ambiguity in case of naming conflicts 
with existing (or future) column names.

--
Abe

Re: CEP-15 multi key transaction syntax

2022-06-27 Thread Blake Eggleston

I think we’ve converged on a starting syntax. Are there any additional comments 
before I open a JIRA?

> On Jun 16, 2022, at 10:33 AM, Blake Eggleston  wrote:
> 
> I think in any scenario where the same cell is updated multiple times, the 
> last one would win. The final result for s3 in your example would be 2
> 
>> On Jun 16, 2022, at 10:31 AM, Jon Meredith > > wrote:
>> 
>> The reason I brought up static columns was for cases where multiple 
>> statements update them and there could be ambiguity.
>> 
>> CREATE TABLE tbl
>> {
>>   pk1 int,
>>   ck2 int,
>>   s3 static int,
>>   r4 static int,
>>   PRIMARY KEY (pk1, ck2)
>> }
>> 
>> BEGIN TRANSACTION
>> UPDATE tbl SET s3=1, r4=1 WHERE pk1=1 AND ck2=1;
>> UPDATE tbl SET s3=2, r4=2 WHERE pk1=1 AND ck2=2;
>> COMMIT TRANSACTION
>> 
>> What should the final value be for s3?
>> 
>> This makes me realize I don't understand how upsert statements that touch 
>> the same row would be applied in general within a transaction.
>> If the plan is for only-once-per-row within a transaction, then I think 
>> regular columns and static columns should be split into their own UPSERT 
>> statements.
>> 
>> On Thu, Jun 16, 2022 at 10:40 AM Benedict Elliott Smith > > wrote:
>> I like Postgres' approach of letting you declare an exceptional condition 
>> and failing if there is not precisely one result (though I would prefer to 
>> differentiate between 0 row->Null and 2 rows->first row), but once you 
>> permit coercing to NULL I think you have to then treat it like NULL and 
>> permit arithmetic (that itself yields NULL)
>> 
>> This is explicitly stipulated in ANSI SQL 92, in 6.12 > expression>:
>> 
>> General Rules
>> 
>>  1) If the value of any  simply contained in a
>>  is the null value, then the result of
>> the  is the null value.
>> 
>> 
>> On 2022/06/16 16:02:33 Blake Eggleston wrote:
>> > Yeah I'd say NULL is fine for condition evaluation. Reference assignment 
>> > is a little trickier. Assigning null to a column seems ok, but we should 
>> > raise an exception if they're doing math or something that expects a 
>> > non-null value
>> > 
>> > > On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith > > > > wrote:
>> > > 
>> > > AFAICT that standard addresses server-side cursors, not the assignment 
>> > > of a query result to a variable. Could you point to where it addresses 
>> > > variable assignment?
>> > > 
>> > > Postgres has a similar concept, SELECT INTO[1], and it explicitly 
>> > > returns NULL if there are no result rows, unless STRICT is specified in 
>> > > which case an error is returned. My recollection is that T-SQL is also 
>> > > fine with coercing no results to NULL when assigning to a variable or 
>> > > using it in a sub-expression.
>> > > 
>> > > I'm in favour of expanding our functionality here, but I do not see 
>> > > anything fundamentally problematic about the proposal as it stands.
>> > > 
>> > > [1] 
>> > > https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
>> > >  
>> > > 
>> > > 
>> > > 
>> > > 
>> > > On 2022/06/13 14:52:41 Konstantin Osipov wrote:
>> > >> * bened...@apache.org  > > >> > [22/06/13 17:37]:
>> > >>> I believe that is a MySQL specific concept. This is one problem with 
>> > >>> mimicking SQL – it’s not one thing!
>> > >>> 
>> > >>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a 
>> > >>> NULL value submitted to a Boolean operator yields UNKNOWN.
>> > >>> 
>> > >>> IF (X) THEN Y does not run Y if X is UNKNOWN;
>> > >>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
>> > >>> 
>> > >>> So, I think we have evidence that it is fine to interpret NULL
>> > >>> as “false” for the evaluation of IF conditions.
>> > >> 
>> > >> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
>> > >> 
>> > >> In Cassandra results, there is no way to distinguish null values
>> > >> from absence of a row. Branching, thus, without being able to
>> > >> branch based on the absence of a row, whatever specific syntax
>> > >> is used for such branching, is incomplete. 
>> > >> 
>> > >> More broadly, SQL/PSM has exception and condition statements, not
>> > >> just IF statements.
>> > >> 
>> > >> -- 
>> > >> Konstantin Osipov, Moscow, Russia
>> > >> 
>> > 
>> > 
>

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Blake Eggleston

I think in any scenario where the same cell is updated multiple times, the last 
one would win. The final result for s3 in your example would be 2

> On Jun 16, 2022, at 10:31 AM, Jon Meredith  wrote:
> 
> The reason I brought up static columns was for cases where multiple 
> statements update them and there could be ambiguity.
> 
> CREATE TABLE tbl
> {
>   pk1 int,
>   ck2 int,
>   s3 static int,
>   r4 static int,
>   PRIMARY KEY (pk1, ck2)
> }
> 
> BEGIN TRANSACTION
> UPDATE tbl SET s3=1, r4=1 WHERE pk1=1 AND ck2=1;
> UPDATE tbl SET s3=2, r4=2 WHERE pk1=1 AND ck2=2;
> COMMIT TRANSACTION
> 
> What should the final value be for s3?
> 
> This makes me realize I don't understand how upsert statements that touch the 
> same row would be applied in general within a transaction.
> If the plan is for only-once-per-row within a transaction, then I think 
> regular columns and static columns should be split into their own UPSERT 
> statements.
> 
> On Thu, Jun 16, 2022 at 10:40 AM Benedict Elliott Smith  > wrote:
> I like Postgres' approach of letting you declare an exceptional condition and 
> failing if there is not precisely one result (though I would prefer to 
> differentiate between 0 row->Null and 2 rows->first row), but once you permit 
> coercing to NULL I think you have to then treat it like NULL and permit 
> arithmetic (that itself yields NULL)
> 
> This is explicitly stipulated in ANSI SQL 92, in 6.12  expression>:
> 
> General Rules
> 
>  1) If the value of any  simply contained in a
>  is the null value, then the result of
> the  is the null value.
> 
> 
> On 2022/06/16 16:02:33 Blake Eggleston wrote:
> > Yeah I'd say NULL is fine for condition evaluation. Reference assignment is 
> > a little trickier. Assigning null to a column seems ok, but we should raise 
> > an exception if they're doing math or something that expects a non-null 
> > value
> > 
> > > On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith  > > > wrote:
> > > 
> > > AFAICT that standard addresses server-side cursors, not the assignment of 
> > > a query result to a variable. Could you point to where it addresses 
> > > variable assignment?
> > > 
> > > Postgres has a similar concept, SELECT INTO[1], and it explicitly returns 
> > > NULL if there are no result rows, unless STRICT is specified in which 
> > > case an error is returned. My recollection is that T-SQL is also fine 
> > > with coercing no results to NULL when assigning to a variable or using it 
> > > in a sub-expression.
> > > 
> > > I'm in favour of expanding our functionality here, but I do not see 
> > > anything fundamentally problematic about the proposal as it stands.
> > > 
> > > [1] 
> > > https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
> > >  
> > > 
> > > 
> > > 
> > > 
> > > On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> > >> * bened...@apache.org   > >> > [22/06/13 17:37]:
> > >>> I believe that is a MySQL specific concept. This is one problem with 
> > >>> mimicking SQL – it’s not one thing!
> > >>> 
> > >>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
> > >>> value submitted to a Boolean operator yields UNKNOWN.
> > >>> 
> > >>> IF (X) THEN Y does not run Y if X is UNKNOWN;
> > >>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> > >>> 
> > >>> So, I think we have evidence that it is fine to interpret NULL
> > >>> as “false” for the evaluation of IF conditions.
> > >> 
> > >> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> > >> 
> > >> In Cassandra results, there is no way to distinguish null values
> > >> from absence of a row. Branching, thus, without being able to
> > >> branch based on the absence of a row, whatever specific syntax
> > >> is used for such branching, is incomplete. 
> > >> 
> > >> More broadly, SQL/PSM has exception and condition statements, not
> > >> just IF statements.
> > >> 
> > >> -- 
> > >> Konstantin Osipov, Moscow, Russia
> > >> 
> > 
> >

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Jon Meredith

The reason I brought up static columns was for cases where multiple
statements update them and there could be ambiguity.

CREATE TABLE tbl
{
  pk1 int,
  ck2 int,
  s3 static int,
  r4 static int,
  PRIMARY KEY (pk1, ck2)
}

BEGIN TRANSACTION
UPDATE tbl SET s3=1, r4=1 WHERE pk1=1 AND ck2=1;
UPDATE tbl SET s3=2, r4=2 WHERE pk1=1 AND ck2=2;
COMMIT TRANSACTION

What should the final value be for s3?

This makes me realize I don't understand how upsert statements that touch
the same row would be applied in general within a transaction.
If the plan is for only-once-per-row within a transaction, then I think
regular columns and static columns should be split into their own UPSERT
statements.

On Thu, Jun 16, 2022 at 10:40 AM Benedict Elliott Smith 
wrote:

> I like Postgres' approach of letting you declare an exceptional condition
> and failing if there is not precisely one result (though I would prefer to
> differentiate between 0 row->Null and 2 rows->first row), but once you
> permit coercing to NULL I think you have to then treat it like NULL and
> permit arithmetic (that itself yields NULL)
>
> This is explicitly stipulated in ANSI SQL 92, in 6.12  expression>:
>
> General Rules
>
>  1) If the value of any  simply contained in a
>  is the null value, then the result
> of
> the  is the null value.
>
>
> On 2022/06/16 16:02:33 Blake Eggleston wrote:
> > Yeah I'd say NULL is fine for condition evaluation. Reference assignment
> is a little trickier. Assigning null to a column seems ok, but we should
> raise an exception if they're doing math or something that expects a
> non-null value
> >
> > > On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> > >
> > > AFAICT that standard addresses server-side cursors, not the assignment
> of a query result to a variable. Could you point to where it addresses
> variable assignment?
> > >
> > > Postgres has a similar concept, SELECT INTO[1], and it explicitly
> returns NULL if there are no result rows, unless STRICT is specified in
> which case an error is returned. My recollection is that T-SQL is also fine
> with coercing no results to NULL when assigning to a variable or using it
> in a sub-expression.
> > >
> > > I'm in favour of expanding our functionality here, but I do not see
> anything fundamentally problematic about the proposal as it stands.
> > >
> > > [1]
> https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
> > >
> > >
> > >
> > > On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> > >> * bened...@apache.org  [22/06/13 17:37]:
> > >>> I believe that is a MySQL specific concept. This is one problem with
> mimicking SQL – it’s not one thing!
> > >>>
> > >>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a
> NULL value submitted to a Boolean operator yields UNKNOWN.
> > >>>
> > >>> IF (X) THEN Y does not run Y if X is UNKNOWN;
> > >>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> > >>>
> > >>> So, I think we have evidence that it is fine to interpret NULL
> > >>> as “false” for the evaluation of IF conditions.
> > >>
> > >> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> > >>
> > >> In Cassandra results, there is no way to distinguish null values
> > >> from absence of a row. Branching, thus, without being able to
> > >> branch based on the absence of a row, whatever specific syntax
> > >> is used for such branching, is incomplete.
> > >>
> > >> More broadly, SQL/PSM has exception and condition statements, not
> > >> just IF statements.
> > >>
> > >> --
> > >> Konstantin Osipov, Moscow, Russia
> > >>
> >
> >
>

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Benedict Elliott Smith

I like Postgres' approach of letting you declare an exceptional condition and 
failing if there is not precisely one result (though I would prefer to 
differentiate between 0 row->Null and 2 rows->first row), but once you permit 
coercing to NULL I think you have to then treat it like NULL and permit 
arithmetic (that itself yields NULL)

This is explicitly stipulated in ANSI SQL 92, in 6.12 :

General Rules

 1) If the value of any  simply contained in a
 is the null value, then the result of
the  is the null value.


On 2022/06/16 16:02:33 Blake Eggleston wrote:
> Yeah I'd say NULL is fine for condition evaluation. Reference assignment is a 
> little trickier. Assigning null to a column seems ok, but we should raise an 
> exception if they're doing math or something that expects a non-null value
> 
> > On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith  
> > wrote:
> > 
> > AFAICT that standard addresses server-side cursors, not the assignment of a 
> > query result to a variable. Could you point to where it addresses variable 
> > assignment?
> > 
> > Postgres has a similar concept, SELECT INTO[1], and it explicitly returns 
> > NULL if there are no result rows, unless STRICT is specified in which case 
> > an error is returned. My recollection is that T-SQL is also fine with 
> > coercing no results to NULL when assigning to a variable or using it in a 
> > sub-expression.
> > 
> > I'm in favour of expanding our functionality here, but I do not see 
> > anything fundamentally problematic about the proposal as it stands.
> > 
> > [1] 
> > https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
> > 
> > 
> > 
> > On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> >> * bened...@apache.org  [22/06/13 17:37]:
> >>> I believe that is a MySQL specific concept. This is one problem with 
> >>> mimicking SQL – it’s not one thing!
> >>> 
> >>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
> >>> value submitted to a Boolean operator yields UNKNOWN.
> >>> 
> >>> IF (X) THEN Y does not run Y if X is UNKNOWN;
> >>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> >>> 
> >>> So, I think we have evidence that it is fine to interpret NULL
> >>> as “false” for the evaluation of IF conditions.
> >> 
> >> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> >> 
> >> In Cassandra results, there is no way to distinguish null values
> >> from absence of a row. Branching, thus, without being able to
> >> branch based on the absence of a row, whatever specific syntax
> >> is used for such branching, is incomplete. 
> >> 
> >> More broadly, SQL/PSM has exception and condition statements, not
> >> just IF statements.
> >> 
> >> -- 
> >> Konstantin Osipov, Moscow, Russia
> >> 
> 
>

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Blake Eggleston

Yeah I'd say NULL is fine for condition evaluation. Reference assignment is a 
little trickier. Assigning null to a column seems ok, but we should raise an 
exception if they're doing math or something that expects a non-null value

> On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith  
> wrote:
> 
> AFAICT that standard addresses server-side cursors, not the assignment of a 
> query result to a variable. Could you point to where it addresses variable 
> assignment?
> 
> Postgres has a similar concept, SELECT INTO[1], and it explicitly returns 
> NULL if there are no result rows, unless STRICT is specified in which case an 
> error is returned. My recollection is that T-SQL is also fine with coercing 
> no results to NULL when assigning to a variable or using it in a 
> sub-expression.
> 
> I'm in favour of expanding our functionality here, but I do not see anything 
> fundamentally problematic about the proposal as it stands.
> 
> [1] 
> https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
> 
> 
> 
> On 2022/06/13 14:52:41 Konstantin Osipov wrote:
>> * bened...@apache.org  [22/06/13 17:37]:
>>> I believe that is a MySQL specific concept. This is one problem with 
>>> mimicking SQL – it’s not one thing!
>>> 
>>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
>>> value submitted to a Boolean operator yields UNKNOWN.
>>> 
>>> IF (X) THEN Y does not run Y if X is UNKNOWN;
>>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
>>> 
>>> So, I think we have evidence that it is fine to interpret NULL
>>> as “false” for the evaluation of IF conditions.
>> 
>> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
>> 
>> In Cassandra results, there is no way to distinguish null values
>> from absence of a row. Branching, thus, without being able to
>> branch based on the absence of a row, whatever specific syntax
>> is used for such branching, is incomplete. 
>> 
>> More broadly, SQL/PSM has exception and condition statements, not
>> just IF statements.
>> 
>> -- 
>> Konstantin Osipov, Moscow, Russia
>>

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Benedict Elliott Smith

AFAICT that standard addresses server-side cursors, not the assignment of a 
query result to a variable. Could you point to where it addresses variable 
assignment?

Postgres has a similar concept, SELECT INTO[1], and it explicitly returns NULL 
if there are no result rows, unless STRICT is specified in which case an error 
is returned. My recollection is that T-SQL is also fine with coercing no 
results to NULL when assigning to a variable or using it in a sub-expression.

I'm in favour of expanding our functionality here, but I do not see anything 
fundamentally problematic about the proposal as it stands.

[1] 
https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW

On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> * bened...@apache.org  [22/06/13 17:37]:
> > I believe that is a MySQL specific concept. This is one problem with 
> > mimicking SQL – it’s not one thing!
> > 
> > In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
> > value submitted to a Boolean operator yields UNKNOWN.
> > 
> > IF (X) THEN Y does not run Y if X is UNKNOWN;
> > IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> > 
> > So, I think we have evidence that it is fine to interpret NULL
> > as “false” for the evaluation of IF conditions.
> 
> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> 
> In Cassandra results, there is no way to distinguish null values
> from absence of a row. Branching, thus, without being able to
> branch based on the absence of a row, whatever specific syntax
> is used for such branching, is incomplete. 
> 
> More broadly, SQL/PSM has exception and condition statements, not
> just IF statements.
> 
> -- 
> Konstantin Osipov, Moscow, Russia
>

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Blake Eggleston

I see what you mean. We have the EXISTS/NOT EXISTS condition for explicitly 
checking for the existence of a row. One thing the old syntax did is how it 
handled references to columns that don't exist. Previously, if any column 
reference didn't resolve, the update wouldn't apply. With the new syntax, if we 
want to be able to use multiple branches, that's going to be more difficult, 
since taking the 'ELSE' path may not make sense from an application 
perspective. So returning an exception in that case might be the right thing to 
do

> On Jun 15, 2022, at 2:18 PM, Konstantin Osipov  
> wrote:
> 
> * bened...@apache.org  [22/06/16 00:00]:
>> First some history: static rows are an efficiency sop to those
>> who migrated from the historical wide row world, where you could
>> have “global” partition state fetched with every query, and to
>> support the deprecation of thrift and its horrible data model
>> something needed to give – static rows were the result.
>> 
>> However, is the concept generally consistent? I think so. At
>> least, your example seem fine to me, and I can’t see how they
>> violate the “relational model” (whatever that may be). If it
>> helps, you can think of the static columns actually creating a
>> second table, so that you now have two separate tables with the
>> same partition key. These tables are implicitly related via a
>> “full outer join” on the partition key, and you can imagine that
>> you are generally querying a view of this relation.
> 
> This is a model I haven't pondered yet. 
> 
>> In this case, you would expect the outcome you see AFAICT. If
>> you have no restriction on the results, and you have no regular
>> rows and one static row, you would see a single static row
>> result with null regular columns (and a count of 1 row). If you
>> imposed a restriction on regular columns, you would not see the
>> static column as the null regular columns would not match the
>> condition.
>> 
>>> In LWT, a static row appears to exist when there is no regular row matching 
>>> WHERE
>> 
>> I assume you mean the IF clause matches against a static row if
>> you UPDATE tbl SET v = a WHERE p = b IF s = c. This could be an
>> inconsistency, but I think it is not. Recall, UPDATE in CQL is
>> not UPDATE in SQL. SQL would do nothing if the row doesn’t
>> exist, whatever the IF clause might say. CQL is really
>> performing UPSERT.
>> 
>> So, what happens when the WHERE clause doesn’t match a primary
>> key with UPSERT? A row is created. In this case, if you consider
>> that this empty nascent row is used to join with the static
>> “table” for evaluating the IF condition, to decide what you
>> UPSERT, then it all makes sense – to me, anyway.
>> 
>>> NULLs are first-class values, distinguishable from unset values
>> 
>> Could you give an example?
> 
> In SQL, if you FETCH into a VARIABLE and there is no matching row, 
> it won't quietly fill your variable with NULLs or a static cells,
> and leave you wondering what to do next. FETCH will RAISE NOT
> FOUND condition, a kind of exception you can handle separately. 
> Totally different in Cassandra where NULL is a deletion marker and
> NULLs are indistinguishable from missing values.
> 
> -- 
> Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Konstantin Osipov

* bened...@apache.org  [22/06/16 00:00]:
> First some history: static rows are an efficiency sop to those
> who migrated from the historical wide row world, where you could
> have “global” partition state fetched with every query, and to
> support the deprecation of thrift and its horrible data model
> something needed to give – static rows were the result.
> 
> However, is the concept generally consistent? I think so. At
> least, your example seem fine to me, and I can’t see how they
> violate the “relational model” (whatever that may be). If it
> helps, you can think of the static columns actually creating a
> second table, so that you now have two separate tables with the
> same partition key. These tables are implicitly related via a
> “full outer join” on the partition key, and you can imagine that
> you are generally querying a view of this relation.

This is a model I haven't pondered yet. 

> In this case, you would expect the outcome you see AFAICT. If
> you have no restriction on the results, and you have no regular
> rows and one static row, you would see a single static row
> result with null regular columns (and a count of 1 row). If you
> imposed a restriction on regular columns, you would not see the
> static column as the null regular columns would not match the
> condition.
> 
> > In LWT, a static row appears to exist when there is no regular row matching 
> > WHERE
> 
> I assume you mean the IF clause matches against a static row if
> you UPDATE tbl SET v = a WHERE p = b IF s = c. This could be an
> inconsistency, but I think it is not. Recall, UPDATE in CQL is
> not UPDATE in SQL. SQL would do nothing if the row doesn’t
> exist, whatever the IF clause might say. CQL is really
> performing UPSERT.
> 
> So, what happens when the WHERE clause doesn’t match a primary
> key with UPSERT? A row is created. In this case, if you consider
> that this empty nascent row is used to join with the static
> “table” for evaluating the IF condition, to decide what you
> UPSERT, then it all makes sense – to me, anyway.
> 
> > NULLs are first-class values, distinguishable from unset values
> 
> Could you give an example?

In SQL, if you FETCH into a VARIABLE and there is no matching row, 
it won't quietly fill your variable with NULLs or a static cells,
and leave you wondering what to do next. FETCH will RAISE NOT
FOUND condition, a kind of exception you can handle separately. 
Totally different in Cassandra where NULL is a deletion marker and
NULLs are indistinguishable from missing values.

-- 
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Konstantin Osipov

* bened...@apache.org  [22/06/13 17:37]:
> I believe that is a MySQL specific concept. This is one problem with 
> mimicking SQL – it’s not one thing!
> 
> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL value 
> submitted to a Boolean operator yields UNKNOWN.
> 
> IF (X) THEN Y does not run Y if X is UNKNOWN;
> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> 
> So, I think we have evidence that it is fine to interpret NULL
> as “false” for the evaluation of IF conditions.

NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 

In Cassandra results, there is no way to distinguish null values
from absence of a row. Branching, thus, without being able to
branch based on the absence of a row, whatever specific syntax
is used for such branching, is incomplete. 

More broadly, SQL/PSM has exception and condition statements, not
just IF statements.

-- 
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-15 Thread bened...@apache.org

Ok, so I am not a huge fan of static rows, but I disagree with your analysis.

First some history: static rows are an efficiency sop to those who migrated 
from the historical wide row world, where you could have “global” partition 
state fetched with every query, and to support the deprecation of thrift and 
its horrible data model something needed to give – static rows were the result.

However, is the concept generally consistent? I think so. At least, your 
example seem fine to me, and I can’t see how they violate the “relational 
model” (whatever that may be). If it helps, you can think of the static columns 
actually creating a second table, so that you now have two separate tables with 
the same partition key. These tables are implicitly related via a “full outer 
join” on the partition key, and you can imagine that you are generally querying 
a view of this relation.

In this case, you would expect the outcome you see AFAICT. If you have no 
restriction on the results, and you have no regular rows and one static row, 
you would see a single static row result with null regular columns (and a count 
of 1 row). If you imposed a restriction on regular columns, you would not see 
the static column as the null regular columns would not match the condition.

> In LWT, a static row appears to exist when there is no regular row matching 
> WHERE

I assume you mean the IF clause matches against a static row if you UPDATE tbl 
SET v = a WHERE p = b IF s = c. This could be an inconsistency, but I think it 
is not. Recall, UPDATE in CQL is not UPDATE in SQL. SQL would do nothing if the 
row doesn’t exist, whatever the IF clause might say. CQL is really performing 
UPSERT.

So, what happens when the WHERE clause doesn’t match a primary key with UPSERT? 
A row is created. In this case, if you consider that this empty nascent row is 
used to join with the static “table” for evaluating the IF condition, to decide 
what you UPSERT, then it all makes sense – to me, anyway.

> NULLs are first-class values, distinguishable from unset values

Could you give an example?


From: Konstantin Osipov 
Date: Wednesday, 15 June 2022 at 20:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
* bened...@apache.org  [22/06/15 18:38]:
> I expect LET to behave like SELECT, and I don’t expect this work to modify 
> the behaviour of normal CQL expressions. Do you think there is something 
> wrong or inconsistent about the behaviours you mention?
>
> Static columns are a bit weird, but at the very least the following would 
> permit the user to reliably obtain a static value, if it exists:
>
> LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1
>
> This could be mixed with a clustering key query
>
> LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
> clusteringKey = someOtherKey

I think static rows should not be selectable outside clustering
rows. This violates relational model. Unfortunately currently they
sometimes are.

Here's an example:


> create table t (p int, c int, r int, s int static, primary key(p, c));
OK
> insert into t (p, s) values (1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- that's right, there is a row now; what row though?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+
> -- let's add more rows
> insert into t (p, c, s) values (1,1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- we did not add more rows?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+

In LWT, a static row appears to exist when there is no regular row
matching WHERE. It would be nice to somehow either be consistent
in LET with existing SELECTs, or, so to speak, be consistently
inconsistent, i.e. consistent with some other vendor, and not come
up with a whole new semantics for static rows, different from LWT
and SELECTs.

This is why I was making all these comments about missing rows
-there is no incongruence in classic SQL, any vendor, because a)
there are no static rows b) NULLs are first-class values,
distinguishable from unset values.


--
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-15 Thread Konstantin Osipov

* bened...@apache.org  [22/06/15 18:38]:
> I expect LET to behave like SELECT, and I don’t expect this work to modify 
> the behaviour of normal CQL expressions. Do you think there is something 
> wrong or inconsistent about the behaviours you mention?
> 
> Static columns are a bit weird, but at the very least the following would 
> permit the user to reliably obtain a static value, if it exists:
> 
> LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1
> 
> This could be mixed with a clustering key query
> 
> LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
> clusteringKey = someOtherKey

I think static rows should not be selectable outside clustering
rows. This violates relational model. Unfortunately currently they
sometimes are. 

Here's an example:


> create table t (p int, c int, r int, s int static, primary key(p, c));
OK
> insert into t (p, s) values (1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- that's right, there is a row now; what row though?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+
> -- let's add more rows
> insert into t (p, c, s) values (1,1,1) if not exists;
+-+--+--+--+--+
| [applied]   | p| c| s| r|
|-+--+--+--+--|
| True| null | null | null | null |
+-+--+--+--+--+
> -- we did not add more rows?
> select count(*) from t;
+-+
|   count |
|-|
|   1 |
+-+

In LWT, a static row appears to exist when there is no regular row
matching WHERE. It would be nice to somehow either be consistent
in LET with existing SELECTs, or, so to speak, be consistently
inconsistent, i.e. consistent with some other vendor, and not come
up with a whole new semantics for static rows, different from LWT
and SELECTs.

This is why I was making all these comments about missing rows
-there is no incongruence in classic SQL, any vendor, because a)
there are no static rows b) NULLs are first-class values,
distinguishable from unset values.


-- 
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-15 Thread bened...@apache.org

I expect LET to behave like SELECT, and I don’t expect this work to modify the 
behaviour of normal CQL expressions. Do you think there is something wrong or 
inconsistent about the behaviours you mention?

Static columns are a bit weird, but at the very least the following would 
permit the user to reliably obtain a static value, if it exists:

LET x = some_static_column FROM table WHERE partitionKey = someKey LIMIT 1

This could be mixed with a clustering key query

LET y = some_regular_column FROM table WHERE partitionKey = someKey AND 
clusteringKey = someOtherKey


From: Konstantin Osipov 
Date: Wednesday, 15 June 2022 at 14:04
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
* bened...@apache.org  [22/06/15 10:00]:
> It sounds like we’re zeroing in on a solution.
>
> To draw attention back to Jon’s email, I think the last open question at this 
> point is the scope of identifiers declared by LET, and how we handle name 
> clashes with table columns in an UPDATE.
>
> I think we have basically two options:
>
> 1. Require LET for all input parameters to an assignment in UPDATE
> 2. Add some additional syntax to local variables to identify them, e.g. 
> 


I'm curious, regardless of the syntax you choose, will LET or
SELECT return the static row if there is no match for the
clustering key, or return NULL row?

I am asking because SELECT currently does not return any rows if
there is no clustering key matching the WHERE clause, but a conditional UPDATE
chooses the static row to check conditions instead, if it's present.

--
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-15 Thread Konstantin Osipov

* bened...@apache.org  [22/06/15 10:00]:
> It sounds like we’re zeroing in on a solution.
> 
> To draw attention back to Jon’s email, I think the last open question at this 
> point is the scope of identifiers declared by LET, and how we handle name 
> clashes with table columns in an UPDATE.
> 
> I think we have basically two options:
> 
> 1. Require LET for all input parameters to an assignment in UPDATE
> 2. Add some additional syntax to local variables to identify them, e.g. 
> 


I'm curious, regardless of the syntax you choose, will LET or
SELECT return the static row if there is no match for the
clustering key, or return NULL row?

I am asking because SELECT currently does not return any rows if
there is no clustering key matching the WHERE clause, but a conditional UPDATE
chooses the static row to check conditions instead, if it's present.

-- 
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread bened...@apache.org

+1

From: Blake Eggleston 
Date: Tuesday, 14 June 2022 at 21:46
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
I'd lean towards 3, where the statement doesn't parse because `miles` is 
ambiguous

On Jun 14, 2022, at 1:40 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

To be clear, the concerning situation is

BEGIN TRANSACTION
  LET miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’
  IF running THEN
UPDATE cars SET miles_driven = miles + 30 WHERE model='pinto';
  END IF
COMMIT TRANSACTION

But where there’s some additional column also called miles in cars

From: bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>>
Date: Tuesday, 14 June 2022 at 21:37
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Duplicate declarations are usually rejected by languages, so I think that’s 
fine?

Option 1 would involve something like

BEGIN TRANSACTION
  LET car_miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’
  LET user_miles = miles_driven FROM users WHERE name=’blake’
  SELECT running, car_miles, user_miles
  IF running THEN
UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
  END IF
COMMIT TRANSACTION

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 21:27
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Just to make sure I'm understanding correctly, I've been thinking of LET like a 
variable declaration and assignment, but is that the right mental model? For 
example, this is a valid statement:

BEGIN TRANSACTION
  LET miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’
  SELECT running, miles   # let the user know if the transaction takes any 
action
  IF running THEN
UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
  END IF
COMMIT TRANSACTION

But this isn't, because we're trying to bind to "miles" twice

BEGIN TRANSACTION
  LET miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’
  LET miles = miles_driven FROM users WHERE name=’blake’ # duplicate binding 
for "miles"
  SELECT running, miles   # let the user know if the transaction takes any 
action
  IF running THEN
UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
  END IF
COMMIT TRANSACTION

I think that's option #1, but I'm a little confused now that I'm looking at 
some of the examples.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:58 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
It sounds like we’re zeroing in on a solution.

To draw attention back to Jon’s email, I think the last open question at this 
point is the scope of identifiers declared by LET, and how we handle name 
clashes with table columns in an UPDATE.

I think we have basically two options:

1. Require LET for all input parameters to an assignment in UPDATE
2. Add some additional syntax to local variables to identify them, e.g. 

Any other ideas?

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 20:31
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Sorry, that was in reference to the "Would you require a LIMIT 1 clause if the 
key did not fully specify a row?" question, so I think we're in agreement here.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
> It seems like we would want to start with restrictions on number of rows, 
> uniqueness, homogeneity of results, etc

I am not keen on any hard limit on the number of rows, I anticipate a 
configurable guardrail for rejecting queries that are too expensive. I think 
the normal CQL restrictions are likely to apply (must include partition key), 
plus (initially) no range scans, and the aforementioned restrictions on what 
order statements must occur in the transaction.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 18:42
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of an

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread Derek Chen-Becker

OK, that makes sense. One of the examples in an earlier email had duplicate
"LET miles =" so I was confused. I think failing in the face of ambiguous
identifiers is going to be more friendly by not requiring LET for every
field you might want to use, and we can provide a very clear error message
in that case.

Cheers,

Derek

On Tue, Jun 14, 2022 at 2:40 PM bened...@apache.org 
wrote:

> To be clear, the concerning situation is
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   IF running THEN
>
> UPDATE cars SET miles_driven = miles + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> But where there’s some additional column also called *miles* in *cars*
>
>
>
>
>
> *From: *bened...@apache.org 
> *Date: *Tuesday, 14 June 2022 at 21:37
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Duplicate declarations are usually rejected by languages, so I think
> that’s fine?
>
>
>
> Option 1 would involve something like
>
>
>
> BEGIN TRANSACTION
>
>   LET car_miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>
>   SELECT running, car_miles, user_miles
>
>   IF running THEN
>
> UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
> UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Tuesday, 14 June 2022 at 21:27
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Just to make sure I'm understanding correctly, I've been thinking of LET
> like a variable declaration and assignment, but is that the right mental
> model? For example, this is a valid statement:
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   SELECT running, miles   # let the user know if the transaction takes any
> action
>
>   IF running THEN
>
> UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
> UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> But this isn't, because we're trying to bind to "miles" twice
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   LET miles = miles_driven FROM users WHERE name=’blake’ # duplicate
> binding for "miles"
>
>   SELECT running, miles   # let the user know if the transaction takes any
> action
>
>   IF running THEN
>
> UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
> UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> I think that's option #1, but I'm a little confused now that I'm looking
> at some of the examples.
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Tue, Jun 14, 2022 at 1:58 PM bened...@apache.org 
> wrote:
>
> It sounds like we’re zeroing in on a solution.
>
>
>
> To draw attention back to Jon’s email, I think the last open question at
> this point is the scope of identifiers declared by LET, and how we handle
> name clashes with table columns in an UPDATE.
>
>
>
> I think we have basically two options:
>
>
>
> 1. Require LET for all input parameters to an assignment in UPDATE
>
> 2. Add some additional syntax to local variables to identify them, e.g.
> 
>
>
>
> Any other ideas?
>
>
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Tuesday, 14 June 2022 at 20:31
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Sorry, that was in reference to the "Would you require a LIMIT 1 clause if
> the key did not fully specify a row?" question, so I think we're in
> agreement here.
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org 
> wrote:
>
> > It seems like we would want to start with restrictions on number of
> rows, uniqueness, homogeneity of results, etc
>
>
>
> I am not keen on any hard limit on the number of rows, I anticipate a
> configurable guardrail for rejecting queries that are too expensive. I
> think the normal CQL restrictions are likely to apply (must include
> partition key), plus (initially) no range scans, and th

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread Derek Chen-Becker

Just to make sure I'm understanding correctly, I've been thinking of LET
like a variable declaration and assignment, but is that the right mental
model? For example, this is a valid statement:

BEGIN TRANSACTION

  LET miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’
  SELECT running, miles   # let the user know if the transaction takes any
action

  IF running THEN

UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';

  END IF

COMMIT TRANSACTION

But this isn't, because we're trying to bind to "miles" twice

BEGIN TRANSACTION

  LET miles = miles_driven, running=is_running FROM cars WHERE model=’pinto’

  LET miles = miles_driven FROM users WHERE name=’blake’ # duplicate
binding for "miles"

  SELECT running, miles   # let the user know if the transaction takes any
action

  IF running THEN

UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';

  END IF

COMMIT TRANSACTION


I think that's option #1, but I'm a little confused now that I'm looking at
some of the examples.


Cheers,


Derek

On Tue, Jun 14, 2022 at 1:58 PM bened...@apache.org 
wrote:

> It sounds like we’re zeroing in on a solution.
>
>
>
> To draw attention back to Jon’s email, I think the last open question at
> this point is the scope of identifiers declared by LET, and how we handle
> name clashes with table columns in an UPDATE.
>
>
>
> I think we have basically two options:
>
>
>
> 1. Require LET for all input parameters to an assignment in UPDATE
>
> 2. Add some additional syntax to local variables to identify them, e.g.
> 
>
>
>
> Any other ideas?
>
>
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Tuesday, 14 June 2022 at 20:31
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Sorry, that was in reference to the "Would you require a LIMIT 1 clause if
> the key did not fully specify a row?" question, so I think we're in
> agreement here.
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org 
> wrote:
>
> > It seems like we would want to start with restrictions on number of
> rows, uniqueness, homogeneity of results, etc
>
>
>
> I am not keen on any hard limit on the number of rows, I anticipate a
> configurable guardrail for rejecting queries that are too expensive. I
> think the normal CQL restrictions are likely to apply (must include
> partition key), plus (initially) no range scans, and the aforementioned
> restrictions on what order statements must occur in the transaction.
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Tuesday, 14 June 2022 at 18:42
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> "MIXED" means, "hey, this might not be my standard PGSQL transaction" :)
>
>
>
> I do think that surprise is a meaningful measure, from the perspective of
> an individual developer coming to Cassandra from any arbitrary RDBMS. My
> own experience is that a non-trivial number of developers are essentially
> blindly following guidance given to them by someone else when it comes to
> features like transactions, so making syntax that looks superficially
> similar to SQL transactions but acts subtly different (or uses slightly
> different syntax) is going to be surprising. I think we get diminishing
> marginal returns on "it looks just like SQL!" when we start to venture
> further into territory where even different RDMBSs disagree. I would rather
> use some syntax that is clearly Cassandra-specific, even if the structure
> would be similar to a SQL transaction, just to ensure that developers
> understand that it's different and actually look at the docs.
>
>
>
> I completely agree on focusing on clarity and consistency, and I think
> considering how we think it might evolve is good, but that can't be an
> open-ended exercise. My primary concern is how we can start getting
> incremental improvements into end users' hands more quickly, since the
> alternative right now is to basically roll your own, right?
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org 
> wrote:
>
> What on earth does MIXED mean?
>
>
>
> I agree with the sentiment we should minimise surprise, but everyone is
> surprised differently so it becomes a sort of pointless rubrik, everyone
> claiming it supports their view. I think it is only useful in cases where
> there is clear agreement that so

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread bened...@apache.org

(or 3. Let schema updates break the statement – this might actually be 
preferable, so long as it fails-fast rather than corrupts behaviour)

From: bened...@apache.org 
Date: Tuesday, 14 June 2022 at 20:58
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
It sounds like we’re zeroing in on a solution.

To draw attention back to Jon’s email, I think the last open question at this 
point is the scope of identifiers declared by LET, and how we handle name 
clashes with table columns in an UPDATE.

I think we have basically two options:

1. Require LET for all input parameters to an assignment in UPDATE
2. Add some additional syntax to local variables to identify them, e.g. 

Any other ideas?

From: Derek Chen-Becker 
Date: Tuesday, 14 June 2022 at 20:31
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Sorry, that was in reference to the "Would you require a LIMIT 1 clause if the 
key did not fully specify a row?" question, so I think we're in agreement here.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
> It seems like we would want to start with restrictions on number of rows, 
> uniqueness, homogeneity of results, etc

I am not keen on any hard limit on the number of rows, I anticipate a 
configurable guardrail for rejecting queries that are too expensive. I think 
the normal CQL restrictions are likely to apply (must include partition key), 
plus (initially) no range scans, and the aforementioned restrictions on what 
order statements must occur in the transaction.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 18:42
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of an 
individual developer coming to Cassandra from any arbitrary RDBMS. My own 
experience is that a non-trivial number of developers are essentially blindly 
following guidance given to them by someone else when it comes to features like 
transactions, so making syntax that looks superficially similar to SQL 
transactions but acts subtly different (or uses slightly different syntax) is 
going to be surprising. I think we get diminishing marginal returns on "it 
looks just like SQL!" when we start to venture further into territory where 
even different RDMBSs disagree. I would rather use some syntax that is clearly 
Cassandra-specific, even if the structure would be similar to a SQL 
transaction, just to ensure that developers understand that it's different and 
actually look at the docs.

I completely agree on focusing on clarity and consistency, and I think 
considering how we think it might evolve is good, but that can't be an 
open-ended exercise. My primary concern is how we can start getting incremental 
improvements into end users' hands more quickly, since the alternative right 
now is to basically roll your own, right?

Cheers,

Derek

On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek

On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggl

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread bened...@apache.org

It sounds like we’re zeroing in on a solution.

To draw attention back to Jon’s email, I think the last open question at this 
point is the scope of identifiers declared by LET, and how we handle name 
clashes with table columns in an UPDATE.

I think we have basically two options:

1. Require LET for all input parameters to an assignment in UPDATE
2. Add some additional syntax to local variables to identify them, e.g. 

Any other ideas?

From: Derek Chen-Becker 
Date: Tuesday, 14 June 2022 at 20:31
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Sorry, that was in reference to the "Would you require a LIMIT 1 clause if the 
key did not fully specify a row?" question, so I think we're in agreement here.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
> It seems like we would want to start with restrictions on number of rows, 
> uniqueness, homogeneity of results, etc

I am not keen on any hard limit on the number of rows, I anticipate a 
configurable guardrail for rejecting queries that are too expensive. I think 
the normal CQL restrictions are likely to apply (must include partition key), 
plus (initially) no range scans, and the aforementioned restrictions on what 
order statements must occur in the transaction.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 18:42
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of an 
individual developer coming to Cassandra from any arbitrary RDBMS. My own 
experience is that a non-trivial number of developers are essentially blindly 
following guidance given to them by someone else when it comes to features like 
transactions, so making syntax that looks superficially similar to SQL 
transactions but acts subtly different (or uses slightly different syntax) is 
going to be surprising. I think we get diminishing marginal returns on "it 
looks just like SQL!" when we start to venture further into territory where 
even different RDMBSs disagree. I would rather use some syntax that is clearly 
Cassandra-specific, even if the structure would be similar to a SQL 
transaction, just to ensure that developers understand that it's different and 
actually look at the docs.

I completely agree on focusing on clarity and consistency, and I think 
considering how we think it might evolve is good, but that can't be an 
open-ended exercise. My primary concern is how we can start getting incremental 
improvements into end users' hands more quickly, since the alternative right 
now is to basically roll your own, right?

Cheers,

Derek

On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek

On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the sa

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread Derek Chen-Becker

Sorry, that was in reference to the "Would you require a LIMIT 1 clause if
the key did not fully specify a row?" question, so I think we're in
agreement here.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org 
wrote:

> > It seems like we would want to start with restrictions on number of
> rows, uniqueness, homogeneity of results, etc
>
>
>
> I am not keen on any hard limit on the number of rows, I anticipate a
> configurable guardrail for rejecting queries that are too expensive. I
> think the normal CQL restrictions are likely to apply (must include
> partition key), plus (initially) no range scans, and the aforementioned
> restrictions on what order statements must occur in the transaction.
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Tuesday, 14 June 2022 at 18:42
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> "MIXED" means, "hey, this might not be my standard PGSQL transaction" :)
>
>
>
> I do think that surprise is a meaningful measure, from the perspective of
> an individual developer coming to Cassandra from any arbitrary RDBMS. My
> own experience is that a non-trivial number of developers are essentially
> blindly following guidance given to them by someone else when it comes to
> features like transactions, so making syntax that looks superficially
> similar to SQL transactions but acts subtly different (or uses slightly
> different syntax) is going to be surprising. I think we get diminishing
> marginal returns on "it looks just like SQL!" when we start to venture
> further into territory where even different RDMBSs disagree. I would rather
> use some syntax that is clearly Cassandra-specific, even if the structure
> would be similar to a SQL transaction, just to ensure that developers
> understand that it's different and actually look at the docs.
>
>
>
> I completely agree on focusing on clarity and consistency, and I think
> considering how we think it might evolve is good, but that can't be an
> open-ended exercise. My primary concern is how we can start getting
> incremental improvements into end users' hands more quickly, since the
> alternative right now is to basically roll your own, right?
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org 
> wrote:
>
> What on earth does MIXED mean?
>
>
>
> I agree with the sentiment we should minimise surprise, but everyone is
> surprised differently so it becomes a sort of pointless rubrik, everyone
> claiming it supports their view. I think it is only useful in cases where
> there is clear agreement that something is surprising, but unhelpful when
> choosing between subtle variations on approach.
>
>
>
> The main goal IMO should be clarity and consistency, so that the user can
> reason about the constructs easily, and so we can evolve them.
>
>
>
> For instance, we should be sure to consider how the syntax will look if we
> **do** offer interactive transactions, or JOINs, or anything else we
> might add in future.
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Monday, 13 June 2022 at 23:09
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
> wrote:
>
> I prefer an approach that supports an accurate mental model of what’s
> happening behind the scenes. I think that should be a design priority for
> the syntax. We’ll be able to build things on top of accord, but the core
> multi-key cas operation isn’t going to change too much.
>
>
>
> +1, the principle of least surprise tells me that if this doesn't behave
> exactly like SQL transactions (for whatever SQL actually means), it could
> be more clear to not try and emulate it halfway
>
>
>
> BEGIN MIXED TRANSACTION?
>
>
>
> Derek
>
>
>
>
>
>
>
> On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
> wrote:
>
>
>
> Does the IF <...> ABORT simplify reasoning though? If you restrict it to
> only dealing with the most recent row it would, but referencing the name
> implies you’d be able to include references from other operations, in which
> case you’d have the same problem.
>
> > return instead an exception if the transaction is aborted
>
> Since the txn is not actually interactive, I think it would be better to
> receive values instead of an excetion, to understand why the operation was
> rolled back.
>
>
>
> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz  wrote:
>
>
>
> Benedict,
>
>
>
> I'm really excited about this feature.  I've been observing

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread bened...@apache.org

> It seems like we would want to start with restrictions on number of rows, 
> uniqueness, homogeneity of results, etc

I am not keen on any hard limit on the number of rows, I anticipate a 
configurable guardrail for rejecting queries that are too expensive. I think 
the normal CQL restrictions are likely to apply (must include partition key), 
plus (initially) no range scans, and the aforementioned restrictions on what 
order statements must occur in the transaction.

From: Derek Chen-Becker 
Date: Tuesday, 14 June 2022 at 18:42
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of an 
individual developer coming to Cassandra from any arbitrary RDBMS. My own 
experience is that a non-trivial number of developers are essentially blindly 
following guidance given to them by someone else when it comes to features like 
transactions, so making syntax that looks superficially similar to SQL 
transactions but acts subtly different (or uses slightly different syntax) is 
going to be surprising. I think we get diminishing marginal returns on "it 
looks just like SQL!" when we start to venture further into territory where 
even different RDMBSs disagree. I would rather use some syntax that is clearly 
Cassandra-specific, even if the structure would be similar to a SQL 
transaction, just to ensure that developers understand that it's different and 
actually look at the docs.

I completely agree on focusing on clarity and consistency, and I think 
considering how we think it might evolve is good, but that can't be an 
open-ended exercise. My primary concern is how we can start getting incremental 
improvements into end users' hands more quickly, since the alternative right 
now is to basically roll your own, right?

Cheers,

Derek

On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.

From: Derek Chen-Becker mailto:de...@chen-becker.org>>
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek

On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.

On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult t

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread Derek Chen-Becker

"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of
an individual developer coming to Cassandra from any arbitrary RDBMS. My
own experience is that a non-trivial number of developers are essentially
blindly following guidance given to them by someone else when it comes to
features like transactions, so making syntax that looks superficially
similar to SQL transactions but acts subtly different (or uses slightly
different syntax) is going to be surprising. I think we get diminishing
marginal returns on "it looks just like SQL!" when we start to venture
further into territory where even different RDMBSs disagree. I would rather
use some syntax that is clearly Cassandra-specific, even if the structure
would be similar to a SQL transaction, just to ensure that developers
understand that it's different and actually look at the docs.

I completely agree on focusing on clarity and consistency, and I think
considering how we think it might evolve is good, but that can't be an
open-ended exercise. My primary concern is how we can start getting
incremental improvements into end users' hands more quickly, since the
alternative right now is to basically roll your own, right?

Cheers,

Derek

On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org 
wrote:

> What on earth does MIXED mean?
>
>
>
> I agree with the sentiment we should minimise surprise, but everyone is
> surprised differently so it becomes a sort of pointless rubrik, everyone
> claiming it supports their view. I think it is only useful in cases where
> there is clear agreement that something is surprising, but unhelpful when
> choosing between subtle variations on approach.
>
>
>
> The main goal IMO should be clarity and consistency, so that the user can
> reason about the constructs easily, and so we can evolve them.
>
>
>
> For instance, we should be sure to consider how the syntax will look if we
> **do** offer interactive transactions, or JOINs, or anything else we
> might add in future.
>
>
>
>
>
> *From: *Derek Chen-Becker 
> *Date: *Monday, 13 June 2022 at 23:09
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
> wrote:
>
> I prefer an approach that supports an accurate mental model of what’s
> happening behind the scenes. I think that should be a design priority for
> the syntax. We’ll be able to build things on top of accord, but the core
> multi-key cas operation isn’t going to change too much.
>
>
>
> +1, the principle of least surprise tells me that if this doesn't behave
> exactly like SQL transactions (for whatever SQL actually means), it could
> be more clear to not try and emulate it halfway
>
>
>
> BEGIN MIXED TRANSACTION?
>
>
>
> Derek
>
>
>
>
>
>
>
> On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
> wrote:
>
>
>
> Does the IF <...> ABORT simplify reasoning though? If you restrict it to
> only dealing with the most recent row it would, but referencing the name
> implies you’d be able to include references from other operations, in which
> case you’d have the same problem.
>
> > return instead an exception if the transaction is aborted
>
> Since the txn is not actually interactive, I think it would be better to
> receive values instead of an excetion, to understand why the operation was
> rolled back.
>
>
>
> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz  wrote:
>
>
>
> Benedict,
>
>
>
> I'm really excited about this feature.  I've been observing this
> conversation for a while now, and I"m happy to add some thoughts.
>
>
>
> We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.
>
>
>
> I think taking small steps forward, to build a few complete features as
> close to SQL as possible is a good approach.
>
>
>
> question we are currently asking: do we want to have a more LWT-like
> approach... or do we want a more SQL-like approach
>
>
>
> For years now we've been fighting this notion that Cassandra is difficult
> to use.  Coming up with specialized syntax isn't going to bridge that
> divide.  From a (new?) user perspective, the best plan is to stay as
> consistent with SQL as possible.
>
>
>
> I believe that is a MySQL specific concept. This is one problem with
> mimicking SQL – it’s not one thing!
>
>
>
> Right?!?!  As if this needed to be more complex.
>
>
>
> I think we have evidence that it

Re: CEP-15 multi key transaction syntax

2022-06-14 Thread bened...@apache.org

> I … couldn't find an implementation that wasn't vendor specific.

I’ve fallen into the same trap as others. You’re right, all control flow is 
vendor specific it turns out. So, we either need to consciously pick an SQL 
dialect to mimic (probably the safest would be pgsql), or make sure we are 
distinct.

I should say that I’m *not* opposed to the COMMIT IF syntax (and this factoid 
above further cements that lack of opposition), I just want to be sure our 
syntax can handle evolution of the feature without getting confused/confusing.

If we rule out ever offering the succinct UPDATE x … AS syntax then I don’t 
think it will ever be confusing, it might simply become defunct (not ideal, but 
not the end of the world).

We have a few more things to figure out:

1) Do we automatically turn SELECT statements with a single row into something 
addressable? I like the brevity offered, but I’m not aware of other SQL-like 
languages where this happens. I think the norm is IF (SELECT x FROM…) THEN; or 
you can declare variables.

This also makes it hard right now to declare SELECT statements that return to 
the user and those that do not without introducing additional non-standard 
modifications to COMMIT or SELECT.

Alternatives might be to either require a full SELECT inside the IF for now; or 
to introduce a LET x=, y= FROM… AS z, to make clear we’re declaring some 
variables we can use in expressions.

2) How do we return success/failure. The IF (X) THEN… approach would naturally 
return nothing, nor throw an exception, so we might want to offer the user the 
ability to perform SELECT within the IF so that the presence of a resultset 
defines success/failure – we could even offer the user SELECT ? or to 
return the value of the boolean SELECT X; IF (X) THEN UPDATE y…; END IF

3) The AS syntax – do we want this to look more like SQL, i.e. SELECT X FROM 
tbl AS mytableref?

From: Blake Eggleston 
Date: Tuesday, 14 June 2022 at 00:33
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> It’s something to hammer out in more detail once we get these other questions 
> pinned down, as I think we can figure out a good compromise.

+1

> self contained, one-off statement like these

I meant with the if statements inline? I hadn't encountered them before myself, 
and couldn't find an implementation that wasn't vendor specific.

Regarding commit if, I'd be totally fine settling on this:

BEGIN TRANSACTION
IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END
COMMIT TRANSACTION

I prefer it to if...abort and commit if ... isn't popular.

On Jun 13, 2022, at 4:14 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> Like I mentioned in my earlier email, the if/abort syntax throwing an 
> exception would, at least as described, limit useful data returned to the 
> client

Right, I agree. I think this is orthogonal to the other syntax questions. I 
think it is also preferable not to mix success/failure with data results, and 
that might be preferable for both syntaxes. It’s something to hammer out in 
more detail once we get these other questions pinned down, as I think we can 
figure out a good compromise.

> At a higher level, what I meant was that SQL doesn’t have a self contained, 
> one-off statement like these

I’m not sure what you mean? It definitely does? In fact, this was how I most 
often used SQL when I worked with it – non-interactively, with explicit 
transactions as part of a single submission to the server, as this reduced the 
number of round-trips but kept the SQL in version control. Stored procedures 
are just a way of doing this with the SQL saved server-side, and accepting 
explicit parameters, but they’re just a convenience?

> and Cassandra doesn’t have interactive transactions

Yet!

> Incidentally, I think it would be useful to eventually have multiple IF 
> branches inline, and had meant the COMMIT IF as a shorthand for it

I agree it would be nice to support more general IF statements, for both 
positive and negative control flow (i.e. IF (X) THEN UPDATE Y, but also IF (X) 
THEN ABORT/ROLLBACK/RAISERROR).

I’m not sure if COMMIT IF really works as syntactic sugar for the more complex 
construct you outlined, though? Perhaps we could instead offer

IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END

For now we could require that at most one such statement occurs per 
transaction, and encapsulates the whole transaction, e.g.

BEGIN TRANSACTION
IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END
COMMIT TRANSACTION

It would be q

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org

> Like I mentioned in my earlier email, the if/abort syntax throwing an 
> exception would, at least as described, limit useful data returned to the 
> client

Right, I agree. I think this is orthogonal to the other syntax questions. I 
think it is also preferable not to mix success/failure with data results, and 
that might be preferable for both syntaxes. It’s something to hammer out in 
more detail once we get these other questions pinned down, as I think we can 
figure out a good compromise.

> At a higher level, what I meant was that SQL doesn’t have a self contained, 
> one-off statement like these

I’m not sure what you mean? It definitely does? In fact, this was how I most 
often used SQL when I worked with it – non-interactively, with explicit 
transactions as part of a single submission to the server, as this reduced the 
number of round-trips but kept the SQL in version control. Stored procedures 
are just a way of doing this with the SQL saved server-side, and accepting 
explicit parameters, but they’re just a convenience?

> and Cassandra doesn’t have interactive transactions

Yet!

> Incidentally, I think it would be useful to eventually have multiple IF 
> branches inline, and had meant the COMMIT IF as a shorthand for it

I agree it would be nice to support more general IF statements, for both 
positive and negative control flow (i.e. IF (X) THEN UPDATE Y, but also IF (X) 
THEN ABORT/ROLLBACK/RAISERROR).

I’m not sure if COMMIT IF really works as syntactic sugar for the more complex 
construct you outlined, though? Perhaps we could instead offer

IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END

For now we could require that at most one such statement occurs per 
transaction, and encapsulates the whole transaction, e.g.

BEGIN TRANSACTION
IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END
COMMIT TRANSACTION

It would be quite easy to relax this (maybe even before release), but it gets 
us off the starting block without planned obsolescence.

From: Blake Eggleston 
Date: Monday, 13 June 2022 at 23:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> I think it is far more problematic to introduce a syntax that would not be 
> consistent with future enhancements to transactional functionality. Then we 
> would have to introduce a third syntax, and more syntaxes makes for a messy 
> language IMO.
> I have a very strong preference for choosing a syntax we can evolve 
> consistently, so that users just gain additional keywords or have 
> restrictions relaxed as the feature evolves.

I think our views and goals as pretty strongly aligned here.

> How so? I think all we’re really considering is *not* introducing the IF part 
> of the COMMIT syntax, which is not-SQL-like

Like I mentioned in my earlier email, the if/abort syntax throwing an exception 
would, at least as described, limit useful data returned to the client. 
Solvable depending on how we settle on what data is returned to the client 
though.

At a higher level, what I meant was that SQL doesn’t have a self contained, 
one-off statement like these (stored procedures/functions are close[1], but 
different), and Cassandra doesn’t have interactive transactions. So the 
argument that something is more SQL like when putting syntax meant for 
interactive transactions into Cassandra’s atomic txns isn’t very convincing imo.

Incidentally, I think it would be useful to eventually have multiple IF 
branches inline, and had meant the COMMIT IF as a shorthand for it. Something 
like

BEGIN TRANSACTION;
SELECT * FROM sometable WHERE key=5 AS sel;
UPDATE sometable SET lastread=now() WHERE key=5;
IF sel.someval = 3 THEN
UPDATE someothertable SET anotherval=14 WHERE key=10;
ELSE IF sel.somval = 4 THEN
UPDATE someothertable SET anotherval=13 WHERE key=10;
ELSE
UPDATE someothertable SET anotherval=12 WHERE key=10;
ENDIF;
COMMIT TRANSACTION;

And for extra fun, here’s an early mockup I did based on the Postgres function 
syntax: https://gist.github.com/bdeggleston/51d5510450a1d7549f725e06d871cc60

> Do we require these to be declared first? If so, the problem of ambiguity 
> goes away at least, ignoring everything else.
> Perhaps we can do that initially either way? It makes both syntaxes easier to 
> implement, so we get our MVP more easily. But if we settle what our preferred 
> syntax is, we can see if there’s time to deliver it before a release. Either 
> way, the syntax evolves on a consistent path.

Yes, that’s the idea.

On Jun 13, 2022, at 1:21 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> Don’t call these transactions, the term implies things accord do

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org

> is there a subset … that could be implemented as an initial version and then 
> grown over time to include more powerful features?

This is what I would like to aim for, but it’s hard as we probably don’t agree 
in what direction the feature will develop.

My view is that we are more likely than not to develop creeping SQL-like 
functionality over time, in which case it is perhaps good to plan for this 
intentionally from the start.

SQL has decades of work behind it, so we run less risk of taking a design 
deadend, and finding ourselves in a bind when further evolving the language.

I think the way to approach that is to ensure that we do a mix of the following:

1) Ensure any keywords we copy from SQL work very similarly to their SQL 
counterpart, with only some additional restrictions (esp. when we expect to be 
able to later relax them)
2) Where we can’t reasonably do that, introduce new keywords that look and feel 
like SQL but aren’t, so there is no confusion

From: Derek Chen-Becker 
Date: Monday, 13 June 2022 at 23:07
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
I'm coming to this thread fresh and admittedly I'm still trying to catch up and 
wrap my head around it. I think it's already been called out, but what looked 
superficially simple at the beginning of the thread has quickly become 
something that I'm having to take notes on to make sure I understand the 
semantics. I'm a little worried that there are complexities here that we might 
not realize. I like the idea, and I think it's a really powerful addition to 
CQL, but I think we need to make sure we're not setting up users for confusion. 
CQL is great because it leverages knowledge of SQL, but the devil is in the 
differences.

Also, related to complexity, is there a subset of what's being discussed that 
could be implemented as an initial version and then grown over time to include 
more powerful features?

In terms of things that have been discussed so far, in no particular order, the 
AS keyword seems to give the user reasonable control over whether they get the 
pre- or post-update version of the record. Similarly, I think the IF...ABORT 
syntax is much clearer if using AS, since that keyword then decides which 
version of the row to use for the condition. Consider the following (possibly 
incorrect) example:

BEGIN TRANSACTION
SELECT * from cars where ... AS car
IF car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ...
COMMIT TRANSACTION

vs

BEGIN TRANSACTION
SELECT * FROM cars WHERE ... AS current_car
IF current_car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ... AS car
COMMIT TRANSACTION

Cheers,

Derek

On Sun, Jun 12, 2022 at 5:34 AM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we wan

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org

What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.

From: Derek Chen-Becker 
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek

On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.

On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult to 
use.  Coming up with specialized syntax isn't going to bridge that divide.  
From a (new?) user perspective, the best plan is to stay as consistent with SQL 
as possible.

I believe that is a MySQL specific concept. This is one problem with mimicking 
SQL – it’s not one thing!

Right?!?!  As if this needed to be more complex.

I think we have evidence that it is fine to interpret NULL as “false” for the 
evaluation of IF conditions.

Agree.  Null == false isn't too much of a leap.

Thanks for taking up the charge on this one.  Glad to see it moving forward!

Thanks,

Aaron

On Sun, Jun 12, 2022 at 10:33 AM 
bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
Welcome Li, and thanks for your input

> When I first saw the syntax, I took it for granted that the condition was 
> evaluated against the state AFTER the updates

Depending what you mean, I think this is one of the options being considered. 
At least, it seems this syntax is most likely to be evaluated against the 
values written by preceding statements in the batch, but not the statement 
itself (or later ones), as this could lead to nonsensical statements like

BEGIN TRANSACTION
UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
COMMIT TRANSACTION IF tbl.v = 0

Where y is never 0 afterwards, so this never succeeds. I take it in this simple 
case you would expect the condition to be evaluated against the state prior to 
the statement (i.e. the initial state)?

But we have a blank slate, so every option is available to us! We just need to 
make sure it makes sense to the user, even in uncommon cases.

> The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> tend to put the condition closer to the related SELECT statement.

This is probably not going to matter in practice. The SELECTs all happen 
upfront no matter what the CQL might look like, and the UPDATE all happen only 
after the IF conditions are evaluated. This is all just a question of how the 
user expresses things.

In future we may offer interactive transactions, or transactions that are 
multi-step, in which case this would be more relevant and could have an 
efficiency impact.

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Derek Chen-Becker

 sure, and will defer to the
>> opinions of others here. There won’t be any optimisation impact, as we
>> simply check if the transaction contains any updates, but some validation
>> could be helpful for the user.
>>
>>
>>
>> > Finally, I wonder if the community would be interested in idempotency
>> support.
>>
>>
>>
>> This is something that has been considered, and that Accord is able to
>> support (in a couple of ways), but as an end-to-end feature this requires
>> client support and other scaffolding that is not currently
>> planned/scheduled. The simplest (least robust) approach is for the server
>> to include the transaction’s identifier in its timeout, so that it be
>> queried by the client to establish if it has been made durable. This should
>> be quite easy to deliver on the server-side, but would require some
>> application or client integration, and is unreliable in the face of
>> coordinator failure (so the transaction id is unknown to the client). The
>> more complete approach is for the client to include an idempotency token in
>> its submission to the server, and for C* to record this alongside the
>> transaction id, and for some bounded time window to either reject
>> re-submissions of this token or to evaluate it as a no-op. This requires
>> much tighter integration from the clients, and more work server-side.
>>
>>
>>
>> Which is simply to say, this is on our radar but I can’t make promises
>> about what form it will take, or when it will arrive, only that it has been
>> planned for enough to ensure we can achieve it when resources permit.
>>
>>
>>
>> *From: *Li Boxuan 
>> *Date: *Sunday, 12 June 2022 at 16:14
>> *To: *dev@cassandra.apache.org 
>> *Subject: *Re: CEP-15 multi key transaction syntax
>>
>> Correcting my typo:
>>
>>
>>
>> >  I took it for granted that the condition was evaluated against the
>> state before the updates
>>
>>
>>
>> I took it for granted that the condition was evaluated against the state
>> AFTER the updates
>>
>>
>>
>> On Jun 12, 2022, at 11:07 AM, Li Boxuan  wrote:
>>
>>
>>
>> Thank you team for this exciting update! I just joined the dev mailing
>> list to take part in this discussion. I am not a Cassandra developer and
>> haven’t understood Accord myself yet, so my questions are more from a
>> user’s standpoint.
>>
>>
>>
>> > The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we
>> only get one chance to make this API right.
>>
>>
>>
>> I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax,
>> I took it for granted that the condition was evaluated against the state
>> after the updates, but it turned out to be the opposite. Thus I prefer the
>> IF (Boolean expr) ABORT TRANSACTION idea. In addition, when the transaction
>> is large and there are many conditions, using the COMMIT IF syntax might
>> make the CQL query uglier and developers’ life harder. Another very subtle
>> point is if there are many conditions combined using AND clauses, wouldn't
>> it make the execution slightly slower because, for each SELECT statement,
>> you would have to check every condition? The IF (Boolean expr) ABORT
>> TRANSACTION would suffer less because users may tend to put the condition
>> closer to the related SELECT statement.
>>
>>
>>
>> > read-only transactions involving multiple tables will definitely be
>> supported.
>>
>>
>>
>> Would you consider allowing users to start a read-only transaction
>> explicitly like BEGIN TRANSACTION READONLY? This could help catch some
>> developers’ bugs like unintentional updates. This might also give Cassandra
>> a hint for optimization.
>>
>>
>>
>> Finally, I wonder if the community would be interested in idempotency
>> support. DynamoDB has this interesting feature (
>> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
>> which guards the situation where the same transaction is submitted multiple
>> times due to a connection time-out or other connectivity issue. I have no
>> idea how that is implemented under the hood and I don’t even know if this
>> is technically possible with the Accord design, but I thought it would be
>> interesting to think about.
>>
>>
>>
>> Best regards,
>>
>> Boxuan
>>
>>
>>
>>
>>
>> On Jun 12, 2022, at 7:31 AM, bened...@a

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Derek Chen-Becker

I'm coming to this thread fresh and admittedly I'm still trying to catch up
and wrap my head around it. I think it's already been called out, but what
looked superficially simple at the beginning of the thread has quickly
become something that I'm having to take notes on to make sure I understand
the semantics. I'm a little worried that there are complexities here that
we might not realize. I like the idea, and I think it's a really powerful
addition to CQL, but I think we need to make sure we're not setting up
users for confusion. CQL is great because it leverages knowledge of SQL,
but the devil is in the differences.

Also, related to complexity, is there a subset of what's being discussed
that could be implemented as an initial version and then grown over time to
include more powerful features?

In terms of things that have been discussed so far, in no particular order,
the AS keyword seems to give the user reasonable control over whether they
get the pre- or post-update version of the record. Similarly, I think the
IF...ABORT syntax is much clearer if using AS, since that keyword then
decides which version of the row to use for the condition. Consider the
following (possibly incorrect) example:

BEGIN TRANSACTION
SELECT * from cars where ... AS car
IF car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ...
COMMIT TRANSACTION

vs

BEGIN TRANSACTION
SELECT * FROM cars WHERE ... AS current_car
IF current_car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ... AS car
COMMIT TRANSACTION

Cheers,

Derek

On Sun, Jun 12, 2022 at 5:34 AM bened...@apache.org 
wrote:

> > I would love hearing from people on what they think.
>
>
>
> ^^ It would be great to have more participants in this conversation
>
>
>
> > For context, my questions earlier were based on my 20+ years of using
> SQL transactions across different systems.
>
>
>
> We probably don’t come from a very different place. I spent too many years
> with T-SQL.
>
>
>
> > When you start a SQL transaction, you are creating a branch of your
> data that you can operate with until you reach your desired state and then
> merge it back with a commit.
>
>
>
> That’s the essential complexity we’re grappling with: how much do we
> permit your “branch” to do, how do we let you express it, and how do we let
> you express conditions?
>
>
>
> We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.
>
>
>
> Right now, we have the issue that read-your-writes introduces some
> complexity to the semantics, particularly around the conditions of
> execution.
>
>
>
> LWTs impose conditions on the state of all records prior to execution, but
> their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean
> expr) is most consistent with this approach. This can be confusing, though,
> if the condition is evaluated on a value that has been updated by a prior
> statement in the batch – what value does this global condition get
> evaluated against?*
>
>
>
> SQL has no such concept, but also SQL is designed to be interactive.
> Depending on the dialect there’s probably a lot of ways to do this
> non-interactively in SQL, but we probably cannot cheaply replicate the
> functionality exactly as we do not (yet) support interactive transactions
> that they were designed for. To submit a conditional non-interactive
> transaction in SQL, you would likely use
>
>
>
> IF (X) THEN
>
> ROLLBACK
>
> RETURN (ERRCODE)
>
> END IF
>
>
>
> or
>
>
>
> IF (X) THEN RAISERROR
>
>
>
> So, that is in essence the question we are currently asking: do we want to
> have a more LWT-like approach (and if so, how do we address this complexity
> for the user), or do we want a more SQL-like approach (and if so, how do we
> modify it to make non-interactive transactions convenient, and
> implementation tractable)
>
>
>
> * This is anyway a shortcoming of existing batches, I think? So it might
> be we can sweep it under the rug, but I think it will be more relevant here
> as people execute more complex transactions, and we should ideally have
> semantics that will work well into the future – including if we later
> introduce interactive transactions.
>
>
>
>
>
>
>
>
>
>
>
> *From: *Patrick McFadin 
> *Date: *Saturday, 11 June 2022 at 15:33
> *To: *dev 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> I think the syntax is evolving into something pretty complicated, which
> may be warranted but I wanted to take a step back a

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org

> Don’t call these transactions, the term implies things accord doesn’t do. 
> Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.

The condition is optional, so CAS is not accurate. These are also definitely 
transactions, they are only non-interactive - transactions in SQL are also 
often non-interactive (e.g. within a stored procedure).

I think it is far more problematic to introduce a syntax that would not be 
consistent with future enhancements to transactional functionality. Then we 
would have to introduce a third syntax, and more syntaxes makes for a messy 
language IMO.

I have a very strong preference for choosing a syntax we can evolve 
consistently, so that users just gain additional keywords or have restrictions 
relaxed as the feature evolves.

> Supporting an SQL like syntax implies capabilities that we can’t provide, so 
> you’re delivering something that looks familiar

How so? I think all we’re really considering is *not* introducing the IF part 
of the COMMIT syntax, which is not-SQL-like, and instead offering a way of 
aborting transactions consistent with how it might be done in SQL. This doesn’t 
implement partial SQL functionality, nor look especially not-CQL, it’s just 
more similar control flow so familiar.

> Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
> dealing with the most recent row it would

If a condition is evaluated against the current value of any record (as of the 
point the condition’s declaration) then it would seem more obvious than were 
the COMMIT IF to be evaluated against the state prior to the value’s 
declaration, as the IF appears to execute last.

> Remove named updates, column references must come from selects. More verbose, 
> but crystal clear with regards to when/where values come from.

Do we require these to be declared first? If so, the problem of ambiguity goes 
away at least, ignoring everything else.

Perhaps we can do that initially either way? It makes both syntaxes easier to 
implement, so we get our MVP more easily. But if we settle what our preferred 
syntax is, we can see if there’s time to deliver it before a release. Either 
way, the syntax evolves on a consistent path.

From: Blake Eggleston 
Date: Monday, 13 June 2022 at 20:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Regarding modeling syntax after SQL... that approach has pros and cons. 
Supporting an SQL like syntax implies capabilities that we can’t provide, so 
you’re delivering something that looks familiar, but behaves differently, which 
doesn’t help us with usability.

I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

So I have 2 contrarian proposals:
1. Remove named updates, column references must come from selects. More 
verbose, but crystal clear with regards to when/where values come from.
2. Don’t call these transactions, the term implies things accord doesn’t do. 
Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.

Although less exciting, this would simplify the initial implementation, and let 
feature requests and first hand experience inform where and how the syntax 
develops from there.

Blake

On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.

On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult to 
use.  Coming up with specialized syntax isn't going to bridge that divide.  
From a (new?) user perspective, the best plan is to stay as consistent with SQL 
as possible.

I belie

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Blake Eggleston

ent.
>> 
>>  
>> 
>> This is probably not going to matter in practice. The SELECTs all happen 
>> upfront no matter what the CQL might look like, and the UPDATE all happen 
>> only after the IF conditions are evaluated. This is all just a question of 
>> how the user expresses things.
>> 
>>  
>> 
>> In future we may offer interactive transactions, or transactions that are 
>> multi-step, in which case this would be more relevant and could have an 
>> efficiency impact.
>> 
>>  
>> 
>> > Would you consider allowing users to start a read-only transaction 
>> > explicitly like BEGIN TRANSACTION READONLY?
>> 
>>  
>> 
>> Good question. I would be OK with this, for sure, and will defer to the 
>> opinions of others here. There won’t be any optimisation impact, as we 
>> simply check if the transaction contains any updates, but some validation 
>> could be helpful for the user.
>> 
>>  
>> 
>> > Finally, I wonder if the community would be interested in idempotency 
>> > support. 
>> 
>>  
>> 
>> This is something that has been considered, and that Accord is able to 
>> support (in a couple of ways), but as an end-to-end feature this requires 
>> client support and other scaffolding that is not currently 
>> planned/scheduled. The simplest (least robust) approach is for the server to 
>> include the transaction’s identifier in its timeout, so that it be queried 
>> by the client to establish if it has been made durable. This should be quite 
>> easy to deliver on the server-side, but would require some application or 
>> client integration, and is unreliable in the face of coordinator failure (so 
>> the transaction id is unknown to the client). The more complete approach is 
>> for the client to include an idempotency token in its submission to the 
>> server, and for C* to record this alongside the transaction id, and for some 
>> bounded time window to either reject re-submissions of this token or to 
>> evaluate it as a no-op. This requires much tighter integration from the 
>> clients, and more work server-side.
>> 
>>  
>> 
>> Which is simply to say, this is on our radar but I can’t make promises about 
>> what form it will take, or when it will arrive, only that it has been 
>> planned for enough to ensure we can achieve it when resources permit.
>> 
>>  
>> 
>> From: Li Boxuan mailto:libox...@connect.hku.hk>>
>> Date: Sunday, 12 June 2022 at 16:14
>> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: CEP-15 multi key transaction syntax
>> 
>> Correcting my typo: 
>> 
>>  
>> 
>> >  I took it for granted that the condition was evaluated against the state 
>> > before the updates
>> 
>> 
>> 
>> 
>> I took it for granted that the condition was evaluated against the state 
>> AFTER the updates
>> 
>> 
>> 
>> 
>> 
>> On Jun 12, 2022, at 11:07 AM, Li Boxuan > <mailto:libox...@connect.hku.hk>> wrote:
>> 
>>  
>> 
>> Thank you team for this exciting update! I just joined the dev mailing list 
>> to take part in this discussion. I am not a Cassandra developer and haven’t 
>> understood Accord myself yet, so my questions are more from a user’s 
>> standpoint.
>> 
>>  
>> 
>> > The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we 
>> > only get one chance to make this API right.
>> 
>>  
>> 
>> I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I 
>> took it for granted that the condition was evaluated against the state after 
>> the updates, but it turned out to be the opposite. Thus I prefer the IF 
>> (Boolean expr) ABORT TRANSACTION idea. In addition, when the transaction is 
>> large and there are many conditions, using the COMMIT IF syntax might make 
>> the CQL query uglier and developers’ life harder. Another very subtle point 
>> is if there are many conditions combined using AND clauses, wouldn't it make 
>> the execution slightly slower because, for each SELECT statement, you would 
>> have to check every condition? The IF (Boolean expr) ABORT TRANSACTION would 
>> suffer less because users may tend to put the condition closer to the 
>> related SELECT statement.
>> 
>>  
>> 
>> > read-only transactions involving multiple tables will definitely be 
>> > supported.
>&

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Blake Eggleston

y to 
> deliver on the server-side, but would require some application or client 
> integration, and is unreliable in the face of coordinator failure (so the 
> transaction id is unknown to the client). The more complete approach is for 
> the client to include an idempotency token in its submission to the server, 
> and for C* to record this alongside the transaction id, and for some bounded 
> time window to either reject re-submissions of this token or to evaluate it 
> as a no-op. This requires much tighter integration from the clients, and more 
> work server-side.
> 
>  
> 
> Which is simply to say, this is on our radar but I can’t make promises about 
> what form it will take, or when it will arrive, only that it has been planned 
> for enough to ensure we can achieve it when resources permit.
> 
>  
> 
> From: Li Boxuan mailto:libox...@connect.hku.hk>>
> Date: Sunday, 12 June 2022 at 16:14
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Correcting my typo: 
> 
>  
> 
> >  I took it for granted that the condition was evaluated against the state 
> > before the updates
> 
> 
> 
> 
> I took it for granted that the condition was evaluated against the state 
> AFTER the updates
> 
> 
> 
> 
> 
> On Jun 12, 2022, at 11:07 AM, Li Boxuan  <mailto:libox...@connect.hku.hk>> wrote:
> 
>  
> 
> Thank you team for this exciting update! I just joined the dev mailing list 
> to take part in this discussion. I am not a Cassandra developer and haven’t 
> understood Accord myself yet, so my questions are more from a user’s 
> standpoint.
> 
>  
> 
> > The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we 
> > only get one chance to make this API right.
> 
>  
> 
> I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I 
> took it for granted that the condition was evaluated against the state after 
> the updates, but it turned out to be the opposite. Thus I prefer the IF 
> (Boolean expr) ABORT TRANSACTION idea. In addition, when the transaction is 
> large and there are many conditions, using the COMMIT IF syntax might make 
> the CQL query uglier and developers’ life harder. Another very subtle point 
> is if there are many conditions combined using AND clauses, wouldn't it make 
> the execution slightly slower because, for each SELECT statement, you would 
> have to check every condition? The IF (Boolean expr) ABORT TRANSACTION would 
> suffer less because users may tend to put the condition closer to the related 
> SELECT statement.
> 
>  
> 
> > read-only transactions involving multiple tables will definitely be 
> > supported.
> 
>  
> 
> Would you consider allowing users to start a read-only transaction explicitly 
> like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
> like unintentional updates. This might also give Cassandra a hint for 
> optimization.
> 
>  
> 
> Finally, I wonder if the community would be interested in idempotency 
> support. DynamoDB has this interesting feature 
> (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems
>  
> <https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems>),
>  which guards the situation where the same transaction is submitted multiple 
> times due to a connection time-out or other connectivity issue. I have no 
> idea how that is implemented under the hood and I don’t even know if this is 
> technically possible with the Accord design, but I thought it would be 
> interesting to think about.
> 
>  
> 
> Best regards,
> 
> Boxuan
> 
>  
> 
> 
> 
> 
> On Jun 12, 2022, at 7:31 AM, bened...@apache.org <mailto:bened...@apache.org> 
> wrote:
> 
>  
> 
> > I would love hearing from people on what they think.
> 
>  
> ^^ It would be great to have more participants in this conversation
> 
>  
> > For context, my questions earlier were based on my 20+ years of using SQL 
> > transactions across different systems.
> 
>  
> We probably don’t come from a very different place. I spent too many years 
> with T-SQL.
> 
>  
> > When you start a SQL transaction, you are creating a branch of your data 
> > that you can operate with until you reach your desired state and then merge 
> > it back with a commit.
> 
>  
> That’s the essential complexity we’re grappling with: how much do we permit 
> your “branch” to do, how do we let you express

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Aaron Ploetz

Benedict,

I'm really excited about this feature.  I've been observing this
conversation for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.


I think taking small steps forward, to build a few complete features as
close to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like
> approach... or do we want a more SQL-like approach
>

For years now we've been fighting this notion that Cassandra is difficult
to use.  Coming up with specialized syntax isn't going to bridge that
divide.  From a (new?) user perspective, the best plan is to stay as
consistent with SQL as possible.

I believe that is a MySQL specific concept. This is one problem with
> mimicking SQL – it’s not one thing!


Right?!?!  As if this needed to be more complex.

I think we have evidence that it is fine to interpret NULL as “false” for
> the evaluation of IF conditions.
>

Agree.  Null == false isn't too much of a leap.

Thanks for taking up the charge on this one.  Glad to see it moving forward!

Thanks,

Aaron



On Sun, Jun 12, 2022 at 10:33 AM bened...@apache.org 
wrote:

> Welcome Li, and thanks for your input
>
>
>
> > When I first saw the syntax, I took it for granted that the condition
> was evaluated against the state AFTER the updates
>
>
>
> Depending what you mean, I think this is one of the options being
> considered. At least, it seems this syntax is most likely to be evaluated
> against the values written by preceding statements in the batch, but not
> the statement itself (or later ones), as this could lead to nonsensical
> statements like
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
>
> COMMIT TRANSACTION IF tbl.v = 0
>
>
>
> Where y is never 0 afterwards, so this never succeeds. I take it in this
> simple case you would expect the condition to be evaluated against the
> state prior to the statement (i.e. the initial state)?
>
>
>
> But we have a blank slate, so every option is available to us! We just
> need to make sure it makes sense to the user, even in uncommon cases.
>
>
>
> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because users
> may tend to put the condition closer to the related SELECT statement.
>
>
>
> This is probably not going to matter in practice. The SELECTs all happen
> upfront no matter what the CQL might look like, and the UPDATE all happen
> only after the IF conditions are evaluated. This is all just a question of
> how the user expresses things.
>
>
>
> In future we may offer interactive transactions, or transactions that are
> multi-step, in which case this would be more relevant and could have an
> efficiency impact.
>
>
>
> > Would you consider allowing users to start a read-only transaction
> explicitly like BEGIN TRANSACTION READONLY?
>
>
>
> Good question. I would be OK with this, for sure, and will defer to the
> opinions of others here. There won’t be any optimisation impact, as we
> simply check if the transaction contains any updates, but some validation
> could be helpful for the user.
>
>
>
> > Finally, I wonder if the community would be interested in idempotency
> support.
>
>
>
> This is something that has been considered, and that Accord is able to
> support (in a couple of ways), but as an end-to-end feature this requires
> client support and other scaffolding that is not currently
> planned/scheduled. The simplest (least robust) approach is for the server
> to include the transaction’s identifier in its timeout, so that it be
> queried by the client to establish if it has been made durable. This should
> be quite easy to deliver on the server-side, but would require some
> application or client integration, and is unreliable in the face of
> coordinator failure (so the transaction id is unknown to the client). The
> more complete approach is for the client to include an idempotency token in
> its submission to the server, and for C* to record this alongside the
> transaction id, and for some bounded time window to either reject
> re-submissions of this token or to evaluate it as a no-op. This requires
> much tighter integration from the clients, and more work server-side.
>
>
>
> Which is simply to say, this is on our radar but I can’t make promises
> about what form it will take, or when it will arrive, only that it has been
> planned for enough to ensure we can achieve it when resources permit.
>
>
>
> *From: *Li Boxuan 
> *Date: *Sunday,

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org

I believe that is a MySQL specific concept. This is one problem with mimicking 
SQL – it’s not one thing!

In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL value 
submitted to a Boolean operator yields UNKNOWN.

IF (X) THEN Y does not run Y if X is UNKNOWN;
IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.

So, I think we have evidence that it is fine to interpret NULL as “false” for 
the evaluation of IF conditions.

[1] 
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/else-if-else-transact-sql?view=sql-server-ver16



From: Konstantin Osipov 
Date: Monday, 13 June 2022 at 14:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> IF (X) THEN
> ROLLBACK
> RETURN (ERRCODE)
> END IF
>
> or
>
> IF (X) THEN RAISERROR
>
> So, that is in essence the question we are currently asking: do
> we want to have a more LWT-like approach (and if so, how do we
> address this complexity for the user), or do we want a more
> SQL-like approach (and if so, how do we modify it to make
> non-interactive transactions convenient, and implementation
> tractable)
>
> * This is anyway a shortcoming of existing batches, I think? So
> it might be we can sweep it under the rug, but I think it will
> be more relevant here as people execute more complex
> transactions, and we should ideally have semantics that will
> work well into the future – including if we later introduce
> interactive transactions.

I'd start with answering the question how the syntax should handle
NOT FOUND condition. In SQL, that would trigger activation of a
CONTINUE handler.

It's hard to see how one can truly branch the logic without it.
Relying on NULL content of a cell would be full of gotchas.

--
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Konstantin Osipov

> IF (X) THEN
> ROLLBACK
> RETURN (ERRCODE)
> END IF
> 
> or
> 
> IF (X) THEN RAISERROR
> 
> So, that is in essence the question we are currently asking: do
> we want to have a more LWT-like approach (and if so, how do we
> address this complexity for the user), or do we want a more
> SQL-like approach (and if so, how do we modify it to make
> non-interactive transactions convenient, and implementation
> tractable)
> 
> * This is anyway a shortcoming of existing batches, I think? So
> it might be we can sweep it under the rug, but I think it will
> be more relevant here as people execute more complex
> transactions, and we should ideally have semantics that will
> work well into the future – including if we later introduce
> interactive transactions.

I'd start with answering the question how the syntax should handle
NOT FOUND condition. In SQL, that would trigger activation of a
CONTINUE handler. 

It's hard to see how one can truly branch the logic without it.
Relying on NULL content of a cell would be full of gotchas.

-- 
Konstantin Osipov, Moscow, Russia

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread bened...@apache.org

Welcome Li, and thanks for your input

> When I first saw the syntax, I took it for granted that the condition was 
> evaluated against the state AFTER the updates

Depending what you mean, I think this is one of the options being considered. 
At least, it seems this syntax is most likely to be evaluated against the 
values written by preceding statements in the batch, but not the statement 
itself (or later ones), as this could lead to nonsensical statements like

BEGIN TRANSACTION
UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
COMMIT TRANSACTION IF tbl.v = 0

Where y is never 0 afterwards, so this never succeeds. I take it in this simple 
case you would expect the condition to be evaluated against the state prior to 
the statement (i.e. the initial state)?

But we have a blank slate, so every option is available to us! We just need to 
make sure it makes sense to the user, even in uncommon cases.

> The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> tend to put the condition closer to the related SELECT statement.

This is probably not going to matter in practice. The SELECTs all happen 
upfront no matter what the CQL might look like, and the UPDATE all happen only 
after the IF conditions are evaluated. This is all just a question of how the 
user expresses things.

In future we may offer interactive transactions, or transactions that are 
multi-step, in which case this would be more relevant and could have an 
efficiency impact.

> Would you consider allowing users to start a read-only transaction explicitly 
> like BEGIN TRANSACTION READONLY?

Good question. I would be OK with this, for sure, and will defer to the 
opinions of others here. There won’t be any optimisation impact, as we simply 
check if the transaction contains any updates, but some validation could be 
helpful for the user.

> Finally, I wonder if the community would be interested in idempotency support.

This is something that has been considered, and that Accord is able to support 
(in a couple of ways), but as an end-to-end feature this requires client 
support and other scaffolding that is not currently planned/scheduled. The 
simplest (least robust) approach is for the server to include the transaction’s 
identifier in its timeout, so that it be queried by the client to establish if 
it has been made durable. This should be quite easy to deliver on the 
server-side, but would require some application or client integration, and is 
unreliable in the face of coordinator failure (so the transaction id is unknown 
to the client). The more complete approach is for the client to include an 
idempotency token in its submission to the server, and for C* to record this 
alongside the transaction id, and for some bounded time window to either reject 
re-submissions of this token or to evaluate it as a no-op. This requires much 
tighter integration from the clients, and more work server-side.

Which is simply to say, this is on our radar but I can’t make promises about 
what form it will take, or when it will arrive, only that it has been planned 
for enough to ensure we can achieve it when resources permit.

From: Li Boxuan 
Date: Sunday, 12 June 2022 at 16:14
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Correcting my typo:

>  I took it for granted that the condition was evaluated against the state 
> before the updates

I took it for granted that the condition was evaluated against the state AFTER 
the updates

On Jun 12, 2022, at 11:07 AM, Li Boxuan 
mailto:libox...@connect.hku.hk>> wrote:

Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state after the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some de

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread Li Boxuan

 the future – including if we later introduce interactive 
transactions.

From: Patrick McFadin mailto:pmcfa...@gmail.com>>
Date: Saturday, 11 June 2022 at 15:33
To: dev mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into something pretty complicated, which may be 
warranted but I wanted to take a step back and be a bit more reflective on what 
we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL 
transactions across different systems. That's my personal bias when I see the 
word "database transaction" in this case. When you start a SQL transaction, you 
are creating a branch of your data that you can operate with until you reach 
your desired state and then merge it back with a commit. Or if you don't like 
what you see, use a rollback and act like it never happened. That was the 
thinking when I asked about interactive sessions. If you are using a driver, 
that all happens in a batch. I realize that is out of scope here, but that's 
probably knowledge that is pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a philosophical 
fork in the road which hopefully will generate some comments in the larger user 
community.

Path 1)
Mimic what's already been available in the SQL community, using existing CQL 
syntax. (SQL Example using JDBC: https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people on 
what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM 
bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>>
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

On Jun 8, 2022, at 1:20 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.

From: Blake Eggleston m

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread Li Boxuan

Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state before the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
like unintentional updates. This might also give Cassandra a hint for 
optimization.

Finally, I wonder if the community would be interested in idempotency support. 
DynamoDB has this interesting feature 
(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
 which guards the situation where the same transaction is submitted multiple 
times due to a connection time-out or other connectivity issue. I have no idea 
how that is implemented under the hood and I don’t even know if this is 
technically possible with the Accord design, but I thought it would be 
interesting to think about.

Best regards,
Boxuan

On Jun 12, 2022, at 7:31 AM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactive 
transactions.

From: Patrick McFadin mailto:pmcfa...@gmail.com>>
Date: Saturday, 11 June 2022 at 15:33
To: dev mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into somethin

Re: CEP-15 multi key transaction syntax

2022-06-12 Thread bened...@apache.org

> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactive 
transactions.

From: Patrick McFadin 
Date: Saturday, 11 June 2022 at 15:33
To: dev 
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into something pretty complicated, which may be 
warranted but I wanted to take a step back and be a bit more reflective on what 
we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL 
transactions across different systems. That's my personal bias when I see the 
word "database transaction" in this case. When you start a SQL transaction, you 
are creating a branch of your data that you can operate with until you reach 
your desired state and then merge it back with a commit. Or if you don't like 
what you see, use a rollback and act like it never happened. That was the 
thinking when I asked about interactive sessions. If you are using a driver, 
that all happens in a batch. I realize that is out of scope here, but that's 
probably knowledge that is pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a philosophical 
fork in the road which hopefully will generate some comments in the larger user 
community.

Path 1)
Mimic what's already been available in the SQL community, using existing CQL 
syntax. (SQL Example using JDBC: https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people on 
what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM 
bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>>
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some mor

Re: CEP-15 multi key transaction syntax

2022-06-11 Thread Patrick McFadin

I think the syntax is evolving into something pretty complicated, which may
be warranted but I wanted to take a step back and be a bit more reflective
on what we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL
transactions across different systems. That's my personal bias when I see
the word "database transaction" in this case. When you start a SQL
transaction, you are creating a branch of your data that you can operate
with until you reach your desired state and then merge it back with a
commit. Or if you don't like what you see, use a rollback and act like it
never happened. That was the thinking when I asked about interactive
sessions. If you are using a driver, that all happens in a batch. I realize
that is out of scope here, but that's probably knowledge that is
pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a
philosophical fork in the road which hopefully will generate some comments
in the larger user community.

Path 1)
Mimic what's already been available in the SQL community, using existing
CQL syntax. (SQL Example using JDBC:
https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people
on what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM bened...@apache.org 
wrote:

> This might also permit us to remove one result set (the success/failure
> one) and return instead an exception if the transaction is aborted. This is
> also more consistent with SQL, if memory serves. That might conflict with
> returning the other result sets in the event of abort (though that’s up to
> us ultimately), but it feels like a nicer API for the user – depending on
> how these exceptions are surfaced in client APIs.
>
>
>
> *From: *bened...@apache.org 
> *Date: *Friday, 10 June 2022 at 19:59
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> So, thinking on it myself some more, I think if there’s an option that *
> *doesn’t** require the user to reason about the point at which the read
> happens in order to understand how the condition is applied would probably
> be better.
>
>
>
> What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?
>
>
>
> It’s compatible with more advanced IF functionality later, and probably
> not much trickier to implement?
>
>
>
> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we
> only get one chance to make this API right.
>
>
>
>
>
> *From: *Blake Eggleston 
> *Date: *Friday, 10 June 2022 at 18:56
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Yeah I think that’s intuitive enough. I had been thinking about multiple
> condition branches, but was thinking about something closer to
>
> IF select.column=5
>   UPDATE ... SET ... WHERE key=1;
> ELSE IF select.column=6
>   UPDATE ... SET ... WHERE key=2;
> ELSE
>   UPDATE ... SET ... WHERE key=3;
> ENDIF
> COMMIT TRANSACTION;
>
> Which would make the proposed COMMIT IF we're talking about now a
> shorthand. Of course this would be follow on work.
>
>
>
>
> On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote:
>
>
>
> I imagine that conditions would be evaluated against the state prior to
> the execution of statement against which it is being evaluated, but after
> the prior statements. I *think* that should be OK to reason about.
>
>
>
> i.e. we might have a contrived example like:
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>
>
>
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>
>
>
> I think this is probably adequately intuitive? It is a bit atypical to
> have conditions that wrap the whole transaction though.
>
>
>
> We have another option, of course, which is to offer IF x ROLLBACK
> TRANSACTION, which is closer to SQL, which would translate the above to:
>
>
>
> BEGIN TRANSACTION
>
> SELECT a FROM tbl WHERE k = 1 AS q0
>
> IF q0.a != 0 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> IF q1.a != 1 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION
>
>
>
> This is less succinct, but might be more familiar to users. We could also
> eschew the ability to read from UPDATE statements entirely in this scheme,
> as this would then look very much like SQL.
>
>
>
>
>
> *From: *Blake Eggleston 
> *Date: *Wednesday, 8 June 2

Re: CEP-15 multi key transaction syntax

2022-06-10 Thread bened...@apache.org

This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org 
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.

From: Blake Eggleston 
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

On Jun 8, 2022, at 1:20 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.

On Jun 6, 2022, at 4:33 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of

Re: CEP-15 multi key transaction syntax

2022-06-10 Thread bened...@apache.org

So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.

From: Blake Eggleston 
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

On Jun 8, 2022, at 1:20 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.

On Jun 6, 2022, at 4:33 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tomb

Re: CEP-15 multi key transaction syntax

2022-06-10 Thread Blake Eggleston

Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to 

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

> On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote:
> 
> I imagine that conditions would be evaluated against the state prior to the 
> execution of statement against which it is being evaluated, but after the 
> prior statements. I think that should be OK to reason about.
>  
> i.e. we might have a contrived example like:
>  
> BEGIN TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>  
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>  
> I think this is probably adequately intuitive? It is a bit atypical to have 
> conditions that wrap the whole transaction though.
>  
> We have another option, of course, which is to offer IF x ROLLBACK 
> TRANSACTION, which is closer to SQL, which would translate the above to:
>  
> BEGIN TRANSACTION
> SELECT a FROM tbl WHERE k = 1 AS q0
> IF q0.a != 0 ROLLBACK TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> IF q1.a != 1 ROLLBACK TRANSACTION
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION
>  
> This is less succinct, but might be more familiar to users. We could also 
> eschew the ability to read from UPDATE statements entirely in this scheme, as 
> this would then look very much like SQL.
>  
>  
> From: Blake Eggleston 
> Date: Wednesday, 8 June 2022 at 20:59
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> > It affects not just RETURNING but also conditions that are evaluated 
> > against the row, and if we in future permit using the values from one 
> > select in a function call / write to another table (which I imagine we 
> > will).
> 
> I hadn’t thought about that... using intermediate or even post update values 
> in condition evaluation or function calls seems like it would make it 
> difficult to understand why a condition is or is not applying. On the other 
> hand, it would powerful, especially when using things like database generated 
> values in queries (auto incrementing integer clustering keys or server 
> generated timeuuids being examples that come to mind). Additionally, if we 
> return these values, I guess that would solve the visibility issues I’m 
> worried about. 
> 
> Agreed intermediate values would be straightforward to calculate though.
> 
> 
> On Jun 6, 2022, at 4:33 PM, bened...@apache.org <mailto:bened...@apache.org> 
> wrote:
>  
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).
>  
> I think that for it to be intuitive we need it to make sense sequentially, 
> which means either calculating it or restricting what can be stated (or 
> abandoning the syntax).
>  
> If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
> overlapping DELETE (and as many SELECT as you like) that would perhaps make 
> it simple enough? Require for now that SELECTS go first, then DELETE and then 
> INSERT/UPDATE (or vice versa, depending what we want to make simple)?
>  
> FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
> restricted to single rows we are updating, so we could simply maintain a 
> collections of rows and upsert into them as we process the execution. Most 
> transactions won’t need it, I suspect, so we don’t need to worry about 
> perfect efficiency.
>  
>  
> From: Blake Eggleston mailto:beggles...@apple.com>>
> Date: Tuesday, 7 June 2022 at 00:21
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> That's a good question. I'd lean towards returning the final state of things, 
> although I could understand expecting to see intermediate state. Regarding 
> range tombstones, we could require them to precede any updates like selects, 
> but there's still the question of how to handle multiple updates to the same 
> cell when the user has requested we return the post-update state of the cell.
> 
> 
> 
> On Jun 6, 2022, at 4:00 PM, bened...@apache.org <mailto:bened...@apache.org> 
> w

Re: CEP-15 multi key transaction syntax

2022-06-08 Thread bened...@apache.org

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.

From: Blake Eggleston 
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.

On Jun 6, 2022, at 4:33 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.

On Jun 6, 2022, at 4:00 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond

Re: CEP-15 multi key transaction syntax

2022-06-08 Thread Blake Eggleston

> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about. 

Agreed intermediate values would be straightforward to calculate though.

> On Jun 6, 2022, at 4:33 PM, bened...@apache.org wrote:
> 
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).
>  
> I think that for it to be intuitive we need it to make sense sequentially, 
> which means either calculating it or restricting what can be stated (or 
> abandoning the syntax).
>  
> If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
> overlapping DELETE (and as many SELECT as you like) that would perhaps make 
> it simple enough? Require for now that SELECTS go first, then DELETE and then 
> INSERT/UPDATE (or vice versa, depending what we want to make simple)?
>  
> FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
> restricted to single rows we are updating, so we could simply maintain a 
> collections of rows and upsert into them as we process the execution. Most 
> transactions won’t need it, I suspect, so we don’t need to worry about 
> perfect efficiency.
>  
>  
> From: Blake Eggleston 
> Date: Tuesday, 7 June 2022 at 00:21
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> That's a good question. I'd lean towards returning the final state of things, 
> although I could understand expecting to see intermediate state. Regarding 
> range tombstones, we could require them to precede any updates like selects, 
> but there's still the question of how to handle multiple updates to the same 
> cell when the user has requested we return the post-update state of the cell.
> 
> 
> On Jun 6, 2022, at 4:00 PM, bened...@apache.org <mailto:bened...@apache.org> 
> wrote:
>  
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point in the sequential execution then?
>  
> The succinct syntax is I think highly desirable for user experience, but this 
> does complicate it a bit if we want to remain intuitive.
>  
>  
>  
>  
> From: Blake Eggleston mailto:beggles...@apple.com>>
> Date: Monday, 6 June 2022 at 23:17
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Hi all,
> 
> Thanks for all the input and questions so far. Glad people are excited about 
> this!
> 
> I didn’t have any free time to respond this weekend, although it looks like 
> Benedict has responded to most of the questions so far, so if I don’t respond 
> to a question you asked here, you can interpret that as “what Benedict said” 
> :).
> 
> 
> Jeff, 
> 
> > Is there a new keyword for “partition (not) exists” or is it inferred by 
> > the select?
> 
> I'd intended this to be worked out from the select statement, ie: if the 
> read/reference is null/empty, then it doesn't exist, whether you're 
> interested in the partition, row, or cell. So I don't think we'd need an 
> additional keyword there. I think that would address partition exists / not 
> exists use cases?
> 
> > And would you allow a transaction that had > 1 named select and no 
> > modification statements, but commit if 1=1 ?
> 
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) 
> would be part of the syntax. Also, running a txn that doesn’t contain updates 
> wouldn’t be a problem.
> 
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
> 
> Alex,
> 
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from seconda

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.

From: Blake Eggleston 
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.

On Jun 6, 2022, at 4:00 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.

From: Blake Eggleston mailto:beggles...@apple.com>>
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).

Jeff,

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement w

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Blake Eggleston

That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.

> On Jun 6, 2022, at 4:00 PM, bened...@apache.org wrote:
> 
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point in the sequential execution then?
>  
> The succinct syntax is I think highly desirable for user experience, but this 
> does complicate it a bit if we want to remain intuitive.
>  
>  
>  
>  
> From: Blake Eggleston 
> Date: Monday, 6 June 2022 at 23:17
> To: dev@cassandra.apache.org 
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Hi all,
> 
> Thanks for all the input and questions so far. Glad people are excited about 
> this!
> 
> I didn’t have any free time to respond this weekend, although it looks like 
> Benedict has responded to most of the questions so far, so if I don’t respond 
> to a question you asked here, you can interpret that as “what Benedict said” 
> :).
> 
> 
> Jeff, 
> 
> > Is there a new keyword for “partition (not) exists” or is it inferred by 
> > the select?
> 
> I'd intended this to be worked out from the select statement, ie: if the 
> read/reference is null/empty, then it doesn't exist, whether you're 
> interested in the partition, row, or cell. So I don't think we'd need an 
> additional keyword there. I think that would address partition exists / not 
> exists use cases?
> 
> > And would you allow a transaction that had > 1 named select and no 
> > modification statements, but commit if 1=1 ?
> 
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) 
> would be part of the syntax. Also, running a txn that doesn’t contain updates 
> wouldn’t be a problem.
> 
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
> 
> Alex,
> 
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from secondary index (or SASI)
> > 5. UPDATE with predicate on non-primary key
> 
> The full primary key must be defined as part of the statement, and you can’t 
> use column references to define them, so you wouldn’t be able to run these.
> 
> > MVs
> 
> To prevent being spread too thin, both in syntax design and implementation 
> work, I’d like to limit read and write operations in the initial 
> implementation to vanilla selects, updates, inserts, and deletes. Once we 
> have a solid implementation of multi-key/table transactions supporting 
> foundational operations, we can start figuring out how the more advanced 
> pieces can be best supported. Not a great answer to your question, but a 
> related tangent I should have included in my initial email.
> 
> > ... RETURNING ...
> 
> I like the idea of the returning statement, but to echo what Benedict said, I 
> think any scheme for specifying data to be returned should apply the same to 
> select and update statements, since updates can have underlying reads that 
> the user may be interested in. I’d mentioned having an optional RETURN 
> statement in addition to automatically returning selects in my original email.
> 
> > ... WITH ...
> 
> I like the idea of defining statement names at the beginning of a statement, 
> since I could imagine mapping names to selects might get difficult if there 
> are a lot of columns in the select or update, but beginning each statement 
> with `WITH ` reduces readability imo. Maybe putting the name after the 
> first term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table 
> AS  SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would 
> be improve finding names without harming overall readability?
> 
> Benedict,
> 
> > I agree that SELECT statements should be required to go first.
> 
> +1
> 
> > There only remains the issue of conditions imposed upon 
> > UPDATE/INSERT/DELETE statements when there are multiple statements that 
> > affect the same primary key. I think we can (and should) simply reject such 
> > queries for now, as it doesn’t make much sense to have multiple statements 
> > for the same primary key in the same transaction.
> 
> Unfortunately, I think there are use cases for both multiple select

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.




From: Blake Eggleston 
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).


Jeff,

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH ` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table AS 
 SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.


Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful,

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Blake Eggleston

Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).

Jeff, 

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH ` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS  WHERE...`, `UPDATE table AS 
 SET ...`, `INSERT INTO table AS  (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.

Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful, since a txn won’t always result in an 
update, in which case we’d just return the select.

Thanks,

Blake

> On Jun 6, 2022, at 9:41 AM, Henrik Ingo  wrote:
> 
> On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
>   > wrote:
> > One way to make it obvious is to require the user to explicitly type the 
> > SELECTs and then to require that all SELECTs appear before 
> > UPDATE/INSERT/DELETE.
> 
>  
> 
> Yes, I agree that SELECT statements should be required to go first.
> 
>  
> 
> However, I think this is sufficient and we can retain the shorter format for 
> RETURNING. There only remains the issue of conditions

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo

On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
wrote:

> > One way to make it obvious is to require the user to explicitly type
> the SELECTs and then to require that all SELECTs appear before
> UPDATE/INSERT/DELETE.
>
>
>
> Yes, I agree that SELECT statements should be required to go first.
>
>
>
> However, I think this is sufficient and we can retain the shorter format
> for RETURNING. There only remains the issue of conditions imposed upon
> UPDATE/INSERT/DELETE statements when there are multiple statements that
> affect the same primary key. I think we can (and should) simply reject such
> queries for now, as it doesn’t make much sense to have multiple statements
> for the same primary key in the same transaction.
>
>
I guess I was thinking ahead to a future where and UPDATE write set may or
may not intersect with a previous update due to allowing WHERE clause to
use secondary keys, etc.

That said, I'm not saying we SHOULD require explicit SELECT statements for
every update. I'm sure that would be annoying more than useful.I was just
following a train of thought.



>
>
> > Returning the "result" from an UPDATE presents the question should it
> be the data at the start of the transaction or end state?
>
>
>
> I am inclined to only return the new values (as proposed by Alex) for the
> purpose of returning new auto-increment values etc. If you require the
> prior value, SELECT is available to express this.
>
>
That's a great point!


>
>
> > I was thinking the following coordinator-side implementation would
> allow to use also old drivers
>
>
>
> I am inclined to return just the first result set to old clients. I think
> it’s fine to require a client upgrade to get multiple result sets.
>
>
Possibly. I just wanted to share an idea for consideration. IMO the temp
table idea might not be too hard to implement*, but sure the syntax does
feel a bit bolted on.

*) I'm maybe the wrong person to judge that, of course :-)

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.]

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread bened...@apache.org

> One way to make it obvious is to require the user to explicitly type the 
> SELECTs and then to require that all SELECTs appear before 
> UPDATE/INSERT/DELETE.

Yes, I agree that SELECT statements should be required to go first.

However, I think this is sufficient and we can retain the shorter format for 
RETURNING. There only remains the issue of conditions imposed upon 
UPDATE/INSERT/DELETE statements when there are multiple statements that affect 
the same primary key. I think we can (and should) simply reject such queries 
for now, as it doesn’t make much sense to have multiple statements for the same 
primary key in the same transaction.

> Returning the "result" from an UPDATE presents the question should it be the 
> data at the start of the transaction or end state?

I am inclined to only return the new values (as proposed by Alex) for the 
purpose of returning new auto-increment values etc. If you require the prior 
value, SELECT is available to express this.

> I was thinking the following coordinator-side implementation would allow to 
> use also old drivers

I am inclined to return just the first result set to old clients. I think it’s 
fine to require a client upgrade to get multiple result sets.

From: Henrik Ingo 
Date: Monday, 6 June 2022 at 15:18
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Thank you Blake and team!

Just some personal reactions and thoughts...

First instinct is to support the shorter format where UPDATE ... AS car  is 
also its own implicit select.

However, a subtle thing to note is that a reasonable user might expect that in 
a sequence of multiple UPDATEs, each of them is also read at the position where 
the UPDATE is in the list of statements. The fact that Accord executes all 
reads first is not at all obvious from the syntax. One way to make it obvious 
is to require the user to explicitly type the SELECTs and then to require that 
all SELECTs appear before UPDATE/INSERT/DELETE.

I like the idea of a RETURN or RETURNING keyword to specify what exactly you 
want to return. This would allow to also return results from UPDATE/INSERT 
since the user explicitly told us to do so.

Returning the "result" from an UPDATE presents the question should it be the 
data at the start of the transaction or end state? Interestingly the MongoDB 
$findAndModify operation allows you to choose between both options. There seems 
to be a valid use case for both. The obvious examples are:

  UPDATE t SET c=100 WHERE id=1 AS t RETURNING BEFORE c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the value of what c was before I replaced with a new value.

  INSERT INTO t (c) VALUES (100) AS t RETURNING AFTER d;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the defaulted value of d. (...as was already pointed out in 
another email.)

  UPDATE t SET c+=1 WHERE id=1 AS t RETURNING AFTER c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the result of c after the transaction. (Which I know will be at 
most 100, but I want to know exactly.)

I kind of sympathize with the intuitive opinion that we should return the 
values from the start of the transaction, since that's how Accord works: reads 
first, updates second.

Finally, I wanted to share a thought on how to implement the returning of 
multiple result sets. While you don't address it, I'm assuming the driver api 
will get new functionality where you can get a specific result set out of many.

I was thinking the following coordinator-side implementation would allow to use 
also old drivers:

BEGIN TRANSACTION;
   SELECT * FROM table1 WHERE  AS t1;
   SELECT * FROM table2 WHERE  AS t2;
   UPDATE something...
COMMIT TRANSACTION;
SELECT * FROM t1;
SELECT * FROM t2;

The coordinator-level implementation here would be to store the results of the 
SELECTs inside a transaction into temporary tables that the client can the read 
from after the transaction. Even if those later selects are outside the 
transaction, their contents would be a constant snapshot representing the state 
of those rows at the time of the transaction. The tables should be visible only 
to the same client session and until the start of the next transaction or a 
timeout, whichever comes first.

henrik

On Fri, Jun 3, 2022 at 6:39 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
Hi dev@,

I’ve been working on a draft syntax for Accord transactions and wanted to bring 
what I have to the dev list to solicit feedback and build consensus before 
moving forward with it. The proposed transaction syntax is intended to be an 
extended batch syntax. Basically batches with selects, and an optional 
condition at the end. To facilitate conditions against an arbitrary number of 
select statements, you can also name the statements, and reference columns in 
the results. To cut down on the number of operations needed, select values can

Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo

Thank you Blake and team!

Just some personal reactions and thoughts...

First instinct is to support the shorter format where UPDATE ... AS car  is
also its own implicit select.

However, a subtle thing to note is that a reasonable user might expect that
in a sequence of multiple UPDATEs, each of them is also read at the
position where the UPDATE is in the list of statements. The fact that
Accord executes all reads first is not at all obvious from the syntax. One
way to make it obvious is to require the user to explicitly type the
SELECTs and then to require that all SELECTs appear before
UPDATE/INSERT/DELETE.

I like the idea of a RETURN or RETURNING keyword to specify what exactly
you want to return. This would allow to also return results from
UPDATE/INSERT since the user explicitly told us to do so.

Returning the "result" from an UPDATE presents the question should it be
the data at the start of the transaction or end state? Interestingly the
MongoDB $findAndModify operation allows you to choose between both options.
There seems to be a valid use case for both. The obvious examples are:

  UPDATE t SET c=100 WHERE id=1 AS t RETURNING BEFORE c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the value of what c was before I replaced with a new value.

  INSERT INTO t (c) VALUES (100) AS t RETURNING AFTER d;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the defaulted value of d. (...as was already pointed out in
another email.)

  UPDATE t SET c+=1 WHERE id=1 AS t RETURNING AFTER c;
COMMIT TRANSACTION IF t.c <= 100;

I want to know the result of c after the transaction. (Which I know will be
at most 100, but I want to know exactly.)

I kind of sympathize with the intuitive opinion that we should return the
values from the start of the transaction, since that's how Accord works:
reads first, updates second.

Finally, I wanted to share a thought on how to implement the returning of
multiple result sets. While you don't address it, I'm assuming the driver
api will get new functionality where you can get a specific result set out
of many.

I was thinking the following coordinator-side implementation would allow to
use also old drivers:

BEGIN TRANSACTION;
   SELECT * FROM table1 WHERE  AS t1;
   SELECT * FROM table2 WHERE  AS t2;
   UPDATE something...
COMMIT TRANSACTION;
SELECT * FROM t1;
SELECT * FROM t2;

The coordinator-level implementation here would be to store the results of
the SELECTs inside a transaction into temporary tables that the client can
the read from after the transaction. Even if those later selects are
outside the transaction, their contents would be a constant snapshot
representing the state of those rows at the time of the transaction. The
tables should be visible only to the same client session and until the
start of the next transaction or a timeout, whichever comes first.

henrik

On Fri, Jun 3, 2022 at 6:39 PM Blake Eggleston  wrote:

> Hi dev@,
>
> I’ve been working on a draft syntax for Accord transactions and wanted to
> bring what I have to the dev list to solicit feedback and build consensus
> before moving forward with it. The proposed transaction syntax is intended
> to be an extended batch syntax. Basically batches with selects, and an
> optional condition at the end. To facilitate conditions against an
> arbitrary number of select statements, you can also name the statements,
> and reference columns in the results. To cut down on the number of
> operations needed, select values can also be used in updates, including
> some math operations. Parameterization of literals is supported the same as
> other statements.
>
> Here's an example selecting a row from 2 tables, and issuing updates for
> each row if a condition is met:
>
> BEGIN TRANSACTION;
>   SELECT * FROM users WHERE name='blake' AS user;
>   SELECT * from cars WHERE model='pinto' AS car;
>   UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
> name='blake';
>   UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
> COMMIT TRANSACTION IF car.is_running;
>
> This can be simplified by naming the updates with an AS  syntax. If
> updates are named, a corresponding read is generated behind the scenes and
> its values inform the update.
>
> Here's an example, the query is functionally identical to the previous
> query. In the case of the user update, a read is still performed behind the
> scenes to enable the calculation of miles_driven + 30, but doesn't need to
> be named since it's not referenced anywhere else.
>
> BEGIN TRANSACTION;
>   UPDATE users SET miles_driven += 30 WHERE name='blake';
>   UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
> COMMIT TRANSACTION IF car.is_running;
>
> Here’s another example, performing the canonical bank transfer:
>
> BEGIN TRANSACTION;
>   UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>   UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
> COMMIT

Re: CEP-15 multi key transaction syntax

2022-06-05 Thread bened...@apache.org

> In the case that the condition is met, is the mutation applied at that point, 
> or has it already happened and there is something like a rollback segment?

The condition is a part of the transaction execution, so no mutation is applied 
until it has been evaluated – there is no rollback.

> What is the case when the condition is not met and what is presented to the 
> end-user?

I think you can expect to have any SELECT/RETURN (whatever we settle on) 
results returned, along with FALSE for the executed result set.

> More importantly, what happens with respect to the A & I in ACID when the 
> transaction is applied?

Not sure what you mean? They’re maintained at all times, but would be happy to 
explain more if I can understand the question better.

> If UPDATE is used, returning the number of rows changed would be helpful.

Do we support updates that affect an uncertain number of rows at the moment? 
Besides DELETE, for which we don’t want to calculate it, as it’s costlier.

> Is this something that can be done interactively in cqlsh or does it all have 
> to be submitted in one statement block?

These are non-interactive, so it needs to be declared in a single statement. I 
think Accord can be extended to natively support interactive transactions in 
future, in a manner consistent with its fast non-interactive transactions, but 
that’s a whole other endeavour.

From: Patrick McFadin 
Date: Sunday, 5 June 2022 at 01:47
To: dev 
Subject: Re: CEP-15 multi key transaction syntax
I've been waiting for this email! I'll echo what Jeff said about how exciting 
this is for the project.

On the SELECT inside the transaction:

In the first example, I'm making an assumption that you are doing a select on a 
partition key and only expect one result but is any valid CQL SELECT allowed 
here? If 'model' were a non-partition key column name and was indexed, then you 
could potentially have multiple rows returned and that isn't an allowed 
operation. Are only partition key lookups allowed or is there some logic 
looking for only one row?

I'm asking because I can see in reverse time series models where you can select 
the latest temperature
  SELECT temperature FROM weather_station WHERE id=1234 AND DATE='2022-06-04' 
LIMIT 1;

(also, horrible example. Everyone knows that the return value for a 
Pinto.is_running will always evaluate to FALSE)

On COMMIT TRANSACTION:

So much to unpack here. In the case that the condition is met, is the mutation 
applied at that point, or has it already happened and there is something like a 
rollback segment? What is the case when the condition is not met and what is 
presented to the end-user? More importantly, what happens with respect to the A 
& I in ACID when the transaction is applied?

If UPDATE is used, returning the number of rows changed would be helpful.

Is this something that can be done interactively in cqlsh or does it all have 
to be submitted in one statement block?

I'll stop here for now.

Patrick

On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org<mailto:bened...@apache.org> 
mailto:bened...@apache.org>> wrote:
> The returned result set is after the updates are applied?
Returning the prior values is probably more powerful, as you can perform 
unconditional updates and respond to the prior state, that you otherwise would 
not know. It’s also simpler to implement.

My inclination is to require that SELECT statements are declared first, so that 
we leave open the option of (in future) supporting SELECT statements in any 
place in the transaction, returning the values as of their position in a 
sequential execution of the statements.

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

My preference is that the IF condition is anyway optional, as it is much more 
obvious to a user than concocting some always-true condition. But yes, 
read-only transactions involving multiple tables will definitely be supported.

From: Jeff Jirsa mailto:jji...@gmail.com>>
Date: Saturday, 4 June 2022 at 22:49
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax

And would you allow a transaction that had > 1 named select and no modification 
statements, but commit if 1=1 ?

> On Jun 4, 2022, at 2:45 PM, Jeff Jirsa 
> mailto:jji...@gmail.com>> wrote:
>
> 
>
>> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
>> mailto:beggles...@apple.com>> wrote:
>>
>> Hi dev@,
>
> First, I’m ridiculously excited to see this.
>
>>
>> I’ve been working on a draft syntax for Accord transactions and wanted to 
>> bring what I have to the dev list to solicit feedback and build consensus 
>> before moving forward with it. The proposed transaction syntax is intended 
>> to be an exte

Re: CEP-15 multi key transaction syntax

2022-06-05 Thread bened...@apache.org

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

So, I think these are all likely to be rejected the same way they are today, as 
the individual statements would not parse [1,2] or be validated [3,5], as I’m 
fairly sure UPDATE and INSERT require a primary key to be specified and that 
only SELECT supports secondary indexes.

It could be nice to have dedicated messages explaining the limitation for 
[1,2], at least until the restriction is lifted.

> 4. The presence of a materialized view

This is a bit more complex. I think in principle MVs could function as they do 
today, i.e. with eventually consistent update. MVs remain experimental however, 
with known shortcomings, and I am not keen to validate them with Accord.

Since I think our plan is to opt tables into transactional behaviour (to 
minimise the potential for misusing them, unlike LWTs, which are easily used 
unsafely), I would prefer to ensure that MVs are mutually exclusive with 
transactions for now.

I anticipate follow up work will deliver global secondary indexes on top of 
Accord. I’ve no idea if that will replace or coexist with MVs as they exist 
today, perhaps it will be possible to create MVs and specify their consistency 
properties on creation once the existing MVs are reliable.

> 6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

I’m not sure what our plans are around aggregations and transactions, perhaps 
Blake can speak more to his thoughts. Since aggregations are relatively new I 
am inclined to exclude them initially, at least for write transactions, since 
LWTs do not support them.

Otherwise we will need some deterministic measure for aborting transactions – 
even after we have agreed to execute them. E.g. a 5000 row limit on live rows 
read as input before a transaction is converted to a no-op. We will have to be 
especially careful here for unconditional transactions without any 
SELECT/RETURN, as these must still wait for the result of execution before 
notifying the user of the outcome, if it may be aborted.

Suggestions welcome here.

> 7. Triggers

Good question!

It looks like LWTs don’t integrate with triggers today, so I guess we can 
ignore them too. I don’t know how stable triggers are, or how widely they are 
used. I’m sure we have some use cases, but I’m not aware of any community 
members that use them so it is likely sparse.

In principle a trigger could modify the transaction submitted by a client to 
include additional updates, but this would likely require changes to the 
trigger API. I anticipate ignoring them until we have community demand.

> Random Syntax Thoughts

I like the RETURNING syntax, and consistency with SQL dialects is a plus. I’m 
concerned about consistency with SELECT statements, though – these already 
imply RETURNING, but we might use them to compute constraint clauses on tables 
we are not updating, and this would leave no consistent way of doing this 
without returning all of its fields to the user, at least not without multiple 
SELECT statements over the same data.

We could introduce a new keyword such as CONSTRAIN in this case, with syntax 
equivalent to UPDATE/DELETE but supporting RETURNING and by default not 
returning any fields?

The idea of a RETURNING syntax on the transaction itself was previously floated 
and is nice, but I worry about having multiple inconsistent ways of returning 
data that can be co-mingled. How would you envisage these keywords interacting?


From: Alex Miller 
Date: Sunday, 5 June 2022 at 03:39
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
All of my text below largely extends your question of syntax in a few
directions:
 - What is the user experience of trying to run different statements
with this syntax?
 - How do transactions interact with other Cassandra constructs?
 - What are the execution semantics of these statements?
which I do acknowledge is a moderate re-scoping of the question.

Also, please take my understanding of existing CQL and DDL constructs with
an impractically large grain of salt.


Undesireable Transactions
-

I tried to match CQL docs up against a number of ways of writing statements
which Accord wouldn't like, or users might not like the effect of running.
I'm assuming it'd be good to think through how one would express the error
message or guidance given to users?  Or at least just making sure I
understand correctly what is writable but not executable or desirable.

=== Likely Unexecutable

All the cases here are predicated on the lack of automatic reconnaissance
transaction support.

1. Dependant SELECTs

CREATE TABLE users (name text primary key, home_state text);
CREATE TABLE states (name text primary key, population int);

BEGIN TRANSACTION;
  /*1*/ SELECT home_state FROM users WHERE name='blake' AS user;
  /*2*/ SELEC

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin

Love the Oracle/Postgres RETURNING syntax and just generally +1 to adding
INSERT and DELETE.


And for each DML type...
>  - INSERT ... RETURNING returns inserted data (useful for defaulted or
> autoincrement columns).
>  - UPDATE ... RETURNING returns the modified data.
>  - DELETE ... RETURNING returns the now-deleted data.
>
>

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Alex Miller

All of my text below largely extends your question of syntax in a few
directions:
 - What is the user experience of trying to run different statements
with this syntax?
 - How do transactions interact with other Cassandra constructs?
 - What are the execution semantics of these statements?
which I do acknowledge is a moderate re-scoping of the question.

Also, please take my understanding of existing CQL and DDL constructs with
an impractically large grain of salt.


Undesireable Transactions
-

I tried to match CQL docs up against a number of ways of writing statements
which Accord wouldn't like, or users might not like the effect of running.
I'm assuming it'd be good to think through how one would express the error
message or guidance given to users?  Or at least just making sure I
understand correctly what is writable but not executable or desirable.

=== Likely Unexecutable

All the cases here are predicated on the lack of automatic reconnaissance
transaction support.

1. Dependant SELECTs

CREATE TABLE users (name text primary key, home_state text);
CREATE TABLE states (name text primary key, population int);

BEGIN TRANSACTION;
  /*1*/ SELECT home_state FROM users WHERE name='blake' AS user;
  /*2*/ SELECT population FROM states WHERE name=user.home_state AS state;
COMMIT TRANSACTION;

The primary key for SELECT (2) depends on a value produced by SELECT (1), which
results in not being able to produce the full read conflict set ahead of time
for Accord.

2. Dependant UPDATEs

CREATE TABLE users (name text primary key, home_state text);
CREATE TABLE states (name text primary key, population int);

BEGIN TRANSACTION;
  SELECT home_state FROM users WHERE name='blake' AS user;
  UPDATE states SET population += 1 WHERE name=user.home_state AS state;
COMMIT TRANSACTION;

The primary key for UPDATE depends on a value produced by SELECT, which
results in not being able to produce the full write conflict set ahead of time
for Accord.

3. UPDATE from secondary index (or SASI)

CREATE TABLE users (id int primary key, name text, home_state text);
CREATE INDEX users_by_name ON users (name);

BEGIN TRANSACTION;
  UPDATE users SET miles_driven += 30 WHERE name='blake';
COMMIT TRANSACTION;

This is just a rephasing of (2), but hiding the SELECT behind an implict query
to the secondary index for the primary key.

(But an UPDATE from a covering index would be okay!)

4. The presence of a materialized view which can implicitly add
any/all of the above

CREATE TABLE users (
  name text primary key,
  miles_driven int,
  state text,
);
CREATE MATERIALIZED VIEW users_by_home_state_by_miles AS
  SELECT * FROM users
  WHERE home_state IS NOT NULL
  PRIMARY KEY (home_state, miles_driven, name);

BEGIN TRANSACTION;
  UPDATE users SET miles_driven += 30 WHERE name='blake';
COMMIT TRANSACTION;

This is a rephrasing of (2), but the UPDATE is to the materialized view, and
the SELECT is to get miles_driven out of the UPDATE on users.  Some
materialized views would be fine to transactionally update though.

=== Poor Performance

5. UPDATE with predicate on non-primary key

CREATE TABLE users (
  id int primary key,
  name text,
  miles_driven int
);

BEGIN TRANSACTION;
  UPDATE users SET miles_driven += 30 WHERE name='blake';
COMMIT TRANSACTION;

As now any transaction which touches the 'blake' row in `users` is going to
have to wait behind a full table scan completing in Accord's transaction
executor. Then any transaction which conflicts with one of those would also
have to wait, and eventually snowball into a cascading stall in transaction
processing.

Support for these kinds of things could be useful to some users though?  It
might be wise to consider extending Accord with a table-level lock concept
to avoid having to maintain conflict information on every key in the table
individually.

6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

CREATE TABLE users (name text primary key, state text, miles_driven int);

BEGIN TRANSACTION;
  SELECT sum(miles_driven) FROM users WHERE state='Ohio';
COMMIT TRANSACTION;

As read-only transactions should be cheap, and I don't think the computation
would be much more notably expensive than the non-transactional version of
this.

However, what maybe feels similar to a user:

BEGIN TRANSACTION;
  SELECT sum(miles_driven) FROM users WHERE state='Ohio';
  UPDATE miles_by_state SET total=sum(miles_driven) WHERE state='Ohio';
COMMIT TRANSACTION;

does mean we're back in the bad idea category.

=== I Have Literally No Idea

7. Triggers

What are the transactional guarantees of triggers?  These are
implemented in Java, so that'd just be outright banned by Accord?  Unless
triggers can have an API to spit out extra conflict ranges for the partition
they live on?  Sounds like an Accord

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin

Oops. I missed this part: "full primary key or a limit of 1"

Still curious what the end-user would see if there is more than one row
returned.

On Sat, Jun 4, 2022 at 5:46 PM Patrick McFadin  wrote:

> I've been waiting for this email! I'll echo what Jeff said about how
> exciting this is for the project.
>
> On the SELECT inside the transaction:
>
> In the first example, I'm making an assumption that you are doing a select
> on a partition key and only expect one result but is any valid CQL SELECT
> allowed here? If 'model' were a non-partition key column name and was
> indexed, then you could potentially have multiple rows returned and that
> isn't an allowed operation. Are only partition key lookups allowed or is
> there some logic looking for only one row?
>
> I'm asking because I can see in reverse time series models where you can
> select the latest temperature
>   SELECT temperature FROM weather_station WHERE id=1234 AND
> DATE='2022-06-04' LIMIT 1;
>
> (also, horrible example. Everyone knows that the return value for a
> Pinto.is_running will always evaluate to FALSE)
>
> On COMMIT TRANSACTION:
>
> So much to unpack here. In the case that the condition is met, is the
> mutation applied at that point, or has it already happened and there is
> something like a rollback segment? What is the case when the condition is
> not met and what is presented to the end-user? More importantly, what
> happens with respect to the A & I in ACID when the transaction is applied?
>
> If UPDATE is used, returning the number of rows changed would be helpful.
>
> Is this something that can be done interactively in cqlsh or does it all
> have to be submitted in one statement block?
>
> I'll stop here for now.
>
> Patrick
>
> On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org 
> wrote:
>
>> > The returned result set is after the updates are applied?
>>
>> Returning the prior values is probably more powerful, as you can perform
>> unconditional updates and respond to the prior state, that you otherwise
>> would not know. It’s also simpler to implement.
>>
>>
>>
>> My inclination is to require that SELECT statements are declared first,
>> so that we leave open the option of (in future) supporting SELECT
>> statements in any place in the transaction, returning the values as of
>> their position in a sequential execution of the statements.
>>
>>
>>
>> > And would you allow a transaction that had > 1 named select and no
>> modification statements, but commit if 1=1 ?
>>
>>
>>
>> My preference is that the IF condition is anyway optional, as it is much
>> more obvious to a user than concocting some always-true condition. But yes,
>> read-only transactions involving multiple tables will definitely be
>> supported.
>>
>>
>>
>>
>>
>> *From: *Jeff Jirsa 
>> *Date: *Saturday, 4 June 2022 at 22:49
>> *To: *dev@cassandra.apache.org 
>> *Subject: *Re: CEP-15 multi key transaction syntax
>>
>>
>> And would you allow a transaction that had > 1 named select and no
>> modification statements, but commit if 1=1 ?
>>
>> > On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
>> >
>> > 
>> >
>> >> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
>> wrote:
>> >>
>> >> Hi dev@,
>> >
>> > First, I’m ridiculously excited to see this.
>> >
>> >>
>> >> I’ve been working on a draft syntax for Accord transactions and wanted
>> to bring what I have to the dev list to solicit feedback and build
>> consensus before moving forward with it. The proposed transaction syntax is
>> intended to be an extended batch syntax. Basically batches with selects,
>> and an optional condition at the end. To facilitate conditions against an
>> arbitrary number of select statements, you can also name the statements,
>> and reference columns in the results. To cut down on the number of
>> operations needed, select values can also be used in updates, including
>> some math operations. Parameterization of literals is supported the same as
>> other statements.
>> >>
>> >> Here's an example selecting a row from 2 tables, and issuing updates
>> for each row if a condition is met:
>> >>
>> >> BEGIN TRANSACTION;
>> >> SELECT * FROM users WHERE name='blake' AS user;
>> >> SELECT * from cars WHERE model='pinto' AS car;
>> >> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
>> name='blake';
>> >> UPDATE cars SET miles_driven =

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Patrick McFadin

I've been waiting for this email! I'll echo what Jeff said about how
exciting this is for the project.

On the SELECT inside the transaction:

In the first example, I'm making an assumption that you are doing a select
on a partition key and only expect one result but is any valid CQL SELECT
allowed here? If 'model' were a non-partition key column name and was
indexed, then you could potentially have multiple rows returned and that
isn't an allowed operation. Are only partition key lookups allowed or is
there some logic looking for only one row?

I'm asking because I can see in reverse time series models where you can
select the latest temperature
  SELECT temperature FROM weather_station WHERE id=1234 AND
DATE='2022-06-04' LIMIT 1;

(also, horrible example. Everyone knows that the return value for a
Pinto.is_running will always evaluate to FALSE)

On COMMIT TRANSACTION:

So much to unpack here. In the case that the condition is met, is the
mutation applied at that point, or has it already happened and there is
something like a rollback segment? What is the case when the condition is
not met and what is presented to the end-user? More importantly, what
happens with respect to the A & I in ACID when the transaction is applied?

If UPDATE is used, returning the number of rows changed would be helpful.

Is this something that can be done interactively in cqlsh or does it all
have to be submitted in one statement block?

I'll stop here for now.

Patrick

On Sat, Jun 4, 2022 at 3:34 PM bened...@apache.org 
wrote:

> > The returned result set is after the updates are applied?
>
> Returning the prior values is probably more powerful, as you can perform
> unconditional updates and respond to the prior state, that you otherwise
> would not know. It’s also simpler to implement.
>
>
>
> My inclination is to require that SELECT statements are declared first, so
> that we leave open the option of (in future) supporting SELECT statements
> in any place in the transaction, returning the values as of their position
> in a sequential execution of the statements.
>
>
>
> > And would you allow a transaction that had > 1 named select and no
> modification statements, but commit if 1=1 ?
>
>
>
> My preference is that the IF condition is anyway optional, as it is much
> more obvious to a user than concocting some always-true condition. But yes,
> read-only transactions involving multiple tables will definitely be
> supported.
>
>
>
>
>
> *From: *Jeff Jirsa 
> *Date: *Saturday, 4 June 2022 at 22:49
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
>
> And would you allow a transaction that had > 1 named select and no
> modification statements, but commit if 1=1 ?
>
> > On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
> >
> > 
> >
> >> On Jun 3, 2022, at 8:39 AM, Blake Eggleston 
> wrote:
> >>
> >> Hi dev@,
> >
> > First, I’m ridiculously excited to see this.
> >
> >>
> >> I’ve been working on a draft syntax for Accord transactions and wanted
> to bring what I have to the dev list to solicit feedback and build
> consensus before moving forward with it. The proposed transaction syntax is
> intended to be an extended batch syntax. Basically batches with selects,
> and an optional condition at the end. To facilitate conditions against an
> arbitrary number of select statements, you can also name the statements,
> and reference columns in the results. To cut down on the number of
> operations needed, select values can also be used in updates, including
> some math operations. Parameterization of literals is supported the same as
> other statements.
> >>
> >> Here's an example selecting a row from 2 tables, and issuing updates
> for each row if a condition is met:
> >>
> >> BEGIN TRANSACTION;
> >> SELECT * FROM users WHERE name='blake' AS user;
> >> SELECT * from cars WHERE model='pinto' AS car;
> >> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE
> name='blake';
> >> UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE
> model='pinto';
> >> COMMIT TRANSACTION IF car.is_running;
> >>
> >> This can be simplified by naming the updates with an AS  syntax.
> If updates are named, a corresponding read is generated behind the scenes
> and its values inform the update.
> >>
> >> Here's an example, the query is functionally identical to the previous
> query. In the case of the user update, a read is still performed behind the
> scenes to enable the calculation of miles_driven + 30, but doesn't need to
> be named since it's not referenced anywhere else.
> >>
> >> BEGIN TRANSACTION

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread bened...@apache.org

> The returned result set is after the updates are applied?

Returning the prior values is probably more powerful, as you can perform 
unconditional updates and respond to the prior state, that you otherwise would 
not know. It’s also simpler to implement.

My inclination is to require that SELECT statements are declared first, so that 
we leave open the option of (in future) supporting SELECT statements in any 
place in the transaction, returning the values as of their position in a 
sequential execution of the statements.

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

My preference is that the IF condition is anyway optional, as it is much more 
obvious to a user than concocting some always-true condition. But yes, 
read-only transactions involving multiple tables will definitely be supported.


From: Jeff Jirsa 
Date: Saturday, 4 June 2022 at 22:49
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax

And would you allow a transaction that had > 1 named select and no modification 
statements, but commit if 1=1 ?

> On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
>
> 
>
>> On Jun 3, 2022, at 8:39 AM, Blake Eggleston  wrote:
>>
>> Hi dev@,
>
> First, I’m ridiculously excited to see this.
>
>>
>> I’ve been working on a draft syntax for Accord transactions and wanted to 
>> bring what I have to the dev list to solicit feedback and build consensus 
>> before moving forward with it. The proposed transaction syntax is intended 
>> to be an extended batch syntax. Basically batches with selects, and an 
>> optional condition at the end. To facilitate conditions against an arbitrary 
>> number of select statements, you can also name the statements, and reference 
>> columns in the results. To cut down on the number of operations needed, 
>> select values can also be used in updates, including some math operations. 
>> Parameterization of literals is supported the same as other statements.
>>
>> Here's an example selecting a row from 2 tables, and issuing updates for 
>> each row if a condition is met:
>>
>> BEGIN TRANSACTION;
>> SELECT * FROM users WHERE name='blake' AS user;
>> SELECT * from cars WHERE model='pinto' AS car;
>> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
>> UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
>> COMMIT TRANSACTION IF car.is_running;
>>
>> This can be simplified by naming the updates with an AS  syntax. If 
>> updates are named, a corresponding read is generated behind the scenes and 
>> its values inform the update.
>>
>> Here's an example, the query is functionally identical to the previous 
>> query. In the case of the user update, a read is still performed behind the 
>> scenes to enable the calculation of miles_driven + 30, but doesn't need to 
>> be named since it's not referenced anywhere else.
>>
>> BEGIN TRANSACTION;
>> UPDATE users SET miles_driven += 30 WHERE name='blake';
>> UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
>> COMMIT TRANSACTION IF car.is_running;
>>
>> Here’s another example, performing the canonical bank transfer:
>>
>> BEGIN TRANSACTION;
>> UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>> UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
>> COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;
>>
>> As you can see from the examples, column values can be referenced via a dot 
>> syntax, ie: . -> select1.value. Since the read portion 
>> of the transaction is performed before evaluating conditions or applying 
>> updates, values read can be freely applied to non-primary key values in 
>> updates. Select statements used either in checking a condition or creating 
>> an update must be restricted to a single row, either by specifying the full 
>> primary key or a limit of 1. Multi-row selects are allowed, but only for 
>> returning data to the client (see below).
>>
>> For evaluating conditions, = & != are available for all types, <, <=, >, >= 
>> are available for numerical types, and EXISTS, NOT EXISTS can be used for 
>> partitions, rows, and values. If any column references cannot be satisfied 
>> by the result of the reads, the condition implicitly fails. This prevents 
>> having to include a bunch of exists statements.
>
> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?
>
>>
>> On completion, an operation would return a boolean value indicating the 
>> operation had been applied, and a result set for each named select (but not 
>> named update). We could also support an optional RETURN keyword, which would 
>> allow the user to only return specific named selects (ie: RETURN select1, 
>> select2).
>>
>
> The returned result set is after the updates are applied?
>
>
>> Let me know what you think!
>>
>> Blake

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Jeff Jirsa



And would you allow a transaction that had > 1 named select and no modification 
statements, but commit if 1=1 ? 

> On Jun 4, 2022, at 2:45 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On Jun 3, 2022, at 8:39 AM, Blake Eggleston  wrote:
>> 
>> Hi dev@,
> 
> First, I’m ridiculously excited to see this. 
> 
>> 
>> I’ve been working on a draft syntax for Accord transactions and wanted to 
>> bring what I have to the dev list to solicit feedback and build consensus 
>> before moving forward with it. The proposed transaction syntax is intended 
>> to be an extended batch syntax. Basically batches with selects, and an 
>> optional condition at the end. To facilitate conditions against an arbitrary 
>> number of select statements, you can also name the statements, and reference 
>> columns in the results. To cut down on the number of operations needed, 
>> select values can also be used in updates, including some math operations. 
>> Parameterization of literals is supported the same as other statements.
>> 
>> Here's an example selecting a row from 2 tables, and issuing updates for 
>> each row if a condition is met:
>> 
>> BEGIN TRANSACTION;
>> SELECT * FROM users WHERE name='blake' AS user;
>> SELECT * from cars WHERE model='pinto' AS car;
>> UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
>> UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
>> COMMIT TRANSACTION IF car.is_running;
>> 
>> This can be simplified by naming the updates with an AS  syntax. If 
>> updates are named, a corresponding read is generated behind the scenes and 
>> its values inform the update.
>> 
>> Here's an example, the query is functionally identical to the previous 
>> query. In the case of the user update, a read is still performed behind the 
>> scenes to enable the calculation of miles_driven + 30, but doesn't need to 
>> be named since it's not referenced anywhere else.
>> 
>> BEGIN TRANSACTION;
>> UPDATE users SET miles_driven += 30 WHERE name='blake';
>> UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
>> COMMIT TRANSACTION IF car.is_running;
>> 
>> Here’s another example, performing the canonical bank transfer:
>> 
>> BEGIN TRANSACTION;
>> UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>> UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
>> COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;
>> 
>> As you can see from the examples, column values can be referenced via a dot 
>> syntax, ie: . -> select1.value. Since the read portion 
>> of the transaction is performed before evaluating conditions or applying 
>> updates, values read can be freely applied to non-primary key values in 
>> updates. Select statements used either in checking a condition or creating 
>> an update must be restricted to a single row, either by specifying the full 
>> primary key or a limit of 1. Multi-row selects are allowed, but only for 
>> returning data to the client (see below).
>> 
>> For evaluating conditions, = & != are available for all types, <, <=, >, >= 
>> are available for numerical types, and EXISTS, NOT EXISTS can be used for 
>> partitions, rows, and values. If any column references cannot be satisfied 
>> by the result of the reads, the condition implicitly fails. This prevents 
>> having to include a bunch of exists statements.
> 
> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?
> 
>> 
>> On completion, an operation would return a boolean value indicating the 
>> operation had been applied, and a result set for each named select (but not 
>> named update). We could also support an optional RETURN keyword, which would 
>> allow the user to only return specific named selects (ie: RETURN select1, 
>> select2).
>> 
> 
> The returned result set is after the updates are applied? 
> 
> 
>> Let me know what you think!
>> 
>> Blake

Re: CEP-15 multi key transaction syntax

2022-06-04 Thread Jeff Jirsa




> On Jun 3, 2022, at 8:39 AM, Blake Eggleston  wrote:
> 
> Hi dev@,

First, I’m ridiculously excited to see this. 

> 
> I’ve been working on a draft syntax for Accord transactions and wanted to 
> bring what I have to the dev list to solicit feedback and build consensus 
> before moving forward with it. The proposed transaction syntax is intended to 
> be an extended batch syntax. Basically batches with selects, and an optional 
> condition at the end. To facilitate conditions against an arbitrary number of 
> select statements, you can also name the statements, and reference columns in 
> the results. To cut down on the number of operations needed, select values 
> can also be used in updates, including some math operations. Parameterization 
> of literals is supported the same as other statements.
> 
> Here's an example selecting a row from 2 tables, and issuing updates for each 
> row if a condition is met:
> 
> BEGIN TRANSACTION;
>  SELECT * FROM users WHERE name='blake' AS user;
>  SELECT * from cars WHERE model='pinto' AS car;
>  UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
>  UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
> COMMIT TRANSACTION IF car.is_running;
> 
> This can be simplified by naming the updates with an AS  syntax. If 
> updates are named, a corresponding read is generated behind the scenes and 
> its values inform the update.
> 
> Here's an example, the query is functionally identical to the previous query. 
> In the case of the user update, a read is still performed behind the scenes 
> to enable the calculation of miles_driven + 30, but doesn't need to be named 
> since it's not referenced anywhere else.
> 
> BEGIN TRANSACTION;
>  UPDATE users SET miles_driven += 30 WHERE name='blake';
>  UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
> COMMIT TRANSACTION IF car.is_running;
> 
> Here’s another example, performing the canonical bank transfer:
> 
> BEGIN TRANSACTION;
>  UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>  UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
> COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;
> 
> As you can see from the examples, column values can be referenced via a dot 
> syntax, ie: . -> select1.value. Since the read portion 
> of the transaction is performed before evaluating conditions or applying 
> updates, values read can be freely applied to non-primary key values in 
> updates. Select statements used either in checking a condition or creating an 
> update must be restricted to a single row, either by specifying the full 
> primary key or a limit of 1. Multi-row selects are allowed, but only for 
> returning data to the client (see below).
> 
> For evaluating conditions, = & != are available for all types, <, <=, >, >= 
> are available for numerical types, and EXISTS, NOT EXISTS can be used for 
> partitions, rows, and values. If any column references cannot be satisfied by 
> the result of the reads, the condition implicitly fails. This prevents having 
> to include a bunch of exists statements.

Is there a new keyword for “partition (not) exists” or is it inferred by the 
select?

> 
> On completion, an operation would return a boolean value indicating the 
> operation had been applied, and a result set for each named select (but not 
> named update). We could also support an optional RETURN keyword, which would 
> allow the user to only return specific named selects (ie: RETURN select1, 
> select2).
> 

The returned result set is after the updates are applied? 


> Let me know what you think!
> 
> Blake

CEP-15 multi key transaction syntax

2022-06-03 Thread Blake Eggleston

Hi dev@,

I’ve been working on a draft syntax for Accord transactions and wanted to bring 
what I have to the dev list to solicit feedback and build consensus before 
moving forward with it. The proposed transaction syntax is intended to be an 
extended batch syntax. Basically batches with selects, and an optional 
condition at the end. To facilitate conditions against an arbitrary number of 
select statements, you can also name the statements, and reference columns in 
the results. To cut down on the number of operations needed, select values can 
also be used in updates, including some math operations. Parameterization of 
literals is supported the same as other statements.

Here's an example selecting a row from 2 tables, and issuing updates for each 
row if a condition is met:

BEGIN TRANSACTION;
  SELECT * FROM users WHERE name='blake' AS user;
  SELECT * from cars WHERE model='pinto' AS car;
  UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
COMMIT TRANSACTION IF car.is_running;

This can be simplified by naming the updates with an AS  syntax. If 
updates are named, a corresponding read is generated behind the scenes and its 
values inform the update.

Here's an example, the query is functionally identical to the previous query. 
In the case of the user update, a read is still performed behind the scenes to 
enable the calculation of miles_driven + 30, but doesn't need to be named since 
it's not referenced anywhere else.

BEGIN TRANSACTION;
  UPDATE users SET miles_driven += 30 WHERE name='blake';
  UPDATE cars SET miles_driven += 30 WHERE model='pinto' AS car;
COMMIT TRANSACTION IF car.is_running;

Here’s another example, performing the canonical bank transfer:

BEGIN TRANSACTION;
  UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
  UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;

As you can see from the examples, column values can be referenced via a dot 
syntax, ie: . -> select1.value. Since the read portion of 
the transaction is performed before evaluating conditions or applying updates, 
values read can be freely applied to non-primary key values in updates. Select 
statements used either in checking a condition or creating an update must be 
restricted to a single row, either by specifying the full primary key or a 
limit of 1. Multi-row selects are allowed, but only for returning data to the 
client (see below).

For evaluating conditions, = & != are available for all types, <, <=, >, >= are 
available for numerical types, and EXISTS, NOT EXISTS can be used for 
partitions, rows, and values. If any column references cannot be satisfied by 
the result of the reads, the condition implicitly fails. This prevents having 
to include a bunch of exists statements.

On completion, an operation would return a boolean value indicating the 
operation had been applied, and a result set for each named select (but not 
named update). We could also support an optional RETURN keyword, which would 
allow the user to only return specific named selects (ie: RETURN select1, 
select2).

Let me know what you think!

Blake

89 matches

Mail list logo