[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-27 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613077#comment-15613077
 ] 

James Taylor edited comment on PHOENIX-6 at 10/27/16 8:43 PM:
--

bq. How about a covered column test case for local indexes too?
Sounds good - I've made the parameterized local index covered.

bq. Do you think queries with a subquery in the update clause will compile? 
We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not 
possible to have a subquery (or any query) there.

bq. Also what if the UDF has a row key column. Will the expression compiler be 
able to figure that out?
The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF 
would yield a syntax error. It's valid to use a UDF on the right-hand-side (and 
equally valid to refer to row key columns too). In that case, you'd be invoking 
the UDF not causing the row key to change.

Thanks for the reviews, [~samarthjain]!


was (Author: jamestaylor):
bq. How about a covered column test case for local indexes too?
Sounds good - I've made the parameterized local index covered.

Do you think queries with a subquery in the update clause will compile? 
We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not 
possible to have a subquery (or any query) there.

 Also what if the UDF has a row key column. Will the expression compiler be 
able to figure that out?
The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF 
would yield a syntax error. It's valid to use a UDF on the right-hand-side (and 
equally valid to refer to row key columns too). In that case, you'd be invoking 
the UDF not causing the row key to change.

Thanks for the reviews, [~samarthjain]!

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, 
> PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, 
> PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, 
> PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> * The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> * This new clause cannot be used with
> ** Immutable tables since the whole point is to atomically update a row in 
> place which isn't allowed for immutable tables. 
> ** Transactional tables because these use optimistic concurrency as their 
> mechanism for consistency and isolation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 7:15 PM:
--

It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT

Does the first FROM SELECT see the results of the first DUPLICATE KEY? 
Does the second DUPLICATE KEY see the results from the first FROM SELECT? 
Does the second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some of the results from 
statements in the same commit batch could occur in different orders. It might 
be better, at least long term, just to support per-statement auto-commit 
batching at the client level (since commit basically handles non-autocommit 
batching), though the difficulty would be higher.


was (Author: cameron.hatfield):
It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT

Does the first FROM SELECT see the results of the first DUPLICATE KEY? 
Does the second DUPLICATE KEY see the results from the first FROM SELECT? 
Does the second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM:
--

It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT

Does the first FROM SELECT see the results of the first DUPLICATE KEY? 
Does the second DUPLICATE KEY see the results from the first FROM SELECT? 
Does the second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.


was (Author: cameron.hatfield):
It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT

Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM:
--

It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.


was (Author: cameron.hatfield):
It would work for me, though I'm not sure if it would be consistent the 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM:
--

It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.


was (Author: cameron.hatfield):
It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
*UPSERT DUPLICATE KEY
*UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-14 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM:
--

It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT

Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.


was (Author: cameron.hatfield):
It would work for me, though I'm not sure if it would be consistent the other 
statements that exist within phoenix, as far as read isolation goes. 
Specifically, with the following ordering: 
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
* UPSERT DUPLICATE KEY
* UPSERT FROM SELECT
Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the 
second DUPLICATE KEY see the results from the first FROM SELECT? Does the 
second FROM SELECT see the results from the first FROM SELECT?

Would definitely be counter intuitive if only some statements in the same 
commit batch could occur in different orders. It might be better, at least long 
term, just to support per-statement auto-commit batching at the client level 
(since commit basically handles non-autocommit batching), though the difficulty 
would be higher.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-6_wip1.patch
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547135#comment-15547135
 ] 

James Taylor edited comment on PHOENIX-6 at 10/11/16 12:33 AM:
---

Would you mind filing a separate JIRA for your use case? If I understand it 
correctly, you'd like to be able to have multiple atomic insert/update 
statements for the same row in the same commit batch and have them sequentially 
execute? Would the atomicity be for the entire commit batch (for all rows being 
operated on) or if not, how would potentially overlapping rows from other 
batches be handled?

For your question, I think you meant to use the same table name for both 
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see 
the data from the UPSERT VALUES. When auto commit is off, the data is kept in 
memory on the client until the commit occurs. The UPSERT VALUES would cause a 
new row to be cached on the client (in memory) and the UPSERT SELECT would 
cause N rows to be cached on the client (where N is the number of rows 
currently in the table). The commit essentially generates the HBase Puts and 
does a batch mutate.

For transactional tables, you do see your own updates. In that case, the UPSERT 
SELECT would see the row from the UPSERT VALUES call.

One more thought I had, [~cameron.hatfield]. If we allowed multiple UPSERT 
VALUES ... ON DUPLICATE KEY statements for the same row key, it might work as 
you'd like. This could potentially be accomplished by having a different 
version for each Cell. In that case, each statement would operate under its own 
row lock on the latest value. Is that more of what you're looking for?


was (Author: jamestaylor):
Would you mind filing a separate JIRA for your use case? If I understand it 
correctly, you'd like to be able to have multiple atomic insert/update 
statements for the same row in the same commit batch and have them sequentially 
execute? Would the atomicity be for the entire commit batch (for all rows being 
operated on) or if not, how would potentially overlapping rows from other 
batches be handled?

For your question, I think you meant to use the same table name for both 
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see 
the data from the UPSERT VALUES. When auto commit is off, the data is kept in 
memory on the client until the commit occurs. The UPSERT VALUES would cause a 
new row to be cached on the client (in memory) and the UPSERT SELECT would 
cause N rows to be cached on the client (where N is the number of rows 
currently in the table). The commit essentially generates the HBase Puts and 
does a batch mutate.

For transactional tables, you do see your own updates. In that case, the UPSERT 
SELECT would see the row from the UPSERT VALUES call.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:56 PM:
--

For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, and the same PK is in 
both sides of the batch, an extra two rows should exist. The most likely time 
for this to show up, of course, is multiple upsert selects within the same 
commit on a large enough table. In playing with this in squirrel sql, I don't 
really have enough knowledge to what would actually trigger multiple batches to 
happen, so I have no repro for this :).

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc. if the problem exists at 
all.


was (Author: cameron.hatfield):
For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, and the same PK is in 
both sides of the batch, an extra two rows should exist. The most likely time 
for this to show up, of course, is multiple upsert selects within the same 
commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on i

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:34 PM:
--

For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, and the same PK is in 
both sides of the batch, an extra two rows should exist. The most likely time 
for this to show up, of course, is multiple upsert selects within the same 
commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.


was (Author: cameron.hatfield):
For the help with my use case question, not really, due to the round-trip per 
row insert (same reasoning as the other "batch checkandput" bugs within phoenix 
and HBase).

For the question about commit, and this is just me trying to understand it, if 
you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that 
order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a 
delete, or select), according to that wording. Is that correct? For example, 
what happens in the following case (when using jdbc driver commands for 
autocommit / commit):
{code:sql}
AUTOCOMMIT OFF;
UPSERT INTO T (PK, V) VALUES (0,0);

-- Run batch size + 1 times
UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff;

COMMIT;
{code}

In theory, according to SQL + the docs, I would expect to see only one new row 
added to the table. But if more then one batch occurs, an extra two rows should 
exist. The most likely time for this to show up, of course, is multiple upsert 
selects within the same commit on a large enough table.

Anyways, this is only mildly related to this bug, as it this same problem would 
be more pronounced with an on duplicate key command, as you would probably 
actually be using it for things like increments / etc.

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:45 PM:
-

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase
{code:sql}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
{code}
After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


inter-query (Same query as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
{code:sql}
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
{code}
After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


inter-query (Same query as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546436#comment-15546436
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:43 PM:
-

Except, in the second case (two statements, same batch), the value of row 'a' 
of would non-existent, as they have never been inserted before. So wouldn't the 
result be 0,0 if it was in the same batch?


was (Author: cameron.hatfield):
Except, in the second case (two statements, same batch), the value of row 'a' 
of would non-existent, as they have never been interested before. So wouldn't 
the result be 0,0 if it was in the same batch?

> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> =, ...] ]
> {code}
> The following restrictions will apply:
> - The  may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> - If the table is immutable, the  may not appear in a secondary 
> index. This is because the mutations for indexes on immutable tables are 
> calculated on the client-side, while this new syntax would potentially modify 
> the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-04 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 6:01 PM:
-

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


inter-query (Same query as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Featu

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-03 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:21 PM:
--

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client. Aside from that, since 
batching is usually a performance optimization, I wouldn't want the 
optimization actually affecting the results.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a ne

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-03 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:13 PM:
--

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
COUNTER1, COUNTER2
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
COUNTER1, COUNTER2
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomical

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

2016-10-03 Thread Cameron Hatfield (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705
 ] 

Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:12 PM:
--

At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Statements ran in order, in two separate execute round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.




was (Author: cameron.hatfield):
At what point will the expressions be evaluated in the context of update 
statement?

Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the 
RHS of each expression):
Both run in two separate round trips to Phoenix/HBase

UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1;

After the second upsert, which would be the result?
1: 1, 1
2: 1, 2
3: 2, 1

Last time I looked at this during the update statement thread, I believe it was 
number 1 that should be the result according to the SQL standard.


intra-query (Same querie as example query in description, twice):
Multiple queries, within the same batch sent to the Phoenix coprocessor
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;

After the batch is run, which would be the result?
1: 0,0
2: 1,1

This is the one we had issues with choosing how we wanted to do it with the CAS 
change. Since they aren't in the memstore until the whole batch is finished, 
you have to keep state within the coprocessor, for each edit, to ensure it came 
out with number 2. On top of that, if you have two different queries that will 
have very different results in different orders, then batching can readily 
affect the outcome. Right now, even though all of the warnings within HBase 
talk about ordering not guaranteed for batching, for a single row, it seems to 
be ordered in batch order as sent by the client.

>From a SQL standard direction, I believe 2 would be correct here, as there is 
>no idea of a "transaction" for a batch in phoenix (even per key), so I would 
>assume that it would be equiv. to auto commit after each statement.



> Support ON DUPLICATE KEY construct
> --
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would o