[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613077#comment-15613077 ] James Taylor edited comment on PHOENIX-6 at 10/27/16 8:43 PM: -- bq. How about a covered column test case for local indexes too? Sounds good - I've made the parameterized local index covered. bq. Do you think queries with a subquery in the update clause will compile? We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not possible to have a subquery (or any query) there. bq. Also what if the UDF has a row key column. Will the expression compiler be able to figure that out? The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF would yield a syntax error. It's valid to use a UDF on the right-hand-side (and equally valid to refer to row key columns too). In that case, you'd be invoking the UDF not causing the row key to change. Thanks for the reviews, [~samarthjain]! was (Author: jamestaylor): bq. How about a covered column test case for local indexes too? Sounds good - I've made the parameterized local index covered. Do you think queries with a subquery in the update clause will compile? We only allow expressions in the ON DUPLICATE KEY UPDATE syntax, so it's not possible to have a subquery (or any query) there. Also what if the UDF has a row key column. Will the expression compiler be able to figure that out? The left-hand-side of the ON DUPLICATE KEY UPDATE may only be a column - a UDF would yield a syntax error. It's valid to use a UDF on the right-hand-side (and equally valid to refer to row key columns too). In that case, you'd be invoking the UDF not causing the row key to change. Thanks for the reviews, [~samarthjain]! > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6.patch, PHOENIX-6_4.x-HBase-0.98.patch, > PHOENIX-6_v2.patch, PHOENIX-6_v3.patch, PHOENIX-6_v4.patch, > PHOENIX-6_v5.patch, PHOENIX-6_wip1.patch, PHOENIX-6_wip2.patch, > PHOENIX-6_wip3.patch, PHOENIX-6_wip4.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > * The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > * This new clause cannot be used with > ** Immutable tables since the whole point is to atomically update a row in > place which isn't allowed for immutable tables. > ** Transactional tables because these use optimistic concurrency as their > mechanism for consistency and isolation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 7:15 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some of the results from statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:43 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: *UPSERT DUPLICATE KEY *UPSERT FROM SELECT *UPSERT DUPLICATE KEY *UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576149#comment-15576149 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/14/16 6:44 PM: -- It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. was (Author: cameron.hatfield): It would work for me, though I'm not sure if it would be consistent the other statements that exist within phoenix, as far as read isolation goes. Specifically, with the following ordering: * UPSERT DUPLICATE KEY * UPSERT FROM SELECT * UPSERT DUPLICATE KEY * UPSERT FROM SELECT Does the first FROM SELECT see the results of the first DUPLICATE KEY? Does the second DUPLICATE KEY see the results from the first FROM SELECT? Does the second FROM SELECT see the results from the first FROM SELECT? Would definitely be counter intuitive if only some statements in the same commit batch could occur in different orders. It might be better, at least long term, just to support per-statement auto-commit batching at the client level (since commit basically handles non-autocommit batching), though the difficulty would be higher. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > Attachments: PHOENIX-6_wip1.patch > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY UPDATE COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547135#comment-15547135 ] James Taylor edited comment on PHOENIX-6 at 10/11/16 12:33 AM: --- Would you mind filing a separate JIRA for your use case? If I understand it correctly, you'd like to be able to have multiple atomic insert/update statements for the same row in the same commit batch and have them sequentially execute? Would the atomicity be for the entire commit batch (for all rows being operated on) or if not, how would potentially overlapping rows from other batches be handled? For your question, I think you meant to use the same table name for both statements, right? For non transactional tables, the UPSERT SELECT wouldn't see the data from the UPSERT VALUES. When auto commit is off, the data is kept in memory on the client until the commit occurs. The UPSERT VALUES would cause a new row to be cached on the client (in memory) and the UPSERT SELECT would cause N rows to be cached on the client (where N is the number of rows currently in the table). The commit essentially generates the HBase Puts and does a batch mutate. For transactional tables, you do see your own updates. In that case, the UPSERT SELECT would see the row from the UPSERT VALUES call. One more thought I had, [~cameron.hatfield]. If we allowed multiple UPSERT VALUES ... ON DUPLICATE KEY statements for the same row key, it might work as you'd like. This could potentially be accomplished by having a different version for each Cell. In that case, each statement would operate under its own row lock on the latest value. Is that more of what you're looking for? was (Author: jamestaylor): Would you mind filing a separate JIRA for your use case? If I understand it correctly, you'd like to be able to have multiple atomic insert/update statements for the same row in the same commit batch and have them sequentially execute? Would the atomicity be for the entire commit batch (for all rows being operated on) or if not, how would potentially overlapping rows from other batches be handled? For your question, I think you meant to use the same table name for both statements, right? For non transactional tables, the UPSERT SELECT wouldn't see the data from the UPSERT VALUES. When auto commit is off, the data is kept in memory on the client until the commit occurs. The UPSERT VALUES would cause a new row to be cached on the client (in memory) and the UPSERT SELECT would cause N rows to be cached on the client (where N is the number of rows currently in the table). The commit essentially generates the HBase Puts and does a batch mutate. For transactional tables, you do see your own updates. In that case, the UPSERT SELECT would see the row from the UPSERT VALUES call. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > To handle the maintenance of immutable indexes, we'll need to push the > maintenance to the server side. > This is because the mutations for indexes on immutable tables are calculated > on the client-side, while this new syntax would potentially modify the value > on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:56 PM: -- For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. In playing with this in squirrel sql, I don't really have enough knowledge to what would actually trigger multiple batches to happen, so I have no repro for this :). Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. if the problem exists at all. was (Author: cameron.hatfield): For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on i
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546874#comment-15546874 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 10:34 PM: -- For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, and the same PK is in both sides of the batch, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. was (Author: cameron.hatfield): For the help with my use case question, not really, due to the round-trip per row insert (same reasoning as the other "batch checkandput" bugs within phoenix and HBase). For the question about commit, and this is just me trying to understand it, if you run an UPSERT VALUES and an UPSERT SELECT in the same commit, in that order, the UPSERT SELECT should not see the results of UPSERT VALUES (or a delete, or select), according to that wording. Is that correct? For example, what happens in the following case (when using jdbc driver commands for autocommit / commit): {code:sql} AUTOCOMMIT OFF; UPSERT INTO T (PK, V) VALUES (0,0); -- Run batch size + 1 times UPSERT INTO entity.stuff (PK, V) SELECT PK + 1 as PK, V + 1 FROM entity.stuff; COMMIT; {code} In theory, according to SQL + the docs, I would expect to see only one new row added to the table. But if more then one batch occurs, an extra two rows should exist. The most likely time for this to show up, of course, is multiple upsert selects within the same commit on a large enough table. Anyways, this is only mildly related to this bug, as it this same problem would be more pronounced with an on duplicate key command, as you would probably actually be using it for things like increments / etc. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:45 PM: - At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase {code:sql} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; {code} After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor {code:sql} UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; {code} After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546436#comment-15546436 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 7:43 PM: - Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been inserted before. So wouldn't the result be 0,0 if it was in the same batch? was (Author: cameron.hatfield): Except, in the second case (two statements, same batch), the value of row 'a' of would non-existent, as they have never been interested before. So wouldn't the result be 0,0 if it was in the same batch? > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would obviously have a negative impact on performance. For > an example of similar syntax , see MySQL documentation at > http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > See this discussion for more detail: > https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. > A related discussion is on PHOENIX-2909. > Initially we'd support the following: > # This would prevent the setting of VAL to 0 if the row already exists: > {code} > UPSERT INTO T (PK, VAL) VALUES ('a',0) > ON DUPLICATE KEY IGNORE; > {code} > # This would increment the valueS of COUNTER1 and COUNTER2 if the row already > exists and otherwise initialize them to 0: > {code} > UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) > ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; > {code} > So the general form is: > {code} > UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE > =, ...] ] > {code} > The following restrictions will apply: > - The may not be part of the primary key constraint - only KeyValue > columns will be allowed. > - If the table is immutable, the may not appear in a secondary > index. This is because the mutations for indexes on immutable tables are > calculated on the client-side, while this new syntax would potentially modify > the value on the server-side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/4/16 6:01 PM: - At what point will the expressions be evaluated in the context of update statement? Specifically, intra-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. inter-query (Same query as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Featu
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:21 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. Aside from that, since batching is usually a performance optimization, I wouldn't want the optimization actually affecting the results. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a ne
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:13 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? COUNTER1, COUNTER2 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? COUNTER1, COUNTER2 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomical
[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct
[ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543705#comment-15543705 ] Cameron Hatfield edited comment on PHOENIX-6 at 10/3/16 11:12 PM: -- At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Statements ran in order, in two separate execute round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. was (Author: cameron.hatfield): At what point will the expressions be evaluated in the context of update statement? Specifically, inter-query order (Notice the use of COUNTER1 and COUNTER2 on the RHS of each expression): Both run in two separate round trips to Phoenix/HBase UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER2 + 1, COUNTER2 = COUNTER1 + 1; After the second upsert, which would be the result? 1: 1, 1 2: 1, 2 3: 2, 1 Last time I looked at this during the update statement thread, I believe it was number 1 that should be the result according to the SQL standard. intra-query (Same querie as example query in description, twice): Multiple queries, within the same batch sent to the Phoenix coprocessor UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1; After the batch is run, which would be the result? 1: 0,0 2: 1,1 This is the one we had issues with choosing how we wanted to do it with the CAS change. Since they aren't in the memstore until the whole batch is finished, you have to keep state within the coprocessor, for each edit, to ensure it came out with number 2. On top of that, if you have two different queries that will have very different results in different orders, then batching can readily affect the outcome. Right now, even though all of the warnings within HBase talk about ordering not guaranteed for batching, for a single row, it seems to be ordered in batch order as sent by the client. >From a SQL standard direction, I believe 2 would be correct here, as there is >no idea of a "transaction" for a batch in phoenix (even per key), so I would >assume that it would be equiv. to auto commit after each statement. > Support ON DUPLICATE KEY construct > -- > > Key: PHOENIX-6 > URL: https://issues.apache.org/jira/browse/PHOENIX-6 > Project: Phoenix > Issue Type: New Feature >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.9.0 > > > To support inserting a new row only if it doesn't already exist, we should > support the "on duplicate key" construct for UPSERT. With this construct, the > UPSERT VALUES statement would run atomically and would thus require a read > before write which would o