[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Thomas Tauber-Marshall has uploaded a new patch set (#3).
Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..
IMPALA-3725 Support Kudu UPSERT in Impala
This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT and uses all of the analysis,
planning, and execution machinery as INSERT, except that if
there's a primary key collision instead of returning an error an
update is performed instead.
New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
select_stmt
| VALUES (value [, value...]) [, (value [, (value...)]) ...]
}
where column list must contain all of the key columns in table_name,
if specified, and table_name must be a Kudu table.
Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
M fe/src/main/java/com/cloudera/impala/planner/TableSink.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
15 files changed, 392 insertions(+), 35 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/4047/3
--
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Thomas Tauber-Marshall has posted comments on this change.
Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..
Patch Set 1:
(9 comments)
http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG
Commit Message:
PS1, Line 22: all of the columns in table_name
> The original point of this was to do the simplest thing until we had suppor
Done
http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:
PS1, Line 447: All columns if this is an UPSERT.
> See my comment in the commit msg. I am not sure this restriction is correct
Done
PS1, Line 689: !isUpsert_
> flip the conditions to avoid using a negative, just easier to read
Done
http://gerrit.cloudera.org:8080/#/c/4047/2/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:
PS2, Line 445: ars
> typo
Done
PS2, Line 447:* - All columns if this is an UPSERT.
> As discussed, if Kudu leaves non-referenced columns as-is, then let's remov
Done
PS2, Line 689: if (!isUpsert_) {
: strBuilder.append("INSERT ");
: } else {
: strBuilder.append("UPSERT");
: }
> Flip the logic; easier to read when, all else being equal, there are fewer
Done
http://gerrit.cloudera.org:8080/#/c/4047/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:
PS2, Line 524:
:
:
:
:
:
:
:
> not needed, remove
Done
PS2, Line 541:
:
:
:
:
:
:
> same
Done
PS2, Line 560:
:
:
:
:
:
:
:
:
:
:
:
> same
Done
--
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Thomas Tauber-Marshall
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3725 Support Kudu UPSERT in Impala .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG Commit Message: PS1, Line 22: all of the columns in table_name > This was Matt's suggestion. We don't currently know what the primary key co The original point of this was to do the simplest thing until we had support for nullable columns, at which point we expected the code to change -- i.e. to avoid wasted work. However, as we've discussed, we now want to see if Kudu will just leave non-referenced columns as-is, and error on violations itself (and we handle them gracefully). -- To view, visit http://gerrit.cloudera.org:8080/4047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..
Patch Set 2:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:
PS1, Line 689: !isUpsert_
flip the conditions to avoid using a negative, just easier to read
http://gerrit.cloudera.org:8080/#/c/4047/2/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:
PS2, Line 445: ars
typo
PS2, Line 447:* - All columns if this is an UPSERT.
As discussed, if Kudu leaves non-referenced columns as-is, then let's remove
this restriction. I think PK columns will still be required.
PS2, Line 689: if (!isUpsert_) {
: strBuilder.append("INSERT ");
: } else {
: strBuilder.append("UPSERT ");
: }
Flip the logic; easier to read when, all else being equal, there are fewer
negatives.
http://gerrit.cloudera.org:8080/#/c/4047/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:
PS2, Line 524: SCANRANGELOCATIONS
: NODE 0:
: HDFS SPLIT
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt
0:20853
: DISTRIBUTEDPLAN
: UPSERT INTO KUDU [functional_kudu.testtbl]
: |
: 00:SCAN HDFS [functional.alltypes]
:partitions=1/24 files=1 size=20.36KB
not needed, remove
PS2, Line 541: SCANRANGELOCATIONS
:
: DISTRIBUTEDPLAN
: UPSERT INTO KUDU [functional_kudu.testtbl]
: |
: 00:UNION
:constant-operands=2
same
PS2, Line 560: SCANRANGELOCATIONS
: NODE 0:
: HDFS SPLIT
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt
0:20853
: DISTRIBUTEDPLAN
: UPSERT INTO KUDU [functional_kudu.testtbl]
: |
: 01:EXCHANGE [UNPARTITIONED]
: | limit: 10
: |
: 00:SCAN HDFS [functional.alltypes]
:partitions=1/24 files=1 size=20.36KB
:limit: 10
same
--
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Thomas Tauber-Marshall
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3725 Support Kudu UPSERT in Impala .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG Commit Message: PS1, Line 22: all of the columns in table_name > Why is that? This is not a requirement for INSERT. Until we have nullable p This was Matt's suggestion. We don't currently know what the primary key columns are during analysis, as far as I can tell, so all we can do is pass the values off to Kudu and see if it returns an error. If the number of values to upsert is large, we might run into an error in the middle and end up in an inconsistent state, with some of the upsert values applied and some not. Of course, we can relax this requirement if we start storing info about primary keys so that we could throw an error during analysis, but its better to err on the side of being more restrictive for now, since its easier to add functionality than to take it away. There's also the issue of the semantics of leaving out columns, eg. do they get overwritten with null or are they left alone. I don't personally feel strongly about it either way, though, especially since as you point out we don't require this for insert. http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 363: DeleteStmt delete_stmt > I believe you can write nonterminal InsertStmt insert_stmt, upsert_stmt; in Done PS1, Line 662: RESULT = upsert; : :} > nit: I think indentation is off by 1 :) Done PS1, Line 679:RESULT = new InsertStmt(w, table, true, list, hints, query, col_perm, ignore, false); > nit: long lines Done http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java: Line 127: // True if this is an UPSERT operation. Only supported for Kudu tables. > You may want to mention that USERT is only supported for Kudu tables. Done -- To view, visit http://gerrit.cloudera.org:8080/4047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala
Thomas Tauber-Marshall has uploaded a new patch set (#2).
Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..
IMPALA-3725 Support Kudu UPSERT in Impala
This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT and uses all of the analysis,
planning, and execution machinery as INSERT, except that if
there's a primary key collision instead of returning an error an
update is performed instead.
New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
select_stmt
| VALUES (value [, value...]) [, (value [, (value...)]) ...]
}
where column list must contain all of the columns in table_name,
if specified, and table_name must be a Kudu table.
This patch still needs end-to-end testing.
Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
M fe/src/main/java/com/cloudera/impala/planner/TableSink.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
14 files changed, 361 insertions(+), 33 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/4047/2
--
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall
Gerrit-Reviewer: Dimitris Tsirogiannis
