[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..

IMPALA-3725 Support Kudu UPSERT in Impala

This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT and uses all of the analysis,
planning, and execution machinery as INSERT, except that if
there's a primary key collision instead of returning an error an
update is performed instead.

New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
  select_stmt
 | VALUES (value [, value...]) [, (value [, (value...)]) ...]
}

where column list must contain all of the key columns in table_name,
if specified, and table_name must be a Kudu table.

Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
M fe/src/main/java/com/cloudera/impala/planner/TableSink.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
15 files changed, 392 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/4047/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG
Commit Message:

PS1, Line 22: all of the columns in table_name
> The original point of this was to do the simplest thing until we had suppor
Done


http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

PS1, Line 447: All columns if this is an UPSERT.
> See my comment in the commit msg. I am not sure this restriction is correct
Done


PS1, Line 689: !isUpsert_
> flip the conditions to avoid using a negative, just easier to read
Done


http://gerrit.cloudera.org:8080/#/c/4047/2/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

PS2, Line 445: ars
> typo
Done


PS2, Line 447:* - All columns if this is an UPSERT.
> As discussed, if Kudu leaves non-referenced columns as-is, then let's remov
Done


PS2, Line 689: if (!isUpsert_) {
 :   strBuilder.append("INSERT ");
 : } else {
 :   strBuilder.append("UPSERT");
 : }
> Flip the logic; easier to read when, all else being equal, there are fewer 
Done


http://gerrit.cloudera.org:8080/#/c/4047/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

PS2, Line 524: 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
> not needed, remove
Done


PS2, Line 541: 
 : 
 : 
 : 
 : 
 : 
 : 
> same
Done


PS2, Line 560: 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG
Commit Message:

PS1, Line 22: all of the columns in table_name
> This was Matt's suggestion. We don't currently know what the primary key co
The original point of this was to do the simplest thing until we had support 
for nullable columns, at which point we expected the code to change -- i.e. to 
avoid wasted work. However, as we've discussed, we now want to see if Kudu will 
just leave non-referenced columns as-is, and error on violations itself (and we 
handle them gracefully).


-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-29 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

PS1, Line 689: !isUpsert_
flip the conditions to avoid using a negative, just easier to read


http://gerrit.cloudera.org:8080/#/c/4047/2/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

PS2, Line 445: ars
typo


PS2, Line 447:* - All columns if this is an UPSERT.
As discussed, if Kudu leaves non-referenced columns as-is, then let's remove 
this restriction. I think PK columns will still be required.


PS2, Line 689: if (!isUpsert_) {
 :   strBuilder.append("INSERT ");
 : } else {
 :   strBuilder.append("UPSERT ");
 : }
Flip the logic; easier to read when, all else being equal, there are fewer 
negatives.


http://gerrit.cloudera.org:8080/#/c/4047/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

PS2, Line 524:  SCANRANGELOCATIONS
 : NODE 0:
 :   HDFS SPLIT 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt 
0:20853
 :  DISTRIBUTEDPLAN
 : UPSERT INTO KUDU [functional_kudu.testtbl]
 : |
 : 00:SCAN HDFS [functional.alltypes]
 :partitions=1/24 files=1 size=20.36KB
not needed, remove


PS2, Line 541:  SCANRANGELOCATIONS
 : 
 :  DISTRIBUTEDPLAN
 : UPSERT INTO KUDU [functional_kudu.testtbl]
 : |
 : 00:UNION
 :constant-operands=2
same


PS2, Line 560:  SCANRANGELOCATIONS
 : NODE 0:
 :   HDFS SPLIT 
hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt 
0:20853
 :  DISTRIBUTEDPLAN
 : UPSERT INTO KUDU [functional_kudu.testtbl]
 : |
 : 01:EXCHANGE [UNPARTITIONED]
 : |  limit: 10
 : |
 : 00:SCAN HDFS [functional.alltypes]
 :partitions=1/24 files=1 size=20.36KB
 :limit: 10
same


-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4047/1//COMMIT_MSG
Commit Message:

PS1, Line 22: all of the columns in table_name
> Why is that? This is not a requirement for INSERT. Until we have nullable p
This was Matt's suggestion. We don't currently know what the primary key 
columns are during analysis, as far as I can tell, so all we can do is pass the 
values off to Kudu and see if it returns an error.

If the number of values to upsert is large, we might run into an error in the 
middle and end up in an inconsistent state, with some of the upsert values 
applied and some not.

Of course, we can relax this requirement if we start storing info about primary 
keys so that we could throw an error during analysis, but its better to err on 
the side of being more restrictive for now, since its easier to add 
functionality than to take it away.

There's also the issue of the semantics of leaving out columns, eg. do they get 
overwritten with null or are they left alone.

I don't personally feel strongly about it either way, though, especially since 
as you point out we don't require this for insert.


http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 363: DeleteStmt delete_stmt
> I believe you can write nonterminal InsertStmt insert_stmt, upsert_stmt; in
Done


PS1, Line 662:  RESULT = upsert;
 :   :}
> nit: I think indentation is off by 1 :)
Done


PS1, Line 679:RESULT = new InsertStmt(w, table, true, list, hints, query, 
col_perm, ignore, false);
> nit: long lines
Done


http://gerrit.cloudera.org:8080/#/c/4047/1/fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java:

Line 127:   // True if this is an UPSERT operation. Only supported for Kudu 
tables.
> You may want to mention that USERT is only supported for Kudu tables.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3725 Support Kudu UPSERT in Impala

2016-08-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-3725 Support Kudu UPSERT in Impala
..

IMPALA-3725 Support Kudu UPSERT in Impala

This patch introduces a new query statement, UPSERT, for Kudu
tables which operates like an INSERT and uses all of the analysis,
planning, and execution machinery as INSERT, except that if
there's a primary key collision instead of returning an error an
update is performed instead.

New syntax:
[with_clause] UPSERT INTO [TABLE] table_name [(column list)]
{
  select_stmt
 | VALUES (value [, value...]) [, (value [, (value...)]) ...]
}

where column list must contain all of the columns in table_name,
if specified, and table_name must be a Kudu table.

This patch still needs end-to-end testing.

Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java
M fe/src/main/java/com/cloudera/impala/planner/KuduTableSink.java
M fe/src/main/java/com/cloudera/impala/planner/TableSink.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
14 files changed, 361 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/4047/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8df5cea36b642e267f85ff6b163f3dd96b8386e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis