[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Baike Xia (Code Review) Mon, 24 Oct 2022 05:17:04 -0700

Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )


Change subject: IMPALA-3119: DDL support for bucketed tables
......................................................................


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@18
PS10, Line 18: CREATE TABLE tbl (i int COMMENT 'hello', s string)
> It'd be better if we can highlight this line since it's the only new part.
Got.


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29:   the hash partition is equivalent to a bucket,
> Do you mean "CLUSTERED" has been used as a hint so we can't use it as a key
If add supported "CLUSTERED" in cup file, execute SQL with CLUTERED in hint and 
an error will be reported.I.E. execute sql - "create  /* +CLUSTERED */ test  as 
select * from tpcds.item;", error messge:
`
Query: create /* +CLUSTERED */ test
as select * from tpcds.item
Query submitted at: 2022-10-24 09:09:52 (Coordinator: http://d403ca04eda0:25000)
ERROR: ParseException: Syntax error in line 1:
create /* +CLUSTERED */ test
           ^
Encountered: CLUSTERED
Expected: STRAIGHT_JOIN, COMMA, IDENTIFIER

CAUSED BY: Exception: Syntax error
`


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@30
PS10, Line 30:   and the optimization rule applies to join query;
> Are these recognized by Hive? i.e. if Hive inserts data into the table, is
If HASH is used, the behavior is the same as hive. If not, the hive behavior is 
incompatible with the Hive behavior.
If Hive inserts data into the table, it's considered a HASH, which is what we 
expect.

Multiple bucket hash functions are used because hive's bucket hash algorithm is 
different from kudu's bucket hash algorithm. To be compatible with bucket join 
optimization in kudu table, multiple bucket hash functions are used. In other 
words, the kudu table is not supported in HASH mode. Using KUDU_HASH, however, 
results in tabular forms not being recognized by computing engines other than 
impala.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   | opt_bucket_desc:bucket
> I think we don't need two switches here. Like other optional fields, we can
Yes, but adding empty to opt_bucket_desc causes a compilation error. So I took 
this approach. Or, can you give me some advice?
`
Warning : *** Reduce/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and     opt_sort_cols ::= (*)
  under symbols: {}
  Resolved in favor of the first production.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list 
RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list 
RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_sort_cols ::= (*)
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list 
RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and     opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list 
RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.
`


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = new Pair<List<String>, TSortingOrder>(null, 
TSortingOrder.LEXICAL); :}
> nit: Let's skip reformatting unrelated codes.
I Got.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex@178
PS10, Line 178: kudu_has
> The commit message mentions "kudu_hash".
Done


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2844
PS10, Line 2844: ber mu
> Can we add tests for "HASH" and "KUDUHASH" without the parentheses?
Yes, I can.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@a4347
PS10, Line 4347:
> Can we keep this since this still doesn't work?
Yes, we can keep this since. This was taken off when I tried clustered.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3074
PS10, Line 3074:     ParsesOk("CREATE TABLE bucketed_test (i int COMMENT 
'hello', s string) " +
> Could you add some tests for KUDU_HASH?
Done



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 24 Oct 2022 12:16:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

Reply via email to