Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18676 )

Change subject: WIP KUDU-2671 support adding a range with custom hash schema
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18676/5/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/18676/5/fe/src/main/cup/sql-parser.cup@1730
PS5, Line 1730: hash_partition_param_list:hashspec
As Kurt mentioned offline, this can't parse multiple hash partition params 
since the COMMA in it is considered the separator in range_params_list. I.e

 Query: create TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2))
 PARTITION BY HASH(id) PARTITIONS 3,
 RANGE (c2)
 (
   PARTITION 10 <= VALUES < 20 HASH PARTITIONS 5, HASH PARTITIONS 3,
   PARTITION 20 <= VALUES < 30 HASH(id) PARTITIONS 5
 )
 STORED AS KUDU
 ERROR: ParseException: Syntax error in line 6:
 ...S < 20 HASH PARTITIONS 5, HASH PARTITIONS 3,
                              ^
 Encountered: HASH
 Expected: PARTITION

I think the reason is CUP just look ahead one token since it's an LALR parser. 
Generally, the LALR parser refers to the LALR(1) parser and LALR(1) = 
LA(1)LR(0) (1 token of lookahead, LR(0)). To correctly parse the above query, 
CUP has to look ahead two tokens to see whether the token after COMMA is 
KW_HASH or KW_PARTITION. So this can't be parsed by CUP.

I might be wrong if this is just a shift/reduce conflict. But the doc mentions 
that CUP resolves shift/reduce conflicts by shifting. The above failure shows 
CUP choose reducing instead of shiftting.

> During parser construction the system may detect that an ambiguous situation 
> would occur at runtime. This is called a conflict. In general, the parser may 
> be unable to decide whether to shift (read another symbol) or reduce (replace 
> the recognized right hand side of a production with its left hand side). This 
> is called a shift/reduce conflict. Similarly, the parser may not be able to 
> decide between reduction with two different productions. This is called a 
> reduce/reduce conflict. Normally, if one or more of these conflicts occur, 
> parser generation is aborted. However, in certain carefully considered cases 
> it may be advantageous to arbitrarily break such a conflict. In this case CUP 
> uses YACC convention and resolves shift/reduce conflicts by shifting, and 
> reduce/reduce conflicts using the "highest priority" production (the one 
> declared first in the specification). In order to enable automatic breaking 
> of conflicts the -expect option must be given indicating exactly how many 
> conflicts are expected. Conflicts resolved by precedences and associativities 
> are not reported.

If CUP can't parse such a syntax, I'd suggest adding KW_WITH and replace COMMA 
with KW_AND inside the range_param, to make it sufficient to just look ahead 
one token. E.g.

 create TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2))
 PARTITION BY HASH(id) PARTITIONS 3,
 RANGE (c2)
 (
   PARTITION 10 <= VALUES < 20 WITH HASH PARTITIONS 5 AND HASH PARTITIONS 3,
   PARTITION 20 <= VALUES < 30 WITH HASH(id) PARTITIONS 5
 )
 STORED AS KUDU

To support this, we just need a custom hash partition param list:

 custom_hash_partition_param_list ::=
   KW_WITH hash_partition_param:dc
   {: RESULT = Lists.newArrayList(dc); :}
   | custom_hash_partition_param_list:list KW_AND hash_partition_param:d
   {:
     list.add(d);
     RESULT = list;
   :}
   ;

https://en.wikipedia.org/wiki/LALR_parser
http://www2.cs.tum.edu/projects/cup/docs.php



--
To view, visit http://gerrit.cloudera.org:8080/18676
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I981056e0827f4957580706d6e73742e4e6743c1c
Gerrit-Change-Number: 18676
Gerrit-PatchSet: 5
Gerrit-Owner: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 11 Jul 2022 08:22:40 +0000
Gerrit-HasComments: Yes

Reply via email to