Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18676 )
Change subject: WIP KUDU-2671 support adding a range with custom hash schema ...................................................................... Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/18676/5/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/18676/5/fe/src/main/cup/sql-parser.cup@1730 PS5, Line 1730: hash_partition_param_list:hashspec As Kurt mentioned offline, this can't parse multiple hash partition params since the COMMA in it is considered the separator in range_params_list. I.e Query: create TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2)) PARTITION BY HASH(id) PARTITIONS 3, RANGE (c2) ( PARTITION 10 <= VALUES < 20 HASH PARTITIONS 5, HASH PARTITIONS 3, PARTITION 20 <= VALUES < 30 HASH(id) PARTITIONS 5 ) STORED AS KUDU ERROR: ParseException: Syntax error in line 6: ...S < 20 HASH PARTITIONS 5, HASH PARTITIONS 3, ^ Encountered: HASH Expected: PARTITION I think the reason is CUP just look ahead one token since it's an LALR parser. Generally, the LALR parser refers to the LALR(1) parser and LALR(1) = LA(1)LR(0) (1 token of lookahead, LR(0)). To correctly parse the above query, CUP has to look ahead two tokens to see whether the token after COMMA is KW_HASH or KW_PARTITION. So this can't be parsed by CUP. I might be wrong if this is just a shift/reduce conflict. But the doc mentions that CUP resolves shift/reduce conflicts by shifting. The above failure shows CUP choose reducing instead of shiftting. > During parser construction the system may detect that an ambiguous situation > would occur at runtime. This is called a conflict. In general, the parser may > be unable to decide whether to shift (read another symbol) or reduce (replace > the recognized right hand side of a production with its left hand side). This > is called a shift/reduce conflict. Similarly, the parser may not be able to > decide between reduction with two different productions. This is called a > reduce/reduce conflict. Normally, if one or more of these conflicts occur, > parser generation is aborted. However, in certain carefully considered cases > it may be advantageous to arbitrarily break such a conflict. In this case CUP > uses YACC convention and resolves shift/reduce conflicts by shifting, and > reduce/reduce conflicts using the "highest priority" production (the one > declared first in the specification). In order to enable automatic breaking > of conflicts the -expect option must be given indicating exactly how many > conflicts are expected. Conflicts resolved by precedences and associativities > are not reported. If CUP can't parse such a syntax, I'd suggest adding KW_WITH and replace COMMA with KW_AND inside the range_param, to make it sufficient to just look ahead one token. E.g. create TABLE t1 (id int, c2 int, PRIMARY KEY(id, c2)) PARTITION BY HASH(id) PARTITIONS 3, RANGE (c2) ( PARTITION 10 <= VALUES < 20 WITH HASH PARTITIONS 5 AND HASH PARTITIONS 3, PARTITION 20 <= VALUES < 30 WITH HASH(id) PARTITIONS 5 ) STORED AS KUDU To support this, we just need a custom hash partition param list: custom_hash_partition_param_list ::= KW_WITH hash_partition_param:dc {: RESULT = Lists.newArrayList(dc); :} | custom_hash_partition_param_list:list KW_AND hash_partition_param:d {: list.add(d); RESULT = list; :} ; https://en.wikipedia.org/wiki/LALR_parser http://www2.cs.tum.edu/projects/cup/docs.php -- To view, visit http://gerrit.cloudera.org:8080/18676 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I981056e0827f4957580706d6e73742e4e6743c1c Gerrit-Change-Number: 18676 Gerrit-PatchSet: 5 Gerrit-Owner: Kurt Deschler <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 11 Jul 2022 08:22:40 +0000 Gerrit-HasComments: Yes
