I think we should restrict the order of range/hash hence I opened a Jira to cover this: https://issues.apache.org/jira/browse/IMPALA-7686
Having those 2 HASH clauses makes sense for me. Cheers, Gabor On Tue, Oct 9, 2018 at 6:25 PM Boris Tyukin <bo...@boristyukin.com> wrote: > thanks, guys. thanks for looking into it, Gabor. It is a nice piece of > code to bookmark :) > > So it is intentional but I still wonder why. I understand that RANGE > partitions can be added later, after a table has been created but it still > does not explain why range/hash combo is not valid while hash/range is > valid. I would think it would work another way around. Do you guys think it > makes sense for me to submit a JIRA? > > I think I am going to use two HASH partitions like below and it should > have a similar effect. What do you think? > > CREATE TABLE test2 > > ( > > source_system_id BIGINT, > > id1 BIGINT, > > id2 BIGINT, > > name STRING, > > PRIMARY KEY (source_system_id,id1,id2) > > ) > > PARTITION BY HASH(source_system_id) PARTITIONS 3, > > HASH(id1,id2) PARTITIONS 3 > > STORED AS KUDU; > > On Tue, Oct 9, 2018 at 10:32 AM Zoltan Borok-Nagy <borokna...@cloudera.com> > wrote: > >> Hi Boris, >> >> I found the following sentence in the docs: "Zero or more hash partition >> levels can be combined with an optional range partition level." >> https://kudu.apache.org/docs/schema_design.html#multilevel-partitioning >> So it seems like it is an intentional limitation. >> >> BR, >> Zoltan >> >> >> On Tue, Oct 9, 2018 at 4:21 PM Gabor Kaszab <gaborkas...@cloudera.com> >> wrote: >> >>> Hi Boris, >>> >>> I think you accidentally copy-pasted the same example twice but I get >>> the point and managed to reproduce what you observed. I made a research in >>> the code about what syntax Impala accepts here and this is the relevant >>> code I found: >>> >>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1388 >>> >>> As I read, in the 'PARTITION BY' you can either have a: >>> - list of HASH PARTITIONS or >>> - RANGE PARTITIONS or >>> - a list of HASH PARTITIONS followed by a COMMA and then by RANGE >>> PARTITIONS. >>> >>> Please note the comment by this section: >>> >>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1381 >>> // The PARTITION BY clause contains any number of HASH() clauses >>> followed by exactly zero >>> // or one RANGE clauses >>> >>> Is it mentioned somewhere in the docs that this can be done the other >>> way around? >>> Note, I'm not sure which Impala version you use, I took cdh5-trunk to do >>> this research. >>> >>> I hope this helps! >>> Gabor >>> >>> >>> On Tue, Oct 9, 2018 at 3:39 PM Boris Tyukin <bo...@boristyukin.com> >>> wrote: >>> >>>> Hey guys, >>>> >>>> we've reviewed the docs but still not sure why query 1 works fine and >>>> query 2 errors out with a syntax error. The only difference is that I have >>>> Hash partition first and then a Range. >>>> >>>> ERROR: AnalysisException: Syntax error in line 13: >>>> ), >>>> ^ >>>> Encountered: COMMA >>>> Expected: AS, CACHED, COMMENT, LOCATION, ROW, SORT, STORED, >>>> TBLPROPERTIES, UNCACHED, WITH >>>> CAUSED BY: Exception: Syntax error >>>> >>>> >>>> >>>> *Query 1:* >>>> >>>> CREATE TABLE test2 >>>> ( >>>> source_system_id BIGINT, >>>> id1 BIGINT, >>>> id2 BIGINT, >>>> name STRING, >>>> PRIMARY KEY (source_system_id,id1,id2) >>>> ) >>>> PARTITION BY >>>> RANGE(source_system_id) ( >>>> PARTITION VALUE = 10000000, >>>> PARTITION VALUE = 20000000, >>>> PARTITION VALUE = 30000000 >>>> ), >>>> HASH(id1,id2) PARTITIONS 10 >>>> STORED AS KUDU; >>>> >>>> *Query 2:* >>>> >>>> CREATE TABLE test2 >>>> ( >>>> source_system_id BIGINT, >>>> id1 BIGINT, >>>> id2 BIGINT, >>>> name STRING, >>>> PRIMARY KEY (source_system_id,id1,id2) >>>> ) >>>> PARTITION BY >>>> RANGE(source_system_id) ( >>>> PARTITION VALUE = 10000000, >>>> PARTITION VALUE = 20000000, >>>> PARTITION VALUE = 30000000 >>>> ), >>>> HASH(id1,id2) PARTITIONS 10 >>>> STORED AS KUDU; >>>> >>>