if someone else will be looking for similar mixed range/hash partitions, I just learned from Kudu user group that the order of partitions does not matter for Kudu.
hash(id), range ... would be totally equivalent to range ..., hash(id) and tablets will be stored/sized exactly the same way. Boris On Wed, Oct 10, 2018 at 7:57 AM Boris Tyukin <[email protected]> wrote: > Great, thanks Gabor > > On Wed, Oct 10, 2018, 06:35 Gabor Kaszab <[email protected]> wrote: > >> I think we should restrict the order of range/hash hence I opened a Jira >> to cover this: >> https://issues.apache.org/jira/browse/IMPALA-7686 >> >> Having those 2 HASH clauses makes sense for me. >> >> Cheers, >> Gabor >> >> On Tue, Oct 9, 2018 at 6:25 PM Boris Tyukin <[email protected]> >> wrote: >> >>> thanks, guys. thanks for looking into it, Gabor. It is a nice piece of >>> code to bookmark :) >>> >>> So it is intentional but I still wonder why. I understand that RANGE >>> partitions can be added later, after a table has been created but it still >>> does not explain why range/hash combo is not valid while hash/range is >>> valid. I would think it would work another way around. Do you guys think it >>> makes sense for me to submit a JIRA? >>> >>> I think I am going to use two HASH partitions like below and it should >>> have a similar effect. What do you think? >>> >>> CREATE TABLE test2 >>> >>> ( >>> >>> source_system_id BIGINT, >>> >>> id1 BIGINT, >>> >>> id2 BIGINT, >>> >>> name STRING, >>> >>> PRIMARY KEY (source_system_id,id1,id2) >>> >>> ) >>> >>> PARTITION BY HASH(source_system_id) PARTITIONS 3, >>> >>> HASH(id1,id2) PARTITIONS 3 >>> >>> STORED AS KUDU; >>> >>> On Tue, Oct 9, 2018 at 10:32 AM Zoltan Borok-Nagy < >>> [email protected]> wrote: >>> >>>> Hi Boris, >>>> >>>> I found the following sentence in the docs: "Zero or more hash >>>> partition levels can be combined with an optional range partition level." >>>> https://kudu.apache.org/docs/schema_design.html#multilevel-partitioning >>>> So it seems like it is an intentional limitation. >>>> >>>> BR, >>>> Zoltan >>>> >>>> >>>> On Tue, Oct 9, 2018 at 4:21 PM Gabor Kaszab <[email protected]> >>>> wrote: >>>> >>>>> Hi Boris, >>>>> >>>>> I think you accidentally copy-pasted the same example twice but I get >>>>> the point and managed to reproduce what you observed. I made a research in >>>>> the code about what syntax Impala accepts here and this is the relevant >>>>> code I found: >>>>> >>>>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1388 >>>>> >>>>> As I read, in the 'PARTITION BY' you can either have a: >>>>> - list of HASH PARTITIONS or >>>>> - RANGE PARTITIONS or >>>>> - a list of HASH PARTITIONS followed by a COMMA and then by RANGE >>>>> PARTITIONS. >>>>> >>>>> Please note the comment by this section: >>>>> >>>>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1381 >>>>> // The PARTITION BY clause contains any number of HASH() clauses >>>>> followed by exactly zero >>>>> // or one RANGE clauses >>>>> >>>>> Is it mentioned somewhere in the docs that this can be done the other >>>>> way around? >>>>> Note, I'm not sure which Impala version you use, I took cdh5-trunk to >>>>> do this research. >>>>> >>>>> I hope this helps! >>>>> Gabor >>>>> >>>>> >>>>> On Tue, Oct 9, 2018 at 3:39 PM Boris Tyukin <[email protected]> >>>>> wrote: >>>>> >>>>>> Hey guys, >>>>>> >>>>>> we've reviewed the docs but still not sure why query 1 works fine and >>>>>> query 2 errors out with a syntax error. The only difference is that I >>>>>> have >>>>>> Hash partition first and then a Range. >>>>>> >>>>>> ERROR: AnalysisException: Syntax error in line 13: >>>>>> ), >>>>>> ^ >>>>>> Encountered: COMMA >>>>>> Expected: AS, CACHED, COMMENT, LOCATION, ROW, SORT, STORED, >>>>>> TBLPROPERTIES, UNCACHED, WITH >>>>>> CAUSED BY: Exception: Syntax error >>>>>> >>>>>> >>>>>> >>>>>> *Query 1:* >>>>>> >>>>>> CREATE TABLE test2 >>>>>> ( >>>>>> source_system_id BIGINT, >>>>>> id1 BIGINT, >>>>>> id2 BIGINT, >>>>>> name STRING, >>>>>> PRIMARY KEY (source_system_id,id1,id2) >>>>>> ) >>>>>> PARTITION BY >>>>>> RANGE(source_system_id) ( >>>>>> PARTITION VALUE = 10000000, >>>>>> PARTITION VALUE = 20000000, >>>>>> PARTITION VALUE = 30000000 >>>>>> ), >>>>>> HASH(id1,id2) PARTITIONS 10 >>>>>> STORED AS KUDU; >>>>>> >>>>>> *Query 2:* >>>>>> >>>>>> CREATE TABLE test2 >>>>>> ( >>>>>> source_system_id BIGINT, >>>>>> id1 BIGINT, >>>>>> id2 BIGINT, >>>>>> name STRING, >>>>>> PRIMARY KEY (source_system_id,id1,id2) >>>>>> ) >>>>>> PARTITION BY >>>>>> RANGE(source_system_id) ( >>>>>> PARTITION VALUE = 10000000, >>>>>> PARTITION VALUE = 20000000, >>>>>> PARTITION VALUE = 30000000 >>>>>> ), >>>>>> HASH(id1,id2) PARTITIONS 10 >>>>>> STORED AS KUDU; >>>>>> >>>>>
