Re: Cannot create Impala Kudu table with mixed partitions

Gabor Kaszab Wed, 10 Oct 2018 03:36:01 -0700

I think we should restrict the order of range/hash hence I opened a Jira to
cover this:
https://issues.apache.org/jira/browse/IMPALA-7686


Having those 2 HASH clauses makes sense for me.

Cheers,
Gabor

On Tue, Oct 9, 2018 at 6:25 PM Boris Tyukin <bo...@boristyukin.com> wrote:

> thanks, guys. thanks for looking into it, Gabor. It is a nice piece of
> code to bookmark :)
>
> So it is intentional but I still wonder why. I understand that RANGE
> partitions can be added later, after a table has been created but it still
> does not explain why range/hash combo is not valid while hash/range is
> valid. I would think it would work another way around. Do you guys think it
> makes sense for me to submit a JIRA?
>
> I think I am going to use two HASH partitions like below and it should
> have a similar effect. What do you think?
>
> CREATE TABLE test2
>
> (
>
> source_system_id BIGINT,
>
> id1 BIGINT,
>
> id2 BIGINT,
>
> name STRING,
>
> PRIMARY KEY (source_system_id,id1,id2)
>
> )
>
> PARTITION BY HASH(source_system_id) PARTITIONS 3,
>
> HASH(id1,id2) PARTITIONS 3
>
> STORED AS KUDU;
>
> On Tue, Oct 9, 2018 at 10:32 AM Zoltan Borok-Nagy <borokna...@cloudera.com>
> wrote:
>
>> Hi Boris,
>>
>> I found the following sentence in the docs: "Zero or more hash partition
>> levels can be combined with an optional range partition level."
>> https://kudu.apache.org/docs/schema_design.html#multilevel-partitioning
>> So it seems like it is an intentional limitation.
>>
>> BR,
>>     Zoltan
>>
>>
>> On Tue, Oct 9, 2018 at 4:21 PM Gabor Kaszab <gaborkas...@cloudera.com>
>> wrote:
>>
>>> Hi Boris,
>>>
>>> I think you accidentally copy-pasted the same example twice but I get
>>> the point and managed to reproduce what you observed. I made a research in
>>> the code about what syntax Impala accepts here and this is the relevant
>>> code I found:
>>>
>>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1388
>>>
>>> As I read, in the 'PARTITION BY' you can either have a:
>>> - list of HASH PARTITIONS or
>>> - RANGE PARTITIONS or
>>> - a list of HASH PARTITIONS followed by a COMMA and then by RANGE
>>> PARTITIONS.
>>>
>>> Please note the comment by this section:
>>>
>>> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/cup/sql-parser.cup#L1381
>>> // The PARTITION BY clause contains any number of HASH() clauses
>>> followed by exactly zero
>>> // or one RANGE clauses
>>>
>>> Is it mentioned somewhere in the docs that this can be done the other
>>> way around?
>>> Note, I'm not sure which Impala version you use, I took cdh5-trunk to do
>>> this research.
>>>
>>> I hope this helps!
>>> Gabor
>>>
>>>
>>> On Tue, Oct 9, 2018 at 3:39 PM Boris Tyukin <bo...@boristyukin.com>
>>> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> we've reviewed the docs but still not sure why query 1 works fine and
>>>> query 2 errors out with a syntax error. The only difference is that I have
>>>> Hash partition first and then a Range.
>>>>
>>>> ERROR: AnalysisException: Syntax error in line 13:
>>>>     ),
>>>>      ^
>>>> Encountered: COMMA
>>>> Expected: AS, CACHED, COMMENT, LOCATION, ROW, SORT, STORED,
>>>> TBLPROPERTIES, UNCACHED, WITH
>>>> CAUSED BY: Exception: Syntax error
>>>>
>>>>
>>>>
>>>> *Query 1:*
>>>>
>>>> CREATE TABLE test2
>>>> (
>>>>   source_system_id BIGINT,
>>>>   id1 BIGINT,
>>>>   id2 BIGINT,
>>>>   name STRING,
>>>>   PRIMARY KEY (source_system_id,id1,id2)
>>>> )
>>>> PARTITION BY
>>>>     RANGE(source_system_id) (
>>>>         PARTITION VALUE = 10000000,
>>>>         PARTITION VALUE = 20000000,
>>>>         PARTITION VALUE = 30000000
>>>>     ),
>>>>     HASH(id1,id2) PARTITIONS 10
>>>> STORED AS KUDU;
>>>>
>>>> *Query 2:*
>>>>
>>>> CREATE TABLE test2
>>>> (
>>>>   source_system_id BIGINT,
>>>>   id1 BIGINT,
>>>>   id2 BIGINT,
>>>>   name STRING,
>>>>   PRIMARY KEY (source_system_id,id1,id2)
>>>> )
>>>> PARTITION BY
>>>>     RANGE(source_system_id) (
>>>>         PARTITION VALUE = 10000000,
>>>>         PARTITION VALUE = 20000000,
>>>>         PARTITION VALUE = 30000000
>>>>     ),
>>>>     HASH(id1,id2) PARTITIONS 10
>>>> STORED AS KUDU;
>>>>
>>>

Re: Cannot create Impala Kudu table with mixed partitions

Reply via email to