Re: why distribute by partition column while creating flat hive table?

ShaoFeng Shi Tue, 23 Aug 2016 17:56:56 -0700

In 1.5.3 Kylin will redistribute the source record by the "shard by" column
(if user select such a column); the "shard by" is defined in the cube's
"Advanced setting" page. The "shard by" column should be a High Cardinality
column; In your case, I guess you set the partition column's "shard by" =
true by mistake; please set it to false, and then resubmit a build request;


2016-08-23 18:34 GMT+08:00 赵天烁 <[email protected]>:

> I have a table with huge data increasment every day,bilion level.when I
> build a cube relate to that table,it stuck in creating flat hive
> table....for ever.
> I check the mr process and found that the task sql in this step is ended
> with "DISTRIBUTE BY  ${partition date column}"
> I try to manually execute the same sql,but remove the " distribute by ",
> then everything goes fine with in 10 min.
> as far as I know this step of create a flat table is helpful when I have a
> star schema,but what I only have is that fact table. so why bother to
> create a table with the same structure even the data are the same?the only
> different is the table name....
> so I think is it possible to just create a view with intermediate table
> name that kylin need when I havn't define any lookup table?this way will
> eliminate that long term task which seems like achieved nothing.
>
> ------------------------------
>
> 赵天烁
>
> Kevin Zhao
>
> *[email protected] <[email protected]>*
>
>
>
> 珠海市魅族科技有限公司
>
> MEIZU Technology Co., Ltd.
>
> 广东省珠海市科技创新海岸魅族科技楼
>
> MEIZU Tech Bldg., Technology & Innovation Coast
>
> Zhuhai, 519085, Guangdong, China
>
> meizu.com
>



-- 
Best regards,

Shaofeng Shi

Re: why distribute by partition column while creating flat hive table?

Reply via email to