Re: 分发中间表数据倾斜

Chao Long Thu, 13 Jun 2019 19:53:08 -0700

Hi wang,
   "DISTRIBUTE BY RAND()" may cause data inconsistency, so we changed it to
distribute by the first few columns of  the rowkey, and the default is the
first 3 columns. You can see the following issues for more details. If you
don't have a data skew problem, you can just disable the "redistribute"
step by setting "kylin.source.hive.redistribute-flat-table" to false.
https://issues.apache.org/jira/browse/KYLIN-3388
https://issues.apache.org/jira/browse/KYLIN-3457


On Thu, Jun 13, 2019 at 5:03 PM [email protected] <[email protected]>
wrote:

> 按照文档说法，重新分发中间表的时候是随机方式DISTRIBUTE BY
> RAND()，我的cube里没有指定分片字段，但是不是按照随机方式分发的，而是取的维度字段里的前3个字段，由于cube里的维度没有高基维度导致数据倾斜，
> 怎么设置才能随机分发呢，或者有什么好的建议
>
> ------------------------------
> [email protected]
>

Re: 分发中间表数据倾斜

Reply via email to