Hi wang, "DISTRIBUTE BY RAND()" may cause data inconsistency, so we changed it to distribute by the first few columns of the rowkey, and the default is the first 3 columns. You can see the following issues for more details. If you don't have a data skew problem, you can just disable the "redistribute" step by setting "kylin.source.hive.redistribute-flat-table" to false. https://issues.apache.org/jira/browse/KYLIN-3388 https://issues.apache.org/jira/browse/KYLIN-3457
On Thu, Jun 13, 2019 at 5:03 PM [email protected] <[email protected]> wrote: > 按照文档说法,重新分发中间表的时候是随机方式DISTRIBUTE BY > RAND(),我的cube里没有指定分片字段,但是不是按照随机方式分发的,而是取的维度字段里的前3个字段,由于cube里的维度没有高基维度导致数据倾斜, > 怎么设置才能随机分发呢,或者有什么好的建议 > > ------------------------------ > [email protected] >
