This article can help, to some extend: https://kylin.apache.org/docs/howto/howto_optimize_build.html
Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] ITzhangqiang <[email protected]> 于2019年9月2日周一 上午10:23写道: > Hi Yaqian: > > Thanks fro your reply! > > I know what you said,but I want to know more detail. > > > > 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > *发件人: *Yaqian Zhang <[email protected]> > *发送时间: *2019年9月1日 16:03 > *收件人: *[email protected] > *主题: *Re: Details about “Extract Fact Table Distinct Columns and Build > Dimension Dictionary” > > > > Hi Johnson: > > In this step, kylin calculates the cardinality of the dimension > column and builds a dictionary for the dimension column. > > In order to save space and improve efficiency, kylin encodes and > compresses dimensions, and adopts dictionary coding technology by default. > Dictionary encoding is to construct a mapping table from string to int for > all the values under the dimension, and then serialize the dictionary to > save, thus greatly reducing the size of the storage. The dictionary is in > order. If string A is bigger than string B, the value of encoding A will be > bigger than that of encoding B. This will enable the encoding value to be > used in Hbase queries without decoding. > > However, since using dictionary encoding requires maintaining a > mapping table, it is necessary to consider the dimension cardinality, which > refers to the number of all the different values in the dimension column. > If the cardinality of the dimension is very high, the dictionary will be > very large, so it is not suitable for loading into memory. In this case, > other encoding methods should be chosen. The maximum allowable limit for > kylin dictionary coding is 5 million by default, which is configured by > parameter kylin.dictionary.max.cardinality. > > > > On Aug 30, 2019, at 8:29 PM, Johnson <[email protected]> wrote: > > > > Hi,all: > > · I want to know the details of these two steps:Extract Fact > Table Distinct Columns and Build Dimension Dictionary。What do these steps > do and how to do? > > · looking forward to your reply > > > > ---------------------- > > Best wishes, > > Johnson > > > > > > > > > > > > > >
