Hi Johnson:
        In this step, kylin calculates the cardinality of the dimension column 
and builds a dictionary for the dimension column.
        In order to save space and improve efficiency, kylin encodes and 
compresses dimensions, and adopts dictionary coding technology by default. 
Dictionary encoding is to construct a mapping table from string to int for all 
the values under the dimension, and then serialize the dictionary to save, thus 
greatly reducing the size of the storage. The dictionary is in order. If string 
A is bigger than string B, the value of encoding A will be bigger than that of 
encoding B. This will enable the encoding value to be used in Hbase queries 
without decoding.
        However, since using dictionary encoding requires maintaining a mapping 
table, it is necessary to consider the dimension cardinality, which refers to 
the number of all the different values in the dimension column. If the 
cardinality of the dimension is very high, the dictionary will be very large, 
so it is not suitable for loading into memory. In this case, other encoding 
methods should be chosen. The maximum allowable limit for kylin dictionary 
coding is 5 million by default, which is configured by parameter 
kylin.dictionary.max.cardinality.

> On Aug 30, 2019, at 8:29 PM, Johnson <[email protected]> wrote:
> 
> Hi,all:
> I want to know the details of these two steps:Extract Fact Table Distinct 
> Columns and Build Dimension Dictionary。What do these steps do and how to do?
> looking forward to your reply
> 
> ----------------------
> Best wishes,
> Johnson
> 
> 
> 
> 
>  

Reply via email to