Hi Johnson:
In this step, kylin calculates the cardinality of the dimension column
and builds a dictionary for the dimension column.
In order to save space and improve efficiency, kylin encodes and
compresses dimensions, and adopts dictionary coding technology by default.
Dictionary encoding is to construct a mapping table from string to int for all
the values under the dimension, and then serialize the dictionary to save, thus
greatly reducing the size of the storage. The dictionary is in order. If string
A is bigger than string B, the value of encoding A will be bigger than that of
encoding B. This will enable the encoding value to be used in Hbase queries
without decoding.
However, since using dictionary encoding requires maintaining a mapping
table, it is necessary to consider the dimension cardinality, which refers to
the number of all the different values in the dimension column. If the
cardinality of the dimension is very high, the dictionary will be very large,
so it is not suitable for loading into memory. In this case, other encoding
methods should be chosen. The maximum allowable limit for kylin dictionary
coding is 5 million by default, which is configured by parameter
kylin.dictionary.max.cardinality.
> On Aug 30, 2019, at 8:29 PM, Johnson <[email protected]> wrote:
>
> Hi,all:
> I want to know the details of these two steps:Extract Fact Table Distinct
> Columns and Build Dimension Dictionary。What do these steps do and how to do?
> looking forward to your reply
>
> ----------------------
> Best wishes,
> Johnson
>
>
>
>
>