Re: ultra high cardinality

Li Yang Tue, 12 Jan 2016 02:58:10 -0800

When carefully designed dimensions and aggregation group, Kylin can work
with ultra high-cardinality columns.  That requires understanding of the
analysis scenario first.

As to the specific lines, it's basically saying think if the UHC can be
replaced with a low cardinality column. E.g. some data set has a URL column
which is UHC, and in real analysis which really useful is only the domain
name, so in such case, ETL can pre-process the URL into domain names, then
cube doesn't have to deal with the UHC directly.

On Fri, Jan 8, 2016 at 8:54 AM, zhong zhang <[email protected]> wrote:

> Hi All,
>
> The cube we are trying to build include several ultra high-cardinality
> columns.
> They are over 50 million cardinality.
> From this link
> <http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin?next_slideshow=5>,
> it says:
> Avoid UHC as much as possible.
> - if it's used as indicator, then put the indicator in cube.
> - try categorize values or derive features from the UHC rather than
> putting the original value in cube.
>
> I'm sorry that I'm a newbie to the Kylin and Cube things. Can anyone
> give a little bit more detailed explanation for the above two suggestions?
>
> Best regards,
> Zhong
>

Re: ultra high cardinality

Reply via email to