When carefully designed dimensions and aggregation group, Kylin can work with ultra high-cardinality columns. That requires understanding of the analysis scenario first.
As to the specific lines, it's basically saying think if the UHC can be replaced with a low cardinality column. E.g. some data set has a URL column which is UHC, and in real analysis which really useful is only the domain name, so in such case, ETL can pre-process the URL into domain names, then cube doesn't have to deal with the UHC directly. On Fri, Jan 8, 2016 at 8:54 AM, zhong zhang <[email protected]> wrote: > Hi All, > > The cube we are trying to build include several ultra high-cardinality > columns. > They are over 50 million cardinality. > From this link > <http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin?next_slideshow=5>, > it says: > Avoid UHC as much as possible. > - if it's used as indicator, then put the indicator in cube. > - try categorize values or derive features from the UHC rather than > putting the original value in cube. > > I'm sorry that I'm a newbie to the Kylin and Cube things. Can anyone > give a little bit more detailed explanation for the above two suggestions? > > Best regards, > Zhong >
