Hi : It is a good question, Kylin use Global Dictionary to encode a string into integer, not a specific hashing function, please refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/ for it design. For newer version, we provided build global dictionary in distributed way, such as http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html. On the another hand, cardinal number is calculated at Extract Fact Table Distinct Columns Step. If you interested in the detail and you can read Chinese article, please read https://blog.bcmeng.com/post/kylin-distinct-count-global-dict.html and https://blog.bcmeng.com/post/kylin-distinct-count.html and https://hexiaoqiao.github.io/blog/2016/11/27/exact-count-and-global-dictionary-of-apache-kylin/ for further information.
---------------- Best wishes, Xiaoxiang Yu 发件人: "[email protected]" <[email protected]> 答复: "[email protected]" <[email protected]> 日期: 2019年9月27日 星期五 10:23 收件人: kylin-user <[email protected]> 主题: global dictionary scople Hi: As the official document describes, Kylin uses bitmap to achieve precise count ( distinct) . The premise is to map the content to a number using a specific hash function. But, if we don't know how much cardinal number is ,how to confirm the range of mapped numbers? If the range is set too large, memory is wasted. And if it is set too small, it is not enough. ________________________________ [email protected]
