+1
------------------
Best Regards,
Chao Long

------------------ 原始邮件 ------------------
发件人: "Zhong, Yanghong"<yangzh...@ebay.com.INVALID>;
发送时间: 2019年3月18日(星期一) 上午10:30
收件人: "dev@kylin.apache.org"<dev@kylin.apache.org>;
抄送: "Xiaoxiang Yu"<xiaoxiang...@kyligence.io>; 
主题: Re: [Discussion] Enable shrunken dictionary by default



+1.

Best regards,
Yanghong Zhong

On 2019/3/18, 10:27 AM, "Xiaoxiang Yu" <xiaoxiang...@kyligence.io> wrote:

    Dear all,
    I suggest enable "kylin.dictionary.shrunken-from-global-enabled" by 
default(it is disabled by default), because I found enable it will speed up 
cube build process when cube have count distinct(bitmap) on a large cardinality 
column. This feature is contributed in KYLIN-3491.
    
    When using count distinct(bitmap) measure on a large cardinality 
column(this require global dictionary), build base cuboid step need frequent 
cache swap so it cannot finished within a reasonable period. KYLIN-3491 add a 
new step to build separated dictionary for each InputSplit before 
BuildBaseCuboid step. So mapper of BuildBaseCuboid step only has to fetch a 
smaller dictionary for itself(without unused value), instead of a larger global 
dictionary. It will reduce cache swap and make BuildBaseCuboid step run as 
quick as possible.
    
    In my test env, my hadoop cluster is a CDH cluster with 56 vcore and 110GB 
Memory. I create a model with a fact table (153326740 rows) and three dimension 
tables, there are three count distinct(bitmap) measure which the largest 
cardinality of single column is 55200325. With ShrunkenDict disabled, the 
BuildBaseCuboid cannot completed in 22 hours. Comparatively, with ShrunkenDict 
enabled, build process completed in a reasonable duration(Extra Dictionary cost 
5 minutes, Build Base Cuboid costs 5 minutes).
    
    
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F14030549%2F54363305-ad25e200-46a5-11e9-8bc7-fe2c385c0278.png&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=KuUcbcerY42oG4J11G1jlEcIs4v%2BPPVt40B9G9fqa80%3D&amp;reserved=0
    
    If you want know more, please check 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKYLIN-3491&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=T1P1rCA1munwUedC0PC4qttqbFqiDkda%2FZ%2BgqgkQn%2BE%3D&amp;reserved=0.
 If you have any suggestion, please let me know.
    
    ----------------
    Best wishes,
    Xiaoxiang Yu

Reply via email to