Hi ,all
The first step in cube merge, an error :
java.lang.RuntimeException: Too big dictionary, dictionary cannot be bigger
than 2GB
at
org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:421)
at
org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:408)
at
org.apache.kylin.dict.DictionaryGenerator$StringDictBuilder.build(DictionaryGenerator.java:165)
at
org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:81)
at
org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
at
org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:102)
at
org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:268)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:145)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:135)
at
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:67)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
“SALE_ORD_ID” Cardinality :157644463
SALE COUNT_DISTINCT Value:SALE_ORD_ID, Type:column bitmap
I'm wondering that the high base fields can't do count_distinct accurate
statistical metrics ??