Hi Shrikant, Refer http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/ You might find it useful.
Regards, Ashish On Thu, Aug 16, 2018 at 10:33 AM, Shrikant Bang <[email protected]> wrote: > Thank you, ShaoFeng & Billy for responses. > > I could able to set hierarchies in dimension. > > While building cube, step "fact distinct column" job is failing in a > reducer with Out Of Memory exception. > > java.lang.OutOfMemoryError: Java heap space > at java.util.IdentityHashMap.resize(IdentityHashMap.java:471) > at java.util.IdentityHashMap.put(IdentityHashMap.java:440) > at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes( > TrieDictionaryBuilder.java:476) > at org.apache.kylin.dict.TrieDictionaryBuilder.build( > TrieDictionaryBuilder.java:418) > at org.apache.kylin.dict.TrieDictionaryForestBuilder.build( > TrieDictionaryForestBuilder.java:109) > at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder. > build(DictionaryGenerator.java:220) > at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup( > FactDistinctColumnsReducer.java:216) > at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:103) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > > I tried debugging and understood that dictionary is getting built in > reducer's clean up method. > > I am curious to learn internals. Can you please help me in below : > > 1. Any pointer/reference/JIRA for understanding how TRIE (dictionary) > of dimension's value getting used in next steps? > > 2. Any best practice/references in tuning "fact distinct column" job > for those reducer which have high cardinality. I am trying with increasing > memory as of now as partitioning and number of reducers are depends on > cuboids number. > > > P.S. I am using v2.4 of Kylin with HBase 1.x > > Thank You, > Shrikant Bang > > On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <[email protected]> > wrote: > >> For question 1), in Cube's "advanced setting" step, you can specify the >> cuboid whitelist to build. >> >> 2018-08-13 22:26 GMT+08:00 Billy Liu <[email protected]>: >> >>> Hello Shrikant, >>> >>> For 1, seems the 4 dimensions are hierarchy structure. You could >>> define them as hierarchy dimensions in Cube, and leave A as mandatory >>> dimension. >>> >>> For 2, select 'user_activity' as partition column in model design. >>> There are a few built-in formats, most date types are supported. >>> >>> With Warm regards >>> >>> Billy Liu >>> Shrikant Bang <[email protected]> 于2018年8月13日周一 下午5:39写道: >>> > >>> > Hi Team, >>> > >>> > We are doing a PoC on building OLAP cubes. Could you please help >>> me to get answer of below queries? >>> > >>> > Selective Cuboids: >>> > We need to have selective cuboids as part of OLAP cubes. >>> > Let say if we have 4 dimensions : A, B, C, D then we need just >>> (A,B,C,D) , (A,B,C), (A,B) and (A) >>> > >>> > Refresh Settings: >>> > How to specify partition column and format while building cube for >>> fact table. >>> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube should >>> be refreshed everyday with previous day's computation. >>> > >>> > >>> > Thank You, >>> > Shrikant Bang >>> > >>> >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >>
