Thank you, ShaoFeng & Billy for responses. I could able to set hierarchies in dimension.
While building cube, step "fact distinct column" job is failing in a reducer with Out Of Memory exception. java.lang.OutOfMemoryError: Java heap space at java.util.IdentityHashMap.resize(IdentityHashMap.java:471) at java.util.IdentityHashMap.put(IdentityHashMap.java:440) at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:476) at org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418) at org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:109) at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.build(DictionaryGenerator.java:220) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:216) at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:103) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) I tried debugging and understood that dictionary is getting built in reducer's clean up method. I am curious to learn internals. Can you please help me in below : 1. Any pointer/reference/JIRA for understanding how TRIE (dictionary) of dimension's value getting used in next steps? 2. Any best practice/references in tuning "fact distinct column" job for those reducer which have high cardinality. I am trying with increasing memory as of now as partitioning and number of reducers are depends on cuboids number. P.S. I am using v2.4 of Kylin with HBase 1.x Thank You, Shrikant Bang On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <[email protected]> wrote: > For question 1), in Cube's "advanced setting" step, you can specify the > cuboid whitelist to build. > > 2018-08-13 22:26 GMT+08:00 Billy Liu <[email protected]>: > >> Hello Shrikant, >> >> For 1, seems the 4 dimensions are hierarchy structure. You could >> define them as hierarchy dimensions in Cube, and leave A as mandatory >> dimension. >> >> For 2, select 'user_activity' as partition column in model design. >> There are a few built-in formats, most date types are supported. >> >> With Warm regards >> >> Billy Liu >> Shrikant Bang <[email protected]> 于2018年8月13日周一 下午5:39写道: >> > >> > Hi Team, >> > >> > We are doing a PoC on building OLAP cubes. Could you please help >> me to get answer of below queries? >> > >> > Selective Cuboids: >> > We need to have selective cuboids as part of OLAP cubes. >> > Let say if we have 4 dimensions : A, B, C, D then we need just >> (A,B,C,D) , (A,B,C), (A,B) and (A) >> > >> > Refresh Settings: >> > How to specify partition column and format while building cube for fact >> table. >> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube should >> be refreshed everyday with previous day's computation. >> > >> > >> > Thank You, >> > Shrikant Bang >> > >> > > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
