Will you recommend using "integer" type for UHC (3+ millions) dimension and then have derived columns for relative dimensions (look-ups) where type is not "integer"? >> This depends on the cardinality of the two columns. For example, "user_id" and "email", they are close to 1:1, so this derivation is good. But "user_id" and "sex" is not good because "sex"'s cardinality is much smaller than "user_id", which means lots of post-aggregation will happen after the derivation. Usually, we suggest the relationship is less or around 10:1, but this is not fixed, you can select depends on the performance requirement.
Is derived column's aggregation happens at HBase Co-Processor side? Any JIRA/doc for my learnings? >> No, derivation calculation only happens in Kylin node, won't be pushed down. Because Lookup table's snapshot is only loaded in Kylin node. 2018-08-27 19:00 GMT+08:00 Shrikant Bang <[email protected]>: > Thanks, ShaoFeng for response! > > I have started using memory 2G (default cluster setting) and OOM got > solved when memory increased to 4G. > > Will you recommend using "integer" type for UHC (3+ millions) dimension > and then have derived columns for relative dimensions (look-ups) where type > is not "integer"? > > Is derived column's aggregation happens at HBase Co-Processor side? Any > JIRA/doc for my learnings? > > please suggest. > > Thank You, > Shrikant Bang > > On Tue, Aug 21, 2018 at 6:36 PM ShaoFeng Shi <[email protected]> > wrote: > >> Hi Shrikant, >> >> How much memory are you allocating to Reducer? Please consider to >> allocate more mem to reducer, as Kylin builds the dictionary in the >> reducers. >> >> You can also disable this, then Kylin will build dict in its own JVM. >> This may cause your Kylin process OOM if there is an ultra high cardinality >> (UHC) column. >> >> kylin.engine.mr.build-dict-in-reducer=false >> >> >> Do you know how high the cardinality of that dimension? For UHC which >> cardinality > 3 millions, we don't recommend to use dictionary as the >> encoding. You may need to use "fixed_length" or "integer"(if it is in type >> of integer). >> >> >> 2018-08-16 16:50 GMT+08:00 Ashish Singhi <[email protected]>: >> >>> Hi Shrikant, >>> >>> Refer http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/ >>> You might find it useful. >>> >>> Regards, >>> Ashish >>> >>> On Thu, Aug 16, 2018 at 10:33 AM, Shrikant Bang <[email protected]> >>> wrote: >>> >>>> Thank you, ShaoFeng & Billy for responses. >>>> >>>> I could able to set hierarchies in dimension. >>>> >>>> While building cube, step "fact distinct column" job is failing in a >>>> reducer with Out Of Memory exception. >>>> >>>> java.lang.OutOfMemoryError: Java heap space >>>> at java.util.IdentityHashMap.resize(IdentityHashMap.java:471) >>>> at java.util.IdentityHashMap.put(IdentityHashMap.java:440) >>>> at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes( >>>> TrieDictionaryBuilder.java:476) >>>> at org.apache.kylin.dict.TrieDictionaryBuilder.build( >>>> TrieDictionaryBuilder.java:418) >>>> at org.apache.kylin.dict.TrieDictionaryForestBuilder.build( >>>> TrieDictionaryForestBuilder.java:109) >>>> at org.apache.kylin.dict.DictionaryGenerator$ >>>> StringTrieDictForestBuilder.build(DictionaryGenerator.java:220) >>>> at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer. >>>> doCleanup(FactDistinctColumnsReducer.java:216) >>>> at org.apache.kylin.engine.mr.KylinReducer.cleanup( >>>> KylinReducer.java:103) >>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) >>>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer( >>>> ReduceTask.java:627) >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>> at org.apache.hadoop.security.UserGroupInformation.doAs( >>>> UserGroupInformation.java:1657) >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>> >>>> >>>> I tried debugging and understood that dictionary is getting built in >>>> reducer's clean up method. >>>> >>>> I am curious to learn internals. Can you please help me in below : >>>> >>>> 1. Any pointer/reference/JIRA for understanding how TRIE >>>> (dictionary) of dimension's value getting used in next steps? >>>> >>>> 2. Any best practice/references in tuning "fact distinct column" job >>>> for those reducer which have high cardinality. I am trying with increasing >>>> memory as of now as partitioning and number of reducers are depends on >>>> cuboids number. >>>> >>>> >>>> P.S. I am using v2.4 of Kylin with HBase 1.x >>>> >>>> Thank You, >>>> Shrikant Bang >>>> >>>> On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <[email protected]> >>>> wrote: >>>> >>>>> For question 1), in Cube's "advanced setting" step, you can specify >>>>> the cuboid whitelist to build. >>>>> >>>>> 2018-08-13 22:26 GMT+08:00 Billy Liu <[email protected]>: >>>>> >>>>>> Hello Shrikant, >>>>>> >>>>>> For 1, seems the 4 dimensions are hierarchy structure. You could >>>>>> define them as hierarchy dimensions in Cube, and leave A as mandatory >>>>>> dimension. >>>>>> >>>>>> For 2, select 'user_activity' as partition column in model design. >>>>>> There are a few built-in formats, most date types are supported. >>>>>> >>>>>> With Warm regards >>>>>> >>>>>> Billy Liu >>>>>> Shrikant Bang <[email protected]> 于2018年8月13日周一 下午5:39写道: >>>>>> > >>>>>> > Hi Team, >>>>>> > >>>>>> > We are doing a PoC on building OLAP cubes. Could you please >>>>>> help me to get answer of below queries? >>>>>> > >>>>>> > Selective Cuboids: >>>>>> > We need to have selective cuboids as part of OLAP cubes. >>>>>> > Let say if we have 4 dimensions : A, B, C, D then we need just >>>>>> (A,B,C,D) , (A,B,C), (A,B) and (A) >>>>>> > >>>>>> > Refresh Settings: >>>>>> > How to specify partition column and format while building cube for >>>>>> fact table. >>>>>> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube >>>>>> should be refreshed everyday with previous day's computation. >>>>>> > >>>>>> > >>>>>> > Thank You, >>>>>> > Shrikant Bang >>>>>> > >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> -- Best regards, Shaofeng Shi 史少锋
