Thank you, ShaoFeng for response ! I have learned many internals of Kylin with this mail thread. I appreciate your help.
Regards, Shrikant Bang. On Tue, Aug 28, 2018 at 7:14 PM ShaoFeng Shi <[email protected]> wrote: > Hi Shrikant, > > Do you mean reusing the lookup table snapshot across cubes? As you know, > Kylin took snapshots for lookup table and load them to memory during at > query time. > > When Kylin took a snapshot, it will check whether the lookup table was > changed since last time. If no change, it will reuse the snapshot. The > detail logic is in SnapshotManager.buildSnapshot(); > > So, from this point of view, if your calendar lookup table is stable, it > will be reused by multiple cubes. > > Hope this helps; > > 2018-08-28 17:04 GMT+08:00 Shrikant Bang <[email protected]>: > >> Thank you, ShaoFeng for response! >> >> Apart from UHC, we have other dimension which will be used by multiple >> cubes. >> >> e.g. calendar_dimension ( date, day, week, week, month, quarter .... etc >> etc ) which immutable. >> >> Few of calendar's dimension become part of cube and few become derived >> columns. >> >> Is there any way I can cache in Kylin's node and keep using it every >> other cube? It will be kind of global cache for all cubes under a project. >> >> Thank You, >> Shrikant Bang. >> >> >> >> On Tue, Aug 28, 2018 at 9:05 AM ShaoFeng Shi <[email protected]> >> wrote: >> >>> >>> Will you recommend using "integer" type for UHC (3+ millions) dimension >>> and then have derived columns for relative dimensions (look-ups) where type >>> is not "integer"? >>> >> This depends on the cardinality of the two columns. For example, >>> "user_id" and "email", they are close to 1:1, so this derivation is good. >>> But "user_id" and "sex" is not good because "sex"'s cardinality is much >>> smaller than "user_id", which means lots of post-aggregation will happen >>> after the derivation. Usually, we suggest the relationship is less or >>> around 10:1, but this is not fixed, you can select depends on the >>> performance requirement. >>> >>> Is derived column's aggregation happens at HBase Co-Processor side? Any >>> JIRA/doc for my learnings? >>> >> No, derivation calculation only happens in Kylin node, won't be >>> pushed down. Because Lookup table's snapshot is only loaded in Kylin node. >>> >>> 2018-08-27 19:00 GMT+08:00 Shrikant Bang <[email protected]>: >>> >>>> Thanks, ShaoFeng for response! >>>> >>>> I have started using memory 2G (default cluster setting) and OOM got >>>> solved when memory increased to 4G. >>>> >>>> Will you recommend using "integer" type for UHC (3+ millions) dimension >>>> and then have derived columns for relative dimensions (look-ups) where type >>>> is not "integer"? >>>> >>>> Is derived column's aggregation happens at HBase Co-Processor side? Any >>>> JIRA/doc for my learnings? >>>> >>>> please suggest. >>>> >>>> Thank You, >>>> Shrikant Bang >>>> >>>> On Tue, Aug 21, 2018 at 6:36 PM ShaoFeng Shi <[email protected]> >>>> wrote: >>>> >>>>> Hi Shrikant, >>>>> >>>>> How much memory are you allocating to Reducer? Please consider to >>>>> allocate more mem to reducer, as Kylin builds the dictionary in the >>>>> reducers. >>>>> >>>>> You can also disable this, then Kylin will build dict in its own JVM. >>>>> This may cause your Kylin process OOM if there is an ultra high >>>>> cardinality >>>>> (UHC) column. >>>>> >>>>> kylin.engine.mr.build-dict-in-reducer=false >>>>> >>>>> >>>>> Do you know how high the cardinality of that dimension? For UHC which >>>>> cardinality > 3 millions, we don't recommend to use dictionary as the >>>>> encoding. You may need to use "fixed_length" or "integer"(if it is in >>>>> type of integer). >>>>> >>>>> >>>>> 2018-08-16 16:50 GMT+08:00 Ashish Singhi <[email protected]>: >>>>> >>>>>> Hi Shrikant, >>>>>> >>>>>> Refer http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/ >>>>>> You might find it useful. >>>>>> >>>>>> Regards, >>>>>> Ashish >>>>>> >>>>>> On Thu, Aug 16, 2018 at 10:33 AM, Shrikant Bang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thank you, ShaoFeng & Billy for responses. >>>>>>> >>>>>>> I could able to set hierarchies in dimension. >>>>>>> >>>>>>> While building cube, step "fact distinct column" job is failing in a >>>>>>> reducer with Out Of Memory exception. >>>>>>> >>>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>>> at java.util.IdentityHashMap.resize(IdentityHashMap.java:471) >>>>>>> at java.util.IdentityHashMap.put(IdentityHashMap.java:440) >>>>>>> at >>>>>>> org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:476) >>>>>>> at >>>>>>> org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418) >>>>>>> at >>>>>>> org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:109) >>>>>>> at >>>>>>> org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.build(DictionaryGenerator.java:220) >>>>>>> at org.apache.kylin.engine.mr >>>>>>> .steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:216) >>>>>>> at org.apache.kylin.engine.mr >>>>>>> .KylinReducer.cleanup(KylinReducer.java:103) >>>>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) >>>>>>> at >>>>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) >>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) >>>>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) >>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >>>>>>> >>>>>>> >>>>>>> I tried debugging and understood that dictionary is getting built in >>>>>>> reducer's clean up method. >>>>>>> >>>>>>> I am curious to learn internals. Can you please help me in below : >>>>>>> >>>>>>> 1. Any pointer/reference/JIRA for understanding how TRIE >>>>>>> (dictionary) of dimension's value getting used in next steps? >>>>>>> >>>>>>> 2. Any best practice/references in tuning "fact distinct column" >>>>>>> job for those reducer which have high cardinality. I am trying with >>>>>>> increasing memory as of now as partitioning and number of reducers are >>>>>>> depends on cuboids number. >>>>>>> >>>>>>> >>>>>>> P.S. I am using v2.4 of Kylin with HBase 1.x >>>>>>> >>>>>>> Thank You, >>>>>>> Shrikant Bang >>>>>>> >>>>>>> On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> For question 1), in Cube's "advanced setting" step, you can specify >>>>>>>> the cuboid whitelist to build. >>>>>>>> >>>>>>>> 2018-08-13 22:26 GMT+08:00 Billy Liu <[email protected]>: >>>>>>>> >>>>>>>>> Hello Shrikant, >>>>>>>>> >>>>>>>>> For 1, seems the 4 dimensions are hierarchy structure. You could >>>>>>>>> define them as hierarchy dimensions in Cube, and leave A as >>>>>>>>> mandatory >>>>>>>>> dimension. >>>>>>>>> >>>>>>>>> For 2, select 'user_activity' as partition column in model design. >>>>>>>>> There are a few built-in formats, most date types are supported. >>>>>>>>> >>>>>>>>> With Warm regards >>>>>>>>> >>>>>>>>> Billy Liu >>>>>>>>> Shrikant Bang <[email protected]> 于2018年8月13日周一 下午5:39写道: >>>>>>>>> > >>>>>>>>> > Hi Team, >>>>>>>>> > >>>>>>>>> > We are doing a PoC on building OLAP cubes. Could you please >>>>>>>>> help me to get answer of below queries? >>>>>>>>> > >>>>>>>>> > Selective Cuboids: >>>>>>>>> > We need to have selective cuboids as part of OLAP cubes. >>>>>>>>> > Let say if we have 4 dimensions : A, B, C, D then we need just >>>>>>>>> (A,B,C,D) , (A,B,C), (A,B) and (A) >>>>>>>>> > >>>>>>>>> > Refresh Settings: >>>>>>>>> > How to specify partition column and format while building cube >>>>>>>>> for fact table. >>>>>>>>> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube >>>>>>>>> should be refreshed everyday with previous day's computation. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Thank You, >>>>>>>>> > Shrikant Bang >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Shaofeng Shi 史少锋 >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >
