Thank you, ShaoFeng & Billy for responses.

I could able to set hierarchies in dimension.

While building cube, step "fact distinct column" job is failing in a
reducer with Out Of Memory exception.

java.lang.OutOfMemoryError: Java heap space
at java.util.IdentityHashMap.resize(IdentityHashMap.java:471)
at java.util.IdentityHashMap.put(IdentityHashMap.java:440)
at
org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:476)
at
org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418)
at
org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:109)
at
org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.build(DictionaryGenerator.java:220)
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:216)
at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:103)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


I tried debugging and understood that dictionary is getting built in
reducer's clean up method.

I am curious to learn internals. Can you please help me in below :

  1.  Any pointer/reference/JIRA for understanding how TRIE (dictionary) of
dimension's value getting used in next steps?

  2.  Any best practice/references in tuning "fact distinct column" job for
those reducer which have high cardinality. I am trying with increasing
memory as of now as partitioning and number of reducers are depends on
cuboids number.


P.S. I am using v2.4 of Kylin with HBase 1.x

Thank You,
Shrikant Bang

On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <[email protected]> wrote:

> For question 1), in Cube's "advanced setting" step, you can specify the
> cuboid whitelist to build.
>
> 2018-08-13 22:26 GMT+08:00 Billy Liu <[email protected]>:
>
>> Hello Shrikant,
>>
>> For 1, seems the 4 dimensions are hierarchy structure. You could
>> define them as hierarchy dimensions in Cube, and leave A as mandatory
>> dimension.
>>
>> For 2, select 'user_activity' as partition column in model design.
>> There are a few built-in formats, most date types are supported.
>>
>> With Warm regards
>>
>> Billy Liu
>> Shrikant Bang <[email protected]> 于2018年8月13日周一 下午5:39写道:
>> >
>> > Hi Team,
>> >
>> >      We are doing a PoC on building OLAP cubes. Could you please help
>> me to get answer of below queries?
>> >
>> > Selective Cuboids:
>> > We need to have selective cuboids as part of OLAP cubes.
>> > Let say if we have 4 dimensions : A, B, C, D then we need just
>> (A,B,C,D) , (A,B,C), (A,B) and (A)
>> >
>> > Refresh Settings:
>> > How to specify partition column and format while building cube for fact
>> table.
>> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube should
>> be refreshed everyday with previous day's computation.
>> >
>> >
>> > Thank You,
>> > Shrikant Bang
>> >
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Reply via email to