Or is it possible to use mandatory dimensions instead of join/hierarchical one. 
In that case, Cube wont be exploded as such. Pls. advise.

Can I put mandatory – 60
Hierarchy – 20

Regards,
Manoj

From: Kumar, Manoj H
Sent: Saturday, February 03, 2018 10:40 AM
To: 'user@kylin.apache.org' <user@kylin.apache.org>
Subject: RE: optimal parameters

Thanks for your inputs.. Is there any other way to get 80+ dimensions into one 
Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will 
have 40 dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too 
many, as by default the cube will have 2^N dimension combinations (N is 
dimension number). I think you have optimized the aggregation group, as by 
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's 
capacity. Please try a smaller data set with a simpler Cube first, and then 
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H 
<manoj.h.ku...@jpmorgan.com<mailto:manoj.h.ku...@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
<user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 20170929000000_201709      30000000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 20170929000000_20      
170930000000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0       discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k      
ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml 
-cubename Deposits -output 
hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
      8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns 
-segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true 
-statisticsoutput hdfs://sfpdev/tenants/rft/r      
cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
 -statisticssamplingpercent 100 -jobname Kylin_Fact_D      
istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : 
Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to 
connect to metastore with URI 
thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchase.net:9083>
79931 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a 
connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to 
metastore.
79933 2018-02-01 23:54:17,345 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162 : Kylin 
Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/      
kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79934 2018-02-01 23:54:17,347 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] persistence.ResourceStore:79 : Using 
metadata url /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2      
.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 
for resource store
79935 2018-02-01 23:54:17,354 DEBUG [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:547 : Dump 
resources to /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2.      
1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 
took 9 ms
79936 2018-02-01 23:54:17,354 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:505 : HDFS 
meta dir is: 
file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k<file:///\\apps\rft\rcmo\apps\kylin\kylin_namespace\apache-k>
      
ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79937 2018-02-01 23:54:17,470 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hdfs.DFSClient:1086 : Created token 
for a_rcmo_nd: HDFS_DELEGATION_TOKEN 
owner=a_rcmo...@naeast.ad.JPMORGA<mailto:owner=a_rcmo...@naeast.ad.JPMORGA>     
 NCHASE.COM<https://secureweb.jpmchase.net/readonly/http:/NCHASE.COM>, 
renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, 
sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev
79938 2018-02-01 23:54:17,471 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] security.TokenCache:144 : Got dt for 
hdfs://sfpdev; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev,       
Ident: (token for a_rcmo_nd: HDFS_DELEGATION_TOKEN 
owner=a_rcmo...@naeast.ad.jpmorganchase.com<mailto:owner=a_rcmo...@naeast.ad.jpmorganchase.com>,
 renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, 
sequenceNumber      =917925, masterKeyId=921)
79939 2018-02-01 23:54:17,478 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] 
client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76
79940 2018-02-01 23:54:18,864 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapred.FileInputFormat:249 : Total 
input paths to process : 482
79941 2018-02-01 23:54:19,518 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:202 : number 
of splits:482
79942 2018-02-01 23:54:19,566 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:291 : 
Submitting tokens for job: job_1516848187601_12793
79943 2018-02-01 23:54:19,566 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:293 : Kind: 
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for a_rcm      
o_nd: HDFS_DELEGATION_TOKEN 
owner=a_rcmo...@naeast.ad.jpmorganchase.com<mailto:owner=a_rcmo...@naeast.ad.jpmorganchase.com>,
 renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, 
sequenceNumber=917925, masterKeyId=92      1)
79944 2018-02-01 23:54:19,821 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] impl.YarnClientImpl:260 : Submitted 
application application_1516848187601_12793
79945 2018-02-01 23:54:19,825 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.Job:1311 : The url to track 
the job: http://bdtpisr3n2.svr.us.jpmchase.net:8088/proxy/applicatio




Also pls. advise on Spark parameter as well.

147 
kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.reduce-input-mb=400
149 
#kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.max-reducer-number=300
151 
kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.mapper-input-rows=500000
154 
#kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.build-dict-in-reducer=true
157 
kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.uhc-reducer-count=2
159 #### CUBE | DICTIONARY ###
164 kylin.cube.algorithm=inmem
166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem
167 #kylin.cube.algorithm.layer-or-inmem-threshold=7
169 kylin.cube.aggrgroup.max-combination=61440
171 kylin.snapshot.max-mb=1500



kylin.engine.spark.rdd-partition-cut-mb=800
229 kylin.engine.spark.min-partition=1
231 ## Max partition numbers of rdd
232 kylin.engine.spark.max-partition=500
237 kylin.engine.spark-conf.spark.yarn.queue=XXXX
238 kylin.engine.spark-conf.spark.executor.memory=8G
239 kylin.engine.spark-conf.spark.executor.cores=6
240 kylin.engine.spark-conf.spark.executor.instances=10
241 kylin.engine.spark-conf.spark.eventLog.enabled=true
242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX
243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX
244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false

Regards,
Manoj


This message is confidential and subject to terms at: 
http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer>
 including on confidentiality, legal privilege, viruses and monitoring of 
electronic messages. If you are not the intended recipient, please delete this 
message and notify the sender immediately. Any unauthorized use is strictly 
prohibited.



--
Best regards,

Shaofeng Shi 史少锋


This message is confidential and subject to terms at: 
http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal 
privilege, viruses and monitoring of electronic messages. If you are not the 
intended recipient, please delete this message and notify the sender 
immediately. Any unauthorized use is strictly prohibited.

Reply via email to