Hi Manoj, In this case, splitting the dimensions into two cubes might not work; If user selects a dimension in cube1 and another in cube2, neither cube1 nor cube2 can answer;
Adding all them to one cube is doable, but please note the max physical dimension # (exclude derived col in lookup tables) in one Cube is 64 as the cuboid ID is Long type, which is 8 bytes; Besides, please use mandatory/joint and hierarchy to control the combination numbers. If your dataset is not huge, you can even set most of them as mandatory or joint to greatly reduce the pre-aggregation. 2018-02-05 21:41 GMT+08:00 Kumar, Manoj H <manoj.h.ku...@jpmorgan.com>: > Any inputs on this…. Its very important to have large no of columns in > Tableau worksheet. Pls. advise how can I achieve it? > > > > Regards, > > Manoj > > > > *From:* Kumar, Manoj H > *Sent:* Monday, February 05, 2018 9:58 AM > *To:* 'user@kylin.apache.org' <user@kylin.apache.org> > *Subject:* RE: optimal parameters > > > > Or is it possible to use mandatory dimensions instead of join/hierarchical > one. In that case, Cube wont be exploded as such. Pls. advise. > > > > Can I put mandatory – 60 > > Hierarchy – 20 > > > > > > Regards, > > Manoj > > > > *From:* Kumar, Manoj H > *Sent:* Saturday, February 03, 2018 10:40 AM > *To:* 'user@kylin.apache.org' <user@kylin.apache.org> > *Subject:* RE: optimal parameters > > > > Thanks for your inputs.. Is there any other way to get 80+ dimensions into > one Cube? > > > > Can we split the cube – 20 Dimension > > > > Cube 1 – 20 dimensions > > Cube2 – 20 dimensions > > > > Query should take the data from both cube – Cube1+cube2 – so that Tableau > will have 40 dimensions into one worksheet. Pls. advise. > > > > Regards, > > Manoj > > > > *From:* ShaoFeng Shi [mailto:shaofeng...@apache.org > <shaofeng...@apache.org>] > *Sent:* Friday, February 02, 2018 4:09 PM > *To:* user <user@kylin.apache.org> > *Subject:* Re: optimal parameters > > > > Hi Manoj, > > > > > > 450 millions in one build is a common case for Kylin. But 80+ dimensions > is too many, as by default the cube will have 2^N dimension combinations (N > is dimension number). I think you have optimized the aggregation group, as > by default Kylin only allows 2048 combinations in one Cube. > > > > If you see the build is very slow, a possible reason is the cluster's > capacity. Please try a smaller data set with a simpler Cube first, and then > increase that based on the performance. > > > > 2018-02-02 18:17 GMT+08:00 Kumar, Manoj H <manoj.h.ku...@jpmorgan.com>: > > Any updates on this?? How to process 450 milions of records in one > partition – fact table has this much data for one COB. > > > > Regards, > > Manoj > > > > *From:* Kumar, Manoj H > *Sent:* Friday, February 02, 2018 11:45 AM > *To:* 'user@kylin.apache.org' <user@kylin.apache.org> > *Subject:* optimal parameters > *Importance:* High > > > > Hi Folks – Need your inputs for optimizing the kylin Cube build process – > We have approx.. 450 millions of records in one Partition & 80-90 > Dimensions to be picked up from the tables. Can you pls. advise on this? > What would be optimal way of running the jobs.We have Cloudera cluster of > 16 nodes – with 8 cores machine for each nodes. > > > > This process is running since 60 minutes. > > > > 2018-02-01 23:54:16,257 INFO [pool-9-thread-1] > threadpool.DefaultScheduler:116 : > CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, > name=BUILD CUBE - Deposits - 20170929000000_201709 30000000 - > GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled > > 79923 2018-02-01 23:54:16,258 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD > CUBE - Deposits - 20170929000000_20 170930000000 - GMT+08:00 > 2018-02-02 12:37:11) > > 79924 2018-02-01 23:54:16,263 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd > from READY to RUNNING > > 79925 2018-02-01 23:54:16,271 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract > Fact Table Distinct Columns) > > 79926 2018-02-01 23:54:16,275 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > execution.ExecutableManager:425 : job > id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 > from READY to RUNNING > > 79927 2018-02-01 23:54:16,358 INFO [pool-9-thread-1] > threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual > running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0 > discarded, 0 others > > 79928 2018-02-01 23:54:16,371 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > common.MapReduceExecutable:115 : parameters of the MapReduceExecutable: > -conf /apps/rft/rcmo/apps/kylin/k ylin_namespace/apache-kylin-2. > 1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output > hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_ > poc-kylin_metadata/kylin-2b 8baabe-0d16-4ad8-9c4a- > 449b24cb0fcd/Deposits/fact_distinct_columns -segmentid > da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true > -statisticsoutput hdfs://sfpdev/tenants/rft/r > cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin- > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics > -statisticssamplingpercent 100 -jobname Kylin_Fact_D > istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a- > 449b24cb0fcd > > 79929 2018-02-01 23:54:16,424 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_ > Deposits_Step > > 79930 2018-02-01 23:54:16,775 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > hive.metastore:386 : Trying to connect to metastore with URI thrift:// > bdtpisr3n1.svr.us.jpmchase.net:9083 > > 79931 2018-02-01 23:54:16,784 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > hive.metastore:431 : Opened a connection to metastore, current connections: > 3 > > 79932 2018-02-01 23:54:16,784 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > hive.metastore:483 : Connected to metastore. > > 79933 2018-02-01 23:54:17,345 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > common.KylinConfigBase:162 : Kylin Config was updated with > kylin.metadata.url : /apps/rft/rcmo/apps/kylin/ > kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../ > tomcat/temp/kylin_job_meta8814952902761392543/meta > > 79934 2018-02-01 23:54:17,347 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/ > kylin_namespace/apache-kylin-2 .1.0-KYLIN-2846-cdh57/bin/../ > tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store > > 79935 2018-02-01 23:54:17,354 DEBUG [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/ > kylin_namespace/apache-kylin-2. 1.0-KYLIN-2846-cdh57/bin/../ > tomcat/temp/kylin_job_meta8814952902761392543/meta took 9 ms > > 79936 2018-02-01 23:54:17,354 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > common.AbstractHadoopJob:505 : HDFS meta dir is: > file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k > ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_ > meta8814952902761392543/meta > > 79937 2018-02-01 23:54:17,470 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > hdfs.DFSClient:1086 : Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN > owner=a_rcmo...@naeast.ad.JPMORGA NCHASE.COM, renewer=yarn, > realUser=, issueDate=1517547257468, maxDate=1518152057468, > sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev > > 79938 2018-02-01 23:54:17,471 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > security.TokenCache:144 : Got dt for hdfs://sfpdev; Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for > a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo...@naeast.ad. > JPMORGANCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468, > maxDate=1518152057468, sequenceNumber =917925, masterKeyId=921) > > 79939 2018-02-01 23:54:17,478 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76 > > 79940 2018-02-01 23:54:18,864 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > mapred.FileInputFormat:249 : Total input paths to process : 482 > > 79941 2018-02-01 23:54:19,518 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > mapreduce.JobSubmitter:202 : number of splits:482 > > 79942 2018-02-01 23:54:19,566 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > mapreduce.JobSubmitter:291 : Submitting tokens for job: > job_1516848187601_12793 > > 79943 2018-02-01 23:54:19,566 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > mapreduce.JobSubmitter:293 : Kind: HDFS_DELEGATION_TOKEN, Service: > ha-hdfs:sfpdev, Ident: (token for a_rcm o_nd: HDFS_DELEGATION_TOKEN > owner=a_rcmo...@naeast.ad.jpmorganchase.com, renewer=yarn, realUser=, > issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, > masterKeyId=92 1) > > 79944 2018-02-01 23:54:19,821 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > impl.YarnClientImpl:260 : Submitted application application_1516848187601_ > 12793 > > 79945 2018-02-01 23:54:19,825 INFO [Job > 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] > mapreduce.Job:1311 : The url to track the job: http://bdtpisr3n2.svr.us. > jpmchase.net:8088/proxy/applicatio > > > > > > > > > > Also pls. advise on Spark parameter as well. > > > > 147 kylin.engine.mr.reduce-input-mb=400 > > 149 #kylin.engine.mr.max-reducer-number=300 > > 151 kylin.engine.mr.mapper-input-rows=500000 > > 154 #kylin.engine.mr.build-dict-in-reducer=true > > 157 kylin.engine.mr.uhc-reducer-count=2 > > 159 #### CUBE | DICTIONARY ### > > 164 kylin.cube.algorithm=inmem > > 166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem > > 167 #kylin.cube.algorithm.layer-or-inmem-threshold=7 > > 169 kylin.cube.aggrgroup.max-combination=61440 > > 171 kylin.snapshot.max-mb=1500 > > > > > > > > kylin.engine.spark.rdd-partition-cut-mb=800 > > 229 kylin.engine.spark.min-partition=1 > > 231 ## Max partition numbers of rdd > > 232 kylin.engine.spark.max-partition=500 > > 237 kylin.engine.spark-conf.spark.yarn.queue=XXXX > > 238 kylin.engine.spark-conf.spark.executor.memory=8G > > 239 kylin.engine.spark-conf.spark.executor.cores=6 > > 240 kylin.engine.spark-conf.spark.executor.instances=10 > > 241 kylin.engine.spark-conf.spark.eventLog.enabled=true > > 242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX > > 243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX > > 244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service. > enabled=false > > > > Regards, > > Manoj > > > > This message is confidential and subject to terms at: http:// > www.jpmorgan.com/emaildisclaimer including on confidentiality, legal > privilege, viruses and monitoring of electronic messages. If you are not > the intended recipient, please delete this message and notify the sender > immediately. Any unauthorized use is strictly prohibited. > > > > > > -- > > Best regards, > > > > Shaofeng Shi 史少锋 > > > > This message is confidential and subject to terms at: http:// > www.jpmorgan.com/emaildisclaimer including on confidentiality, legal > privilege, viruses and monitoring of electronic messages. If you are not > the intended recipient, please delete this message and notify the sender > immediately. Any unauthorized use is strictly prohibited. > -- Best regards, Shaofeng Shi 史少锋