Hi Manoj,

450 millions in one build is a common case for Kylin. But 80+ dimensions is
too many, as by default the cube will have 2^N dimension combinations (N is
dimension number). I think you have optimized the aggregation group, as by
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's
capacity. Please try a smaller data set with a simpler Cube first, and then
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H <manoj.h.ku...@jpmorgan.com>:

> Any updates on this?? How to process 450 milions of records in one
> partition – fact table has this much data for one COB.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Friday, February 02, 2018 11:45 AM
> *To:* 'user@kylin.apache.org' <user@kylin.apache.org>
> *Subject:* optimal parameters
> *Importance:* High
>
>
>
> Hi Folks – Need your inputs for optimizing the kylin Cube build process –
> We have approx.. 450 millions of records in one Partition & 80-90
> Dimensions to be picked up from the tables. Can you pls. advise on this?
> What would be optimal way of running the jobs.We have Cloudera cluster of
> 16 nodes – with 8 cores machine for each nodes.
>
>
>
> This process is running since 60 minutes.
>
>
>
> 2018-02-01 23:54:16,257 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:116 : 
> CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd,
> name=BUILD CUBE - Deposits - 20170929000000_201709      30000000 -
> GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled
>
> 79923 2018-02-01 23:54:16,258 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD
> CUBE - Deposits - 20170929000000_20      170930000000 - GMT+08:00
> 2018-02-02 12:37:11)
>
> 79924 2018-02-01 23:54:16,263 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
> from READY to RUNNING
>
> 79925 2018-02-01 23:54:16,271 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract
> Fact Table Distinct Columns)
>
> 79926 2018-02-01 23:54:16,275 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job 
> id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02
> from READY to RUNNING
>
> 79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual
> running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0
> discarded, 0 others
>
> 79928 2018-02-01 23:54:16,371 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.MapReduceExecutable:115 : parameters of the MapReduceExecutable:
> -conf /apps/rft/rcmo/apps/kylin/k      ylin_namespace/apache-kylin-2.
> 1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output
> hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_
> poc-kylin_metadata/kylin-2b      8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd/Deposits/fact_distinct_columns -segmentid
> da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true
> -statisticsoutput hdfs://sfpdev/tenants/rft/r
> cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
> -statisticssamplingpercent 100 -jobname Kylin_Fact_D
> istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd
>
> 79929 2018-02-01 23:54:16,424 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_
> Deposits_Step
>
> 79930 2018-02-01 23:54:16,775 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:386 : Trying to connect to metastore with URI thrift://
> bdtpisr3n1.svr.us.jpmchase.net:9083
>
> 79931 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:431 : Opened a connection to metastore, current connections:
> 3
>
> 79932 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:483 : Connected to metastore.
>
> 79933 2018-02-01 23:54:17,345 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.KylinConfigBase:162 : Kylin Config was updated with
> kylin.metadata.url : /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta
>
> 79934 2018-02-01 23:54:17,347 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2      .1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store
>
> 79935 2018-02-01 23:54:17,354 DEBUG [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.      1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta took 9 ms
>
> 79936 2018-02-01 23:54:17,354 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.AbstractHadoopJob:505 : HDFS meta dir is:
> file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k
> ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_
> meta8814952902761392543/meta
>
> 79937 2018-02-01 23:54:17,470 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hdfs.DFSClient:1086 : Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN
> owner=a_rcmo...@naeast.ad.JPMORGA      NCHASE.COM, renewer=yarn,
> realUser=, issueDate=1517547257468, maxDate=1518152057468,
> sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev
>
> 79938 2018-02-01 23:54:17,471 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> security.TokenCache:144 : Got dt for hdfs://sfpdev; Kind:
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev,       Ident: (token for
> a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo...@naeast.ad.
> JPMORGANCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468,
> maxDate=1518152057468, sequenceNumber      =917925, masterKeyId=921)
>
> 79939 2018-02-01 23:54:17,478 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76
>
> 79940 2018-02-01 23:54:18,864 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapred.FileInputFormat:249 : Total input paths to process : 482
>
> 79941 2018-02-01 23:54:19,518 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:202 : number of splits:482
>
> 79942 2018-02-01 23:54:19,566 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:291 : Submitting tokens for job:
> job_1516848187601_12793
>
> 79943 2018-02-01 23:54:19,566 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:293 : Kind: HDFS_DELEGATION_TOKEN, Service:
> ha-hdfs:sfpdev, Ident: (token for a_rcm      o_nd: HDFS_DELEGATION_TOKEN
> owner=a_rcmo...@naeast.ad.jpmorganchase.com, renewer=yarn, realUser=,
> issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925,
> masterKeyId=92      1)
>
> 79944 2018-02-01 23:54:19,821 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> impl.YarnClientImpl:260 : Submitted application application_1516848187601_
> 12793
>
> 79945 2018-02-01 23:54:19,825 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.Job:1311 : The url to track the job: http://bdtpisr3n2.svr.us.
> jpmchase.net:8088/proxy/applicatio
>
>
>
> [image: cid:image001.png@01D39C3D.28A77E80]
>
>
>
>
>
> Also pls. advise on Spark parameter as well.
>
>
>
> 147 kylin.engine.mr.reduce-input-mb=400
>
> 149 #kylin.engine.mr.max-reducer-number=300
>
> 151 kylin.engine.mr.mapper-input-rows=500000
>
> 154 #kylin.engine.mr.build-dict-in-reducer=true
>
> 157 kylin.engine.mr.uhc-reducer-count=2
>
> 159 #### CUBE | DICTIONARY ###
>
> 164 kylin.cube.algorithm=inmem
>
> 166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem
>
> 167 #kylin.cube.algorithm.layer-or-inmem-threshold=7
>
> 169 kylin.cube.aggrgroup.max-combination=61440
>
> 171 kylin.snapshot.max-mb=1500
>
>
>
>
>
>
>
> kylin.engine.spark.rdd-partition-cut-mb=800
>
> 229 kylin.engine.spark.min-partition=1
>
> 231 ## Max partition numbers of rdd
>
> 232 kylin.engine.spark.max-partition=500
>
> 237 kylin.engine.spark-conf.spark.yarn.queue=XXXX
>
> 238 kylin.engine.spark-conf.spark.executor.memory=8G
>
> 239 kylin.engine.spark-conf.spark.executor.cores=6
>
> 240 kylin.engine.spark-conf.spark.executor.instances=10
>
> 241 kylin.engine.spark-conf.spark.eventLog.enabled=true
>
> 242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX
>
> 243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX
>
> 244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.
> enabled=false
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to