Re: optimal parameters

ShaoFeng Shi Mon, 05 Feb 2018 16:52:53 -0800

Hi Manoj,

In this case, splitting the dimensions into two cubes might not work; If
user selects a dimension in cube1 and another in cube2, neither cube1 nor
cube2 can answer;


Adding all them to one cube is doable, but please note the max physical
dimension # (exclude derived col in lookup tables) in one Cube is 64 as the
cuboid ID is Long type, which is 8 bytes; Besides, please use
mandatory/joint and hierarchy to control the combination numbers. If your
dataset is not huge, you can even set most of them as mandatory or joint to
greatly reduce the pre-aggregation.

2018-02-05 21:41 GMT+08:00 Kumar, Manoj H <[email protected]>:

> Any inputs on this…. Its very important to have large no of columns in
> Tableau worksheet. Pls. advise how can I achieve it?
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Monday, February 05, 2018 9:58 AM
> *To:* '[email protected]' <[email protected]>
> *Subject:* RE: optimal parameters
>
>
>
> Or is it possible to use mandatory dimensions instead of join/hierarchical
> one. In that case, Cube wont be exploded as such. Pls. advise.
>
>
>
> Can I put mandatory – 60
>
> Hierarchy – 20
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Saturday, February 03, 2018 10:40 AM
> *To:* '[email protected]' <[email protected]>
> *Subject:* RE: optimal parameters
>
>
>
> Thanks for your inputs.. Is there any other way to get 80+ dimensions into
> one Cube?
>
>
>
> Can we split the cube – 20 Dimension
>
>
>
> Cube 1 – 20 dimensions
>
> Cube2 – 20 dimensions
>
>
>
> Query should take the data from both cube – Cube1+cube2 – so that Tableau
> will have 40 dimensions into one worksheet. Pls. advise.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* ShaoFeng Shi [mailto:[email protected]
> <[email protected]>]
> *Sent:* Friday, February 02, 2018 4:09 PM
> *To:* user <[email protected]>
> *Subject:* Re: optimal parameters
>
>
>
> Hi Manoj,
>
>
>
>
>
> 450 millions in one build is a common case for Kylin. But 80+ dimensions
> is too many, as by default the cube will have 2^N dimension combinations (N
> is dimension number). I think you have optimized the aggregation group, as
> by default Kylin only allows 2048 combinations in one Cube.
>
>
>
>  If you see the build is very slow, a possible reason is the cluster's
> capacity. Please try a smaller data set with a simpler Cube first, and then
> increase that based on the performance.
>
>
>
> 2018-02-02 18:17 GMT+08:00 Kumar, Manoj H <[email protected]>:
>
> Any updates on this?? How to process 450 milions of records in one
> partition – fact table has this much data for one COB.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Friday, February 02, 2018 11:45 AM
> *To:* '[email protected]' <[email protected]>
> *Subject:* optimal parameters
> *Importance:* High
>
>
>
> Hi Folks – Need your inputs for optimizing the kylin Cube build process –
> We have approx.. 450 millions of records in one Partition & 80-90
> Dimensions to be picked up from the tables. Can you pls. advise on this?
> What would be optimal way of running the jobs.We have Cloudera cluster of
> 16 nodes – with 8 cores machine for each nodes.
>
>
>
> This process is running since 60 minutes.
>
>
>
> 2018-02-01 23:54:16,257 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:116 : 
> CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd,
> name=BUILD CUBE - Deposits - 20170929000000_201709      30000000 -
> GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled
>
> 79923 2018-02-01 23:54:16,258 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD
> CUBE - Deposits - 20170929000000_20      170930000000 - GMT+08:00
> 2018-02-02 12:37:11)
>
> 79924 2018-02-01 23:54:16,263 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
> from READY to RUNNING
>
> 79925 2018-02-01 23:54:16,271 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract
> Fact Table Distinct Columns)
>
> 79926 2018-02-01 23:54:16,275 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job 
> id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02
> from READY to RUNNING
>
> 79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual
> running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0
> discarded, 0 others
>
> 79928 2018-02-01 23:54:16,371 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.MapReduceExecutable:115 : parameters of the MapReduceExecutable:
> -conf /apps/rft/rcmo/apps/kylin/k      ylin_namespace/apache-kylin-2.
> 1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output
> hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_
> poc-kylin_metadata/kylin-2b      8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd/Deposits/fact_distinct_columns -segmentid
> da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true
> -statisticsoutput hdfs://sfpdev/tenants/rft/r
> cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
> -statisticssamplingpercent 100 -jobname Kylin_Fact_D
> istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd
>
> 79929 2018-02-01 23:54:16,424 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_
> Deposits_Step
>
> 79930 2018-02-01 23:54:16,775 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:386 : Trying to connect to metastore with URI thrift://
> bdtpisr3n1.svr.us.jpmchase.net:9083
>
> 79931 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:431 : Opened a connection to metastore, current connections:
> 3
>
> 79932 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:483 : Connected to metastore.
>
> 79933 2018-02-01 23:54:17,345 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.KylinConfigBase:162 : Kylin Config was updated with
> kylin.metadata.url : /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta
>
> 79934 2018-02-01 23:54:17,347 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2      .1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store
>
> 79935 2018-02-01 23:54:17,354 DEBUG [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.      1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta took 9 ms
>
> 79936 2018-02-01 23:54:17,354 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.AbstractHadoopJob:505 : HDFS meta dir is:
> file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k
> ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_
> meta8814952902761392543/meta
>
> 79937 2018-02-01 23:54:17,470 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hdfs.DFSClient:1086 : Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN
> [email protected]      NCHASE.COM, renewer=yarn,
> realUser=, issueDate=1517547257468, maxDate=1518152057468,
> sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev
>
> 79938 2018-02-01 23:54:17,471 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> security.TokenCache:144 : Got dt for hdfs://sfpdev; Kind:
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev,       Ident: (token for
> a_rcmo_nd: HDFS_DELEGATION_TOKEN [email protected].
> JPMORGANCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468,
> maxDate=1518152057468, sequenceNumber      =917925, masterKeyId=921)
>
> 79939 2018-02-01 23:54:17,478 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76
>
> 79940 2018-02-01 23:54:18,864 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapred.FileInputFormat:249 : Total input paths to process : 482
>
> 79941 2018-02-01 23:54:19,518 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:202 : number of splits:482
>
> 79942 2018-02-01 23:54:19,566 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:291 : Submitting tokens for job:
> job_1516848187601_12793
>
> 79943 2018-02-01 23:54:19,566 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.JobSubmitter:293 : Kind: HDFS_DELEGATION_TOKEN, Service:
> ha-hdfs:sfpdev, Ident: (token for a_rcm      o_nd: HDFS_DELEGATION_TOKEN
> [email protected], renewer=yarn, realUser=,
> issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925,
> masterKeyId=92      1)
>
> 79944 2018-02-01 23:54:19,821 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> impl.YarnClientImpl:260 : Submitted application application_1516848187601_
> 12793
>
> 79945 2018-02-01 23:54:19,825 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> mapreduce.Job:1311 : The url to track the job: http://bdtpisr3n2.svr.us.
> jpmchase.net:8088/proxy/applicatio
>
>
>
>
>
>
>
>
>
> Also pls. advise on Spark parameter as well.
>
>
>
> 147 kylin.engine.mr.reduce-input-mb=400
>
> 149 #kylin.engine.mr.max-reducer-number=300
>
> 151 kylin.engine.mr.mapper-input-rows=500000
>
> 154 #kylin.engine.mr.build-dict-in-reducer=true
>
> 157 kylin.engine.mr.uhc-reducer-count=2
>
> 159 #### CUBE | DICTIONARY ###
>
> 164 kylin.cube.algorithm=inmem
>
> 166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem
>
> 167 #kylin.cube.algorithm.layer-or-inmem-threshold=7
>
> 169 kylin.cube.aggrgroup.max-combination=61440
>
> 171 kylin.snapshot.max-mb=1500
>
>
>
>
>
>
>
> kylin.engine.spark.rdd-partition-cut-mb=800
>
> 229 kylin.engine.spark.min-partition=1
>
> 231 ## Max partition numbers of rdd
>
> 232 kylin.engine.spark.max-partition=500
>
> 237 kylin.engine.spark-conf.spark.yarn.queue=XXXX
>
> 238 kylin.engine.spark-conf.spark.executor.memory=8G
>
> 239 kylin.engine.spark-conf.spark.executor.cores=6
>
> 240 kylin.engine.spark-conf.spark.executor.instances=10
>
> 241 kylin.engine.spark-conf.spark.eventLog.enabled=true
>
> 242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX
>
> 243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX
>
> 244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.
> enabled=false
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>
>
>
>
>
> --
>
> Best regards,
>
>
>
> Shaofeng Shi 史少锋
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: optimal parameters

Reply via email to