RE: optimal parameters

2018-02-05 Thread Kumar, Manoj H
Thanks Shaofeng… So Max. limit of Physical Dimension in one Cube – 64 (by using 
mandatory/Hir/Join Hierarchy dimensions).

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Tuesday, February 06, 2018 6:21 AM
To: user 
Subject: Re: optimal parameters

Hi Manoj,

In this case, splitting the dimensions into two cubes might not work; If user 
selects a dimension in cube1 and another in cube2, neither cube1 nor cube2 can 
answer;

Adding all them to one cube is doable, but please note the max physical 
dimension # (exclude derived col in lookup tables) in one Cube is 64 as the 
cuboid ID is Long type, which is 8 bytes; Besides, please use mandatory/joint 
and hierarchy to control the combination numbers. If your dataset is not huge, 
you can even set most of them as mandatory or joint to greatly reduce the 
pre-aggregation.

2018-02-05 21:41 GMT+08:00 Kumar, Manoj H 
mailto:manoj.h.ku...@jpmorgan.com>>:
Any inputs on this…. Its very important to have large no of columns in Tableau 
worksheet. Pls. advise how can I achieve it?

Regards,
Manoj

From: Kumar, Manoj H
Sent: Monday, February 05, 2018 9:58 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: RE: optimal parameters

Or is it possible to use mandatory dimensions instead of join/hierarchical one. 
In that case, Cube wont be exploded as such. Pls. advise.

Can I put mandatory – 60
Hierarchy – 20


Regards,
Manoj

From: Kumar, Manoj H
Sent: Saturday, February 03, 2018 10:40 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: RE: optimal parameters

Thanks for your inputs.. Is there any other way to get 80+ dimensions into one 
Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will 
have 40 dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user mailto:user@kylin.apache.org>>
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too 
many, as by default the cube will have 2^N dimension combinations (N is 
dimension number). I think you have optimized the aggregation group, as by 
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's 
capacity. Please try a smaller data set with a simpler Cube first, and then 
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H 
mailto:manoj.h.ku...@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 2017092900_201709  3000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 2017092900_20  
17093000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0   discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k  
y

Re: optimal parameters

2018-02-05 Thread ShaoFeng Shi
Hi Manoj,

In this case, splitting the dimensions into two cubes might not work; If
user selects a dimension in cube1 and another in cube2, neither cube1 nor
cube2 can answer;

Adding all them to one cube is doable, but please note the max physical
dimension # (exclude derived col in lookup tables) in one Cube is 64 as the
cuboid ID is Long type, which is 8 bytes; Besides, please use
mandatory/joint and hierarchy to control the combination numbers. If your
dataset is not huge, you can even set most of them as mandatory or joint to
greatly reduce the pre-aggregation.

2018-02-05 21:41 GMT+08:00 Kumar, Manoj H :

> Any inputs on this…. Its very important to have large no of columns in
> Tableau worksheet. Pls. advise how can I achieve it?
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Monday, February 05, 2018 9:58 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* RE: optimal parameters
>
>
>
> Or is it possible to use mandatory dimensions instead of join/hierarchical
> one. In that case, Cube wont be exploded as such. Pls. advise.
>
>
>
> Can I put mandatory – 60
>
> Hierarchy – 20
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Saturday, February 03, 2018 10:40 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* RE: optimal parameters
>
>
>
> Thanks for your inputs.. Is there any other way to get 80+ dimensions into
> one Cube?
>
>
>
> Can we split the cube – 20 Dimension
>
>
>
> Cube 1 – 20 dimensions
>
> Cube2 – 20 dimensions
>
>
>
> Query should take the data from both cube – Cube1+cube2 – so that Tableau
> will have 40 dimensions into one worksheet. Pls. advise.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* ShaoFeng Shi [mailto:shaofeng...@apache.org
> ]
> *Sent:* Friday, February 02, 2018 4:09 PM
> *To:* user 
> *Subject:* Re: optimal parameters
>
>
>
> Hi Manoj,
>
>
>
>
>
> 450 millions in one build is a common case for Kylin. But 80+ dimensions
> is too many, as by default the cube will have 2^N dimension combinations (N
> is dimension number). I think you have optimized the aggregation group, as
> by default Kylin only allows 2048 combinations in one Cube.
>
>
>
>  If you see the build is very slow, a possible reason is the cluster's
> capacity. Please try a smaller data set with a simpler Cube first, and then
> increase that based on the performance.
>
>
>
> 2018-02-02 18:17 GMT+08:00 Kumar, Manoj H :
>
> Any updates on this?? How to process 450 milions of records in one
> partition – fact table has this much data for one COB.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Friday, February 02, 2018 11:45 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* optimal parameters
> *Importance:* High
>
>
>
> Hi Folks – Need your inputs for optimizing the kylin Cube build process –
> We have approx.. 450 millions of records in one Partition & 80-90
> Dimensions to be picked up from the tables. Can you pls. advise on this?
> What would be optimal way of running the jobs.We have Cloudera cluster of
> 16 nodes – with 8 cores machine for each nodes.
>
>
>
> This process is running since 60 minutes.
>
>
>
> 2018-02-01 23:54:16,257 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:116 : 
> CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd,
> name=BUILD CUBE - Deposits - 2017092900_201709  3000 -
> GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled
>
> 79923 2018-02-01 23:54:16,258 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD
> CUBE - Deposits - 2017092900_20  17093000 - GMT+08:00
> 2018-02-02 12:37:11)
>
> 79924 2018-02-01 23:54:16,263 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
> from READY to RUNNING
>
> 79925 2018-02-01 23:54:16,271 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract
> Fact Table Distinct Columns)
>
> 79926 2018-02-01 23:54:16,275 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job 
> id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02
> from READY to RUNNING
>
> 79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual
> running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0
> discarded, 0 others
>
> 79928 201

RE: optimal parameters

2018-02-05 Thread Kumar, Manoj H
Any inputs on this…. Its very important to have large no of columns in Tableau 
worksheet. Pls. advise how can I achieve it?

Regards,
Manoj

From: Kumar, Manoj H
Sent: Monday, February 05, 2018 9:58 AM
To: 'user@kylin.apache.org' 
Subject: RE: optimal parameters

Or is it possible to use mandatory dimensions instead of join/hierarchical one. 
In that case, Cube wont be exploded as such. Pls. advise.

Can I put mandatory – 60
Hierarchy – 20


Regards,
Manoj

From: Kumar, Manoj H
Sent: Saturday, February 03, 2018 10:40 AM
To: 'user@kylin.apache.org' 
mailto:user@kylin.apache.org>>
Subject: RE: optimal parameters

Thanks for your inputs.. Is there any other way to get 80+ dimensions into one 
Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will 
have 40 dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user mailto:user@kylin.apache.org>>
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too 
many, as by default the cube will have 2^N dimension combinations (N is 
dimension number). I think you have optimized the aggregation group, as by 
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's 
capacity. Please try a smaller data set with a simpler Cube first, and then 
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H 
mailto:manoj.h.ku...@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 2017092900_201709  3000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 2017092900_20  
17093000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0   discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k  
ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml 
-cubename Deposits -output 
hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
  8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns 
-segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true 
-statisticsoutput hdfs://sfpdev/tenants/rft/r  
cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
 -statisticssamplingpercent 100 -jobname Kylin_Fact_D  
istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : 
Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to 
connect to metastore with URI 
thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchas

RE: optimal parameters

2018-02-04 Thread Kumar, Manoj H
Or is it possible to use mandatory dimensions instead of join/hierarchical one. 
In that case, Cube wont be exploded as such. Pls. advise.

Can I put mandatory – 60
Hierarchy – 20

Regards,
Manoj

From: Kumar, Manoj H
Sent: Saturday, February 03, 2018 10:40 AM
To: 'user@kylin.apache.org' 
Subject: RE: optimal parameters

Thanks for your inputs.. Is there any other way to get 80+ dimensions into one 
Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will 
have 40 dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user mailto:user@kylin.apache.org>>
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too 
many, as by default the cube will have 2^N dimension combinations (N is 
dimension number). I think you have optimized the aggregation group, as by 
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's 
capacity. Please try a smaller data set with a simpler Cube first, and then 
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H 
mailto:manoj.h.ku...@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 2017092900_201709  3000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 2017092900_20  
17093000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0   discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k  
ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml 
-cubename Deposits -output 
hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
  8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns 
-segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true 
-statisticsoutput hdfs://sfpdev/tenants/rft/r  
cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
 -statisticssamplingpercent 100 -jobname Kylin_Fact_D  
istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : 
Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to 
connect to metastore with URI 
thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchase.net:9083>
79931 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a 
connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to 
metastore.
79933 

RE: optimal parameters

2018-02-02 Thread Kumar, Manoj H
Thanks for your inputs.. Is there any other way to get 80+ dimensions into one 
Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will 
have 40 dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user 
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too 
many, as by default the cube will have 2^N dimension combinations (N is 
dimension number). I think you have optimized the aggregation group, as by 
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's 
capacity. Please try a smaller data set with a simpler Cube first, and then 
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H 
mailto:manoj.h.ku...@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' 
mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes – with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 2017092900_201709  3000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 2017092900_20  
17093000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0   discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k  
ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml 
-cubename Deposits -output 
hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
  8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns 
-segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true 
-statisticsoutput hdfs://sfpdev/tenants/rft/r  
cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
 -statisticssamplingpercent 100 -jobname Kylin_Fact_D  
istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : 
Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to 
connect to metastore with URI 
thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchase.net:9083>
79931 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a 
connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to 
metastore.
79933 2018-02-01 23:54:17,345 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162 : Kylin 
Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/  
kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79934 2018-02-01 23:54:17,347 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449

RE: optimal parameters

2018-02-02 Thread Kumar, Manoj H
Any updates on this?? How to process 450 milions of records in one partition - 
fact table has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org' 
Subject: optimal parameters
Importance: High

Hi Folks - Need your inputs for optimizing the kylin Cube build process - We 
have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be 
picked up from the tables. Can you pls. advise on this? What would be optimal 
way of running the jobs.We have Cloudera cluster of 16 nodes - with 8 cores 
machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 
: CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits 
- 2017092900_201709  3000 - GMT+08:00 2018-02-02 12:37:11, 
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (BUILD CUBE - Deposits - 2017092900_20  
17093000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : 
Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job 
id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] 
threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual 
running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0   discarded, 0 
others
79928 2018-02-01 23:54:16,371 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : 
parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k  
ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml 
-cubename Deposits -output 
hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
  8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns 
-segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true 
-statisticsoutput hdfs://sfpdev/tenants/rft/r  
cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
 -statisticssamplingpercent 100 -jobname Kylin_Fact_D  
istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : 
Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to 
connect to metastore with URI thrift://bdtpisr3n1.svr.us.jpmchase.net:9083
79931 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a 
connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to 
metastore.
79933 2018-02-01 23:54:17,345 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162 : Kylin 
Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/  
kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79934 2018-02-01 23:54:17,347 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] persistence.ResourceStore:79 : Using 
metadata url /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2  
.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 
for resource store
79935 2018-02-01 23:54:17,354 DEBUG [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:547 : Dump 
resources to /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2.  
1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta 
took 9 ms
79936 2018-02-01 23:54:17,354 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:505 : HDFS 
meta dir is: 
file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k
  
ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79937 2018-02-01 23:54:17,470 INFO  [Job 
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hdfs.DFSClient:1086 : Created token 
for a_rcmo_nd: HDFS_DELEGATION_TOKEN 
owner=a_rcmo...@naeast.ad.JPMORGA 
 NCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468, 
maxDate=1518152057468, sequenceNumber=917925, masterKeyId=921 on ha

Re: optimal parameters

2018-02-02 Thread ShaoFeng Shi
Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is
too many, as by default the cube will have 2^N dimension combinations (N is
dimension number). I think you have optimized the aggregation group, as by
default Kylin only allows 2048 combinations in one Cube.

 If you see the build is very slow, a possible reason is the cluster's
capacity. Please try a smaller data set with a simpler Cube first, and then
increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H :

> Any updates on this?? How to process 450 milions of records in one
> partition – fact table has this much data for one COB.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Friday, February 02, 2018 11:45 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* optimal parameters
> *Importance:* High
>
>
>
> Hi Folks – Need your inputs for optimizing the kylin Cube build process –
> We have approx.. 450 millions of records in one Partition & 80-90
> Dimensions to be picked up from the tables. Can you pls. advise on this?
> What would be optimal way of running the jobs.We have Cloudera cluster of
> 16 nodes – with 8 cores machine for each nodes.
>
>
>
> This process is running since 60 minutes.
>
>
>
> 2018-02-01 23:54:16,257 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:116 : 
> CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd,
> name=BUILD CUBE - Deposits - 2017092900_201709  3000 -
> GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled
>
> 79923 2018-02-01 23:54:16,258 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD
> CUBE - Deposits - 2017092900_20  17093000 - GMT+08:00
> 2018-02-02 12:37:11)
>
> 79924 2018-02-01 23:54:16,263 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
> from READY to RUNNING
>
> 79925 2018-02-01 23:54:16,271 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract
> Fact Table Distinct Columns)
>
> 79926 2018-02-01 23:54:16,275 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> execution.ExecutableManager:425 : job 
> id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02
> from READY to RUNNING
>
> 79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1]
> threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual
> running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0
> discarded, 0 others
>
> 79928 2018-02-01 23:54:16,371 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.MapReduceExecutable:115 : parameters of the MapReduceExecutable:
> -conf /apps/rft/rcmo/apps/kylin/k  ylin_namespace/apache-kylin-2.
> 1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output
> hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_
> poc-kylin_metadata/kylin-2b  8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd/Deposits/fact_distinct_columns -segmentid
> da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true
> -statisticsoutput hdfs://sfpdev/tenants/rft/r
> cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
> -statisticssamplingpercent 100 -jobname Kylin_Fact_D
> istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-
> 449b24cb0fcd
>
> 79929 2018-02-01 23:54:16,424 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_
> Deposits_Step
>
> 79930 2018-02-01 23:54:16,775 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:386 : Trying to connect to metastore with URI thrift://
> bdtpisr3n1.svr.us.jpmchase.net:9083
>
> 79931 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:431 : Opened a connection to metastore, current connections:
> 3
>
> 79932 2018-02-01 23:54:16,784 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> hive.metastore:483 : Connected to metastore.
>
> 79933 2018-02-01 23:54:17,345 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.KylinConfigBase:162 : Kylin Config was updated with
> kylin.metadata.url : /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta
>
> 79934 2018-02-01 23:54:17,347 INFO  [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2  .1.0-KYLIN-2846-cdh57/bin/../
> tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store
>
> 79935 2018-02-01 23:54:17,354 DEBUG [Job 
> 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761]
> common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/
> kylin_namespace/apache-kylin-2.  1.0-KYLIN-2846-cdh57/bin/.