I want to add another data point: if I disable the topn cube design, then the 
same query would go through smoothly and correctly. Her eis the kylin log:

2019-01-07 08:18:16,256 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] service.QueryService:387 : The 
original query:   SELECT  (COUNT(DISTINCT 
ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID)), (SUM(ZETTICSDW.A_MA_HOURLY_V.HITS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.PAGE_VIEWS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSIONS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSION_TIME)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DOWN_BYTES)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DATA_CONSUMED)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.UP_BYTES))  FROM  ZETTICSDW.A_MA_HOURLY_V   WHERE  
((ZETTICSDW.A_MA_HOURLY_V.THEDATE >= '20180501') AND 
(ZETTICSDW.A_MA_HOURLY_V.THEDATE <= '20180501'))

2019-01-07 08:18:16,263 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] service.QueryService:563 : The 
corrected query: SELECT  (COUNT(DISTINCT 
ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID)), (SUM(ZETTICSDW.A_MA_HOURLY_V.HITS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.PAGE_VIEWS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSIONS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSION_TIME)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DOWN_BYTES)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DATA_CONSUMED)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.UP_BYTES))  FROM  ZETTICSDW.A_MA_HOURLY_V   WHERE  
((ZETTICSDW.A_MA_HOURLY_V.THEDATE >= '20180501') AND 
(ZETTICSDW.A_MA_HOURLY_V.THEDATE <= '20180501'))
LIMIT 50000
2019-01-07 08:18:16,284 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] routing.QueryRouter:58 : Find 
candidates by table ZETTICSDW.A_MA_HOURLY_V and project=Anovadata : 
CUBE[name=ma_aggs_cube_6]
2019-01-07 08:18:16,285 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, 
realizations before: [CUBE[name=ma_aggs_cube_6]], realizations after: 
[CUBE[name=ma_aggs_cube_6]]
2019-01-07 08:18:16,285 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, 
realizations before: [CUBE[name=ma_aggs_cube_6]], realizations after: 
[CUBE[name=ma_aggs_cube_6]]
2019-01-07 08:18:16,286 INFO  [Query 
a78321d0-5913-758b-6eb1-db5a39f92735-134702] rules.RealizationSortRule:40 : 
CUBE[name=ma_aggs_cube_6] priority 1 cost 279.

Kang-sen

From: Kang-Sen Lu <[email protected]>
Sent: Monday, January 07, 2019 8:06 AM
To: [email protected]
Subject: question about how kylin chooses a specific cube design over others

I am using kylin 2.5.1. I have a data model and two cube designs on top of that 
data model.

One data model is used to perform aggregation over a set of aggregation groups. 
So the metric are all using "COUNT DISTINCT", and "SUM" functions. To speed up 
TOPN application, I have created another smaller cube design which addresses 
the TOPN application specifically, i.e. the metric contains only "COUNT 
DISTINCT", and "TOPN", but no "SUM" aggregation.

Here is a normal query, and somehow I am surprised that the TOPN cube was 
sleected. That caused over 10,000,000 row of data being returned and failed.

>From the kylin.log, you can see the "SELECT" statement and the cost evaluation 
>data. I am not sure what has caused the wrongly selection of the cube. I hope 
>someone can provide me with some hint or references.


2019-01-07 07:55:46,137 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] service.QueryService:387 : The 
original query:  SELECT  (COUNT(DISTINCT 
ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID)), (SUM(ZETTICSDW.A_MA_HOURLY_V.HITS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.PAGE_VIEWS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSIONS)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.SESSION_TIME)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DOWN_BYTES)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.DATA_CONSUMED)), 
(SUM(ZETTICSDW.A_MA_HOURLY_V.UP_BYTES))  FROM  ZETTICSDW.A_MA_HOURLY_V   WHERE  
((ZETTICSDW.A_MA_HOURLY_V.THEDATE >= '20180501') AND 
(ZETTICSDW.A_MA_HOURLY_V.THEDATE <= '20180501'))

2019-01-07 07:55:46,184 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:58 : Find 
candidates by table ZETTICSDW.A_MA_HOURLY_V and project=Anovadata : 
CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]
2019-01-07 07:55:46,185 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, 
realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], 
realizations after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]]
2019-01-07 07:55:46,185 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, 
realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], 
realizations after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]]
2019-01-07 07:55:46,185 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] rules.RealizationSortRule:40 : 
CUBE[name=ma_aggs_cube_6] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube] 
priority 1 cost 105.
2019-01-07 07:55:46,186 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying 
rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, 
realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], 
realizations after: [CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]]
2019-01-07 07:55:46,186 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:95 : Adjust 
DimensionAsMeasure for FunctionDesc [expression=COUNT_DISTINCT, 
parameter=ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID, returnType=null]
2019-01-07 07:55:46,186 INFO  [Query 
06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:75 : The 
realizations remaining: 
[CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]],and the final chosen 
one for current olap context 0 is CUBE[name=ma_aggs_topn_cube]

Thanks.

Kang-sen

Reply via email to