I am using kylin 2.5.1. I have a data model and two cube designs on top of that data model.
One data model is used to perform aggregation over a set of aggregation groups. So the metric are all using "COUNT DISTINCT", and "SUM" functions. To speed up TOPN application, I have created another smaller cube design which addresses the TOPN application specifically, i.e. the metric contains only "COUNT DISTINCT", and "TOPN", but no "SUM" aggregation. Here is a normal query, and somehow I am surprised that the TOPN cube was sleected. That caused over 10,000,000 row of data being returned and failed. >From the kylin.log, you can see the "SELECT" statement and the cost evaluation >data. I am not sure what has caused the wrongly selection of the cube. I hope >someone can provide me with some hint or references. 2019-01-07 07:55:46,137 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] service.QueryService:387 : The original query: SELECT (COUNT(DISTINCT ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID)), (SUM(ZETTICSDW.A_MA_HOURLY_V.HITS)), (SUM(ZETTICSDW.A_MA_HOURLY_V.PAGE_VIEWS)), (SUM(ZETTICSDW.A_MA_HOURLY_V.SESSIONS)), (SUM(ZETTICSDW.A_MA_HOURLY_V.SESSION_TIME)), (SUM(ZETTICSDW.A_MA_HOURLY_V.DOWN_BYTES)), (SUM(ZETTICSDW.A_MA_HOURLY_V.DATA_CONSUMED)), (SUM(ZETTICSDW.A_MA_HOURLY_V.UP_BYTES)) FROM ZETTICSDW.A_MA_HOURLY_V WHERE ((ZETTICSDW.A_MA_HOURLY_V.THEDATE >= '20180501') AND (ZETTICSDW.A_MA_HOURLY_V.THEDATE <= '20180501')) 2019-01-07 07:55:46,184 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:58 : Find candidates by table ZETTICSDW.A_MA_HOURLY_V and project=Anovadata : CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube] 2019-01-07 07:55:46,185 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]] 2019-01-07 07:55:46,185 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]] 2019-01-07 07:55:46,185 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] rules.RealizationSortRule:40 : CUBE[name=ma_aggs_cube_6] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube] priority 1 cost 105. 2019-01-07 07:55:46,186 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations after: [CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]] 2019-01-07 07:55:46,186 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:95 : Adjust DimensionAsMeasure for FunctionDesc [expression=COUNT_DISTINCT, parameter=ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID, returnType=null] 2019-01-07 07:55:46,186 INFO [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]],and the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube] Thanks. Kang-sen
