[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781854#comment-16781854 ] KANG-SEN LU commented on KYLIN-2620: If we have TOPN(SUM(X), GROUP-BY D1) metric configured in a kylin cube, the query in hand must meet the following conditions: # GROUP-BY list includes D1 dimension, # ORDER-BY SUM(X) # LIMIT n, where n <= TOPN's limit. Condition 2 and 3 are mentioned by the bug description. But about point 1, I think it is important. We don't want the kylin to use TOPN(SUM(X), GROUP-BY D1) in case the query did not have GROUP-BY D1. If kylin rewrite SUM(X) to TOPN(SUM(X)), then it would have to aggregate over all D1 values. That may lost accuracy, if kylin did not save all D1 value in its cuboid. > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v2.6.2 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work
[ https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779823#comment-16779823 ] KANG-SEN LU commented on KYLIN-3322: If the second point is properly handled, then first point is not important. Could it be that when kylin decided to access TOPN(SUM(X), GROUP-BY B), it does not check if in the real query the GROUP-BY list does include dimension B? That would be a serious problem. I am curious why you are willing to accept this unnecessary requirement that the SUM(X) must be defined in the same cube that TOPN(SUM(X)) is configured. I suspect there are some code somehow assumes SUM(X) is a metric when processing TOPN(SUM(X)). Maybe it is just too difficult to debug this buggy code. Kang-sen > TopN requires a SUM to work > --- > > Key: KYLIN-3322 > URL: https://issues.apache.org/jira/browse/KYLIN-3322 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: liyang >Assignee: Na Zhai >Priority: Major > > Currently if user creates a measure of TopN seller by sum of price, it is > required that user also creates a measure of SUM(price). Otherwise, NPE will > be thrown at query time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work
[ https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776848#comment-16776848 ] KANG-SEN LU commented on KYLIN-3322: Hi, Shaofeng: Thanks for your response. I have two points to add. # What if I already put SUM(X) in a separated cube, why do I have to add SUM(X) into second cube while I am defining TOPN(X) in the second cube. If it is just redundant metadata data, I will not complain about the extra human effort. I am worried if the kylin may not be able to find the right cube to compute SUM(X), because now there are two cubes both are, supposed, equally qualified to answer the query. It will create more challenge to the cost evaluation function to kylin. # My experiment seems to suggest that when SUM(X) not group by B was issued, the cost evaluation function sent the query to the cube containing both TOPN(SUM(X)) and SUM(X) and, more importantly, it goes after TOPN(SUM(X)), then perform SUM(X), that takes more than 20 seconds in my test case. If it goes after SUM(X) directly, it took less than 0.2 second. I think how kylin try to accomplish SUM(X) in a cube containing both TOPN(SUM(X)) and SUM(X) may not be correct. That is the main reason I am against this decision that in a cube containing TOPN(SUM(X)), one must also configure SUM(X) # > TopN requires a SUM to work > --- > > Key: KYLIN-3322 > URL: https://issues.apache.org/jira/browse/KYLIN-3322 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: liyang >Assignee: Na Zhai >Priority: Major > > Currently if user creates a measure of TopN seller by sum of price, it is > required that user also creates a measure of SUM(price). Otherwise, NPE will > be thrown at query time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743363#comment-16743363 ] KANG-SEN LU commented on KYLIN-2620: This bug would limit the selection of topn metric only when the query is better served by the topn cube. However, the cube cost evaluation algorithm in core-metadata/src/main/java/org/apache/kylin/measure/topn/TopNMeasureType.java, function influenceCapabilityCheck(). must be enhanced when there are more than one cube associated with the same data model. The current problem is that when "select sum(x) from fact_table " is issued, if there are two cube spec both can answer this query, the kylin would prefer to use topn cue, even if that means we would retrieve limited rows of data from "group by col_id" then aggregated later. That is not only inefficient, but also incorrect. > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug >Reporter: Lin Tingmao >Priority: Major > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743356#comment-16743356 ] KANG-SEN LU commented on KYLIN-2620: I am having doubt in this sentence: "Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the query to determine whether to rewrite." this sentence should be corrected into: Kylin should check if "ORDER BY measure LIMIT topncapacity" is present in the query to determine whether to rewrite. Am I right? > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug >Reporter: Lin Tingmao >Priority: Major > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
[ https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691765#comment-16691765 ] KANG-SEN LU commented on KYLIN-3636: Hi, Shaofeng: I just created a new cube design, the storage_type was set to 0. I am sending you the json file of the cube here: * [Grid|http://192.168.230.62:7070/kylin/] * [SQL|http://192.168.230.62:7070/kylin/] * [JSON(Cube)|http://192.168.230.62:7070/kylin/] * [Notification|http://192.168.230.62:7070/kylin/] * [Storage|http://192.168.230.62:7070/kylin/] * [Planner|http://192.168.230.62:7070/kylin/] { "uuid": "8b9c51f3-e9b6-45ae-998c-c66fbf87dcad", "last_modified": 1542636675073, "version": "2.5.1.20500", "name": "test", "is_draft": false, "model_name": "ma_aggs_model", "description": "", "null_string": null, "dimensions": [ \{ "name": "APPLICATION_NAME", "table": "A_MA_HOURLY_V", "column": "APPLICATION_NAME", "derived": null }, \{ "name": "BRAND_NAME", "table": "A_MA_HOURLY_V", "column": "BRAND_NAME", "derived": null } ], "measures": [ \{ "name": "_COUNT_", "function": { "expression": "COUNT", "parameter": { "type": "constant", "value": "1" }, "returntype": "bigint" } } ], "dictionaries": [], "rowkey": \{ "rowkey_columns": [ { "column": "A_MA_HOURLY_V.APPLICATION_NAME", "encoding": "dict", "encoding_version": 1, "isShardBy": false }, \{ "column": "A_MA_HOURLY_V.BRAND_NAME", "encoding": "dict", "encoding_version": 1, "isShardBy": false } ] }, "hbase_mapping": \{ "column_family": [ { "name": "F1", "columns": [ { "qualifier": "M", "measure_refs": [ "_COUNT_" ] } ] } ] }, "aggregation_groups": [ \{ "includes": [ "A_MA_HOURLY_V.APPLICATION_NAME", "A_MA_HOURLY_V.BRAND_NAME" ], "select_rule": { "hierarchy_dims": [], "mandatory_dims": [], "joint_dims": [] } } ], "signature": "iBgKI2sCq9L9zGNctnmryw==", "notify_list": [], "status_need_notify": [ "ERROR", "DISCARDED", "SUCCEED" ], "partition_date_start": 0, "partition_date_end": 31536, "auto_merge_time_ranges": [ 60480, 241920 ], "volatile_range": 0, "retention_range": 0, "engine_type": 2, "storage_type": 0, "override_kylin_properties": {}, "cuboid_black_list": [], "parent_forward": 3, "mandatory_dimension_set_list": [], "snapshot_table_desc_list": [] } Here is the java code in our kylin 2.5.1 sandbox: 4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java (honma 2015-09-09 10:01:55 +0800 180) @JsonProperty("engine_type") c56c741a92 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java (shaofengshi 2017-11-05 16:57:47 +0800 181) private int engineType = IEngineAware.ID_MR_V2; 4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java (honma 2015-09-09 10:01:55 +0800 182) @JsonProperty("storage_type") 4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java (honma 2015-09-09 10:01:55 +0800 183) private int storageType = IStorageAware.ID_HBASE; > in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception > > > Key: KYLIN-3636 > URL: https://issues.apache.org/jira/browse/KYLIN-3636 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.4.1 >Reporter: KANG-SEN LU >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > Hi, ShaoFeng: > > Thanks for the reply. I missed this email and not responded earlier, I am > sorry. > > I tried to reproduce this problem with the sample database, and it did not > happen. > > So I am hoping by collecting enough "clue", someone can figure out why this > problem occurred. > > -- > I issued the following query at the sample project to exercise the topn > aggregation: > > select seller_id, SUM(price) as total from kylin_sales group by seller_id > order by total limit 5; > > With my own added debugging, I saw the following log in the kylin.log: (the > query worked OK). > > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > model.FunctionDesc:59 : KSL2, > getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_ > 2018-10-16 16:18:19,963 INFO [kylin-coproc--pool12-t1] > v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: >
[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
[ https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688096#comment-16688096 ] KANG-SEN LU commented on KYLIN-3636: I checked my cube design, the engine type is set to 2, i.e. MR. In addition, in the sample.sh, the shell script forced both "engine-type" and "storage-type" to 2. I found that the kylin code assumed the storage-type is always 2, and it, therefore, determined that the rowkey buffer header is 10 bytes long. But the default storage-type was set to 0, when I created a new cube, for some reason. I have to edit the cube's storage-type from 0 to 2 to make cube build work. > in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception > > > Key: KYLIN-3636 > URL: https://issues.apache.org/jira/browse/KYLIN-3636 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.4.1 >Reporter: KANG-SEN LU >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > Hi, ShaoFeng: > > Thanks for the reply. I missed this email and not responded earlier, I am > sorry. > > I tried to reproduce this problem with the sample database, and it did not > happen. > > So I am hoping by collecting enough "clue", someone can figure out why this > problem occurred. > > -- > I issued the following query at the sample project to exercise the topn > aggregation: > > select seller_id, SUM(price) as total from kylin_sales group by seller_id > order by total limit 5; > > With my own added debugging, I saw the following log in the kylin.log: (the > query worked OK). > > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > model.FunctionDesc:59 : KSL2, > getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_ > 2018-10-16 16:18:19,963 INFO [kylin-coproc--pool12-t1] > v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: > send request to the init region server anovadata4.anovadata.local on table > ANOVA_KYLIN_25X_K758MEAWJG > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= > _KY_SUM_KYLIN_SALES_PRICE_ > > > When I was executing my project query, I issued the following select > statement: > > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", > SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) > \"vl_aggs_model___USERS_BY_ERROR_3XX\" FROM ZETTICSDW.A_VL_HOURLY_V inner > JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( > ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND > ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND > (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND > ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%' > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID > ORDER BY \"vl_aggs_model___USERS_BY_ERROR_3XX\" > LIMIT 25 > > An exception occurred within the method "private ColumnRowType > buildColumnRowType()" of > "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java". > > if (columns.size() != rowType.getFieldCount()) { > throw new IllegalStateException("RowType=" + > rowType.getFieldCount() + ", ColumnRowType=" + columns.size()); > } > > It printed "RowType=133, ColumnRowType=132". > The RowType list contains one extra column name: "ANY > _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_". > > > I think this has something to do with the bug fix "KYLIN-3359 Support > sum(expression) if possible". > > After this bug fix was submitted, I noticed that a lot of column name was > added into rowType like "_KY_SUM_XXX". > > This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very > similar. > > I also found that this extra column name, only existed in RowType but not in > ColumnRowType, was added in the method "public void > implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java". > > With my own debug statement, I saw this debug text in kylin.log: > > relnode.OLAPJoinRel:362 : KSL54: newField= #132: > _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY > > > I hope someone with a deeper technical knowledge in kylin query engine can > figure out what was causing the problem I have seen. > > Thanks again. > > Kang-sen > > *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] > *Sent:* Friday, October 05, 2018 9:59 PM > *To:* user
[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
[ https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688068#comment-16688068 ] KANG-SEN LU commented on KYLIN-3636: Hi, Shaofeng: I would like to add that there is NO bug in the kylin code, one may argue. Because the reason I faced those problem related to TOPN aggregation function is caused by "improper documentation about how to configure TOPN metric". So if we can improve the usage documentation, there will be more happy users. > in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception > > > Key: KYLIN-3636 > URL: https://issues.apache.org/jira/browse/KYLIN-3636 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.4.1 >Reporter: KANG-SEN LU >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > Hi, ShaoFeng: > > Thanks for the reply. I missed this email and not responded earlier, I am > sorry. > > I tried to reproduce this problem with the sample database, and it did not > happen. > > So I am hoping by collecting enough "clue", someone can figure out why this > problem occurred. > > -- > I issued the following query at the sample project to exercise the topn > aggregation: > > select seller_id, SUM(price) as total from kylin_sales group by seller_id > order by total limit 5; > > With my own added debugging, I saw the following log in the kylin.log: (the > query worked OK). > > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > model.FunctionDesc:59 : KSL2, > getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_ > 2018-10-16 16:18:19,963 INFO [kylin-coproc--pool12-t1] > v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: > send request to the init region server anovadata4.anovadata.local on table > ANOVA_KYLIN_25X_K758MEAWJG > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= > _KY_SUM_KYLIN_SALES_PRICE_ > > > When I was executing my project query, I issued the following select > statement: > > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", > SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) > \"vl_aggs_model___USERS_BY_ERROR_3XX\" FROM ZETTICSDW.A_VL_HOURLY_V inner > JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( > ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND > ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND > (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND > ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%' > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID > ORDER BY \"vl_aggs_model___USERS_BY_ERROR_3XX\" > LIMIT 25 > > An exception occurred within the method "private ColumnRowType > buildColumnRowType()" of > "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java". > > if (columns.size() != rowType.getFieldCount()) { > throw new IllegalStateException("RowType=" + > rowType.getFieldCount() + ", ColumnRowType=" + columns.size()); > } > > It printed "RowType=133, ColumnRowType=132". > The RowType list contains one extra column name: "ANY > _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_". > > > I think this has something to do with the bug fix "KYLIN-3359 Support > sum(expression) if possible". > > After this bug fix was submitted, I noticed that a lot of column name was > added into rowType like "_KY_SUM_XXX". > > This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very > similar. > > I also found that this extra column name, only existed in RowType but not in > ColumnRowType, was added in the method "public void > implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java". > > With my own debug statement, I saw this debug text in kylin.log: > > relnode.OLAPJoinRel:362 : KSL54: newField= #132: > _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY > > > I hope someone with a deeper technical knowledge in kylin query engine can > figure out what was causing the problem I have seen. > > Thanks again. > > Kang-sen > > *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] > *Sent:* Friday, October 05, 2018 9:59 PM > *To:* user <[u...@kylin.apache.org|mailto:u...@kylin.apache.org]> > *Subject:* Re: any body see topn in kylin 2.5.1 working? > > Hi Kang-Sen, > > Didn't see this; Can you reproduce the
[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
[ https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688039#comment-16688039 ] KANG-SEN LU commented on KYLIN-3636: Hi, Shaofeng: I believe it is MR, but I am not sure. How do I know if it was MR or Spark being used as build engine? Is it controlled by the kylin.properties? If yes, which parameter is used for this selection? Kang-sen > in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception > > > Key: KYLIN-3636 > URL: https://issues.apache.org/jira/browse/KYLIN-3636 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.4.1 >Reporter: KANG-SEN LU >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.0 > > > Hi, ShaoFeng: > > Thanks for the reply. I missed this email and not responded earlier, I am > sorry. > > I tried to reproduce this problem with the sample database, and it did not > happen. > > So I am hoping by collecting enough "clue", someone can figure out why this > problem occurred. > > -- > I issued the following query at the sample project to exercise the topn > aggregation: > > select seller_id, SUM(price) as total from kylin_sales group by seller_id > order by total limit 5; > > With my own added debugging, I saw the following log in the kylin.log: (the > query worked OK). > > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > model.FunctionDesc:59 : KSL2, > getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_ > 2018-10-16 16:18:19,963 INFO [kylin-coproc--pool12-t1] > v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: > send request to the init region server anovadata4.anovadata.local on table > ANOVA_KYLIN_25X_K758MEAWJG > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= > _KY_SUM_KYLIN_SALES_PRICE_ > > > When I was executing my project query, I issued the following select > statement: > > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", > SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) > \"vl_aggs_model___USERS_BY_ERROR_3XX\" FROM ZETTICSDW.A_VL_HOURLY_V inner > JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( > ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND > ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND > (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND > ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%' > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID > ORDER BY \"vl_aggs_model___USERS_BY_ERROR_3XX\" > LIMIT 25 > > An exception occurred within the method "private ColumnRowType > buildColumnRowType()" of > "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java". > > if (columns.size() != rowType.getFieldCount()) { > throw new IllegalStateException("RowType=" + > rowType.getFieldCount() + ", ColumnRowType=" + columns.size()); > } > > It printed "RowType=133, ColumnRowType=132". > The RowType list contains one extra column name: "ANY > _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_". > > > I think this has something to do with the bug fix "KYLIN-3359 Support > sum(expression) if possible". > > After this bug fix was submitted, I noticed that a lot of column name was > added into rowType like "_KY_SUM_XXX". > > This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very > similar. > > I also found that this extra column name, only existed in RowType but not in > ColumnRowType, was added in the method "public void > implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java". > > With my own debug statement, I saw this debug text in kylin.log: > > relnode.OLAPJoinRel:362 : KSL54: newField= #132: > _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY > > > I hope someone with a deeper technical knowledge in kylin query engine can > figure out what was causing the problem I have seen. > > Thanks again. > > Kang-sen > > *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] > *Sent:* Friday, October 05, 2018 9:59 PM > *To:* user <[u...@kylin.apache.org|mailto:u...@kylin.apache.org]> > *Subject:* Re: any body see topn in kylin 2.5.1 working? > > Hi Kang-Sen, > > Didn't see this; Can you reproduce the problem with the sample cube? > > Kang-Sen Lu <[k...@anovadata.com|mailto:k...@anovadata.com]>
[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
[ https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688005#comment-16688005 ] KANG-SEN LU commented on KYLIN-3636: After adding some debugging statements, I found there were two problems that can affect a cube design. The first problem is that if a cube's storage type is not 2, then the cube build can fail with arraycopy exception. The second problem is if a metric is TOPN(SUM())/GROUP-BY() is configured in a cube, then the SUM() must also be configured as a metric in the same cube. Otherwise, at query time, we may see "null" exception. As far as TOPN aggregation support is concerned, the kylin group created a good technical blog doc, but we also need a description about how to configure it in a cube definition. Some hidden restrictions can hamper the successful usage of TOPN support. > in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception > > > Key: KYLIN-3636 > URL: https://issues.apache.org/jira/browse/KYLIN-3636 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.4.1 >Reporter: KANG-SEN LU >Priority: Major > > Hi, ShaoFeng: > > Thanks for the reply. I missed this email and not responded earlier, I am > sorry. > > I tried to reproduce this problem with the sample database, and it did not > happen. > > So I am hoping by collecting enough "clue", someone can figure out why this > problem occurred. > > -- > I issued the following query at the sample project to exercise the topn > aggregation: > > select seller_id, SUM(price) as total from kylin_sales group by seller_id > order by total limit 5; > > With my own added debugging, I saw the following log in the kylin.log: (the > query worked OK). > > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > model.FunctionDesc:59 : KSL2, > getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_ > 2018-10-16 16:18:19,963 INFO [kylin-coproc--pool12-t1] > v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: > send request to the init region server anovadata4.anovadata.local on table > ANOVA_KYLIN_25X_K758MEAWJG > 2018-10-16 16:18:19,963 INFO [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] > topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= > _KY_SUM_KYLIN_SALES_PRICE_ > > > When I was executing my project query, I issued the following select > statement: > > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", > SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) > \"vl_aggs_model___USERS_BY_ERROR_3XX\" FROM ZETTICSDW.A_VL_HOURLY_V inner > JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( > ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND > ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = > ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND > ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND > (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND > ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%' > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID > ORDER BY \"vl_aggs_model___USERS_BY_ERROR_3XX\" > LIMIT 25 > > An exception occurred within the method "private ColumnRowType > buildColumnRowType()" of > "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java". > > if (columns.size() != rowType.getFieldCount()) { > throw new IllegalStateException("RowType=" + > rowType.getFieldCount() + ", ColumnRowType=" + columns.size()); > } > > It printed "RowType=133, ColumnRowType=132". > The RowType list contains one extra column name: "ANY > _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_". > > > I think this has something to do with the bug fix "KYLIN-3359 Support > sum(expression) if possible". > > After this bug fix was submitted, I noticed that a lot of column name was > added into rowType like "_KY_SUM_XXX". > > This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very > similar. > > I also found that this extra column name, only existed in RowType but not in > ColumnRowType, was added in the method "public void > implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java". > > With my own debug statement, I saw this debug text in kylin.log: > > relnode.OLAPJoinRel:362 : KSL54: newField= #132: > _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY > > > I hope someone with a deeper technical knowledge in kylin query engine can > figure out what was causing the problem I have seen. > > Thanks again. > >