[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871973#comment-16871973 ] ASF subversion and git services commented on KYLIN-2620: Commit 2bfda2d6c2a19586d0854e487bd3ceb805922940 in kylin's branch refs/heads/2.6.x from chao long [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=2bfda2d ] KYLIN-2620 TopN can't match when multi sort columns > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v3.0.0, v2.6.3 > > Attachments: cube_desc.png, sql.png > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851728#comment-16851728 ] ASF GitHub Bot commented on KYLIN-2620: --- nichunen commented on pull request #647: KYLIN-2620 TopN can't match when multi sort columns URL: https://github.com/apache/kylin/pull/647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v3.0.0, v2.6.3 > > Attachments: cube_desc.png, sql.png > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851727#comment-16851727 ] ASF subversion and git services commented on KYLIN-2620: Commit 670c2a98a82a699481eceddb685168104d5b185c in kylin's branch refs/heads/master from chao long [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=670c2a9 ] KYLIN-2620 TopN can't match when multi sort columns > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v3.0.0, v2.6.3 > > Attachments: cube_desc.png, sql.png > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843389#comment-16843389 ] ASF GitHub Bot commented on KYLIN-2620: --- Wayne1c commented on pull request #647: KYLIN-2620 TopN can't match when multi sort columns URL: https://github.com/apache/kylin/pull/647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: Future, v3.0.0 > > Attachments: cube_desc.png, sql.png > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838674#comment-16838674 ] Na Zhai commented on KYLIN-2620: Use this SQL test: {code:sql}SELECT SELLER_ID, SUM(PRICE) FROM KYLIN_SALES group by SELLER_ID order by SUM(PRICE), seller_id DESC limit 10;{code} It did not hit the top_n measure. !sql.png! !cube_desc.png! > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: Future, v3.0.0 > > Attachments: cube_desc.png, sql.png > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802425#comment-16802425 ] ASF subversion and git services commented on KYLIN-2620: Commit d5c37dcaeaca6d08f2098a645d78b2c800482db4 in kylin's branch refs/heads/2.6.x from chao long [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=d5c37dc ] KYLIN-2620 Make the condition stricter to answer query with topN > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v2.6.2 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783279#comment-16783279 ] ASF subversion and git services commented on KYLIN-2620: Commit f968e31140e6b5e68ef3c8124192987752c32a03 in kylin's branch refs/heads/master from chao long [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f968e31 ] KYLIN-2620 Make the condition stricter to answer query with topN > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v2.6.2 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783278#comment-16783278 ] ASF GitHub Bot commented on KYLIN-2620: --- shaofengshi commented on pull request #489: KYLIN-2620 Make the condition stricter to answer query with topN URL: https://github.com/apache/kylin/pull/489 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v2.6.2 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781854#comment-16781854 ] KANG-SEN LU commented on KYLIN-2620: If we have TOPN(SUM(X), GROUP-BY D1) metric configured in a kylin cube, the query in hand must meet the following conditions: # GROUP-BY list includes D1 dimension, # ORDER-BY SUM(X) # LIMIT n, where n <= TOPN's limit. Condition 2 and 3 are mentioned by the bug description. But about point 1, I think it is important. We don't want the kylin to use TOPN(SUM(X), GROUP-BY D1) in case the query did not have GROUP-BY D1. If kylin rewrite SUM(X) to TOPN(SUM(X)), then it would have to aggregate over all D1 values. That may lost accuracy, if kylin did not save all D1 value in its cuboid. > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Chao Long >Priority: Major > Fix For: v2.6.2 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778974#comment-16778974 ] ASF GitHub Bot commented on KYLIN-2620: --- Wayne1c commented on pull request #489: KYLIN-2620 Make the condition stricter to answer query with topN URL: https://github.com/apache/kylin/pull/489 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.6.1 > > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743572#comment-16743572 ] Shaofeng SHI commented on KYLIN-2620: - I will check it. > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug > Components: Measure - TopN >Reporter: Lin Tingmao >Assignee: Shaofeng SHI >Priority: Major > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743363#comment-16743363 ] KANG-SEN LU commented on KYLIN-2620: This bug would limit the selection of topn metric only when the query is better served by the topn cube. However, the cube cost evaluation algorithm in core-metadata/src/main/java/org/apache/kylin/measure/topn/TopNMeasureType.java, function influenceCapabilityCheck(). must be enhanced when there are more than one cube associated with the same data model. The current problem is that when "select sum(x) from fact_table " is issued, if there are two cube spec both can answer this query, the kylin would prefer to use topn cue, even if that means we would retrieve limited rows of data from "group by col_id" then aggregated later. That is not only inefficient, but also incorrect. > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug >Reporter: Lin Tingmao >Priority: Major > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
[ https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743356#comment-16743356 ] KANG-SEN LU commented on KYLIN-2620: I am having doubt in this sentence: "Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the query to determine whether to rewrite." this sentence should be corrected into: Kylin should check if "ORDER BY measure LIMIT topncapacity" is present in the query to determine whether to rewrite. Am I right? > Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN > > > Key: KYLIN-2620 > URL: https://issues.apache.org/jira/browse/KYLIN-2620 > Project: Kylin > Issue Type: Bug >Reporter: Lin Tingmao >Priority: Major > > When running the following query > select sum(measure) from table group by col_id > if there exists TOPN(measure, group by col_id) measure, > TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten > to TOPN. This confuses the user since they may expect a accurate result for > every distinct value of group by column(s). > Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the > query to determine whether to rewrite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)