Hi, Kang sen, Never mind.
And for the "user guide" kind of document", I believe that the document link given by Shaofeng is for the end user. And the blog I mentioned is an extended reading that helps you better understanding Kylin's TOP_N. An end user should read documents first finding how to use it and then go extended reading. Kang-Sen Lu <[email protected]> 于2019年1月19日周六 下午12:59写道: > Hi, Jia Tao: > > > I am sorry I typed your name incorrectly by accident. I was composing the > email on amazon kindle's silk browser. It has a "feature" to automatically > correct any typo. So, when I typed "Jia Tao", the software thinks I had a > typo, and automatically changed it to "His Tao". I should have > proof-reading before I hit send button. As most of the engineers, I did not > have patience to proof-reading. > > > Kang-sen > > > ------------------------------ > *From:* Kang-Sen Lu <[email protected]> > *Sent:* Saturday, January 19, 2019 2:14:10 AM > *To:* [email protected] > *Subject:* Re: question about how to configure TOPN aggregation function > > Hi, His Tao: > > Thanks for your reply. > > I had read the article you mentioned sometime ago. It is a very > informative article about the technical aspects of top n measure feature. > It describes the purpose and principle of top n aggregation. > However, it does not talk about how the end user will provide top n > configuration. So user must guess how to use nylon-fui to configure cube > when he/she wants to take advantage of this good feature. > Unless you are thinking from user's petspective, you won't find it is > useful to create a "user guide" kind of document. > Kang-sen > ------------------------------ > *From:* JiaTao Tao <[email protected]> > *Sent:* Friday, January 18, 2019 10:04:51 PM > *To:* [email protected] > *Subject:* Re: question about how to configure TOPN aggregation function > > Hi Kang-sen > There's a blog written by Shaofeng and it's good for understanding TOPN_N > in Kylin, may it helps: > http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/ > Apache Kylin | Approximate Top-N support in Kylin > <http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> > kylin.apache.org > Approximate Top-N support in Kylin. Mar 19, 2016 • Shaofeng Shi. > Background. Find the Top-N (or Top-K) entities from a dataset is a common > scenario and requirement in data minding; We often see the reports or news > like “Top 100 companies in the world”, “Most popular 20 electronics” sold > on a big e-commerce platform, etc. Exploring and analysising the top > entities can always find some ... > > And for the typical scenario, we shouldn't define the TOP_N's group by col > in Kyiln's dimensions, or it will no benefits for TOP N query. > > Kang-Sen Lu <[email protected]> 于2019年1月18日周五 下午2:48写道: > > Through several experiments, I have learned that (1) when a TOPN(SUM(x), > group-by y) is configured, currently I must configure SUM(x) as well. But > there is a bug addressing this problem (KYLIN-3322). In the aggregation > froup configuration, the dim y does NOT have to be configured in any agg. > > (2) To prevent kylin from routing a non-topn query to the topn cube, bug > KYLIN-2620 is touching the bases with this issue. I am hoping the bug fix > for KYLIN-2620 will be able to accurately dis-qualify a non-topn query from > being routed to a topn cube, by recognizing that the query does not include > ORDER-BY and LIMIT info. > > > Currently, I have hacked the topn cost multiplier function to get by this > issue. Because I am not fully aware of how KYLIN works, the hack may cause > problem in an unexpected way. So when a true solution is available, I will > switch over. > > > Thanks. > > > Kang-sen > ------------------------------ > *From:* ShaoFeng Shi <[email protected]> > *Sent:* Friday, January 18, 2019 9:38:42 AM > *To:* user > *Subject:* Re: question about how to configure TOPN aggregation function > > Hi Kang sen, > > It is in the cube tutorial, please search "TOP_N" in it: > > https://kylin.apache.org/docs/tutorial/create_cube.html > > We will investigate this bug in next week; If you have any findings, > please share with us; also, a patch or PR is welcomed! > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Work email: [email protected] > Kyligence Inc: https://kyligence.io/ > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: [email protected] > Join Kylin dev mail group: [email protected] > > > > > Kang-Sen Lu <[email protected]> 于2019年1月18日周五 下午10:31写道: > > Hi, ShaoFeng: > > > Thanks for the info. So what I found is a bug in kylin. > > > I am curious if there are any tutorial about hwo to use KYLIN GUI to > configure the TOPN measure, i.e what is the minimum info being configured > to make it work? > > > I can see in sample project json files how kylin expects the cube > configuration. But how does a user using KYLIN GUI to accomplish the same > effect is not clear. > > > Kang-sen > ------------------------------ > *From:* ShaoFeng Shi <[email protected]> > *Sent:* Friday, January 18, 2019 9:16:47 AM > *To:* user > *Subject:* Re: question about how to configure TOPN aggregation function > > In theory, it doesn't need a separate SUM() measure; > > Your issue seems to be the same as: > > https://issues.apache.org/jira/browse/KYLIN-3322 > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Work email: [email protected] > Kyligence Inc: https://kyligence.io/ > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: [email protected] > Join Kylin dev mail group: [email protected] > > > > > Kang-Sen Lu <[email protected]> 于2019年1月18日周五 下午10:13写道: > > I am running kylin 2.5.1. I have a question about topn aggregation > function usage. Because I did not find document about how to configure TOPN > aggregation function, so I am not sure if the problem I am facing is > expected or a bug. > > > Here is my test case: > > > one data model, and one cube configured. > in the cube, only TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID) was configured. > No SUM(HITS) was configured in the cube. > Built one hour of cube data. > Issued the following query: > select SUBSCRIBER_ID, sum(hits) > from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY > SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ; > > The query had "null" Exception. > > 2019-01-18 08:58:28,740 INFO [Query > d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying > rule: class > org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, > realizations before: > [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations > after: [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]] > 2019-01-18 08:58:28,741 INFO [Query > d666c666-af7e-8c39-ef57-e80d49590e87-514] rules.RealizationSortRule:40 : > CUBE[name=ma_aggs_cube_5] priority 1 cost 279. > CUBE[name=ma_aggs_topn_cube_test] priority 1 cost 27. > 2019-01-18 08:58:28,741 INFO [Query > d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:51 : Applying > rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, > realizations before: > [CUBE[name=ma_aggs_cube_5],CUBE[name=ma_aggs_topn_cube_test]], realizations > after: [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]] > 2019-01-18 08:58:28,741 INFO [Query > d666c666-af7e-8c39-ef57-e80d49590e87-514] routing.QueryRouter:75 : The > realizations remaining: > [CUBE[name=ma_aggs_topn_cube_test],CUBE[name=ma_aggs_cube_5]],and the final > chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube_test] > 2019-01-18 08:58:28,767 ERROR [Query > d666c666-af7e-8c39-ef57-e80d49590e87-514] service.QueryService:480 : > Exception while executing query > java.sql.SQLException: Error while executing SQL "select SUBSCRIBER_ID, > sum(hits) > from a_ma_hourly_v where THEDATE='20180501' and THEHOUR='07' GROUP BY > SUBSCRIBER_ID ORDER BY sum(hits) DESC LIMIT 100 ": null > at > org.apache.calcite.avatica.Helper.createException(Helper.java:56) > at > org.apache.calcite.avatica.Helper.createException(Helper.java:41) > > My question is that when "TOPN(SUM(HITS),GROUP-BY SUBSCRIBER_ID)" is > configured in a cube, is it necessary to also configure the "SUM(HITS)" in > the cube? > > > Kang-sen > > > > > > -- > > > Regards! > > Aron Tao > -- Regards! Aron Tao
