[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN

2019-03-01 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781854#comment-16781854
 ] 

KANG-SEN LU commented on KYLIN-2620:


If we have TOPN(SUM(X), GROUP-BY D1) metric configured in a kylin cube, the 
query in hand must meet the following conditions:
 # GROUP-BY list includes D1 dimension,
 # ORDER-BY SUM(X)
 # LIMIT n,   where n <= TOPN's limit.

Condition 2 and 3 are mentioned by the bug description. But about point 1, I 
think it is important. We don't want the kylin to use TOPN(SUM(X), GROUP-BY D1) 
in case the query did not have GROUP-BY D1. If kylin rewrite SUM(X) to 
TOPN(SUM(X)), then it would have to aggregate over all D1 values. That may lost 
accuracy, if kylin did not save all D1 value in its cuboid.

> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> 
>
> Key: KYLIN-2620
> URL: https://issues.apache.org/jira/browse/KYLIN-2620
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Reporter: Lin Tingmao
>Assignee: Chao Long
>Priority: Major
> Fix For: v2.6.2
>
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id)  measure, 
> TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten 
> to TOPN. This confuses the user since they may expect a accurate result for 
> every distinct value of group by column(s). 
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
> query to determine whether to rewrite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work

2019-02-27 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779823#comment-16779823
 ] 

KANG-SEN LU commented on KYLIN-3322:


If the second point is properly handled, then first point is not important. 
Could it be that when kylin decided to access TOPN(SUM(X), GROUP-BY B), it does 
not check if in the real query the GROUP-BY list does include dimension B? That 
would be a serious problem.

I am curious why you are willing to accept this unnecessary requirement that 
the SUM(X) must be defined in the same cube that TOPN(SUM(X)) is configured. I 
suspect there are some code somehow assumes SUM(X) is a metric when processing 
TOPN(SUM(X)). Maybe it is just too difficult to debug this buggy code.

Kang-sen

> TopN requires a SUM to work
> ---
>
> Key: KYLIN-3322
> URL: https://issues.apache.org/jira/browse/KYLIN-3322
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Reporter: liyang
>Assignee: Na Zhai
>Priority: Major
>
> Currently if user creates a measure of TopN seller by sum of price, it is 
> required that user also creates a measure of SUM(price). Otherwise, NPE will 
> be thrown at query time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work

2019-02-25 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776848#comment-16776848
 ] 

KANG-SEN LU commented on KYLIN-3322:


Hi, Shaofeng:

 

Thanks for your response. I have two points to add.
 # What if I already put SUM(X) in a separated cube, why do I have to add 
SUM(X) into second cube while I am defining TOPN(X) in the second cube. If it 
is just redundant metadata data, I will not complain about the extra human 
effort. I am worried if the kylin may not be able to find the right cube to 
compute SUM(X), because now there are two cubes both are, supposed, equally 
qualified to answer the query. It will create more challenge to the cost 
evaluation function to kylin.
 #  My experiment seems to suggest that when SUM(X) not group by B was issued, 
the cost evaluation function sent the query to the cube containing both 
TOPN(SUM(X)) and SUM(X) and, more importantly, it goes after TOPN(SUM(X)), then 
perform SUM(X), that takes more than 20 seconds in my test case. If it goes 
after SUM(X) directly, it took less than 0.2 second. I think how kylin try to 
accomplish SUM(X) in a cube containing both TOPN(SUM(X)) and SUM(X) may not be 
correct. That is the main reason I am against this decision that in a cube 
containing TOPN(SUM(X)), one must also configure SUM(X)
 # 

 

> TopN requires a SUM to work
> ---
>
> Key: KYLIN-3322
> URL: https://issues.apache.org/jira/browse/KYLIN-3322
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Reporter: liyang
>Assignee: Na Zhai
>Priority: Major
>
> Currently if user creates a measure of TopN seller by sum of price, it is 
> required that user also creates a measure of SUM(price). Otherwise, NPE will 
> be thrown at query time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN

2019-01-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743363#comment-16743363
 ] 

KANG-SEN LU commented on KYLIN-2620:


This bug would limit the selection of topn metric only when the query is better 
served by the topn cube.

However, the cube cost evaluation algorithm in

core-metadata/src/main/java/org/apache/kylin/measure/topn/TopNMeasureType.java, 
function influenceCapabilityCheck().

must be enhanced when there are more than one cube associated with the same 
data model.

The current problem is that when "select sum(x) from fact_table " is issued, if 
there are two cube spec both can answer this query, the kylin would prefer to 
use topn cue, even if that means we would retrieve limited rows of data from 
"group by col_id" then aggregated later. That is not only inefficient, but also 
incorrect.

> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> 
>
> Key: KYLIN-2620
> URL: https://issues.apache.org/jira/browse/KYLIN-2620
> Project: Kylin
>  Issue Type: Bug
>Reporter: Lin Tingmao
>Priority: Major
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id)  measure, 
> TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten 
> to TOPN. This confuses the user since they may expect a accurate result for 
> every distinct value of group by column(s). 
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
> query to determine whether to rewrite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2620) Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN

2019-01-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743356#comment-16743356
 ] 

KANG-SEN LU commented on KYLIN-2620:


I am having doubt in this sentence:

"Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
query to determine whether to rewrite."

this sentence should be corrected into:

Kylin should check if "ORDER BY measure LIMIT topncapacity" is present in the 
query to determine whether to rewrite.

Am I right?

> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> 
>
> Key: KYLIN-2620
> URL: https://issues.apache.org/jira/browse/KYLIN-2620
> Project: Kylin
>  Issue Type: Bug
>Reporter: Lin Tingmao
>Priority: Major
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id)  measure, 
> TopNMeasureType.isTopNCompatibleSum()will pass, so the SUM is rewritten 
> to TOPN. This confuses the user since they may expect a accurate result for 
> every distinct value of group by column(s). 
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
> query to determine whether to rewrite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception

2018-11-19 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691765#comment-16691765
 ] 

KANG-SEN LU commented on KYLIN-3636:


Hi, Shaofeng:

 

I just created a new cube design, the storage_type was set to 0. I am sending 
you the json file of the cube here:
 * [Grid|http://192.168.230.62:7070/kylin/]
 * [SQL|http://192.168.230.62:7070/kylin/]
 * [JSON(Cube)|http://192.168.230.62:7070/kylin/]
 * [Notification|http://192.168.230.62:7070/kylin/]
 * [Storage|http://192.168.230.62:7070/kylin/]
 * [Planner|http://192.168.230.62:7070/kylin/]

{
  "uuid": "8b9c51f3-e9b6-45ae-998c-c66fbf87dcad",
  "last_modified": 1542636675073,
  "version": "2.5.1.20500",
  "name": "test",
  "is_draft": false,
  "model_name": "ma_aggs_model",
  "description": "",
  "null_string": null,
  "dimensions": [
\{
  "name": "APPLICATION_NAME",
  "table": "A_MA_HOURLY_V",
  "column": "APPLICATION_NAME",
  "derived": null
},
\{
  "name": "BRAND_NAME",
  "table": "A_MA_HOURLY_V",
  "column": "BRAND_NAME",
  "derived": null
}
  ],
  "measures": [
\{
  "name": "_COUNT_",
  "function": {
"expression": "COUNT",
"parameter": {
  "type": "constant",
  "value": "1"
},
"returntype": "bigint"
  }
}
  ],
  "dictionaries": [],
  "rowkey": \{
"rowkey_columns": [
  {
"column": "A_MA_HOURLY_V.APPLICATION_NAME",
"encoding": "dict",
"encoding_version": 1,
"isShardBy": false
  },
  \{
"column": "A_MA_HOURLY_V.BRAND_NAME",
"encoding": "dict",
"encoding_version": 1,
"isShardBy": false
  }
]
  },
  "hbase_mapping": \{
"column_family": [
  {
"name": "F1",
"columns": [
  {
"qualifier": "M",
"measure_refs": [
  "_COUNT_"
]
  }
]
  }
]
  },
  "aggregation_groups": [
\{
  "includes": [
"A_MA_HOURLY_V.APPLICATION_NAME",
"A_MA_HOURLY_V.BRAND_NAME"
  ],
  "select_rule": {
"hierarchy_dims": [],
"mandatory_dims": [],
"joint_dims": []
  }
}
  ],
  "signature": "iBgKI2sCq9L9zGNctnmryw==",
  "notify_list": [],
  "status_need_notify": [
"ERROR",
"DISCARDED",
"SUCCEED"
  ],
  "partition_date_start": 0,
  "partition_date_end": 31536,
  "auto_merge_time_ranges": [
60480,
241920
  ],
  "volatile_range": 0,
  "retention_range": 0,
  "engine_type": 2,
  "storage_type": 0,
  "override_kylin_properties": {},
  "cuboid_black_list": [],
  "parent_forward": 3,
  "mandatory_dimension_set_list": [],
  "snapshot_table_desc_list": []
}

Here is the java code in our kylin 2.5.1 sandbox:

4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java
  (honma    2015-09-09 10:01:55 +0800  180) @JsonProperty("engine_type")
c56c741a92 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java
  (shaofengshi  2017-11-05 16:57:47 +0800  181) private int engineType = 
IEngineAware.ID_MR_V2;
4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java
  (honma    2015-09-09 10:01:55 +0800  182) 
@JsonProperty("storage_type")
4af0f33248 core-cube/src/main/java/org/apache/kylin/cube/model/CubeDesc.java
  (honma    2015-09-09 10:01:55 +0800  183) private int storageType = 
IStorageAware.ID_HBASE;

> in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
> 
>
> Key: KYLIN-3636
> URL: https://issues.apache.org/jira/browse/KYLIN-3636
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.4.1
>Reporter: KANG-SEN LU
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Hi, ShaoFeng:
>  
> Thanks for the reply. I missed this email and not responded earlier, I am 
> sorry.
>  
> I tried to reproduce this problem with the sample database, and it did not 
> happen.
>  
> So I am hoping by collecting enough "clue", someone can figure out why this 
> problem occurred.
>  
> --
> I issued the following query at the sample project to exercise the topn 
> aggregation:
>  
> select seller_id, SUM(price) as total from kylin_sales group by seller_id 
> order by total limit 5;
>  
> With my own added debugging, I saw the following log in the kylin.log: (the 
> query worked OK).
>  
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> model.FunctionDesc:59 : KSL2, 
> getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_
> 2018-10-16 16:18:19,963 INFO  [kylin-coproc--pool12-t1] 
> v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: 
> 

[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception

2018-11-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688096#comment-16688096
 ] 

KANG-SEN LU commented on KYLIN-3636:


I checked my cube design, the engine type is set to 2, i.e. MR.

In addition, in the sample.sh, the shell script forced both "engine-type" and 
"storage-type" to 2.

I found that the kylin code assumed the storage-type is always 2, and it, 
therefore, determined that the rowkey buffer header is 10 bytes long. But the 
default storage-type was set to 0, when I created a new cube, for some reason. 
I have to edit the cube's storage-type from 0 to 2 to make cube build work.

> in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
> 
>
> Key: KYLIN-3636
> URL: https://issues.apache.org/jira/browse/KYLIN-3636
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.4.1
>Reporter: KANG-SEN LU
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Hi, ShaoFeng:
>  
> Thanks for the reply. I missed this email and not responded earlier, I am 
> sorry.
>  
> I tried to reproduce this problem with the sample database, and it did not 
> happen.
>  
> So I am hoping by collecting enough "clue", someone can figure out why this 
> problem occurred.
>  
> --
> I issued the following query at the sample project to exercise the topn 
> aggregation:
>  
> select seller_id, SUM(price) as total from kylin_sales group by seller_id 
> order by total limit 5;
>  
> With my own added debugging, I saw the following log in the kylin.log: (the 
> query worked OK).
>  
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> model.FunctionDesc:59 : KSL2, 
> getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_
> 2018-10-16 16:18:19,963 INFO  [kylin-coproc--pool12-t1] 
> v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: 
> send request to the init region server anovadata4.anovadata.local on table 
> ANOVA_KYLIN_25X_K758MEAWJG
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= 
> _KY_SUM_KYLIN_SALES_PRICE_
>  
>  
> When I was executing my project query, I issued the following select 
> statement:
>  
> SELECT  ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", 
> SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) 
> \"vl_aggs_model___USERS_BY_ERROR_3XX\"  FROM  ZETTICSDW.A_VL_HOURLY_V inner 
> JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( 
> ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) 
> WHERE  ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND 
> ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND 
> (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%'  
> GROUP BY  ZETTICSDW.A_VL_HOURLY_V.IMSIID  
> ORDER BY  \"vl_aggs_model___USERS_BY_ERROR_3XX\"
> LIMIT 25
>  
> An exception occurred within the method "private ColumnRowType 
> buildColumnRowType()" of 
> "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java".
>  
> if (columns.size() != rowType.getFieldCount()) {
>     throw new IllegalStateException("RowType=" + 
> rowType.getFieldCount() + ", ColumnRowType=" + columns.size());
>     }
>  
> It printed "RowType=133, ColumnRowType=132".
> The RowType list contains one extra column name: "ANY 
> _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_".
>  
>  
> I think this has something to do with the bug fix "KYLIN-3359 Support 
> sum(expression) if possible".
>  
> After this bug fix was submitted, I noticed that a lot of column name was 
> added into rowType like "_KY_SUM_XXX".
>  
> This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very 
> similar.
>  
> I also found that this extra column name, only existed in RowType but not in 
> ColumnRowType, was added in the method "public void 
> implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java".
>  
> With my own debug statement, I saw this debug text in kylin.log:
>  
> relnode.OLAPJoinRel:362 : KSL54: newField= #132: 
> _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY
>  
>  
> I hope someone with a deeper technical knowledge in kylin query engine can 
> figure out what was causing the problem I have seen.
>  
> Thanks again.
>  
> Kang-sen
>  
> *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] 
>  *Sent:* Friday, October 05, 2018 9:59 PM
>  *To:* user 

[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception

2018-11-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688068#comment-16688068
 ] 

KANG-SEN LU commented on KYLIN-3636:


Hi, Shaofeng:

 

I would like to add that there is NO bug in the kylin code, one may argue. 
Because the reason I faced those problem related to TOPN aggregation function 
is caused by "improper documentation about how to configure TOPN metric". So if 
we can improve the usage documentation, there will be more happy users.

> in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
> 
>
> Key: KYLIN-3636
> URL: https://issues.apache.org/jira/browse/KYLIN-3636
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.4.1
>Reporter: KANG-SEN LU
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Hi, ShaoFeng:
>  
> Thanks for the reply. I missed this email and not responded earlier, I am 
> sorry.
>  
> I tried to reproduce this problem with the sample database, and it did not 
> happen.
>  
> So I am hoping by collecting enough "clue", someone can figure out why this 
> problem occurred.
>  
> --
> I issued the following query at the sample project to exercise the topn 
> aggregation:
>  
> select seller_id, SUM(price) as total from kylin_sales group by seller_id 
> order by total limit 5;
>  
> With my own added debugging, I saw the following log in the kylin.log: (the 
> query worked OK).
>  
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> model.FunctionDesc:59 : KSL2, 
> getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_
> 2018-10-16 16:18:19,963 INFO  [kylin-coproc--pool12-t1] 
> v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: 
> send request to the init region server anovadata4.anovadata.local on table 
> ANOVA_KYLIN_25X_K758MEAWJG
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= 
> _KY_SUM_KYLIN_SALES_PRICE_
>  
>  
> When I was executing my project query, I issued the following select 
> statement:
>  
> SELECT  ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", 
> SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) 
> \"vl_aggs_model___USERS_BY_ERROR_3XX\"  FROM  ZETTICSDW.A_VL_HOURLY_V inner 
> JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( 
> ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) 
> WHERE  ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND 
> ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND 
> (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%'  
> GROUP BY  ZETTICSDW.A_VL_HOURLY_V.IMSIID  
> ORDER BY  \"vl_aggs_model___USERS_BY_ERROR_3XX\"
> LIMIT 25
>  
> An exception occurred within the method "private ColumnRowType 
> buildColumnRowType()" of 
> "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java".
>  
> if (columns.size() != rowType.getFieldCount()) {
>     throw new IllegalStateException("RowType=" + 
> rowType.getFieldCount() + ", ColumnRowType=" + columns.size());
>     }
>  
> It printed "RowType=133, ColumnRowType=132".
> The RowType list contains one extra column name: "ANY 
> _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_".
>  
>  
> I think this has something to do with the bug fix "KYLIN-3359 Support 
> sum(expression) if possible".
>  
> After this bug fix was submitted, I noticed that a lot of column name was 
> added into rowType like "_KY_SUM_XXX".
>  
> This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very 
> similar.
>  
> I also found that this extra column name, only existed in RowType but not in 
> ColumnRowType, was added in the method "public void 
> implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java".
>  
> With my own debug statement, I saw this debug text in kylin.log:
>  
> relnode.OLAPJoinRel:362 : KSL54: newField= #132: 
> _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY
>  
>  
> I hope someone with a deeper technical knowledge in kylin query engine can 
> figure out what was causing the problem I have seen.
>  
> Thanks again.
>  
> Kang-sen
>  
> *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] 
>  *Sent:* Friday, October 05, 2018 9:59 PM
>  *To:* user <[u...@kylin.apache.org|mailto:u...@kylin.apache.org]>
>  *Subject:* Re: any body see topn in kylin 2.5.1 working?
>  
> Hi Kang-Sen,
>  
> Didn't see this; Can you reproduce the 

[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception

2018-11-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688039#comment-16688039
 ] 

KANG-SEN LU commented on KYLIN-3636:


Hi, Shaofeng:

I believe it is MR, but I am not sure. How do I know if it was MR or Spark 
being used as build engine? Is it controlled by the kylin.properties? If yes, 
which parameter is used for this selection?

Kang-sen

> in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
> 
>
> Key: KYLIN-3636
> URL: https://issues.apache.org/jira/browse/KYLIN-3636
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.4.1
>Reporter: KANG-SEN LU
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> Hi, ShaoFeng:
>  
> Thanks for the reply. I missed this email and not responded earlier, I am 
> sorry.
>  
> I tried to reproduce this problem with the sample database, and it did not 
> happen.
>  
> So I am hoping by collecting enough "clue", someone can figure out why this 
> problem occurred.
>  
> --
> I issued the following query at the sample project to exercise the topn 
> aggregation:
>  
> select seller_id, SUM(price) as total from kylin_sales group by seller_id 
> order by total limit 5;
>  
> With my own added debugging, I saw the following log in the kylin.log: (the 
> query worked OK).
>  
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> model.FunctionDesc:59 : KSL2, 
> getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_
> 2018-10-16 16:18:19,963 INFO  [kylin-coproc--pool12-t1] 
> v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: 
> send request to the init region server anovadata4.anovadata.local on table 
> ANOVA_KYLIN_25X_K758MEAWJG
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= 
> _KY_SUM_KYLIN_SALES_PRICE_
>  
>  
> When I was executing my project query, I issued the following select 
> statement:
>  
> SELECT  ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", 
> SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) 
> \"vl_aggs_model___USERS_BY_ERROR_3XX\"  FROM  ZETTICSDW.A_VL_HOURLY_V inner 
> JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( 
> ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) 
> WHERE  ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND 
> ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND 
> (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%'  
> GROUP BY  ZETTICSDW.A_VL_HOURLY_V.IMSIID  
> ORDER BY  \"vl_aggs_model___USERS_BY_ERROR_3XX\"
> LIMIT 25
>  
> An exception occurred within the method "private ColumnRowType 
> buildColumnRowType()" of 
> "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java".
>  
> if (columns.size() != rowType.getFieldCount()) {
>     throw new IllegalStateException("RowType=" + 
> rowType.getFieldCount() + ", ColumnRowType=" + columns.size());
>     }
>  
> It printed "RowType=133, ColumnRowType=132".
> The RowType list contains one extra column name: "ANY 
> _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_".
>  
>  
> I think this has something to do with the bug fix "KYLIN-3359 Support 
> sum(expression) if possible".
>  
> After this bug fix was submitted, I noticed that a lot of column name was 
> added into rowType like "_KY_SUM_XXX".
>  
> This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very 
> similar.
>  
> I also found that this extra column name, only existed in RowType but not in 
> ColumnRowType, was added in the method "public void 
> implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java".
>  
> With my own debug statement, I saw this debug text in kylin.log:
>  
> relnode.OLAPJoinRel:362 : KSL54: newField= #132: 
> _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY
>  
>  
> I hope someone with a deeper technical knowledge in kylin query engine can 
> figure out what was causing the problem I have seen.
>  
> Thanks again.
>  
> Kang-sen
>  
> *From:* ShaoFeng Shi [[mailto:shaofeng...@apache.org]] 
>  *Sent:* Friday, October 05, 2018 9:59 PM
>  *To:* user <[u...@kylin.apache.org|mailto:u...@kylin.apache.org]>
>  *Subject:* Re: any body see topn in kylin 2.5.1 working?
>  
> Hi Kang-Sen,
>  
> Didn't see this; Can you reproduce the problem with the sample cube? 
>  
> Kang-Sen Lu <[k...@anovadata.com|mailto:k...@anovadata.com]> 

[jira] [Commented] (KYLIN-3636) in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception

2018-11-15 Thread KANG-SEN LU (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688005#comment-16688005
 ] 

KANG-SEN LU commented on KYLIN-3636:


After adding some debugging statements, I found there were two problems that 
can affect a cube design.

 

The first problem is that if a cube's storage type is not 2, then the cube 
build can fail with arraycopy exception.

The second problem is if a metric is 
TOPN(SUM())/GROUP-BY() is configured in a cube, then 
the SUM() must also be configured as a metric in the same cube. 
Otherwise, at query time, we may see "null" exception.

As far as TOPN aggregation support is concerned, the kylin group created a good 
technical blog doc, but we also need a description about how to configure it in 
a cube definition. Some hidden restrictions can hamper the successful usage of 
TOPN support.

 

> in kylin 2.4.1 and 2.5.1 topn aggregation query caused exception
> 
>
> Key: KYLIN-3636
> URL: https://issues.apache.org/jira/browse/KYLIN-3636
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.4.1
>Reporter: KANG-SEN LU
>Priority: Major
>
> Hi, ShaoFeng:
>  
> Thanks for the reply. I missed this email and not responded earlier, I am 
> sorry.
>  
> I tried to reproduce this problem with the sample database, and it did not 
> happen.
>  
> So I am hoping by collecting enough "clue", someone can figure out why this 
> problem occurred.
>  
> --
> I issued the following query at the sample project to exercise the topn 
> aggregation:
>  
> select seller_id, SUM(price) as total from kylin_sales group by seller_id 
> order by total limit 5;
>  
> With my own added debugging, I saw the following log in the kylin.log: (the 
> query worked OK).
>  
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> model.FunctionDesc:59 : KSL2, 
> getRewriteFieldName=_KY_SUM_KYLIN_SALES_PRICE_
> 2018-10-16 16:18:19,963 INFO  [kylin-coproc--pool12-t1] 
> v2.CubeHBaseEndpointRPC:217 : Query-a747f16f-4b12-cc97-08d2-9b45c27a529f: 
> send request to the init region server anovadata4.anovadata.local on table 
> ANOVA_KYLIN_25X_K758MEAWJG
> 2018-10-16 16:18:19,963 INFO  [Query a747f16f-4b12-cc97-08d2-9b45c27a529f-90] 
> topn.TopNMeasureType:399 : KSL888: in TopNMeasureType.java, sumFieldName= 
> _KY_SUM_KYLIN_SALES_PRICE_
>  
>  
> When I was executing my project query, I issued the following select 
> statement:
>  
> SELECT  ZETTICSDW.A_VL_HOURLY_V.IMSIID \"ZETTICSDW_A_VL_HOURLY_V_IMSIID\", 
> SUM(ZETTICSDW.A_VL_HOURLY_V.SIG_EVENT_COUNT) 
> \"vl_aggs_model___USERS_BY_ERROR_3XX\"  FROM  ZETTICSDW.A_VL_HOURLY_V inner 
> JOIN ZETTICSDW.T_VL_TRANSACTION_RULE_V ON ( 
> ZETTICSDW.A_VL_HOURLY_V.CAUSE_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.CAUSE_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.REASON_CODE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.REASON_CODE_KEY AND 
> ZETTICSDW.A_VL_HOURLY_V.TRANSACTION_TYPE_KEY = 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.TRANSACTION_TYPE_KEY) 
> WHERE  ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20180209') AND 
> ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '02') AND 
> (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '03'))) AND 
> ZETTICSDW.T_VL_TRANSACTION_RULE_V.DISPLAY_STRING LIKE '%+3%'  
> GROUP BY  ZETTICSDW.A_VL_HOURLY_V.IMSIID  
> ORDER BY  \"vl_aggs_model___USERS_BY_ERROR_3XX\"
> LIMIT 25
>  
> An exception occurred within the method "private ColumnRowType 
> buildColumnRowType()" of 
> "query/src/main/java/org/apache/kylin/query/relnode/OLAPTableScan.java".
>  
> if (columns.size() != rowType.getFieldCount()) {
>     throw new IllegalStateException("RowType=" + 
> rowType.getFieldCount() + ", ColumnRowType=" + columns.size());
>     }
>  
> It printed "RowType=133, ColumnRowType=132".
> The RowType list contains one extra column name: "ANY 
> _KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_".
>  
>  
> I think this has something to do with the bug fix "KYLIN-3359 Support 
> sum(expression) if possible".
>  
> After this bug fix was submitted, I noticed that a lot of column name was 
> added into rowType like "_KY_SUM_XXX".
>  
> This strange column name "_KY_SUM_1_3a1aedef_SIG_EVENT_COUNT_" is very 
> similar.
>  
> I also found that this extra column name, only existed in RowType but not in 
> ColumnRowType, was added in the method "public void 
> implementRewrite(RewriteImplementor implementor)" within "OLAPJoinRel.java".
>  
> With my own debug statement, I saw this debug text in kylin.log:
>  
> relnode.OLAPJoinRel:362 : KSL54: newField= #132: 
> _KY_SUM_1_6735969a_SIG_EVENT_COUNT_ ANY
>  
>  
> I hope someone with a deeper technical knowledge in kylin query engine can 
> figure out what was causing the problem I have seen.
>  
> Thanks again.
>  
>