[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

2018-03-18 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3149:
-
Fix Version/s: v2.4.0

> Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
> 
>
> Key: KYLIN-3149
> URL: https://issues.apache.org/jira/browse/KYLIN-3149
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: hongbin ma
>Assignee: yiming.xu
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: dump.txt
>
>
> for queries like:
> {code:sql}
> select TRANS_ID from kylin_sales group by cast (case 
> WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, 
> ''), 1, 1)
> when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, 
> ''), 1, 1)  
> when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> end as varchar(256)), TRANS_ID;
> {code}
> the expected logical plan after volcano is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPProjectRel(TRANS_ID=[$1], ctx=[])
> OLAPLimitRel(ctx=[], fetch=[5])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, 
> ''), 1, 1)], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> however the actual is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPLimitRel(ctx=[], fetch=[5])
> OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), 
> SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', 
> '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), 
> =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 
> 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
> ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as 
> expected. If we dump the internal state of this VolcanoPlanner 
> (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the 
> complete dump is attached:
> {code}
>   rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561
>   
> rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
>  '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), 
> =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, 
> ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT 
> NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), 
> SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), 
> null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative 
> cost={15.0 rows, 25.05 cpu, 0.0 io}
>   
> rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
>  NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, 
> cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> {code}
> we see two rels with same cost:  #339 and #348, where #339 is created from 
> LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from 
> LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject 
> =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule 
> require Logical Project rather than OLAP Project, #339 is never reduced.
> The worse thing is that cost of #339 and #348 are same. By current volcano 
> planner algorithm  the first met rel will be chosen, so unexpected rel is 
> chosen
> A simple approach to fix this is to refine the rel choosing algorithm: when 
> two rels are equal in cost, choose a "simpler" one. Since we don't have a 
> perfect measurement of "simple", we simply choose the rel with smaller 
> toString() length



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

2018-01-03 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-3149:
--
Attachment: dump.txt

> Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
> 
>
> Key: KYLIN-3149
> URL: https://issues.apache.org/jira/browse/KYLIN-3149
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: hongbin ma
> Attachments: dump.txt
>
>
> for queries like:
> {code:sql}
> select TRANS_ID from kylin_sales group by cast (case 
> WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, 
> ''), 1, 1)
> when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, 
> ''), 1, 1)  
> when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, 
> ''), 1, 1)
> end as varchar(256)), TRANS_ID;
> {code}
> the expected logical plan after volcano is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPProjectRel(TRANS_ID=[$1], ctx=[])
> OLAPLimitRel(ctx=[], fetch=[5])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, 
> ''), 1, 1)], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> however the actual is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPLimitRel(ctx=[], fetch=[5])
> OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>   OLAPAggregateRel(group=[{0, 1}], ctx=[])
> OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), 
> SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', 
> '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), 
> =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 
> 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, 
> ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
>   OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 
> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as 
> expected. If we dump the internal state of this VolcanoPlanner 
> (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the 
> complete dump is attached:
> {code}
>   rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561
>   
> rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101',
>  '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), 
> =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, 
> ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT 
> NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), 
> SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), 
> null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE 
> "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative 
> cost={15.0 rows, 25.05 cpu, 0.0 io}
>   
> rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS
>  NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, 
> cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> {code}
> we see two rels with same cost:  #339 and #348, where #339 is created from 
> LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from 
> LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject 
> =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule 
> require Logical Project rather than OLAP Project, #339 is never reduced.
> The worse thing is that cost of #339 and #348 are same. By current volcano 
> planner algorithm  the first met rel will be chosen, so unexpected rel is 
> chosen
> A simple approach to fix this is to refine the rel choosing algorithm: when 
> two rels are equal in cost, choose a "simpler" one. Since we don't have a 
> perfect measurement of "simple", we simply choose the rel with smaller 
> toString() length



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)