[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
[ https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu updated KYLIN-3149: - Fix Version/s: v2.4.0 > Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected > > > Key: KYLIN-3149 > URL: https://issues.apache.org/jira/browse/KYLIN-3149 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.2.0 >Reporter: hongbin ma >Assignee: yiming.xu >Priority: Major > Fix For: v2.4.0 > > Attachments: dump.txt > > > for queries like: > {code:sql} > select TRANS_ID from kylin_sales group by cast (case > WHEN '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, > ''), 1, 1) > when '1030101' = '1030102' then substring(COALESCE(OPS_REGION, > ''), 1, 1) > when '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > when '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > end as varchar(256)), TRANS_ID; > {code} > the expected logical plan after volcano is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, > ''), 1, 1)], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > however the actual is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), > SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', > '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), > =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), > 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, > ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as > expected. If we dump the internal state of this VolcanoPlanner > (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the > complete dump is attached: > {code} > rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561 > > rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101', > '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), > =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, > ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT > NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), > SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), > null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative > cost={15.0 rows, 25.05 cpu, 0.0 io} > > rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS > NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, > cumulative cost={15.0 rows, 25.05 cpu, 0.0 io} > {code} > we see two rels with same cost: #339 and #348, where #339 is created from > LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from > LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject > =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule > require Logical Project rather than OLAP Project, #339 is never reduced. > The worse thing is that cost of #339 and #348 are same. By current volcano > planner algorithm the first met rel will be chosen, so unexpected rel is > chosen > A simple approach to fix this is to refine the rel choosing algorithm: when > two rels are equal in cost, choose a "simpler" one. Since we don't have a > perfect measurement of "simple", we simply choose the rel with smaller > toString() length -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
[ https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma updated KYLIN-3149: -- Attachment: dump.txt > Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected > > > Key: KYLIN-3149 > URL: https://issues.apache.org/jira/browse/KYLIN-3149 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.2.0 >Reporter: hongbin ma > Attachments: dump.txt > > > for queries like: > {code:sql} > select TRANS_ID from kylin_sales group by cast (case > WHEN '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, > ''), 1, 1) > when '1030101' = '1030102' then substring(COALESCE(OPS_REGION, > ''), 1, 1) > when '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > when '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, > ''), 1, 1) > end as varchar(256)), TRANS_ID; > {code} > the expected logical plan after volcano is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, > ''), 1, 1)], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > however the actual is: > {code} > EXECUTION PLAN BEFORE REWRITE > OLAPToEnumerableConverter > OLAPLimitRel(ctx=[], fetch=[5]) > OLAPProjectRel(TRANS_ID=[$1], ctx=[]) > OLAPAggregateRel(group=[{0, 1}], ctx=[]) > OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), > SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), =('1030101', > '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, ''), 1, 1), > =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, ''), > 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, > ''), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[]) > OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, > 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]]) > {code} > looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as > expected. If we dump the internal state of this VolcanoPlanner > (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the > complete dump is attached: > {code} > rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561 > > rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101', > '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, ''), 1, 1), > =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, > ''), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT > NULL($2), $2, ''), 1, 1), =('1030101', '1030104'), > SUBSTRING(CASE(IS NOT NULL($2), $2, ''), 1, 1), > null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE > "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative > cost={15.0 rows, 25.05 cpu, 0.0 io} > > rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS > NOT NULL($9), $9, ''), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, > cumulative cost={15.0 rows, 25.05 cpu, 0.0 io} > {code} > we see two rels with same cost: #339 and #348, where #339 is created from > LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from > LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject > =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule > require Logical Project rather than OLAP Project, #339 is never reduced. > The worse thing is that cost of #339 and #348 are same. By current volcano > planner algorithm the first met rel will be chosen, so unexpected rel is > chosen > A simple approach to fix this is to refine the rel choosing algorithm: when > two rels are equal in cost, choose a "simpler" one. Since we don't have a > perfect measurement of "simple", we simply choose the rel with smaller > toString() length -- This message was sent by Atlassian JIRA (v6.4.14#64029)