[jira] [Commented] (CALCITE-5769) Optimizing 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL'
[ https://issues.apache.org/jira/browse/CALCITE-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730357#comment-17730357 ] Jinpeng Wu commented on CALCITE-5769: - Please note that this transformation is not always true, such as the term cast('abc' as double) may return null even when 'abc' is not null. > Optimizing 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL' > > > Key: CALCITE-5769 > URL: https://issues.apache.org/jira/browse/CALCITE-5769 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.34.0 >Reporter: xiong duan >Priority: Major > > According to CALCITE-5156. We should support optimize: > * 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL' > * 'CAST(e AS t) IS NULL' to 'e IS NULL' -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CALCITE-4920) Introduce logical space pruning to TopDownRuleDriver
[ https://issues.apache.org/jira/browse/CALCITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinpeng Wu updated CALCITE-4920: Description: Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver implements the top-down search strategy as suggested by the Cascades frameworks[1] and provides a basic branch and bound pruning mechanism according to the upper bound cost and lower bound cost as suggested by the Columbia paper[2]. However, the previous version of TopDownRuleDriver can only prune implementation rules and enforcement rules, not transformation rules. The reason is major about logical properties. In the classic volcano/cascades model, logical properties, such as output row count, are properties that bind to an equivalent set and will never change during optimization. The Columbia optimizer[2] highly depends on this premise. However, calcite does not obey such rules. In calcite, logical properties of a RelSubset are likely to change during optimization. Actually, calcite is not the only optimizer engine that suffers. Orca's logical properties of an equivalent set also change. And it cannot have logical pruning, either. How does the logical properties problem prevent logical pruning? Take this plan as an example: sink <- op1 <- op2 <- scan. By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. So we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which op3 is in the same equivalent set as op1. After implementations and enforcements, the sub plan (op1 <- op2 <- scan) gets fully optimized and yield a winner with cost C1. And now we are going to optimize op3. We know another plan in the same equivalent set has a cost of C1. So we can use C1 as a cost limit while optimizing op3. In the first step, we should build op3 into a physical plan, say impl-op3, and compute its self-cost as SC3. Ideally, if SC3 is already greater than C1, then we can decide that op3 will never be part of the best plan, thus the optimization of op4 can be skipped. That's the basic though of group pruning in the Columbia optimizer[2]. Here comes the problem: when we calculate the self-cost of impl-op3, we need to leverage the metadata, like row count, of impl-op3, which will in turn ask impl-op3's input to derive its own metadata. However, the equivalent set of op4 is not yet fully explored and its row count may not be the final one. So the self-cost of impl-op3 may be incorrect. If we just apply group pruning according to such cost, op4 will lost its opportunities to explore, and also the opportunities to become the best. To ensure correctness, we require that all descendants are fully explored when calculating a node's cost. That's why our first version of TopDownRuleDriver only prunes implementation rules and enforcement rules. In the passed one year, We tried some ways to solve the problem. For example, we tried to make calcite's logical properties stable, as Xiening proposed. But the proposal was rejected as the changes of metadata after transformations are natural. We also tried to identify, by categories or annotations, rules who will never change the logical properties and give up the pruning for other rules. But we still failed because it introduced too much complexity for rule designers. Those failures drive us to consider the problem from the very essence: if we cannot make SC3 stable, what about we give up the usage of SC3 and leverage other costs for pruning? Here is a simple description of the new though. After achieving C1, we eagerly build op3 and op4, without further exploration on them. Because op4's input, the scan, is fully optimized during the optimization of op1, we can compute a stable cumulative cost of impl-op4. Let's denote it as C4. And if we find that C4 is already greater than C1, then we know C4 will never be the best node and some optimization steps could be skipped (to make it simple, let impl-op4 be the only input of impl-op3): 1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as trait pass-though. These steps are not handle properly in previous version. 2. The traits derivation of impl-op4 and impl-op3. 3. The explorations of op3, if the substitution of explorations always use op4 as input. This is the key of logical pruning. I will explain it in more details later on. Note that, the exploration of op4 is not pruned as we don't know whether op4's other alternatives would yield a lower cost. Moreover, the implementation of op3 is not skipped as it is already applied. But the implementation of other alternatives of op3 could be skipped if the exploration is pruned. The new solution is a hybrid of top-down and bottom-up optimization. Optimization requests with cost limits are passed down in a top-down manner while cost propagation and pruning take place in a bottom-up manner. And it ensures
[jira] [Created] (CALCITE-4920) Introduce logical space pruning to TopDownRuleDriver
Jinpeng Wu created CALCITE-4920: --- Summary: Introduce logical space pruning to TopDownRuleDriver Key: CALCITE-4920 URL: https://issues.apache.org/jira/browse/CALCITE-4920 Project: Calcite Issue Type: Improvement Reporter: Jinpeng Wu Assignee: Jinpeng Wu Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver implements the top-down search strategy as suggested by the Cascades frameworks[1] and provides a basic branch and bound pruning mechanism according to the upper bound cost and lower bound cost as suggested by the Columbia paper[2]. However, the previous version of TopDownRuleDriver can only prune implementation rules and enforcement rules, not transformation rules. The reason is major about logical properties. In the classic volcano/cascades model, logical properties, such as output row count, are properties that bind to an equivalent set and will never change during optimization. The Columbia optimizer[2] highly depends on this premise. However, calcite does not obey such rules. In calcite, logical properties of a RelSubset are likely to change during optimization. Actually, calcite is not the only optimizer engine that suffers. Orca's logical properties of an equivalent set also change. And it cannot have logical pruning, either. How does the logical properties problem prevent logical pruning? Take this plan as an example: sink <- op1 <- op2 <- scan. By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. So we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which op3 is in the same equivalent set as op1. After implementations and enforcements, the sub plan (op1 <- op2 <- scan) gets fully optimized and yield a winner with cost C1. And now we are going to optimize op3. We know another plan in the same equivalent set has a cost of C1. So we can use C1 as a cost limit while optimizing op3. In the first step, we should build op3 into a physical plan, say impl-op3, and compute its self-cost as SC3. Ideally, if SC3 is already greater than C1, then we can decide that op3 will never be part of the best plan, thus the optimization of op4 can be skipped. That's the basic though of group pruning in the Columbia optimizer[2]. Here comes the problem: when we calculate the self-cost of impl-op3, we need to leverage the metadata, like row count, of impl-op3, which will in turn ask impl-op3's input to derive its own metadata. However, the equivalent set of op4 is not yet fully explored and its row count may not be the final one. So the self-cost of impl-op3 may be incorrect. If we just apply group pruning according to such cost, op4 will lost its opportunities to explore, and also the opportunities to become the best. To ensure correctness, we require that all descendants are fully explored when calculating a node's cost. That's why our first version of TopDownRuleDriver only prunes implementation rules and enforcement rules. In the passed one year, We tried some ways to solve the problem. For example, we tried to make calcite's logical properties stable, as Xiening proposed. But the proposal was rejected as the changes of metadata after transformations are natural. We also tried to identify, by categories or annotations, rules who will never change the logical properties and give up the pruning for other rules. But we still failed because it introduced too much complexity for rule designers. Those failures drive us to consider the problem from the very essence: if we cannot make SC3 stable, what about we give up the usage of SC3 and leverage other costs for pruning? Here is a simple description of the new though. After achieving C1, we eagerly build op3 and op4, without further exploration on them. Because op4's input, the scan, is fully optimized during the optimization of op1, we can compute a stable cumulative cost of impl-op4. Let's denote it as C4. And if we find that C4 is already greater than C1, then we know C4 will never be the best node and some optimization steps could be skipped (to make it simple, let impl-op4 be the only input of impl-op3): 1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as trait pass-though. These steps are not handle properly in previous version. 2. The traits derivation of impl-op4 and impl-op3. 3. The explorations of op3, if the substitution of explorations always use op4 as input. This is the key of logical pruning. I will explain it in more details later on. Note that, the exploration of op4 is not pruned as we don't know whether op4's other alternatives would yield a lower cost. Moreover, the implementation of op3 is not skipped as it is already applied. But the implementation of other alternatives of op3 could be skipped if the exploration is pruned. The new solution is a hybrid of top-down and bottom-up
[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.
[ https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250183#comment-17250183 ] Jinpeng Wu commented on CALCITE-4432: - I get no more idea currently. I know that [~hyuan] spent some time on this problem. Maybe [~hyuan] can share some findings. > When the RelNode's input is the same subset as the node belonged to, not > choose this node as best. > -- > > Key: CALCITE-4432 > URL: https://issues.apache.org/jira/browse/CALCITE-4432 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Ziwei Liu >Assignee: Ziwei Liu >Priority: Major > > If a subset have a cyclic node, the node's input is this subset itself. If > the beset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.
[ https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248795#comment-17248795 ] Jinpeng Wu commented on CALCITE-4432: - Hi, Julian. I think the top-down rule driver may be a general solution for this problem: # Transformation rules that lead to set merging are generally fired before implementations rules # During implemetation/optimization phase, optimization will stop directly when cycles are detected. So cyclic nodes should have no chance to become the best of its RelSubset. > When the RelNode's input is the same subset as the node belonged to, not > choose this node as best. > -- > > Key: CALCITE-4432 > URL: https://issues.apache.org/jira/browse/CALCITE-4432 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Ziwei Liu >Assignee: Ziwei Liu >Priority: Major > > If a subset have a cyclic node, the node's input is this subset itself. If > the beset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.
[ https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247170#comment-17247170 ] Jinpeng Wu commented on CALCITE-4432: - For example, there is a RelSubset A with best as X. RelSubset B with best Y is the input subset of X. When A is merged with B, A's best should be replaced by Y as X's cost should always greater than Y's cost. This bug is fired when X's cost is not greater than Y's cost. There are all kinds of reason why Y's cost is not always larger than X's cost. For example, X's selfCost is underflowed or X's totalCost is overflowed. But these should be the issue of cost model, not calcite core. > When the RelNode's input is the same subset as the node belonged to, not > choose this node as best. > -- > > Key: CALCITE-4432 > URL: https://issues.apache.org/jira/browse/CALCITE-4432 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Ziwei Liu >Assignee: Ziwei Liu >Priority: Major > > If a subset have a cyclic node, the node's input is this subset itself. If > the beset -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4360) Apply SubstitutionRule first in top-down driven rule apply
[ https://issues.apache.org/jira/browse/CALCITE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221928#comment-17221928 ] Jinpeng Wu commented on CALCITE-4360: - Yes. This should be a typo of previous commit. Thanks for fixing this. > Apply SubstitutionRule first in top-down driven rule apply > -- > > Key: CALCITE-4360 > URL: https://issues.apache.org/jira/browse/CALCITE-4360 > Project: Calcite > Issue Type: Improvement >Reporter: Chunwei Lei >Assignee: Chunwei Lei >Priority: Major > Labels: pull-request-available > Attachments: image-2020-10-27-21-55-55-155.png > > Time Spent: 20m > Remaining Estimate: 0h > > In the current implementation, TopDownRuleQueue adds substitute rules in the > end wrongly. The SubstitutionRule should be executed first. > !image-2020-10-27-21-55-55-155.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-4050) Traits Propagation for EnumerableMergeJoin Produces Incorrect Result
[ https://issues.apache.org/jira/browse/CALCITE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinpeng Wu updated CALCITE-4050: Description: In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping from left keys to right keys (the keyMap variable). However, the left keys could have duplicate entries. One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3]) EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner]) EnumerableSort(sort0=[$0], dir0=[ASC]) EnumerableTableScan(table=[[hr, depts]]) EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC]) ... where left keys are [0, 0] , and right keys are [1, 0]. Deriving right child's traits may result in incorrect output. was: In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping from left keys to right keys (the keyMap variable). However, the left keys could have duplicate entries. One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3]) EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner]) EnumerableSort(sort0=[$0], dir0=[ASC]) EnumerableTableScan(table=[[hr, depts]]) EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC]) ... where left keys are [0, 0] , and right keys are [1, 0]. Deriving right child's traits may result in incorrect output. > Traits Propagation for EnumerableMergeJoin Produces Incorrect Result > > > Key: CALCITE-4050 > URL: https://issues.apache.org/jira/browse/CALCITE-4050 > Project: Calcite > Issue Type: Bug >Reporter: Jinpeng Wu >Priority: Major > > In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping > from left keys to right keys (the keyMap variable). However, the left keys > could have duplicate entries. > One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is > EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3]) > EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner]) > EnumerableSort(sort0=[$0], dir0=[ASC]) > EnumerableTableScan(table=[[hr, depts]]) > EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC]) > ... > where left keys are [0, 0] , and right keys are [1, 0]. Deriving right > child's traits may result in incorrect output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-4050) Traits Propagation for EnumerableMergeJoin Produces Incorrect Result
Jinpeng Wu created CALCITE-4050: --- Summary: Traits Propagation for EnumerableMergeJoin Produces Incorrect Result Key: CALCITE-4050 URL: https://issues.apache.org/jira/browse/CALCITE-4050 Project: Calcite Issue Type: Bug Reporter: Jinpeng Wu In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping from left keys to right keys (the keyMap variable). However, the left keys could have duplicate entries. One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3]) EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner]) EnumerableSort(sort0=[$0], dir0=[ASC]) EnumerableTableScan(table=[[hr, depts]]) EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC]) ... where left keys are [0, 0] , and right keys are [1, 0]. Deriving right child's traits may result in incorrect output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3997) Problem with MERGE JOIN: java.lang.AssertionError: cannot merge join: left input is not sorted on left keys
[ https://issues.apache.org/jira/browse/CALCITE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108725#comment-17108725 ] Jinpeng Wu commented on CALCITE-3997: - [~rubenql] I also think that physical transformation rules are usually duplicate rule firings and should be avoided. In your case, LogicalProject -> EnumerableCalc LogicalProject -> LogicalCalc -> EnumerableCalc maybe what you need is a ProjectMergeRule, not EnumerableCalcMergeRule > Problem with MERGE JOIN: java.lang.AssertionError: cannot merge join: left > input is not sorted on left keys > --- > > Key: CALCITE-3997 > URL: https://issues.apache.org/jira/browse/CALCITE-3997 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.23.0 >Reporter: Enrico Olivelli >Priority: Blocker > Fix For: 1.23.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > I have a couple of problems with HerdDB. > 1) JOIN order unsorted columns in presence of a WHERE over other columns > This is my case: > CREATE TABLE tblspace1.table1 (k1 string primary key,n1 int,s1 string) > CREATE TABLE tblspace1.table3 (k1 string primary key,n3 int,s3 string) > SELECT t1.k1 as first, t2.k1 as second > FROMtblspace1.table1 t1 > INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1 > WHERE t1.n1 + 1 = t2.n3 > In this case for table1 and table3 no column is physically sorted (no column > with a collation) > I have this Planner error: > java.lang.AssertionError: cannot merge join: left input is not sorted on left > keys > at > org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457) > at > org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153) > at GeneratedMetadataHandler_Collation.collations_$(Unknown Source) > at GeneratedMetadataHandler_Collation.collations(Unknown Source) > at > org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539) > at > org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273) > at > org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122) > at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242) > at > org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121) > at > org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111) > at > org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172) > at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464) > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258) > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230) > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219) > at > org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620) > at > org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59) > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221) > at > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519) > at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535) > at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292) > If I remove the "WHERE" clause then no error is reported. > we have many other test cases about JOINs and this one is the only one that > fails > This is the failing test case on HerdDB > https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522 > We are using the default set of rules Programs.ofRules(Programs.RULE_SET) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3916) Support cascades style top-down driven rule apply
[ https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100596#comment-17100596 ] Jinpeng Wu edited comment on CALCITE-3916 at 5/6/20, 9:35 AM: -- PR: https://github.com/apache/calcite/pull/1950 There might be two ways to accomplish this. The first one is designing another Planner while the second is modifying the VolcanoPlanner directly and make sure it won't break the current logic. The pros and cons are discussed: https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E The code in this PR is now generally on the first track because I am still trying some aggressive optimization. If keeping one VolcanoPlanner is the consensus, it's definitely possible to combine this PR with VolcannoPlanner. was (Author: fatlittle): PR: https://github.com/apache/calcite/pull/1950 There might be two ways to accomplish this. The first one is designing another Planner while the second is modifying the VolcanoPlanner directly and make sure it won't break the current logic. The pros and cons are discussed: https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E My code is now generally on the first track. Currently it should not be difficult to switch to the second one. However, I am still trying some aggressive optimizations. So I am not going to take the second way until many people insist. Thanks > Support cascades style top-down driven rule apply > - > > Key: CALCITE-3916 > URL: https://issues.apache.org/jira/browse/CALCITE-3916 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Assignee: Jinpeng Wu >Priority: Major > > Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a > RelSet, rule is matched and applied sequentially. No RuleQueue and > DeferringRuleCall is needed anymore. This will make space pruning and rule > mutual exclusivity check possible. > Rule that use AbstractConverter as operand is an exception, to keep backward > compatibility, this kind of rule still needs top-down apply. > This should be done after CALCITE-3896. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3916) Support cascades style top-down driven rule apply
[ https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100596#comment-17100596 ] Jinpeng Wu commented on CALCITE-3916: - PR: https://github.com/apache/calcite/pull/1950 There might be two ways to accomplish this. The first one is designing another Planner while the second is modifying the VolcanoPlanner directly and make sure it won't break the current logic. The pros and cons are discussed: https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E My code is now generally on the first track. Currently it should not be difficult to switch to the second one. However, I am still trying some aggressive optimizations. So I am not going to take the second way until many people insist. Thanks > Support cascades style top-down driven rule apply > - > > Key: CALCITE-3916 > URL: https://issues.apache.org/jira/browse/CALCITE-3916 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Assignee: Jinpeng Wu >Priority: Major > > Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a > RelSet, rule is matched and applied sequentially. No RuleQueue and > DeferringRuleCall is needed anymore. This will make space pruning and rule > mutual exclusivity check possible. > Rule that use AbstractConverter as operand is an exception, to keep backward > compatibility, this kind of rule still needs top-down apply. > This should be done after CALCITE-3896. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CALCITE-3916) Support cascades style top-down driven rule apply
[ https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinpeng Wu reassigned CALCITE-3916: --- Assignee: Jinpeng Wu (was: Haisheng Yuan) > Support cascades style top-down driven rule apply > - > > Key: CALCITE-3916 > URL: https://issues.apache.org/jira/browse/CALCITE-3916 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Assignee: Jinpeng Wu >Priority: Major > > Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a > RelSet, rule is matched and applied sequentially. No RuleQueue and > DeferringRuleCall is needed anymore. This will make space pruning and rule > mutual exclusivity check possible. > Rule that use AbstractConverter as operand is an exception, to keep backward > compatibility, this kind of rule still needs top-down apply. > This should be done after CALCITE-3896. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100463#comment-17100463 ] Jinpeng Wu commented on CALCITE-3963: - I think we all agree that RelNodes in a RelSet should share the same logical properties. The difference is how to do this. I agree with Julian that MetadataQuery is a good design to propagate logical properties for new RelNode. Storing a concrete value associate with a RelSet require complicated logic to maintain and invalidate the cached value. If some logic is considered flawed, it is a bug of metadata handler. It should be metadata handler's job to ensure logical properties across the RelSet is consistent. Haisheng mentioned that we have to decide when this value is used for logical space pruning. I think we can add a state field to RelSet, for example, EXPLORED or SUBSTITUTION_APPLIED. MetadataHandler can also leverage this value to decide its logic. This value requires invalidation when RelSets get merged. But it should be much simpler than storing a concrete metadata result. This strategy is somewhat like combining option one and option two. When new RelNode is registered into a RelSet, logical properties are recomputed as cache in RelMetadataQuery is invalidated. This value can not be used for logical space pruning until the RelSet is in a suitable state. And how to decide the state? It may be difficult now, but much simpler in top-down rule applying strategy. > Maintains logical properties at RelSet (equivalent group) instead of RelNode > > > Key: CALCITE-3963 > URL: https://issues.apache.org/jira/browse/CALCITE-3963 > Project: Calcite > Issue Type: Bug >Reporter: Xiening Dai >Assignee: Xiening Dai >Priority: Major > > Currently the logical properties (such as row count, distinct row count, etc) > are maintained at RelNode level. This creates a number of meta data > consistency problems, e.g. CALCITE-1048, CALCITE-2166. > In theory, all RelNodes in a RelSet should share the same logical properties > per definition of relational equivalence. So it makes more sense to keep > logical properties at RelSet level, rather than the RelNode. And such > properties shouldn't change when new sub set is created or subset's best is > changed. > Specifically I think below build in metadata should fall into the logical > properties category - > Selectivity > UniqueKeys > ColumnUniqueness > RowCount > MaxRowCount > MinRowCount > DistinctRowCount > Size (averageRowSize, averageColumnSize) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085662#comment-17085662 ] Jinpeng Wu commented on CALCITE-3896: - > Is this the one of the physical plan after applying all the physical rules? Yes, Danny. But this plan somehow depends on the passThrough framework. A plan must be fired as a candidate before it can win the competition with cost model. I was asking where can the Project5 get fired. I am not against the idea itself. Without considering my comments, this proposal is still a very promotion to current calcite framework. I was just raising some lessons I ever learnt to make it even better. > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085500#comment-17085500 ] Jinpeng Wu commented on CALCITE-3896: - And I hope this "won't and shouldn't" can be enforced by the interface, not just noted in javadoc. For example, interface RelNode \{ Pair passThrough(RelTraitSet required); } It only allows implementations to return the RelTraitSet of Project3 and AbstractConverter2 in your example. > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085494#comment-17085494 ] Jinpeng Wu edited comment on CALCITE-3896 at 4/17/20, 7:07 AM: --- >> When passing through parent requests, it won't and shouldn't generate new >> child physical operators So how to generate such candidate: Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2. This is most possible the best plan. was (Author: fatlittle): When passing through parent requests, it won't and shouldn't generate new child physical operators - So how to generate such candidate: Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2. This is most possible the best plan. > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085494#comment-17085494 ] Jinpeng Wu commented on CALCITE-3896: - When passing through parent requests, it won't and shouldn't generate new child physical operators - So how to generate such candidate: Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2. This is most possible the best plan. > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082903#comment-17082903 ] Jinpeng Wu commented on CALCITE-3896: - Hi,[~hyuan] 。 # Got it # For example, some rule may decide that a logical agg will fire the one phase agg candidate only when input is small enough or by looking in its input, its input has already been distributed by the group keys. Well, this case is not very good. I am just thinking if there may be some exceptions # An actual case that i have come across,the case AC<-Project(With RexCall), could generate ## candidate 1:NONE. It is better when calls are generating data with smaller size (like extract a small part of the data from a big json) ## candidate 2: Project(With RexCall)<-AC. Better when AC is perfectly match children's delivering traits ## candidate 3: Project(With RexCalls)<-AC<-Project(Column Pruning Only), better column pruning is available ## candidate 4: Project(Other RexCalls)<-AC<-Project(Containing part of the rexCalls that may shrink data size), we don' t have the exact cost model here. So this candidate may produce multiple result that could possibly be the best > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812 ] Jinpeng Wu edited comment on CALCITE-3896 at 4/14/20, 6:02 AM: --- Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. This requires applying all transformation rules and implementation rules before enforcing. So implementation rules can not decide which candidate is valid or not according to the input's delivering traits. 3. The method passThough could generate multiple candidates or none candidates was (Author: fatlittle): Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. This requires applying all transformation rules and implementation rules before enforcing. So implementation rules can not decide which candidate is valid or not according to the input's delivering traits. 3. The method passThough could generate more than multiple candidates or none candidates > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812 ] Jinpeng Wu edited comment on CALCITE-3896 at 4/8/20, 4:37 AM: -- Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. This requires applying all transformation rules and implementation rules before enforcing. So implementation rules can not decide which candidate is valid or not according to the input's delivering traits. 3. The method passThough could generate more than multiple candidates or none candidates was (Author: fatlittle): Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. This requires applying all transformation rules and implementation rules before enforcing. So implementation rules can not decide which candidate is valid or not according to the input traits to. 3. The method passThough could generate more than multiple candidates or none candidates > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812 ] Jinpeng Wu edited comment on CALCITE-3896 at 4/8/20, 4:34 AM: -- Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. This requires applying all transformation rules and implementation rules before enforcing. So implementation rules can not decide which candidate is valid or not according to the input traits to. 3. The method passThough could generate more than multiple candidates or none candidates was (Author: fatlittle): Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. What if the implementation rule generating MergeJoin[a] comes after passThrough [a] to MergeJoin[a,b] 3. The method passThough could generate more than multiple candidates or none candidates > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators
[ https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812 ] Jinpeng Wu commented on CALCITE-3896: - Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to replace AC? I think we need to consider several cases before this: 1. How can the planner know that passing request [a] to MergeJoin[a,b] will generate exactly the same MergeJoin[a] in order to avoid redundancy. Enforcing needs only generate an output that satisfies [a], not exactly [a]. Or it could be another MergeJoin[a] with different inputs, thus different cost. 2. What if the implementation rule generating MergeJoin[a] comes after passThrough [a] to MergeJoin[a,b] 3. The method passThough could generate more than multiple candidates or none candidates > Pass through parent trait requests to child operators > - > > Key: CALCITE-3896 > URL: https://issues.apache.org/jira/browse/CALCITE-3896 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > This is not on-demand trait requests as described in [mailing > list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e], > which requires the overhaul of the core planner. This ticket tries to enable > VolcanoPlanner with basic and minimal ability to pass through parent trait > request to child operators without rules, though may not be flexible or > powerful, but should be able to work with current Calcite application with > minimal changes. > The method for physical operators to implement would be: > {code:java} > interface RelNode { > RelNode passThrough(RelTraitSet required); > } > {code} > Given that Calcite's physical operators decides its child operators' traits > when the physical operator is created in physical implementation rule, there > are some drawback that can't be avoided. e.g., given the following plan: > {code:java} > StreamAgg on [a] >+-- MergeJoin on [a, b, c] >|--- TableScan foo >+--- TableScan bar > {code} > Suppose the MergeJoin implementation rule generates several mergejoins that > distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator > StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's > request, nothing to do. Next pass request to MergeJoin[a,b], we get > MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] > again. We know they are redundant and there is no need to pass through parent > operator's trait request, but these MergeJoin operators are independent and > agnostic of each other's existence. > The ideal way is that in physical implementation rule, during the creation of > physical operator, it should not care about itself and its child operators' > physical traits. But this is another different topic. > Anyway, better than nothing, once it is done, we can provide the option to > obsolete or disable {{AbstractConverter}}, but still be able to do property > enforcement. -- This message was sent by Atlassian Jira (v8.3.4#803005)