[jira] [Commented] (CALCITE-5769) Optimizing 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL'

2023-06-07 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730357#comment-17730357
 ] 

Jinpeng Wu commented on CALCITE-5769:
-

Please note that this transformation is not always true, such as the term 
cast('abc' as double) may return null even when 'abc' is not null. 

> Optimizing 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL'
> 
>
> Key: CALCITE-5769
> URL: https://issues.apache.org/jira/browse/CALCITE-5769
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.34.0
>Reporter: xiong duan
>Priority: Major
>
> According to CALCITE-5156. We should support optimize:
>  * 'CAST(e AS t) IS NOT NULL' to 'e IS NOT NULL'
>  * 'CAST(e AS t) IS NULL' to 'e IS NULL'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (CALCITE-4920) Introduce logical space pruning to TopDownRuleDriver

2021-12-02 Thread Jinpeng Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinpeng Wu updated CALCITE-4920:

Description: 
Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver 
implements the top-down search strategy as suggested by the Cascades 
frameworks[1] and provides a basic branch and bound pruning mechanism according 
to the upper bound cost and lower bound cost as suggested by the Columbia 
paper[2].

However, the previous version of TopDownRuleDriver can only prune 
implementation rules and enforcement rules, not transformation rules. The 
reason is major about logical properties.
In the classic volcano/cascades model, logical properties, such as output row 
count, are properties that bind to an equivalent set and will never change 
during optimization. The Columbia optimizer[2] highly depends on this premise. 
However, calcite does not obey such rules. In calcite, logical properties of a 
RelSubset are likely to change during optimization. Actually, calcite is not 
the only optimizer engine that suffers. Orca's logical properties of an 
equivalent set also change. And it cannot have logical pruning, either.

How does the logical properties problem prevent logical pruning? Take this plan 
as an example: sink <- op1 <- op2 <- scan.
By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. So 
we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which op3 is 
in the same equivalent set as op1.
After implementations and enforcements, the sub plan (op1 <- op2 <- scan) gets 
fully optimized and yield a winner with cost C1.
And now we are going to optimize op3. We know another plan in the same 
equivalent set has a cost of C1. So we can use C1 as a cost limit while 
optimizing op3. In the first step, we should build op3 into a physical plan, 
say impl-op3, and compute its self-cost as SC3.
Ideally, if SC3 is already greater than C1, then we can decide that op3 will 
never be part of the best plan, thus the optimization of op4 can be skipped. 
That's the basic though of group pruning in the Columbia optimizer[2].
Here comes the problem: when we calculate the self-cost of impl-op3, we need to 
leverage the metadata, like row count, of impl-op3, which will in turn ask 
impl-op3's input to derive its own metadata. However, the equivalent set of op4 
is not yet fully explored and its row count may not be the final one. So the 
self-cost of impl-op3 may be incorrect. If we just apply group pruning 
according to such cost, op4 will lost its opportunities to explore, and also 
the opportunities to become the best.

To ensure correctness, we require that all descendants are fully explored when 
calculating a node's cost. That's why our first version of TopDownRuleDriver 
only prunes implementation rules and enforcement rules.

In the passed one year, We tried some ways to solve the problem. For example, 
we tried to make calcite's logical properties stable, as Xiening proposed. But 
the proposal was rejected as the changes of metadata after transformations are 
natural. We also tried to identify, by categories or annotations, rules who 
will never change the logical properties and give up the pruning for other 
rules. But we still failed because it introduced too much complexity for rule 
designers.

Those failures drive us to consider the problem from the very essence: if we 
cannot make SC3 stable, what about we give up the usage of SC3 and leverage 
other costs for pruning?

Here is a simple description of the new though. After achieving C1, we eagerly 
build op3 and op4, without further exploration on them. Because op4's input, 
the scan, is fully optimized during the optimization of op1, we can compute a 
stable cumulative cost of impl-op4. Let's denote it as C4. And if we find that 
C4 is already greater than C1, then we know C4 will never be the best node and 
some optimization steps could be skipped (to make it simple, let impl-op4 be 
the only input of impl-op3):
1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as 
trait pass-though. These steps are not handle properly in previous version.
2. The traits derivation of impl-op4 and impl-op3.
3. The explorations of op3, if the substitution of explorations always use op4 
as input. This is the key of logical pruning. I will explain it in more details 
later on.
Note that, the exploration of op4 is not pruned as we don't know whether op4's 
other alternatives would yield a lower cost. Moreover, the implementation of 
op3 is not skipped as it is already applied. But the implementation of other 
alternatives of op3 could be skipped if the exploration is pruned.

The new solution is a hybrid of top-down and bottom-up optimization. 
Optimization requests with cost limits are passed down in a top-down manner 
while cost propagation and pruning take place in a bottom-up manner. And it 
ensures 

[jira] [Created] (CALCITE-4920) Introduce logical space pruning to TopDownRuleDriver

2021-12-02 Thread Jinpeng Wu (Jira)
Jinpeng Wu created CALCITE-4920:
---

 Summary: Introduce logical space pruning to TopDownRuleDriver
 Key: CALCITE-4920
 URL: https://issues.apache.org/jira/browse/CALCITE-4920
 Project: Calcite
  Issue Type: Improvement
Reporter: Jinpeng Wu
Assignee: Jinpeng Wu


Last year, we submit a PR, introducing the TopDownRuleDriver. The rule driver 
implements the top-down search strategy as suggested by the Cascades 
frameworks[1] and provides a basic branch and bound pruning mechanism according 
to the upper bound cost and lower bound cost as suggested by the Columbia 
paper[2].

However, the previous version of TopDownRuleDriver can only prune 
implementation rules and enforcement rules, not transformation rules. The 
reason is major about logical properties.
In the classic volcano/cascades model, logical properties, such as output row 
count, are properties that bind to an equivalent set and will never change 
during optimization. The Columbia optimizer[2] highly depends on this premise. 
However, calcite does not obey such rules. In calcite, logical properties of a 
RelSubset are likely to change during optimization. Actually, calcite is not 
the only optimizer engine that suffers. Orca's logical properties of an 
equivalent set also change. And it cannot have logical pruning, either.

How does the logical properties problem prevent logical pruning? Take this plan 
as an example: sink <- op1 <- op2 <- scan.
By applying a transformation rule, op1 <- op2 is transformed to op3 <- op4. So 
we get a new alternative plan, say sink <- op3 <- op4 <- scan, in which op3 is 
in the same equivalent set as op1.
After implementations and enforcements, the sub plan (op1 <- op2 <- scan) gets 
fully optimized and yield a winner with cost C1.
And now we are going to optimize op3. We know another plan in the same 
equivalent set has a cost of C1. So we can use C1 as a cost limit while 
optimizing op3. In the first step, we should build op3 into a physical plan, 
say impl-op3, and compute its self-cost as SC3.
Ideally, if SC3 is already greater than C1, then we can decide that op3 will 
never be part of the best plan, thus the optimization of op4 can be skipped. 
That's the basic though of group pruning in the Columbia optimizer[2].
Here comes the problem: when we calculate the self-cost of impl-op3, we need to 
leverage the metadata, like row count, of impl-op3, which will in turn ask 
impl-op3's input to derive its own metadata. However, the equivalent set of op4 
is not yet fully explored and its row count may not be the final one. So the 
self-cost of impl-op3 may be incorrect. If we just apply group pruning 
according to such cost, op4 will lost its opportunities to explore, and also 
the opportunities to become the best.

To ensure correctness, we require that all descendants are fully explored when 
calculating a node's cost. That's why our first version of TopDownRuleDriver 
only prunes implementation rules and enforcement rules.

In the passed one year, We tried some ways to solve the problem. For example, 
we tried to make calcite's logical properties stable, as Xiening proposed. But 
the proposal was rejected as the changes of metadata after transformations are 
natural. We also tried to identify, by categories or annotations, rules who 
will never change the logical properties and give up the pruning for other 
rules. But we still failed because it introduced too much complexity for rule 
designers.

Those failures drive us to consider the problem from the very essence: if we 
cannot make SC3 stable, what about we give up the usage of SC3 and leverage 
other costs for pruning?

Here is a simple description of the new though. After achieving C1, we eagerly 
build op3 and op4, without further exploration on them. Because op4's input, 
the scan, is fully optimized during the optimization of op1, we can compute a 
stable cumulative cost of impl-op4. Let's denote it as C4. And if we find that 
C4 is already greater than C1, then we know C4 will never be the best node and 
some optimization steps could be skipped (to make it simple, let impl-op4 be 
the only input of impl-op3):
1. The enforcement rules among impl-sink, impl-op3 and impl-op4, as well as 
trait pass-though. These steps are not handle properly in previous version.
2. The traits derivation of impl-op4 and impl-op3.
3. The explorations of op3, if the substitution of explorations always use op4 
as input. This is the key of logical pruning. I will explain it in more details 
later on.
Note that, the exploration of op4 is not pruned as we don't know whether op4's 
other alternatives would yield a lower cost. Moreover, the implementation of 
op3 is not skipped as it is already applied. But the implementation of other 
alternatives of op3 could be skipped if the exploration is pruned.

The new solution is a hybrid of top-down and bottom-up 

[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.

2020-12-16 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250183#comment-17250183
 ] 

Jinpeng Wu commented on CALCITE-4432:
-

I get no more idea currently.

I know that [~hyuan] spent some time on this problem. Maybe [~hyuan] can share 
some findings. 

> When the RelNode's input is the same subset as the node belonged to, not 
> choose this node as best.
> --
>
> Key: CALCITE-4432
> URL: https://issues.apache.org/jira/browse/CALCITE-4432
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Ziwei Liu
>Assignee: Ziwei Liu
>Priority: Major
>
> If a subset have a cyclic node, the node's input is this subset itself. If 
> the beset 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.

2020-12-14 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248795#comment-17248795
 ] 

Jinpeng Wu commented on CALCITE-4432:
-

Hi, Julian. I think the top-down rule driver may be a general solution for this 
problem:
 # Transformation rules that lead to set merging are generally fired before 
implementations rules
 # During implemetation/optimization phase, optimization will stop directly 
when cycles are detected. So cyclic nodes should have no chance to become the 
best of its RelSubset. 

 

> When the RelNode's input is the same subset as the node belonged to, not 
> choose this node as best.
> --
>
> Key: CALCITE-4432
> URL: https://issues.apache.org/jira/browse/CALCITE-4432
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Ziwei Liu
>Assignee: Ziwei Liu
>Priority: Major
>
> If a subset have a cyclic node, the node's input is this subset itself. If 
> the beset 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4432) When the RelNode's input is the same subset as the node belonged to, not choose this node as best.

2020-12-10 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247170#comment-17247170
 ] 

Jinpeng Wu commented on CALCITE-4432:
-

For example, there is a RelSubset A with best as X. RelSubset B with best Y is 
the input subset of X. When A is merged with B, A's best should be replaced by 
Y as X's cost should always greater than Y's cost.  

This bug is fired when X's cost is not greater than Y's cost. 

There are all kinds of reason why Y's cost is not always larger than X's cost. 
For example, X's selfCost is underflowed or X's totalCost is overflowed. 

But these should be the issue of cost model, not calcite core. 

 

> When the RelNode's input is the same subset as the node belonged to, not 
> choose this node as best.
> --
>
> Key: CALCITE-4432
> URL: https://issues.apache.org/jira/browse/CALCITE-4432
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Ziwei Liu
>Assignee: Ziwei Liu
>Priority: Major
>
> If a subset have a cyclic node, the node's input is this subset itself. If 
> the beset 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4360) Apply SubstitutionRule first in top-down driven rule apply

2020-10-27 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221928#comment-17221928
 ] 

Jinpeng Wu commented on CALCITE-4360:
-

Yes. This should be a typo of previous commit. Thanks for fixing this. 

> Apply SubstitutionRule first in top-down driven rule apply
> --
>
> Key: CALCITE-4360
> URL: https://issues.apache.org/jira/browse/CALCITE-4360
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Chunwei Lei
>Assignee: Chunwei Lei
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-10-27-21-55-55-155.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the current implementation, TopDownRuleQueue adds substitute rules in the 
> end wrongly. The SubstitutionRule should be executed first.
> !image-2020-10-27-21-55-55-155.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-4050) Traits Propagation for EnumerableMergeJoin Produces Incorrect Result

2020-06-08 Thread Jinpeng Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinpeng Wu updated CALCITE-4050:

Description: 
In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping 
from left keys to right keys (the keyMap variable). However, the left keys 
could have duplicate entries. 
 One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is

EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3])
  EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner])
    EnumerableSort(sort0=[$0], dir0=[ASC])
      EnumerableTableScan(table=[[hr, depts]])
    EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
      ...

where left keys are [0, 0] , and right keys are [1, 0]. Deriving right child's 
traits may result in incorrect output.

  was:
In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping 
from left keys to right keys (the keyMap variable). However, the left keys 
could have duplicate entries. 
One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is 

EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3])
  EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner])
EnumerableSort(sort0=[$0], dir0=[ASC])
  EnumerableTableScan(table=[[hr, depts]])
EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
  ...

where left keys are [0, 0] , and  right keys are [1, 0]. Deriving right child's 
traits may result in incorrect output. 


> Traits Propagation for EnumerableMergeJoin Produces Incorrect Result
> 
>
> Key: CALCITE-4050
> URL: https://issues.apache.org/jira/browse/CALCITE-4050
> Project: Calcite
>  Issue Type: Bug
>Reporter: Jinpeng Wu
>Priority: Major
>
> In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping 
> from left keys to right keys (the keyMap variable). However, the left keys 
> could have duplicate entries. 
>  One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is
> EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3])
>   EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner])
>     EnumerableSort(sort0=[$0], dir0=[ASC])
>       EnumerableTableScan(table=[[hr, depts]])
>     EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
>       ...
> where left keys are [0, 0] , and right keys are [1, 0]. Deriving right 
> child's traits may result in incorrect output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-4050) Traits Propagation for EnumerableMergeJoin Produces Incorrect Result

2020-06-08 Thread Jinpeng Wu (Jira)
Jinpeng Wu created CALCITE-4050:
---

 Summary: Traits Propagation for EnumerableMergeJoin Produces 
Incorrect Result
 Key: CALCITE-4050
 URL: https://issues.apache.org/jira/browse/CALCITE-4050
 Project: Calcite
  Issue Type: Bug
Reporter: Jinpeng Wu


In EnumerableMergeJoin's deriveTraits method, it uses a Map to record mapping 
from left keys to right keys (the keyMap variable). However, the left keys 
could have duplicate entries. 
One example is JdbcTest.testJoinInCorrelatedSubQuery, the expected plan is 

EnumerableProject(deptno=[$0], name=[$1], employees=[$2], location=[$3])
  EnumerableMergeJoin(condition=[AND(=($0, $5), =($0, $4))], joinType=[inner])
EnumerableSort(sort0=[$0], dir0=[ASC])
  EnumerableTableScan(table=[[hr, depts]])
EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
  ...

where left keys are [0, 0] , and  right keys are [1, 0]. Deriving right child's 
traits may result in incorrect output. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3997) Problem with MERGE JOIN: java.lang.AssertionError: cannot merge join: left input is not sorted on left keys

2020-05-15 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108725#comment-17108725
 ] 

Jinpeng Wu commented on CALCITE-3997:
-

[~rubenql] I also think that physical transformation rules are usually 
duplicate rule firings and should be avoided. In your case, 
LogicalProject -> EnumerableCalc
LogicalProject -> LogicalCalc -> EnumerableCalc
maybe what you need is a ProjectMergeRule, not EnumerableCalcMergeRule

> Problem with MERGE JOIN: java.lang.AssertionError: cannot merge join: left 
> input is not sorted on left keys
> ---
>
> Key: CALCITE-3997
> URL: https://issues.apache.org/jira/browse/CALCITE-3997
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.23.0
>Reporter: Enrico Olivelli
>Priority: Blocker
> Fix For: 1.23.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I have a couple of problems with HerdDB.
> 1) JOIN order unsorted columns in presence of a WHERE over other columns
> This is my case:
> CREATE TABLE tblspace1.table1 (k1 string primary key,n1 int,s1 string)
> CREATE TABLE tblspace1.table3 (k1 string primary key,n3 int,s3 string)
> SELECT t1.k1 as first, t2.k1 as second
> FROMtblspace1.table1 t1 
>  INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1
>  WHERE t1.n1 + 1 = t2.n3
> In this case for table1 and table3 no column is physically sorted (no column 
> with a collation)  
> I have this Planner error:
> java.lang.AssertionError: cannot merge join: left input is not sorted on left 
> keys
> at 
> org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457)
> at 
> org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153)
> at GeneratedMetadataHandler_Collation.collations_$(Unknown Source)
> at GeneratedMetadataHandler_Collation.collations(Unknown Source)
> at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539)
> at 
> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273)
> at 
> org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122)
> at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242)
> at 
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121)
> at 
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111)
> at 
> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172)
> at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219)
> at 
> org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620)
> at 
> org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59)
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221)
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519)
> at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535)
> at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292) 
> If I remove the "WHERE" clause then no error is reported.
> we have many  other test cases about JOINs and this one is the only one that 
> fails
> This is the failing test case on HerdDB
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522
> We are using the default set of rules Programs.ofRules(Programs.RULE_SET)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3916) Support cascades style top-down driven rule apply

2020-05-06 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100596#comment-17100596
 ] 

Jinpeng Wu edited comment on CALCITE-3916 at 5/6/20, 9:35 AM:
--

PR: https://github.com/apache/calcite/pull/1950 

There might be two ways to accomplish this. The first one is designing another 
Planner while the second is modifying the VolcanoPlanner directly and make sure 
it won't break the current logic. The pros and cons are discussed: 
https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E
 

The code in this PR is now generally on the first track because I am still 
trying some aggressive optimization. If keeping one VolcanoPlanner is the 
consensus, it's definitely possible to combine this PR with VolcannoPlanner. 



was (Author: fatlittle):
PR: https://github.com/apache/calcite/pull/1950 

There might be two ways to accomplish this. The first one is designing another 
Planner while the second is modifying the VolcanoPlanner directly and make sure 
it won't break the current logic. The pros and cons are discussed: 
https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E
 

My code is now generally on the first track. Currently it should not be 
difficult to switch to the second one. However, I am still trying some 
aggressive optimizations. So I am not going to take the second way until many 
people insist. Thanks

> Support cascades style top-down driven rule apply
> -
>
> Key: CALCITE-3916
> URL: https://issues.apache.org/jira/browse/CALCITE-3916
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Assignee: Jinpeng Wu
>Priority: Major
>
> Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a 
> RelSet, rule is matched and applied sequentially. No RuleQueue and 
> DeferringRuleCall is needed anymore. This will make space pruning and rule 
> mutual exclusivity check possible.
> Rule that use AbstractConverter as operand is an exception, to keep backward 
> compatibility, this kind of rule still needs top-down apply.
> This should be done after CALCITE-3896.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3916) Support cascades style top-down driven rule apply

2020-05-06 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100596#comment-17100596
 ] 

Jinpeng Wu commented on CALCITE-3916:
-

PR: https://github.com/apache/calcite/pull/1950 

There might be two ways to accomplish this. The first one is designing another 
Planner while the second is modifying the VolcanoPlanner directly and make sure 
it won't break the current logic. The pros and cons are discussed: 
https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E
 

My code is now generally on the first track. Currently it should not be 
difficult to switch to the second one. However, I am still trying some 
aggressive optimizations. So I am not going to take the second way until many 
people insist. Thanks

> Support cascades style top-down driven rule apply
> -
>
> Key: CALCITE-3916
> URL: https://issues.apache.org/jira/browse/CALCITE-3916
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Assignee: Jinpeng Wu
>Priority: Major
>
> Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a 
> RelSet, rule is matched and applied sequentially. No RuleQueue and 
> DeferringRuleCall is needed anymore. This will make space pruning and rule 
> mutual exclusivity check possible.
> Rule that use AbstractConverter as operand is an exception, to keep backward 
> compatibility, this kind of rule still needs top-down apply.
> This should be done after CALCITE-3896.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CALCITE-3916) Support cascades style top-down driven rule apply

2020-05-06 Thread Jinpeng Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinpeng Wu reassigned CALCITE-3916:
---

Assignee: Jinpeng Wu  (was: Haisheng Yuan)

> Support cascades style top-down driven rule apply
> -
>
> Key: CALCITE-3916
> URL: https://issues.apache.org/jira/browse/CALCITE-3916
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Assignee: Jinpeng Wu
>Priority: Major
>
> Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a 
> RelSet, rule is matched and applied sequentially. No RuleQueue and 
> DeferringRuleCall is needed anymore. This will make space pruning and rule 
> mutual exclusivity check possible.
> Rule that use AbstractConverter as operand is an exception, to keep backward 
> compatibility, this kind of rule still needs top-down apply.
> This should be done after CALCITE-3896.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3963) Maintains logical properties at RelSet (equivalent group) instead of RelNode

2020-05-05 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100463#comment-17100463
 ] 

Jinpeng Wu commented on CALCITE-3963:
-

I think we all agree that RelNodes in a RelSet should share the same logical 
properties. The difference is how to do this. 

I agree with Julian that MetadataQuery is a good design to propagate logical 
properties for new RelNode. Storing a concrete value associate with a RelSet 
require complicated logic to maintain and invalidate the cached value. If some 
logic is considered flawed, it is a bug of metadata handler. It should be 
metadata handler's job to ensure logical properties across the RelSet is 
consistent. 

Haisheng mentioned that we have to decide when this value is used for logical 
space pruning. I think we can add a state field to RelSet, for example, 
EXPLORED or SUBSTITUTION_APPLIED. MetadataHandler can also leverage this value 
to decide its logic. This value requires invalidation when RelSets get merged. 
But it should be much simpler than storing a concrete metadata result.  

This strategy is somewhat like combining option one and option two. When new 
RelNode is registered into a RelSet, logical properties are recomputed as cache 
in RelMetadataQuery is invalidated. This value can not be used for logical 
space pruning until the RelSet is in a suitable state. And how to decide the 
state? It may be difficult now, but much simpler in top-down rule applying 
strategy. 

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> 
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
>  Issue Type: Bug
>Reporter: Xiening Dai
>Assignee: Xiening Dai
>Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-17 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085662#comment-17085662
 ] 

Jinpeng Wu commented on CALCITE-3896:
-

> Is this the one of the physical plan after applying all the physical rules? 

Yes, Danny.  But this plan somehow depends on the passThrough framework. A plan 
must be fired 
 as a candidate before it can win the competition with cost model. I was asking 
where can the Project5 get fired. 

I am not against the idea itself. Without considering my comments, this 
proposal is still a very promotion to current calcite framework. I was just 
raising some lessons I ever learnt to make it even better. 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-17 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085500#comment-17085500
 ] 

Jinpeng Wu commented on CALCITE-3896:
-

And I hope this "won't and shouldn't" can be enforced by the interface, not 
just noted in javadoc.  

For example, 
interface RelNode \{
  Pair passThrough(RelTraitSet required);
}
It only allows implementations to return the RelTraitSet of  Project3 and 
AbstractConverter2  in your example. 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-17 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085494#comment-17085494
 ] 

Jinpeng Wu edited comment on CALCITE-3896 at 4/17/20, 7:07 AM:
---

>> When passing through parent requests, it won't and shouldn't generate new 
>> child physical operators

So how to generate such candidate: 
Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2.
  This is most possible the best plan.  


was (Author: fatlittle):
When passing through parent requests, it won't and shouldn't generate new child 
physical operators

-

So how to generate such candidate: 
Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2.
  This is most possible the best plan.  

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-17 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085494#comment-17085494
 ] 

Jinpeng Wu commented on CALCITE-3896:
-

When passing through parent requests, it won't and shouldn't generate new child 
physical operators

-

So how to generate such candidate: 
Project4(WithCalculation)<-PhysicalConverter<-Project5(ColumnPruningFromProject1)<-Filter2.
  This is most possible the best plan.  

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-14 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082903#comment-17082903
 ] 

Jinpeng Wu commented on CALCITE-3896:
-

Hi,[~hyuan] 。
 # Got it
 # For example, some rule may decide that a logical agg will fire the one phase 
agg candidate only when input is small enough or by looking in its input, its 
input has already been distributed by the group keys. Well, this case is not 
very good. I am just thinking if there may be some exceptions
 # An actual case that i have come across,the case  AC<-Project(With RexCall), 
could generate 
 ## candidate 1:NONE. It is better when calls are generating data with smaller 
size (like extract a small part of the data from a big json)
 ## candidate 2: Project(With RexCall)<-AC. Better when AC is perfectly match 
children's delivering traits
 ## candidate 3: Project(With RexCalls)<-AC<-Project(Column Pruning Only), 
better column pruning is available
 ## candidate 4: Project(Other RexCalls)<-AC<-Project(Containing part of the 
rexCalls that may shrink data size), we don' t have the exact cost model here. 
So this candidate may produce multiple result that could possibly be the best

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-14 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812
 ] 

Jinpeng Wu edited comment on CALCITE-3896 at 4/14/20, 6:02 AM:
---

Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC?  I 
think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. This requires applying all transformation rules and implementation rules 
before enforcing. So implementation rules can not decide which candidate is 
valid or not according to the input's delivering traits.

3. The method passThough could generate multiple candidates or none candidates

 


was (Author: fatlittle):
Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC?  I 
think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. This requires applying all transformation rules and implementation rules 
before enforcing. So implementation rules can not decide which candidate is 
valid or not according to the input's delivering traits.

3. The method passThough could generate more than multiple candidates or none 
candidates

 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-07 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812
 ] 

Jinpeng Wu edited comment on CALCITE-3896 at 4/8/20, 4:37 AM:
--

Hi, [~hyuan] . Useful idea! Are you intending to use this to replace AC?  I 
think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. This requires applying all transformation rules and implementation rules 
before enforcing. So implementation rules can not decide which candidate is 
valid or not according to the input's delivering traits.

3. The method passThough could generate more than multiple candidates or none 
candidates

 


was (Author: fatlittle):
Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to 
replace AC?  I think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. This requires applying all transformation rules and implementation rules 
before enforcing. So implementation rules can not decide which candidate is 
valid or not according to the input traits to. 

3. The method passThough could generate more than multiple candidates or none 
candidates

 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-07 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812
 ] 

Jinpeng Wu edited comment on CALCITE-3896 at 4/8/20, 4:34 AM:
--

Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to 
replace AC?  I think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. This requires applying all transformation rules and implementation rules 
before enforcing. So implementation rules can not decide which candidate is 
valid or not according to the input traits to. 

3. The method passThough could generate more than multiple candidates or none 
candidates

 


was (Author: fatlittle):
Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to 
replace AC?  I think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. What if the implementation rule generating MergeJoin[a] comes after 
passThrough [a] to MergeJoin[a,b] 

3. The method passThough could generate more than multiple candidates or none 
candidates

 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-07 Thread Jinpeng Wu (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077812#comment-17077812
 ] 

Jinpeng Wu commented on CALCITE-3896:
-

Hi, [~hyuan] . Looks like some useful change. Are you intending to use this to 
replace AC?  I think we need to consider several cases before this:

1.  How can the planner know that passing request [a] to MergeJoin[a,b] will 
generate exactly the same MergeJoin[a]  in order to avoid redundancy. Enforcing 
needs only generate an output that satisfies [a], not exactly [a].  Or it could 
be another MergeJoin[a] with different inputs, thus different cost.

2. What if the implementation rule generating MergeJoin[a] comes after 
passThrough [a] to MergeJoin[a,b] 

3. The method passThough could generate more than multiple candidates or none 
candidates

 

> Pass through parent trait requests to child operators
> -
>
> Key: CALCITE-3896
> URL: https://issues.apache.org/jira/browse/CALCITE-3896
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> This is not on-demand trait requests as described in [mailing 
> list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
>  which requires the overhaul of the core planner. This ticket tries to enable 
> VolcanoPlanner with basic and minimal ability to pass through parent trait 
> request to child operators without rules, though may not be flexible or 
> powerful, but should be able to work with current Calcite application with 
> minimal changes.
> The method for physical operators to implement would be:
> {code:java}
> interface RelNode {
>   RelNode passThrough(RelTraitSet required);
> }
> {code}
> Given that Calcite's physical operators decides its child operators' traits 
> when the physical operator is created in physical implementation rule, there 
> are some drawback that can't be avoided. e.g., given the following plan:
> {code:java}
> StreamAgg on [a]
>+-- MergeJoin on [a, b, c]
>|--- TableScan foo
>+--- TableScan bar
> {code}
> Suppose the MergeJoin implementation rule generates several mergejoins that 
> distributes by [a], [a,b], [a,b,c] separately. Then we pass parent operator 
> StreamAgg's trait request to MergeJoin. Since MergeJoin[a] satisfies parent's 
> request, nothing to do. Next pass request to MergeJoin[a,b], we get 
> MergeJoin[a], then pass request to MergeJoin[a,b,c], we get MergeJoin[a] 
> again. We know they are redundant and there is no need to pass through parent 
> operator's trait request, but these MergeJoin operators are independent and 
> agnostic of each other's existence.
> The ideal way is that in physical implementation rule, during the creation of 
> physical operator, it should not care about itself and its child operators' 
> physical traits. But this is another different topic.
> Anyway, better than nothing, once it is done, we can provide the option to 
> obsolete or disable  {{AbstractConverter}}, but still be able to do property 
> enforcement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)