[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2020-01-08 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010906#comment-17010906
 ] 

Ruben Q L commented on CALCITE-2166:


Thanks for the clarification, [~xndai].

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
> EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 0.0 io}, id = 4206
>   

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2020-01-08 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010880#comment-17010880
 ] 

Xiening Dai commented on CALCITE-2166:
--

[~rubenql] No, part of the fix of CALCITE-3479 was to revert fix of 
CALCITE-2166.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
> EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 0.0 io}, id = 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2020-01-08 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010877#comment-17010877
 ] 

Ruben Q L commented on CALCITE-2166:


[~hyuan], it seems that CALCITE-3479 got resolved, should we update the status 
of the current ticket?

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Vova Vysotskyi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
> EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2019-09-16 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930395#comment-16930395
 ] 

Xiening Dai commented on CALCITE-2166:
--

hi [~danny0405], sorry I just saw your comments here in JIRA. The reason I 
chose #2 is because with #1 we would possibly get into the other error "rel x 
has lower cost than the best of subset", which is the problem we tried to fix 
in the first place. #2 pay slightly higher cost but does guarantee we have the 
cheapest rel in subset as the best rel. There are multiple places broken in the 
memo we need to fix them one by one. I agree 2018 is more fundamental, and 
personally I prefer get that fixed first. But the previous consensus was to fix 
2166 first.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Danny Chan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2019-09-07 Thread Danny Chan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925043#comment-16925043
 ] 

Danny Chan commented on CALCITE-2166:
-

Thanks [~xndai],

Well, i think there are 2 cases that option 1 & 2 can not fix up the problem 
perfectly:
 # one rel may belongs to multi RelSubSet, but because we alway use the DP 
first search algorithm to update parents of one RelSubSet, the other RelSubSet 
may also be affected.
 # The plan may be a DAG, one rel may have multiple parents.

I have reviewed your PR, it seems that you choose the #2 proposal, not #1, i 
see you add a for loop in each layer of the DP search, can you give a 
performance test about how much overhead the patch has introduced ? 

Because #1 and #2 are both not perfect fix, we may need to choose #1 while #2 
has some overhead but not significant improvement.

PS: i'm still a little skeptical about whether this patch do better or worse, 
just like Julian said, with CALCITE-2018, we just comes from a metadata 
breaking state to another one.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Danny Chan
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2019-09-04 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922727#comment-16922727
 ] 

Xiening Dai commented on CALCITE-2166:
--

hi [~vvysotskyi] I send out a PR using #1 proposal. Please help take a look. 
Note that the PR wouldn't completely eliminate this error. There are also other 
cases where error is due to the slate cache item (RelSubSet row count is not 
updated properly). Those cases should be addressed by CALCITE-2018. Let's hope 
we can move it forward. Thanks.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Danny Chan
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2019-08-21 Thread Volodymyr Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912124#comment-16912124
 ] 

Volodymyr Vysotskyi commented on CALCITE-2166:
--

Thanks, Xiening Dai, for moving this issue forward! I've also considered 
similar options, and here are my thoughts regarding them:
1 & 2: Currently we store the best cost value to avoid recalculating it every 
time when a new rel node is added to the rel subset. For the case when we have 
to recalculate its cost, we also have to clear rel metadata cache for current 
rel node, and for the case when something is changed, also recalculate parent 
rel nodes costs.

I agree that #3 is too complex, and I don't have other solutions for this issue.

But in either case, it would be good if 1 or 2 option will help to fix this 
issue partially without introducing significant performance degradation.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Danny Chan
>Priority: Critical
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
> EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>   EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
> EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], 

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

2019-08-20 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911714#comment-16911714
 ] 

Xiening Dai commented on CALCITE-2166:
--

hi [~danny0405][~vvysotskyi], not sure what is the progress of this issue. I 
spend some time looking into this and have a couple of proposals as below 
(ranking from the easier to the hardiest) -

1. In propagateCostImprovements0() if we find bestrel cost is increased, we 
just updated RelSubset.bestCost

This is the most trivial fix we can do. With that we can guarantee the internal 
state of memo is consistent, but the plan generated can be sub-optimal.

2. In propagateCostImprovements0() if we find bestrel cost is increased, we 
recompute cost for all rels within current subset, and choose the cheapest one 
as the best.

This goes one step further than #1 to look for a potentially better plan when 
the original bestrel cost is increased. Although this may generate better plans 
under some cases, it is still not able to guarantee the optimal result.

3. Redesign the whole bestrel logic, so a given subset could have more than one 
bestrel, and each best candidate would map to one or more parents.

The issue here is the fundamental assumption of our DP search model is broken 
by exception case like this bug. The best of current subset does not 
necessarliy uses the best of the input subset, because although the best of 
input subset grantees the minimal of input cost, it doesn't necessarily 
minimize the self cost of parent node since the row count could increase. With 
this in mind, in order to find the optimal plan, we would need to have multiple 
"best" candiates and take input rows into account when calculate parent's best. 
Theoretically this would grantee an optimal plan, but it's fairly complex and 
could incur significant overhead on runtime performance.

At this moment, I'd prefer proposal #1. #3 is too complex and requires drastic 
change on the core framework which I don't see very feasible. #2 adds overhead 
but again, same as #1, it doesn't grantee an optimal plan.

Let me know your thoughts. I would love to see this resolved so we can move on 
to CALCITE-2018 which I believe is fairly important to the functionality of 
core framework.



> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -
>
> Key: CALCITE-2166
> URL: https://issues.apache.org/jira/browse/CALCITE-2166
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Danny Chan
>Priority: Critical
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>   if (subset.best != null) {
> RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
> if (!subset.bestCost.equals(bestCost)) {
>   throw new AssertionError(
> "relSubset [" + subset.getDescription()
>   + "] has wrong best cost "
>   + subset.bestCost + ". Correct cost is " + bestCost);
> }
>   }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367