[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388579721 ## File path: core/src/test/resources/sql/sub-query.iq ## @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], expr#5=[IS TRUE($t4)], e EnumerableLimit(fetch=[1]) EnumerableSort(sort0=[$0], dir0=[DESC]) EnumerableAggregate(group=[{0}], c=[COUNT()]) - EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3]) + EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8]) Review comment: OK. You propose a change. It results in generating bad plans, thus it introduces a technical regression. There's a technical justification for -1. I know cost model has awful lot of inconsistencies. However, it turns out that all those 100 tiny inconsistencies cancel each other, and Calcite manages to produce "sane" plans. Now you fix one or two such defects (which has good merit), however, the net result becomes that there are 98 inconsistencies in the cost model which **no longer** cancel each other. Calcite purpose is optimizer, and it is really sad to introduce regressions to the optimizer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388573407 ## File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java ## @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider, + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n" + "group by \"the_year\"") .enableMaterializations(true) -.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT NULL($t0)], " -+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, $t3)], C=[$t4])\n" -+ " EnumerableAggregate(group=[{0}])\n" -+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])") +.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n" ++ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n" ++ "EnumerableAggregate(group=[{0}])\n" Review comment: Who will fix the costing model then? I think it is unfair to merge a change that is not really compatible with the costing model. If the change to optimizer requires adjustments to the costing model, then could you please do that in a single PR, so we see the net changes for both plans and the response times? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388573407 ## File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java ## @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider, + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n" + "group by \"the_year\"") .enableMaterializations(true) -.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT NULL($t0)], " -+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, $t3)], C=[$t4])\n" -+ " EnumerableAggregate(group=[{0}])\n" -+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])") +.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n" ++ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n" ++ "EnumerableAggregate(group=[{0}])\n" Review comment: Who will fix the costing model then? I think it is unfair to merge a change that is not really compatible with the costing model. If the change to optimizer requires adjustments to the costing model, then could you please do that in a single commit, so we see the net changes for both plans and the response times? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388570453 ## File path: core/src/test/resources/sql/sub-query.iq ## @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], expr#5=[IS TRUE($t4)], e EnumerableLimit(fetch=[1]) EnumerableSort(sort0=[$0], dir0=[DESC]) EnumerableAggregate(group=[{0}], c=[COUNT()]) - EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3]) + EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8]) Review comment: Let me put it in another way: you change the optimizer, and now it favours bad plans. What the optimizer now does it introduces a dummy always_true filter, and it thinks the filter would reduce the number of rows and so on. It does not look like a well-behaving optimizer :-/ Even though the change reduces `slow test` execution, that reduction might be the result of "skipping some rules" rather than removing importance. So currently it looks like some rules do not fire which result in a noticeable amount of useless predicates floating around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388570453 ## File path: core/src/test/resources/sql/sub-query.iq ## @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], expr#5=[IS TRUE($t4)], e EnumerableLimit(fetch=[1]) EnumerableSort(sort0=[$0], dir0=[DESC]) EnumerableAggregate(group=[{0}], c=[COUNT()]) - EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3]) + EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8]) Review comment: Let me put it in another way: you change the optimizer, and now it favours bad plans. What it does it introduces a dummy always_true filter, and it thinks the filter would reduce the number of rows and so on. It does not look like a well-behaving optimizer :-/ Even though the change reduces `slow test` execution, that reduction might be the result of "skipping some rules" rather than removing importance. So currently it looks like some rules do not fire which result in a noticeable amount of useless predicates floating around. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388218456 ## File path: core/src/test/resources/sql/sub-query.iq ## @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], expr#5=[IS TRUE($t4)], e EnumerableLimit(fetch=[1]) EnumerableSort(sort0=[$0], dir0=[DESC]) EnumerableAggregate(group=[{0}], c=[COUNT()]) - EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3]) + EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8]) Review comment: This looks like a plan degradation. For instance `expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)]` is the same as `null:BOOLEAN`. Do you know the reason for this plan degradation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r388213305 ## File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java ## @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider, + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n" + "group by \"the_year\"") .enableMaterializations(true) -.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT NULL($t0)], " -+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, $t3)], C=[$t4])\n" -+ " EnumerableAggregate(group=[{0}])\n" -+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])") +.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n" ++ " EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n" ++ "EnumerableAggregate(group=[{0}])\n" Review comment: This looks like a plan degradation, doesn't it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance URL: https://github.com/apache/calcite/pull/1840#discussion_r387935235 ## File path: core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java ## @@ -87,36 +85,11 @@ * according to a dynamic programming algorithm. */ public class VolcanoPlanner extends AbstractRelOptPlanner { - protected static final double COST_IMPROVEMENT = .5; //~ Instance fields protected RelSubset root; - /** - * If true, the planner keeps applying rules as long as they continue to - * reduce the cost. If false, the planner terminates as soon as it has found - * any implementation, no matter how expensive. - */ - protected boolean ambitious = true; - - /** - * If true, and if {@link #ambitious} is true, the planner waits a finite - * number of iterations for the cost to improve. - * - * The number of iterations K is equal to the number of iterations - * required to get the first finite plan. After the first finite plan, it - * continues to fire rules to try to improve it. The planner sets a target - * cost of the current best cost multiplied by {@link #COST_IMPROVEMENT}. If - * it does not meet that cost target within K steps, it quits, and uses the - * current best plan. If it meets the cost, it sets a new, lower target, and - * has another K iterations to meet it. And so forth. - * - * If false, the planner continues to fire rules until the rule queue is - * empty. - */ - protected boolean impatient = false; Review comment: Can we rename `impatient` flag to [`useSimulatedAnnealing`](https://en.wikipedia.org/wiki/Simulated_annealing) ? It would sound sophisticated enough to keep it from removing :) Jokes aside, I think the planning cannot be always fit into a finite amount of time, so it makes sense to have a predefined timeout for the planner. That is it could give up and just return the best it has so far. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services