[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388579721
 
 

 ##
 File path: core/src/test/resources/sql/sub-query.iq
 ##
 @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], 
expr#5=[IS TRUE($t4)], e
 EnumerableLimit(fetch=[1])
   EnumerableSort(sort0=[$0], dir0=[DESC])
 EnumerableAggregate(group=[{0}], c=[COUNT()])
-  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
+  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], 
expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], 
expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])
 
 Review comment:
   OK. You propose a change. It results in generating bad plans, thus it 
introduces a technical regression. There's a technical justification for -1.
   
   I know cost model has awful lot of inconsistencies. However, it turns out 
that all those 100 tiny inconsistencies cancel each other, and Calcite manages 
to produce "sane" plans.
   Now you fix one or two such defects (which has good merit), however, the net 
result becomes that there are 98 inconsistencies in the cost model which **no 
longer** cancel each other.
   
   Calcite purpose is optimizer, and it is really sad to introduce regressions 
to the optimizer.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388573407
 
 

 ##
 File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java
 ##
 @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider,
 + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n"
 + "group by \"the_year\"")
 .enableMaterializations(true)
-.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT 
NULL($t0)], "
-+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, 
$t3)], C=[$t4])\n"
-+ "  EnumerableAggregate(group=[{0}])\n"
-+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])")
+.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n"
++ "  EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
++ "EnumerableAggregate(group=[{0}])\n"
 
 Review comment:
   Who will fix the costing model then?
   I think it is unfair to merge a change that is not really compatible with 
the costing model.
   
   If the change to optimizer requires adjustments to the costing model, then 
could you please do that in a single PR, so we see the net changes for both 
plans and the response times?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388573407
 
 

 ##
 File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java
 ##
 @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider,
 + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n"
 + "group by \"the_year\"")
 .enableMaterializations(true)
-.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT 
NULL($t0)], "
-+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, 
$t3)], C=[$t4])\n"
-+ "  EnumerableAggregate(group=[{0}])\n"
-+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])")
+.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n"
++ "  EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
++ "EnumerableAggregate(group=[{0}])\n"
 
 Review comment:
   Who will fix the costing model then?
   I think it is unfair to merge a change that is not really compatible with 
the costing model.
   
   If the change to optimizer requires adjustments to the costing model, then 
could you please do that in a single commit, so we see the net changes for both 
plans and the response times?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388570453
 
 

 ##
 File path: core/src/test/resources/sql/sub-query.iq
 ##
 @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], 
expr#5=[IS TRUE($t4)], e
 EnumerableLimit(fetch=[1])
   EnumerableSort(sort0=[$0], dir0=[DESC])
 EnumerableAggregate(group=[{0}], c=[COUNT()])
-  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
+  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], 
expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], 
expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])
 
 Review comment:
   Let me put it in another way: you change the optimizer, and now it favours 
bad plans.
   What the optimizer now does it introduces a dummy always_true filter, and it 
thinks the filter would reduce the number of rows and so on. It does not look 
like a well-behaving optimizer :-/
   
   Even though the change reduces `slow test` execution, that reduction might 
be the result of "skipping some rules" rather than removing importance.
   
   So currently it looks like some rules do not fire which result in a 
noticeable amount of useless predicates floating around.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388570453
 
 

 ##
 File path: core/src/test/resources/sql/sub-query.iq
 ##
 @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], 
expr#5=[IS TRUE($t4)], e
 EnumerableLimit(fetch=[1])
   EnumerableSort(sort0=[$0], dir0=[DESC])
 EnumerableAggregate(group=[{0}], c=[COUNT()])
-  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
+  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], 
expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], 
expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])
 
 Review comment:
   Let me put it in another way: you change the optimizer, and now it favours 
bad plans.
   What it does it introduces a dummy always_true filter, and it thinks the 
filter would reduce the number of rows and so on. It does not look like a 
well-behaving optimizer :-/
   
   Even though the change reduces `slow test` execution, that reduction might 
be the result of "skipping some rules" rather than removing importance.
   
   So currently it looks like some rules do not fire which result in a 
noticeable amount of useless predicates floating around.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388218456
 
 

 ##
 File path: core/src/test/resources/sql/sub-query.iq
 ##
 @@ -831,7 +831,7 @@ EnumerableCalc(expr#0..3=[{inputs}], expr#4=[NOT($t2)], 
expr#5=[IS TRUE($t4)], e
 EnumerableLimit(fetch=[1])
   EnumerableSort(sort0=[$0], dir0=[DESC])
 EnumerableAggregate(group=[{0}], c=[COUNT()])
-  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], cs=[$t3])
+  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[false], expr#4=[123], 
expr#5=[null:INTEGER], expr#6=[=($t4, $t5)], expr#7=[IS NULL($t5)], 
expr#8=[OR($t6, $t7)], cs=[$t3], $condition=[$t8])
 
 Review comment:
   This looks like a plan degradation.
   For instance `expr#4=[123], expr#5=[null:INTEGER], expr#6=[=($t4, $t5)]` is 
the same as `null:BOOLEAN`.
   
   Do you know the reason for this plan degradation?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-05 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r388213305
 
 

 ##
 File path: core/src/test/java/org/apache/calcite/test/LatticeTest.java
 ##
 @@ -677,10 +677,10 @@ private void checkTileAlgorithm(String statisticProvider,
 + "join \"foodmart\".\"time_by_day\" using (\"time_id\")\n"
 + "group by \"the_year\"")
 .enableMaterializations(true)
-.explainContains("EnumerableCalc(expr#0=[{inputs}], expr#1=[IS NOT 
NULL($t0)], "
-+ "expr#2=[1:BIGINT], expr#3=[0:BIGINT], expr#4=[CASE($t1, $t2, 
$t3)], C=[$t4])\n"
-+ "  EnumerableAggregate(group=[{0}])\n"
-+ "EnumerableTableScan(table=[[adhoc, m{32, 36}]])")
+.explainContains("EnumerableCalc(expr#0..1=[{inputs}], C=[$t1])\n"
++ "  EnumerableAggregate(group=[{0}], C=[COUNT($0)])\n"
++ "EnumerableAggregate(group=[{0}])\n"
 
 Review comment:
   This looks like a plan degradation, doesn't it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [calcite] vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule queue importance

2020-03-04 Thread GitBox
vlsi commented on a change in pull request #1840: [CALCITE-3753] Remove rule 
queue importance
URL: https://github.com/apache/calcite/pull/1840#discussion_r387935235
 
 

 ##
 File path: 
core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java
 ##
 @@ -87,36 +85,11 @@
  * according to a dynamic programming algorithm.
  */
 public class VolcanoPlanner extends AbstractRelOptPlanner {
-  protected static final double COST_IMPROVEMENT = .5;
 
   //~ Instance fields 
 
   protected RelSubset root;
 
-  /**
-   * If true, the planner keeps applying rules as long as they continue to
-   * reduce the cost. If false, the planner terminates as soon as it has found
-   * any implementation, no matter how expensive.
-   */
-  protected boolean ambitious = true;
-
-  /**
-   * If true, and if {@link #ambitious} is true, the planner waits a finite
-   * number of iterations for the cost to improve.
-   *
-   * The number of iterations K is equal to the number of iterations
-   * required to get the first finite plan. After the first finite plan, it
-   * continues to fire rules to try to improve it. The planner sets a target
-   * cost of the current best cost multiplied by {@link #COST_IMPROVEMENT}. If
-   * it does not meet that cost target within K steps, it quits, and uses the
-   * current best plan. If it meets the cost, it sets a new, lower target, and
-   * has another K iterations to meet it. And so forth.
-   *
-   * If false, the planner continues to fire rules until the rule queue is
-   * empty.
-   */
-  protected boolean impatient = false;
 
 Review comment:
   Can we rename `impatient` flag to 
[`useSimulatedAnnealing`](https://en.wikipedia.org/wiki/Simulated_annealing) ?
   It would sound sophisticated enough to keep it from removing :)
   
   Jokes aside, I think the planning cannot be always fit into a finite amount 
of time, so it makes sense to have a predefined timeout for the planner. That 
is it could give up and just return the best it has so far.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services