[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800380#comment-16800380 ] Danny Chan commented on CALCITE-2936: - So the MinRowCount and MaxRowCount are all estimated but safe bounds. We should update the java doc i think. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799519#comment-16799519 ] Julian Hyde commented on CALCITE-2936: -- {{MinRowCount}} is an estimate, but it is a safe lower bound. If {{MinRowCount}} evaluates to 2 for a particular relational expression, then the relational expression might return 2 rows or 200, but will never return 1 or 0 rows. Similarly {{MaxRowCount}}. They are intended for precisely these kinds of optimizations. I'm sorry that the documentation doesn't make that crystal clear. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799268#comment-16799268 ] Haisheng Yuan commented on CALCITE-2936: [~danny0405] Let's continue discussion here. I understand that the description in the code concerns you, but I think the comments are misleading, the word should be determine, not estimate. The minRowCount and maxRowCount are provided to help determine whether we can do further optimization, like aggregate / sort removal, existential check, not for cardinality and cost estimation. I don't how much value it will provide if it is an estimate value. Should we update the comments If I don't misunderstand the intention? > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799254#comment-16799254 ] Haisheng Yuan commented on CALCITE-2936: In order to remove semijoin, we have to determine that for every tuple from outer rel, there must be one and only one matching tuple in the inner rel. I have no clue for the example query in this JIRA. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798331#comment-16798331 ] Julian Hyde commented on CALCITE-2936: -- Yeah, I agree it's a bit weird to use metaData during sql-to-rel conversion. However, I think it's OK (especially as MinRowCount is "safe", not an estimate). We already use it when {{SqlToRelConverter}} calls {{RelBuilder.aggregate}}. It's a long shot, but is there any way to push this logic into {{RelBuilder.semiJoin}}? Then people would get it whether they use SqlToRelConverter or some other means (say within a rule). > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797659#comment-16797659 ] Haisheng Yuan commented on CALCITE-2936: Revoke my previous statement. We can use getMinRowCount in SqlToRelConverter. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797653#comment-16797653 ] Haisheng Yuan commented on CALCITE-2936: In that case, the logical relational node we get from SqlToRelConverter is already converted into a join or Correlate. I think it might be late or complex to use RelMdRowCount stats to do the simplification. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
[ https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797623#comment-16797623 ] Julian Hyde commented on CALCITE-2936: -- Consider using the RelMdRowCount statistic for this. If it returns a value 1.0 or higher, it is safe to convert EXISTS to TRUE and NOT EXISTS to FALSE. > Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()" > -- > > Key: CALCITE-2936 > URL: https://issues.apache.org/jira/browse/CALCITE-2936 > Project: Calcite > Issue Type: New Feature >Reporter: Haisheng Yuan >Priority: Major > > An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no > grouping columns should be simplified to a Boolean constant. > Example: > {code:java} > exists(select sum(i) from X) --> true > not exists(select sum(i) from X) --> false > {code} > Repro: > {code:java} > @Test public void testExistentialSubquery() { > final String sql = "SELECT e1.empno\n" > + "FROM emp e1 where exists\n" > + "(select avg(sal) from emp e2 where e1.empno = e2.empno )"; > sql(sql).decorrelate(true).ok(); > } > {code} > We got plan: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($0, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(EMPNO=[$0], $f0=[true]) > LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)]) > LogicalProject(EMPNO=[$0], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > The preferred plan should be: > {code:java} > LogicalProject(EMPNO=[$0]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)