[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-24 Thread Danny Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800380#comment-16800380
 ] 

Danny Chan commented on CALCITE-2936:
-

So the MinRowCount and MaxRowCount are all estimated but safe bounds. We should 
update the java doc i think.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-22 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799519#comment-16799519
 ] 

Julian Hyde commented on CALCITE-2936:
--

{{MinRowCount}} is an estimate, but it is a safe lower bound. If 
{{MinRowCount}} evaluates to 2 for a particular relational expression, then the 
relational expression might return 2 rows or 200, but will never return 1 or 0 
rows.

Similarly {{MaxRowCount}}.

They are intended for precisely these kinds of optimizations. I'm sorry that 
the documentation doesn't make that crystal clear.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-22 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799268#comment-16799268
 ] 

Haisheng Yuan commented on CALCITE-2936:


[~danny0405] Let's continue discussion here. I understand that the description 
in the code concerns you, but I think the comments are misleading, the word 
should be determine, not estimate. The minRowCount and maxRowCount are provided 
to help determine whether we can do further optimization, like aggregate / sort 
removal, existential check, not for cardinality and cost estimation. I don't 
how much value it will provide if it is an estimate value. Should we update the 
comments If I don't misunderstand the intention?

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-22 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799254#comment-16799254
 ] 

Haisheng Yuan commented on CALCITE-2936:


In order to remove semijoin, we have to determine that for every tuple from 
outer rel, there must be one and only one matching tuple in the inner rel. I 
have no clue for the example query in this JIRA.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-21 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798331#comment-16798331
 ] 

Julian Hyde commented on CALCITE-2936:
--

Yeah, I agree it's a bit weird to use metaData during sql-to-rel conversion. 
However, I think it's OK (especially as MinRowCount is "safe", not an 
estimate). We already use it when {{SqlToRelConverter}} calls 
{{RelBuilder.aggregate}}. 

It's a long shot, but is there any way to push this logic into 
{{RelBuilder.semiJoin}}? Then people would get it whether they use 
SqlToRelConverter or some other means (say within a rule).

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-20 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797659#comment-16797659
 ] 

Haisheng Yuan commented on CALCITE-2936:


Revoke my previous statement. We can use getMinRowCount in SqlToRelConverter.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-20 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797653#comment-16797653
 ] 

Haisheng Yuan commented on CALCITE-2936:


In that case, the logical relational node we get from SqlToRelConverter is 
already converted into a join or Correlate. I think it might be late or complex 
to use RelMdRowCount stats to do the simplification.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2936) Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"

2019-03-20 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797623#comment-16797623
 ] 

Julian Hyde commented on CALCITE-2936:
--

Consider using the RelMdRowCount statistic for this. If it returns a value 1.0 
or higher, it is safe to convert EXISTS to TRUE and NOT EXISTS to FALSE.

> Simplify EXISTS or NOT EXISTS sub-query that has "GROUP BY ()"
> --
>
> Key: CALCITE-2936
> URL: https://issues.apache.org/jira/browse/CALCITE-2936
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Haisheng Yuan
>Priority: Major
>
> An EXISTS or NOT EXISTS sub-query whose inner child is an aggregate with no 
> grouping columns should be simplified to a Boolean constant.
> Example:
> {code:java}
> exists(select sum(i) from X) --> true
> not exists(select sum(i) from X) --> false
> {code}
> Repro:
> {code:java}
> @Test public void testExistentialSubquery() {
> final String sql = "SELECT e1.empno\n"
> + "FROM emp e1 where exists\n"
> + "(select avg(sal) from emp e2 where e1.empno = e2.empno )";
> sql(sql).decorrelate(true).ok();
>   }
> {code}
> We got plan:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], EMPNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(EMPNO=[$0], $f0=[true])
>   LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
> LogicalProject(EMPNO=[$0], SAL=[$5])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> The preferred plan should be:
> {code:java}
> LogicalProject(EMPNO=[$0])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)