Vineet Garg created CALCITE-1494:
------------------------------------

             Summary: Inefficient plan for co-related subqueries
                 Key: CALCITE-1494
                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
             Project: Calcite
          Issue Type: Improvement
          Components: core
            Reporter: Vineet Garg
            Assignee: Julian Hyde


For co-related queries such as 
{noformat} select sal from emp where empno IN (select deptno from dept where 
emp.job = dept.name) {noformat}

Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 

{noformat}
LogicalProject(SAL=[$5])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
    LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalAggregate(group=[{0, 1}])
        LogicalProject(DEPTNO=[$0], JOB=[$1])
          LogicalProject(DEPTNO=[$0], JOB=[$2])
            LogicalJoin(condition=[=($2, $1)], joinType=[inner])
              LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
              LogicalAggregate(group=[{0}])
                LogicalProject(JOB=[$2])
                  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{noformat}

As you can notice there is a outer table scan (EMP in this case) to retrieve 
all distinct values for co-related column (EMP.JOB here), which is then joined 
with inner table (DEPT). 
I am not sure why is this step required. After this join Calcite is anyway 
doing group by to generate all distinct values for correlated and result column 
(DEPTNO, JOB) which is then joined with outer table. 
I think the scan + join of outer table with inner table to generate co-rrelated 
values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to