[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2020-07-07 Thread Jiatao Tao (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152732#comment-17152732
 ] 

Jiatao Tao commented on CALCITE-2630:
-

+1, keep in clause. I do this in two companies

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2019-08-28 Thread Juhwan Kim (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918163#comment-16918163
 ] 

Juhwan Kim commented on CALCITE-2630:
-

I also agree with the idea of leaving IN as is. Of course, we could let IN 
exist only in SqlNode land, convert OR to IN before code generation, and let 
the runtime engine do the hash lookup. However, this gets trickier if we have 
NOT IN. After firing ReduceExpressionsRule, NOT with OR conditions would 
probably end up becoming AND with not equal conditions. We could explicitly 
handle this case and generate IN from there, but, having IN as it is seems to 
be a better option.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2019-02-01 Thread Haisheng Yuan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758607#comment-16758607
 ] 

Haisheng Yuan commented on CALCITE-2630:


We hit this issue quite often. Sometimes the query is generated from BI tools, 
and may have tens of thousands of IN constants. Converting IN expression in 
Project list to Join would make the plan complex. Yet converting to OR 
expressions would increase the number of Rex nodes dramatically, which may 
cause optimization overhead, as CALCITE-2792 shows. Converting constant IN list 
to join also leads to 
[CALCITE-2696|https://issues.apache.org/jira/browse/CALCITE-2696]. I would 
prefer leave IN scalar operator as it is if the number of constants greater 
than a threshold, and let the runtime engine do the hash lookup.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-20 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657907#comment-16657907
 ] 

Julian Hyde commented on CALCITE-2630:
--

The other principle I am drawing on - which has served us well - is minimalism. 
The standard says that IN is expanded to OR. Therefore OR is syntactic sugar. 
If people write the same query using equivalent IN or OR formulations we should 
generate the same plan. With your scheme we would not. 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-20 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657901#comment-16657901
 ] 

Julian Hyde commented on CALCITE-2630:
--

But why add a new operator to Rex? (IN exists only in SqlNode land; adding a 
new operator, with all of the attendant optimizations and simplifications, is 
an expensive task.) You can represent the expression as an OR and convert it to 
IN just before code generation. 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-20 Thread Rann Tao (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657770#comment-16657770
 ] 

Rann Tao commented on CALCITE-2630:
---

i agree with the [~pzw2018],We can keep the way of or and join, but we should 
also provide an in-expression (unoptimized), let the specific engine do the 
runtime implementation.


> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-19 Thread Andrei Sereda (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657727#comment-16657727
 ] 

Andrei Sereda commented on CALCITE-2630:


We have also run into similar problem with [Geode|http://geode.apache.org/] 
(gemfire) in terms of performance : 

{{x = 1 ... or x = N}} is much slower than {{x in (1, ..., N)}}

For elastic search, generated payload (JSON) for query {{x = 1 or x = 2}} is 
much larger than for {{x in (1,2)}}

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-19 Thread pengzhiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656440#comment-16656440
 ] 

pengzhiwei commented on CALCITE-2630:
-

_you can have your optimized physical IN operator in your generated code_

[~julianhyde] "In Operator" has been translated to "OR" or "Join" after logical 
plan,If people want to build a physical IN operator in there own engine ,they 
may need to rebuild IN operator from the "OR" or "join" logical plan.It need 
lot of work.

So why we not keep "IN" as it is and leave the space for engine to implement 
there own physical IN.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-19 Thread pengzhiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656410#comment-16656410
 ] 

pengzhiwei commented on CALCITE-2630:
-

   _By adjusting the subQuertThreshold you can avoid the translation to a join 
and have always the OR expansion which does not seem much more complicated than 
the equivalent IN._

[~zabetak]  The problem I meet in our engine is that when I adjust the 
_subQuertThreshold_ to a large value .e.g. 1000,the execution time by OR 
expression is

intolerable.

 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-19 Thread Stamatis Zampetakis (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656363#comment-16656363
 ] 

Stamatis Zampetakis commented on CALCITE-2630:
--

{quote}

I don't very much like the idea of adding another Rex operator.

{quote}

[~julianhyde] : I suppose you mean not creating a RexCall with SqlInOperator 
but this is possible anyways if you pass from the RelBuilder and RexBuilder.  

{quote}

I think it is much clear and good performance than translate "IN" to a complex 
"Join".

{quote}

[~pzw2018] : By adjusting the subQuertThreshold you can avoid the translation 
to a join and have always the OR expansion which does not seem much more 
complicated than the equivalent IN.

If you really want to have the IN in the RexCall, then the discussion so far in 
the dev list is leaning towards creating appropriate rules.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread pengzhiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656256#comment-16656256
 ] 

pengzhiwei commented on CALCITE-2630:
-

[~zabetak] You are right that it may break the 3rd party rules. Should we add a 
config to enable it and disable by default just like the "inSubQueryThreshold" 
does ?

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread pengzhiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656240#comment-16656240
 ] 

pengzhiwei commented on CALCITE-2630:
-

Hi [~julianhyde],We are talking about keeping "In (constant1,constant2,...)" as 
it is but not translate it to a very complex "join" logical plan as I have 
described in the description.We need not add another Rex Operator just like 
"plus","minus" and so on.It just a RexCall with a SqlInOperator.

In the runtime of calcite,we can provide a InExpression to compute the value of 
 "in". We can put all the constants into a "Set" to reduce the computational 
complexity .

I think it is much clear and good performance than translate "IN" to a complex 
"Join". 

 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1668#comment-1668
 ] 

Julian Hyde commented on CALCITE-2630:
--

I don't very much like the idea of adding another Rex operator. A new operator 
-- especially a boolean one, which needs to participate in all kinds of 
transformations, simplifications, and 3-valued logic -- is a lot of work.

If we're talking about the case "column in (constant1, constant2, ..., 
constantN)" then we already have two ways:
* First, "column = constant1 OR column = constant2 ... OR column = constantN" 
(and note that OR is an n-ary operator in Rex land, so there's one call to OR 
with N arguments).
* Second, "RexSubQuery(SqlStdOperatorTable.IN, columnRef, LogicalValues(...))".

The latter form is a hybrid scalar/relational (Rel inside Rex is unusual) but 
we could make it work. With both forms, RelToSql could recognize the form and 
translate it to an IN SqlCall, and if you have another code generator, you 
could generate code accordingly. 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655368#comment-16655368
 ] 

Zoltan Haindrich commented on CALCITE-2630:
---

Thank you [~pzw2018] for starting this conversation!

I think it would help a lot if the logic which rewrites INs (into ORs or 
subqueries) would be available as a rule instead of a built-in feature of 
sql2rel - I feel that it might also help in phasing in the runtime support for 
IN.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread Stamatis Zampetakis (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655243#comment-16655243
 ] 

Stamatis Zampetakis commented on CALCITE-2630:
--

Sure, I just wanted to be sure that we are not going to break something about 
subqueries and semi-joins.

{quote}
We can implement a InExpression for calcite runtime
{quote}
Great!

{quote}
The current translation for "IN expressions" to "join" is much harder to 
implement for other sql-engine.
{quote}
That may be true but we shouldn't forget that some systems and rules are 
already built and work with the current transformation. Depending on how the 
change is going to be introduced it can break existing 3rd party rules and 
systems. 

Last but not least it is worth checking some previous discussions in the dev 
list. The most recent one can be found 
[here|https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E].
 

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread pengzhiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654859#comment-16654859
 ] 

pengzhiwei commented on CALCITE-2630:
-

[~zabetak]  well,the "in subquery" is not included in  this  plan.

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

2018-10-18 Thread Stamatis Zampetakis (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654853#comment-16654853
 ] 

Stamatis Zampetakis commented on CALCITE-2630:
--

As far as it concerns IN expressions with a list of literals it might make 
sense. However, when it comes to IN expressions with subqueries I am not sure 
if it will be beneficial. In particular, because IN with subquery can often be 
transformed to a semi-join, which has clear semantics and many known properties 
for optimization. Moreover, note that the existing runtime does not provide an 
implementation for the IN operator (see 
[CALCITE-2618|https://issues.apache.org/jira/browse/CALCITE-2618]).

> Convert SqlInOperator to In-Expression
> --
>
> Key: CALCITE-2630
> URL: https://issues.apache.org/jira/browse/CALCITE-2630
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.17.0
>Reporter: pengzhiwei
>Assignee: Julian Hyde
>Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>  LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(ROW_VALUE=[$0], $f1=[true])
>   LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)