[jira] [Created] (CALCITE-3761) How to write a rule with optional intermediate operands?

2020-01-30 Thread anjali shrishrimal (Jira)
anjali shrishrimal created CALCITE-3761:
---

 Summary: How to write a rule with optional intermediate operands?
 Key: CALCITE-3761
 URL: https://issues.apache.org/jira/browse/CALCITE-3761
 Project: Calcite
  Issue Type: Wish
  Components: core
Reporter: anjali shrishrimal


I want to write a rule to match a plan based on, only root/top RelNode and leaf 
RelNode, all Intermediate RelNodes are optional.
What operands should be passed to such rule?

 

Suppose Logical Plan is like given below.
{code:java}
LogicalRelNode4
 LogicalRelNode3 (optional)
         LogicalRelNode2 (optional)
  LogicalRelNode1
{code}
LogicalRelNode2 and LogicalRelNode3 are optional. Rule should match the 
structure irrespective to the presence of these optional Nodes.

Rule should get matched for all the following structures.
{code:java}
1. LogicalRelNode4
LogicalRelNode3
 LogicalRelNode2
  LogicalRelNode1 

2. LogicalRelNode4 
LogicalRelNode2 
 LogicalRelNode1


3. LogicalRelNode4 
LogicalRelNode3 
 LogicalRelNode1 

4. LogicalRelNode4 
LogicalRelNode1
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027214#comment-17027214
 ] 

Julian Hyde commented on CALCITE-3760:
--

I wasn't aware of Couchbase and N1QL's {{LET}}. Thanks for sharing.

That said, for these purposes we don't need to add {{LET}} to SQL or even to 
the {{SqlNode}} language. It would be sufficient to add it to the {{RexNode}} 
language.

And in fact we already have {{RexProgram}}, which allows you to define several 
expressions based on temporary expressions. We don't use {{RexProgram}} very 
much these days, because it is just a little harder to write transformation 
rules against a {{RexProgram}} than against a list of {{RexNode}}. On 
reflection, I think history would repeat itself, and adding variables would 
complicate too many places.

So, maybe the best way is to use a Project on a Project:
{noformat}
 select coalesce(udf(c), 100)
 from foo
{noformat}
becomes
{noformat}
select case when x is not null then x else 100 end
from (
  select udf(c)
  from foo)
{noformat}
As we discussed recently, it would be illegal to merge those Projects because 
of the UDF. So the udf would be called exactly once per row.

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-3760:

Labels: pull-request-available  (was: )

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027196#comment-17027196
 ] 

Jin Xing edited comment on CALCITE-3760 at 1/31/20 5:06 AM:


Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~

Yes, a *LET* clause would be very helpful. It allows us to store the result of 
a sub-expression, e.g. result generated from a non-deterministic udf/udaf, and 
use it in subsequent clauses. Thus to ensure non-deterministic expressions are 
evaluated the consistent number of times. It's already supported by some 
vendors [1]. But I would prefer the rewriting within scope of common and 
standard sql, a common scenario is we always want to convert expression back to 
Sql string and run in jdbc convention. A non-common clause might bring obstacle 
to run the sql in other dialects. So I propose to don't do the rewrites when 
found non-deterministic.
{quote}Related issues are re-ordering of the branches of AND and OR conditions, 
and behavior when an expression throws.
{quote}
Currently RexSimplify already takes determinism of expression into 
consideration (there might be space to improve). A missed part is to add an 
interface for udf/udaf to specify whether it's deterministic.

 

[1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html


was (Author: jinxing6...@126.com):
Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~

Yes, a LET clause would be very helpful. It allows us to store the result of a 
sub-expression, e.g. result generated from a  non-deterministic udf/udaf, and 
use it in subsequent clauses. Thus to ensure non-deterministic expressions are 
evaluated the consistent number of times. It's already supported by some 
vendors [1]. But I would prefer the rewriting within scope of sql standard, a 
common scenario is we always want to convert expression back to Sql string and 
run in jdbc convention. A non-standard clause might bring obstacle to run the 
sql in other dialects. So I propose to don't do the rewrites when found 
non-deterministic.
{quote}Related issues are re-ordering of the branches of AND and OR conditions, 
and behavior when an expression throws.
{quote}
Currently RexSimplify already takes determinism of expression into 
consideration (there might be space to improve). A missed part is to add an 
interface for udf/udaf to specify whether it's deterministic.

 

[1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027196#comment-17027196
 ] 

Jin Xing commented on CALCITE-3760:
---

Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~

Yes, a LET clause would be very helpful. It allows us to store the result of a 
sub-expression, e.g. result generated from a  non-deterministic udf/udaf, and 
use it in subsequent clauses. Thus to ensure non-deterministic expressions are 
evaluated the consistent number of times. It's already supported by some 
vendors [1]. But I would prefer the rewriting within scope of sql standard, a 
common scenario is we always want to convert expression back to Sql string and 
run in jdbc convention. A non-standard clause might bring obstacle to run the 
sql in other dialects. So I propose to don't do the rewrites when found 
non-deterministic.
{quote}Related issues are re-ordering of the branches of AND and OR conditions, 
and behavior when an expression throws.
{quote}
Currently RexSimplify already takes determinism of expression into 
consideration (there might be space to improve). A missed part is to add an 
interface for udf/udaf to specify whether it's deterministic.

 

[1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering

2020-01-30 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027058#comment-17027058
 ] 

Xiening Dai commented on CALCITE-3753:
--

{quote}
Roman Kondakov Calcite's engine always had the capability of "Cascades style 
optimization with aggressive search space pruning". That is achieved by the 
'importance' concept, the sorted queue of rule matches, and the ability to stop 
optimization when the plan stops improving.
{quote}

I my opinion, space pruning and the current Calcite importance concept are 
different. Space pruning is archived through top down optimization using 
lower-bound, upper-bound calculation to eliminates alternatives that are 
*guaranteed* to be worse. But Calcite rule importance setting is more heuristic 
and cannot garantee the best plan is found. The "impatient" mode is 
non-deterministic which makes it hardly useful in reality.

{quote}
Top-down is a subtlety in the Volcano paper that I missed. If top-down (or 
something else) would solve the problem of requested traits then we should 
consider it.
{quote}

I think Top-down is not just useful for requested traits, but also necessary 
for space pruning - the lower-bound/upper-bound pruning can only be done 
through top-down approach. Unfortunately current design of Calcite has many 
aspects that would work against top-down searching. For example, in some cases, 
an implementation rule (or even enforcement rule) can generate logical rel, 
which then would require logic transformation to be applied again 
(CALCITE-2970). So the plan might have to go back to the parent nodes again. If 
we move to complete top down approach, we would have to put some limitations on 
current RelOptRule (some interfaces change maybe), and then backward 
compatibility would also become a problem.  

> Always try to match and execute substitution rule first and remove rulematch 
> ordering
> -
>
> Key: CALCITE-3753
> URL: https://issues.apache.org/jira/browse/CALCITE-3753
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
> Attachments: image-2020-01-27-20-27-57-957.png
>
>
> In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be 
> defined as SubstitutionRule, so that we can always try to match and execute 
> them first (without deferring rule call). All the other rulematches doesn't 
> need to be sorted and rules can be executed in any order they matched, since 
> we are going to execute all of them anyway, sooner or later. Computing and 
> comparing importances cause a lot of latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering

2020-01-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026950#comment-17026950
 ] 

Julian Hyde commented on CALCITE-3753:
--

[~zabetak], [~hyuan], Top-down is a subtlety in the Volcano paper that I 
missed. If top-down (or something else) would solve the problem of requested 
traits then we should consider it.

> Always try to match and execute substitution rule first and remove rulematch 
> ordering
> -
>
> Key: CALCITE-3753
> URL: https://issues.apache.org/jira/browse/CALCITE-3753
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
> Attachments: image-2020-01-27-20-27-57-957.png
>
>
> In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be 
> defined as SubstitutionRule, so that we can always try to match and execute 
> them first (without deferring rule call). All the other rulematches doesn't 
> need to be sorted and rules can be executed in any order they matched, since 
> we are going to execute all of them anyway, sooner or later. Computing and 
> comparing importances cause a lot of latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering

2020-01-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026946#comment-17026946
 ] 

Julian Hyde commented on CALCITE-3753:
--

[~rkondakov] Calcite's engine always had the capability of "Cascades style 
optimization with aggressive search space pruning". That is achieved by the 
'importance' concept, the sorted queue of rule matches, and the ability to stop 
optimization when the plan stops improving.

But we didn't use the capability because no one ever tuned the 'importance' and 
'when to stop' metrics. That empirical tuning is not a matter for the engine.

> Always try to match and execute substitution rule first and remove rulematch 
> ordering
> -
>
> Key: CALCITE-3753
> URL: https://issues.apache.org/jira/browse/CALCITE-3753
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
> Attachments: image-2020-01-27-20-27-57-957.png
>
>
> In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be 
> defined as SubstitutionRule, so that we can always try to match and execute 
> them first (without deferring rule call). All the other rulematches doesn't 
> need to be sorted and rules can be executed in any order they matched, since 
> we are going to execute all of them anyway, sooner or later. Computing and 
> comparing importances cause a lot of latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026943#comment-17026943
 ] 

Julian Hyde edited comment on CALCITE-3760 at 1/30/20 7:22 PM:
---

SQL is a [strict 
language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the 
exception of a few constructs such as CASE) but becomes non-strict when you add 
non-deterministic UDFs. As you point out, some of our rewrites assume 
strictness. It would be helpful if we had a 'let' construct, e.g. 
{{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x 
else e2 end}}. It would ensure that expressions are evaluated the correct 
number of times. Without {{let}} or something similar I don't know how we could 
do these rewrites.

Related issues are re-ordering of the branches of AND and OR conditions, and 
behavior when an expression throws.


was (Author: julianhyde):
SQL is a [strict 
language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the 
exception of a few constructs such as CASE) but becomes non-strict when you add 
non-deterministic UDFs. As you point out, some of our rewrites assume 
strictness. It would be helpful if we had a 'let' construct, e.g. 
{{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x 
else e2 end}}. It would ensure that expressions are evaluated the correct 
number of times. Without {{let}} or something similar I don't know how we could 
do these rewrites.

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026943#comment-17026943
 ] 

Julian Hyde commented on CALCITE-3760:
--

SQL is a [strict 
language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the 
exception of a few constructs such as CASE) but becomes non-strict when you add 
non-deterministic UDFs. As you point out, some of our rewrites assume 
strictness. It would be helpful if we had a 'let' construct, e.g. 
{{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x 
else e2 end}}. It would ensure that expressions are evaluated the correct 
number of times. Without {{let}} or something similar I don't know how we could 
do these rewrites.

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026927#comment-17026927
 ] 

Rui Wang commented on CALCITE-3760:
---

It makes sense.

Regarding to the UDF/UDAF, as it is user defined, usually we cannot control 
what users really write as code. Sometimes even if users tell us the UDF is 
deterministic, it might be just not. In this case, adding a parameter for users 
do not solve the problem from root.

In production practice on my side, we usually just build a contract with users: 
say we expect your UDF satisfies A, B and C. If your UDF does not satisfy those 
properties, the query result will be unpredictable.

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3759) Class memory leak due to code generation

2020-01-30 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026911#comment-17026911
 ] 

Rui Wang commented on CALCITE-3759:
---

Ah I  misunderstood what class leak meant in this Jira: I thought that meant 
Calcite releases some classes that does not belong to Calcite (e.g. class name 
not start from org.apache.calcite). But seems like the class leak talked about 
here is  objects stay in memory and not GCed forever. Sorry I wasn't helpful at 
the beginning.

> Class memory leak due to code generation
> 
>
> Key: CALCITE-3759
> URL: https://issues.apache.org/jira/browse/CALCITE-3759
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.21.0
>Reporter: Mike Villa
>Priority: Major
> Attachments: image-2020-01-28-15-55-43-215.png
>
>
> Hi, I'm using calcite and I'm making unit test to see the perform, but with 
> visualvm or jconsole I have checked a class leak. Maybe It's my fault.
> I would be grateful if someone helped me to find the error!
> I have created a GitHub project to check this error.
>  https://github.com/mvillafuertem/calcite-error.git
>  
> !image-2020-01-28-15-55-43-215.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-30 Thread Jin Xing (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Xing updated CALCITE-3760:
--
Summary: Rewriting non-deterministic function can break query semantics  
(was: Rewriting function without considering determinism can break query 
semantics)

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3760) Rewriting function without considering determinism can break query semantics

2020-01-30 Thread Jin Xing (Jira)
Jin Xing created CALCITE-3760:
-

 Summary: Rewriting function without considering determinism can 
break query semantics
 Key: CALCITE-3760
 URL: https://issues.apache.org/jira/browse/CALCITE-3760
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Jin Xing
Assignee: Jin Xing


Calcite rewrite some *SqlFunctions* during validation. But whether the function 
is deterministic is not considered. For a non-deterministic operator, the 
rewriting can break semantics. Additionally there's no interface for user to 
specify the determinism for a UDF/UDAF. 

Say I have non-deterministic UDF & UDAF and run sql like below
{code:java}
select coalesce(udf(col0), 100) from foo;
select nullif(udaf(col0), 1024) from foo;{code}
They will be rewritten as
{code:java}
select case when udf(col0) is not null then udf(col0) else 100 end
from foo;

select case when udaf(col0)=1024 then null udaf(col0)
from foo{code}
As we can see that non-deterministic UDF & UDAF are called multiple times after 
written. Thus the condition in WHEN clause might NOT be held all the time.

We need to provide an interface for user to specify the determinism in UDF/UDAF 
and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3759) Class memory leak due to code generation

2020-01-30 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026688#comment-17026688
 ] 

Stamatis Zampetakis commented on CALCITE-3759:
--

Hey [~mikevm], I played around with your example and I don't observe any leak. 
It is normal that class loading is increasing but there does not seem to be 
somebody who holds references to these classes. If you ask for gc (you can do 
this via VisualVM or another tool) you can see that the classes are unloaded 
directly. 

> Class memory leak due to code generation
> 
>
> Key: CALCITE-3759
> URL: https://issues.apache.org/jira/browse/CALCITE-3759
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.21.0
>Reporter: Mike Villa
>Priority: Major
> Attachments: image-2020-01-28-15-55-43-215.png
>
>
> Hi, I'm using calcite and I'm making unit test to see the perform, but with 
> visualvm or jconsole I have checked a class leak. Maybe It's my fault.
> I would be grateful if someone helped me to find the error!
> I have created a GitHub project to check this error.
>  https://github.com/mvillafuertem/calcite-error.git
>  
> !image-2020-01-28-15-55-43-215.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3759) Class memory leak due to code generation

2020-01-30 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated CALCITE-3759:
-
Summary: Class memory leak due to code generation  (was: class leak)

> Class memory leak due to code generation
> 
>
> Key: CALCITE-3759
> URL: https://issues.apache.org/jira/browse/CALCITE-3759
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.21.0
>Reporter: Mike Villa
>Priority: Major
> Attachments: image-2020-01-28-15-55-43-215.png
>
>
> Hi, I'm using calcite and I'm making unit test to see the perform, but with 
> visualvm or jconsole I have checked a class leak. Maybe It's my fault.
> I would be grateful if someone helped me to find the error!
> I have created a GitHub project to check this error.
>  https://github.com/mvillafuertem/calcite-error.git
>  
> !image-2020-01-28-15-55-43-215.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3724) Implement PrestoSqlDialect

2020-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CALCITE-3724:

Labels: pull-request-available  (was: )

> Implement PrestoSqlDialect
> --
>
> Key: CALCITE-3724
> URL: https://issues.apache.org/jira/browse/CALCITE-3724
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>
> Implement PrestoSqlDialect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-2885) SqlValidatorImpl fails when processing an InferTypes.FIRST_KNOWN function containing a function with a dynamic parameter as first operand

2020-01-30 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026547#comment-17026547
 ] 

Ruben Q L commented on CALCITE-2885:


Thanks for your comment [~jinxing6...@126.com], it's been a while since I 
logged this issue, I need to remind the specifics. I'll take a look at your 
suggestion.

> SqlValidatorImpl fails when processing an InferTypes.FIRST_KNOWN function 
> containing a function with a dynamic parameter as first operand
> -
>
> Key: CALCITE-2885
> URL: https://issues.apache.org/jira/browse/CALCITE-2885
> Project: Calcite
>  Issue Type: Bug
>Affects Versions: 1.18.0
>Reporter: Ruben Q L
>Priority: Major
>
> Problem can be reproduced by adding following tests (e.g. to 
> SqlValidatorDynamicTest.java):
> {code:java}
> @Test public void testDynamicParameter1() throws Exception {
>   final String sql = "select 4 = 2*?";
>   sql(sql).ok();
> }
> @Test public void testDynamicParameter2() throws Exception {
>   final String sql = "select 2*? = 4";
>   sql(sql).ok();
> }
> {code}
> The first test will run successfully, but the second one (which is the same 
> query reversing the equality operands) will fail with the exception:
> {code}
> org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '*' to 
> arguments of type ' * '. Supported form(s): ' * 
> '   ' * '   ' * 
> '
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)