[jira] [Created] (CALCITE-3761) How to write a rule with optional intermediate operands?
anjali shrishrimal created CALCITE-3761: --- Summary: How to write a rule with optional intermediate operands? Key: CALCITE-3761 URL: https://issues.apache.org/jira/browse/CALCITE-3761 Project: Calcite Issue Type: Wish Components: core Reporter: anjali shrishrimal I want to write a rule to match a plan based on, only root/top RelNode and leaf RelNode, all Intermediate RelNodes are optional. What operands should be passed to such rule? Suppose Logical Plan is like given below. {code:java} LogicalRelNode4 LogicalRelNode3 (optional) LogicalRelNode2 (optional) LogicalRelNode1 {code} LogicalRelNode2 and LogicalRelNode3 are optional. Rule should match the structure irrespective to the presence of these optional Nodes. Rule should get matched for all the following structures. {code:java} 1. LogicalRelNode4 LogicalRelNode3 LogicalRelNode2 LogicalRelNode1 2. LogicalRelNode4 LogicalRelNode2 LogicalRelNode1 3. LogicalRelNode4 LogicalRelNode3 LogicalRelNode1 4. LogicalRelNode4 LogicalRelNode1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027214#comment-17027214 ] Julian Hyde commented on CALCITE-3760: -- I wasn't aware of Couchbase and N1QL's {{LET}}. Thanks for sharing. That said, for these purposes we don't need to add {{LET}} to SQL or even to the {{SqlNode}} language. It would be sufficient to add it to the {{RexNode}} language. And in fact we already have {{RexProgram}}, which allows you to define several expressions based on temporary expressions. We don't use {{RexProgram}} very much these days, because it is just a little harder to write transformation rules against a {{RexProgram}} than against a list of {{RexNode}}. On reflection, I think history would repeat itself, and adding variables would complicate too many places. So, maybe the best way is to use a Project on a Project: {noformat} select coalesce(udf(c), 100) from foo {noformat} becomes {noformat} select case when x is not null then x else 100 end from ( select udf(c) from foo) {noformat} As we discussed recently, it would be illegal to merge those Projects because of the UDF. So the udf would be called exactly once per row. > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-3760: Labels: pull-request-available (was: ) > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > Labels: pull-request-available > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027196#comment-17027196 ] Jin Xing edited comment on CALCITE-3760 at 1/31/20 5:06 AM: Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~ Yes, a *LET* clause would be very helpful. It allows us to store the result of a sub-expression, e.g. result generated from a non-deterministic udf/udaf, and use it in subsequent clauses. Thus to ensure non-deterministic expressions are evaluated the consistent number of times. It's already supported by some vendors [1]. But I would prefer the rewriting within scope of common and standard sql, a common scenario is we always want to convert expression back to Sql string and run in jdbc convention. A non-common clause might bring obstacle to run the sql in other dialects. So I propose to don't do the rewrites when found non-deterministic. {quote}Related issues are re-ordering of the branches of AND and OR conditions, and behavior when an expression throws. {quote} Currently RexSimplify already takes determinism of expression into consideration (there might be space to improve). A missed part is to add an interface for udf/udaf to specify whether it's deterministic. [1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html was (Author: jinxing6...@126.com): Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~ Yes, a LET clause would be very helpful. It allows us to store the result of a sub-expression, e.g. result generated from a non-deterministic udf/udaf, and use it in subsequent clauses. Thus to ensure non-deterministic expressions are evaluated the consistent number of times. It's already supported by some vendors [1]. But I would prefer the rewriting within scope of sql standard, a common scenario is we always want to convert expression back to Sql string and run in jdbc convention. A non-standard clause might bring obstacle to run the sql in other dialects. So I propose to don't do the rewrites when found non-deterministic. {quote}Related issues are re-ordering of the branches of AND and OR conditions, and behavior when an expression throws. {quote} Currently RexSimplify already takes determinism of expression into consideration (there might be space to improve). A missed part is to add an interface for udf/udaf to specify whether it's deterministic. [1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027196#comment-17027196 ] Jin Xing commented on CALCITE-3760: --- Hi, [~julianhyde] [~amaliujia] Thanks a lot for feedback ~ Yes, a LET clause would be very helpful. It allows us to store the result of a sub-expression, e.g. result generated from a non-deterministic udf/udaf, and use it in subsequent clauses. Thus to ensure non-deterministic expressions are evaluated the consistent number of times. It's already supported by some vendors [1]. But I would prefer the rewriting within scope of sql standard, a common scenario is we always want to convert expression back to Sql string and run in jdbc convention. A non-standard clause might bring obstacle to run the sql in other dialects. So I propose to don't do the rewrites when found non-deterministic. {quote}Related issues are re-ordering of the branches of AND and OR conditions, and behavior when an expression throws. {quote} Currently RexSimplify already takes determinism of expression into consideration (there might be space to improve). A missed part is to add an interface for udf/udaf to specify whether it's deterministic. [1]https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/let.html > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering
[ https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027058#comment-17027058 ] Xiening Dai commented on CALCITE-3753: -- {quote} Roman Kondakov Calcite's engine always had the capability of "Cascades style optimization with aggressive search space pruning". That is achieved by the 'importance' concept, the sorted queue of rule matches, and the ability to stop optimization when the plan stops improving. {quote} I my opinion, space pruning and the current Calcite importance concept are different. Space pruning is archived through top down optimization using lower-bound, upper-bound calculation to eliminates alternatives that are *guaranteed* to be worse. But Calcite rule importance setting is more heuristic and cannot garantee the best plan is found. The "impatient" mode is non-deterministic which makes it hardly useful in reality. {quote} Top-down is a subtlety in the Volcano paper that I missed. If top-down (or something else) would solve the problem of requested traits then we should consider it. {quote} I think Top-down is not just useful for requested traits, but also necessary for space pruning - the lower-bound/upper-bound pruning can only be done through top-down approach. Unfortunately current design of Calcite has many aspects that would work against top-down searching. For example, in some cases, an implementation rule (or even enforcement rule) can generate logical rel, which then would require logic transformation to be applied again (CALCITE-2970). So the plan might have to go back to the parent nodes again. If we move to complete top down approach, we would have to put some limitations on current RelOptRule (some interfaces change maybe), and then backward compatibility would also become a problem. > Always try to match and execute substitution rule first and remove rulematch > ordering > - > > Key: CALCITE-3753 > URL: https://issues.apache.org/jira/browse/CALCITE-3753 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > Attachments: image-2020-01-27-20-27-57-957.png > > > In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be > defined as SubstitutionRule, so that we can always try to match and execute > them first (without deferring rule call). All the other rulematches doesn't > need to be sorted and rules can be executed in any order they matched, since > we are going to execute all of them anyway, sooner or later. Computing and > comparing importances cause a lot of latency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering
[ https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026950#comment-17026950 ] Julian Hyde commented on CALCITE-3753: -- [~zabetak], [~hyuan], Top-down is a subtlety in the Volcano paper that I missed. If top-down (or something else) would solve the problem of requested traits then we should consider it. > Always try to match and execute substitution rule first and remove rulematch > ordering > - > > Key: CALCITE-3753 > URL: https://issues.apache.org/jira/browse/CALCITE-3753 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > Attachments: image-2020-01-27-20-27-57-957.png > > > In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be > defined as SubstitutionRule, so that we can always try to match and execute > them first (without deferring rule call). All the other rulematches doesn't > need to be sorted and rules can be executed in any order they matched, since > we are going to execute all of them anyway, sooner or later. Computing and > comparing importances cause a lot of latency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering
[ https://issues.apache.org/jira/browse/CALCITE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026946#comment-17026946 ] Julian Hyde commented on CALCITE-3753: -- [~rkondakov] Calcite's engine always had the capability of "Cascades style optimization with aggressive search space pruning". That is achieved by the 'importance' concept, the sorted queue of rule matches, and the ability to stop optimization when the plan stops improving. But we didn't use the capability because no one ever tuned the 'importance' and 'when to stop' metrics. That empirical tuning is not a matter for the engine. > Always try to match and execute substitution rule first and remove rulematch > ordering > - > > Key: CALCITE-3753 > URL: https://issues.apache.org/jira/browse/CALCITE-3753 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > Attachments: image-2020-01-27-20-27-57-957.png > > > In VolcanoPlanner, some rules e.g. ProjectMergeRule, PruneEmptyRule can be > defined as SubstitutionRule, so that we can always try to match and execute > them first (without deferring rule call). All the other rulematches doesn't > need to be sorted and rules can be executed in any order they matched, since > we are going to execute all of them anyway, sooner or later. Computing and > comparing importances cause a lot of latency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026943#comment-17026943 ] Julian Hyde edited comment on CALCITE-3760 at 1/30/20 7:22 PM: --- SQL is a [strict language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the exception of a few constructs such as CASE) but becomes non-strict when you add non-deterministic UDFs. As you point out, some of our rewrites assume strictness. It would be helpful if we had a 'let' construct, e.g. {{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x else e2 end}}. It would ensure that expressions are evaluated the correct number of times. Without {{let}} or something similar I don't know how we could do these rewrites. Related issues are re-ordering of the branches of AND and OR conditions, and behavior when an expression throws. was (Author: julianhyde): SQL is a [strict language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the exception of a few constructs such as CASE) but becomes non-strict when you add non-deterministic UDFs. As you point out, some of our rewrites assume strictness. It would be helpful if we had a 'let' construct, e.g. {{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x else e2 end}}. It would ensure that expressions are evaluated the correct number of times. Without {{let}} or something similar I don't know how we could do these rewrites. > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026943#comment-17026943 ] Julian Hyde commented on CALCITE-3760: -- SQL is a [strict language|https://en.wikipedia.org/wiki/Strict_programming_language] (with the exception of a few constructs such as CASE) but becomes non-strict when you add non-deterministic UDFs. As you point out, some of our rewrites assume strictness. It would be helpful if we had a 'let' construct, e.g. {{coalesce(e1, e2)}} becomes {{let v = e1 in case when x is not null then x else e2 end}}. It would ensure that expressions are evaluated the correct number of times. Without {{let}} or something similar I don't know how we could do these rewrites. > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026927#comment-17026927 ] Rui Wang commented on CALCITE-3760: --- It makes sense. Regarding to the UDF/UDAF, as it is user defined, usually we cannot control what users really write as code. Sometimes even if users tell us the UDF is deterministic, it might be just not. In this case, adding a parameter for users do not solve the problem from root. In production practice on my side, we usually just build a contract with users: say we expect your UDF satisfies A, B and C. If your UDF does not satisfy those properties, the query result will be unpredictable. > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3759) Class memory leak due to code generation
[ https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026911#comment-17026911 ] Rui Wang commented on CALCITE-3759: --- Ah I misunderstood what class leak meant in this Jira: I thought that meant Calcite releases some classes that does not belong to Calcite (e.g. class name not start from org.apache.calcite). But seems like the class leak talked about here is objects stay in memory and not GCed forever. Sorry I wasn't helpful at the beginning. > Class memory leak due to code generation > > > Key: CALCITE-3759 > URL: https://issues.apache.org/jira/browse/CALCITE-3759 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.21.0 >Reporter: Mike Villa >Priority: Major > Attachments: image-2020-01-28-15-55-43-215.png > > > Hi, I'm using calcite and I'm making unit test to see the perform, but with > visualvm or jconsole I have checked a class leak. Maybe It's my fault. > I would be grateful if someone helped me to find the error! > I have created a GitHub project to check this error. > https://github.com/mvillafuertem/calcite-error.git > > !image-2020-01-28-15-55-43-215.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3760) Rewriting non-deterministic function can break query semantics
[ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Xing updated CALCITE-3760: -- Summary: Rewriting non-deterministic function can break query semantics (was: Rewriting function without considering determinism can break query semantics) > Rewriting non-deterministic function can break query semantics > -- > > Key: CALCITE-3760 > URL: https://issues.apache.org/jira/browse/CALCITE-3760 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Jin Xing >Assignee: Jin Xing >Priority: Major > > Calcite rewrite some *SqlFunctions* during validation. But whether the > function is deterministic is not considered. For a non-deterministic > operator, the rewriting can break semantics. Additionally there's no > interface for user to specify the determinism for a UDF/UDAF. > Say I have non-deterministic UDF & UDAF and run sql like below > {code:java} > select coalesce(udf(col0), 100) from foo; > select nullif(udaf(col0), 1024) from foo;{code} > They will be rewritten as > {code:java} > select case when udf(col0) is not null then udf(col0) else 100 end > from foo; > select case when udaf(col0)=1024 then null udaf(col0) > from foo{code} > As we can see that non-deterministic UDF & UDAF are called multiple times > after written. Thus the condition in WHEN clause might NOT be held all the > time. > We need to provide an interface for user to specify the determinism in > UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3760) Rewriting function without considering determinism can break query semantics
Jin Xing created CALCITE-3760: - Summary: Rewriting function without considering determinism can break query semantics Key: CALCITE-3760 URL: https://issues.apache.org/jira/browse/CALCITE-3760 Project: Calcite Issue Type: Bug Components: core Reporter: Jin Xing Assignee: Jin Xing Calcite rewrite some *SqlFunctions* during validation. But whether the function is deterministic is not considered. For a non-deterministic operator, the rewriting can break semantics. Additionally there's no interface for user to specify the determinism for a UDF/UDAF. Say I have non-deterministic UDF & UDAF and run sql like below {code:java} select coalesce(udf(col0), 100) from foo; select nullif(udaf(col0), 1024) from foo;{code} They will be rewritten as {code:java} select case when udf(col0) is not null then udf(col0) else 100 end from foo; select case when udaf(col0)=1024 then null udaf(col0) from foo{code} As we can see that non-deterministic UDF & UDAF are called multiple times after written. Thus the condition in WHEN clause might NOT be held all the time. We need to provide an interface for user to specify the determinism in UDF/UDAF and consider whether a SqlNode is deterministic when rewriting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3759) Class memory leak due to code generation
[ https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026688#comment-17026688 ] Stamatis Zampetakis commented on CALCITE-3759: -- Hey [~mikevm], I played around with your example and I don't observe any leak. It is normal that class loading is increasing but there does not seem to be somebody who holds references to these classes. If you ask for gc (you can do this via VisualVM or another tool) you can see that the classes are unloaded directly. > Class memory leak due to code generation > > > Key: CALCITE-3759 > URL: https://issues.apache.org/jira/browse/CALCITE-3759 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.21.0 >Reporter: Mike Villa >Priority: Major > Attachments: image-2020-01-28-15-55-43-215.png > > > Hi, I'm using calcite and I'm making unit test to see the perform, but with > visualvm or jconsole I have checked a class leak. Maybe It's my fault. > I would be grateful if someone helped me to find the error! > I have created a GitHub project to check this error. > https://github.com/mvillafuertem/calcite-error.git > > !image-2020-01-28-15-55-43-215.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3759) Class memory leak due to code generation
[ https://issues.apache.org/jira/browse/CALCITE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated CALCITE-3759: - Summary: Class memory leak due to code generation (was: class leak) > Class memory leak due to code generation > > > Key: CALCITE-3759 > URL: https://issues.apache.org/jira/browse/CALCITE-3759 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.21.0 >Reporter: Mike Villa >Priority: Major > Attachments: image-2020-01-28-15-55-43-215.png > > > Hi, I'm using calcite and I'm making unit test to see the perform, but with > visualvm or jconsole I have checked a class leak. Maybe It's my fault. > I would be grateful if someone helped me to find the error! > I have created a GitHub project to check this error. > https://github.com/mvillafuertem/calcite-error.git > > !image-2020-01-28-15-55-43-215.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CALCITE-3724) Implement PrestoSqlDialect
[ https://issues.apache.org/jira/browse/CALCITE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CALCITE-3724: Labels: pull-request-available (was: ) > Implement PrestoSqlDialect > -- > > Key: CALCITE-3724 > URL: https://issues.apache.org/jira/browse/CALCITE-3724 > Project: Calcite > Issue Type: Improvement >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > > Implement PrestoSqlDialect -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2885) SqlValidatorImpl fails when processing an InferTypes.FIRST_KNOWN function containing a function with a dynamic parameter as first operand
[ https://issues.apache.org/jira/browse/CALCITE-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026547#comment-17026547 ] Ruben Q L commented on CALCITE-2885: Thanks for your comment [~jinxing6...@126.com], it's been a while since I logged this issue, I need to remind the specifics. I'll take a look at your suggestion. > SqlValidatorImpl fails when processing an InferTypes.FIRST_KNOWN function > containing a function with a dynamic parameter as first operand > - > > Key: CALCITE-2885 > URL: https://issues.apache.org/jira/browse/CALCITE-2885 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.18.0 >Reporter: Ruben Q L >Priority: Major > > Problem can be reproduced by adding following tests (e.g. to > SqlValidatorDynamicTest.java): > {code:java} > @Test public void testDynamicParameter1() throws Exception { > final String sql = "select 4 = 2*?"; > sql(sql).ok(); > } > @Test public void testDynamicParameter2() throws Exception { > final String sql = "select 2*? = 4"; > sql(sql).ok(); > } > {code} > The first test will run successfully, but the second one (which is the same > query reversing the equality operands) will fail with the exception: > {code} > org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '*' to > arguments of type ' * '. Supported form(s): ' * > ' ' * ' ' * > ' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)