[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979151#comment-15979151 ] radu commented on CALCITE-1740: --- [~julianhyde] I have updated the PR with the new code implementation that addresses your remarks. Please have a look > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979003#comment-15979003 ] radu commented on CALCITE-1740: --- [~julianhyde] I have addressed the remarks except managing to create a test class in the SqlToRelConverterTest. I am following the model of the existing function but it seems that when the result is parsed for comparison (the result is correct - i manually checked and distinct is set correctly)..the checker does not find the plan. I did not manage to understand how to set and work with the objects that are verified. Failed tests: SqlToRelConverterTest.testSelectOverDistinct:452 plan expected:<[${plan}]> but was:<[ LogicalProject(SUMC=[CASE(>(COUNT($7) OVER (ROWS BETWEEN 10 PRECEDING AND CURRENT ROW), 0), CAST($SUM0($7) OVER (ROWS BETWEEN 10 PRECEDING AND CURRENT ROW)):INTEGER, null)]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) ]> let me know if you have any thoughts on this. Anyway..regardless of this we should keep the existing test because we need to ensure that also the distinct is preserved when the project to window rule is applied. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978239#comment-15978239 ] radu commented on CALCITE-1740: --- [~julianhyde] Thanks a lot for the feedback. I will address these remarks and update the code > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965502#comment-15965502 ] radu commented on CALCITE-1740: --- [~julianhyde] Great. I will proceed to add this modification. I know what you mean btw with going with the debugger through the steps...you indeed get a deep understanding of the internal mechanisms > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964720#comment-15964720 ] Julian Hyde commented on CALCITE-1740: -- Looks like {{RexOver}} will need a {{distinct}} field. Don't just "check" the tests in RelOptRulesTest. Write a new test (I won't accept a patch without one) and run it in the debugger. You'll learn a lot. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964086#comment-15964086 ] radu commented on CALCITE-1740: --- I have checked the tests in the RelOptRulesTest class, but i am not sure they help us with this issue. Indeed there are examples of creating Distinct Aggregates (as LogicalAggregates). However, when we would like to work with the aggregates in the over window the process happens in 2 steps First the aggregates are created in a project context and than the ProjectToWindowRule is applied wen the LogicalWindow is created and the aggregation functions are passed. However, the SQLBasicCall which had the distinct aggregate is lost before this point. Also the AggregateCall objects are not created when the logicalWindow is created (just when the isDistinct is called). Therefore if we do not carry the distinct flag in the AggFunction then we would need to add a field in one of the enclosing objects (e.g., in the project and in the logicalwindow) to carry this information until the AggFunction are created. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963471#comment-15963471 ] Julian Hyde commented on CALCITE-1740: -- I do not agree that the "distinct" flag should be part of the function. It is more like an argument to the function call, or an instruction to the "GROUP BY" operator for how to pass values to the aggregate function. If you have an aggregate function f (even a user-defined one) then the result of "f(distinct x)" is always the same as calling "f(x)" having eliminated the duplicate values of "x" first. If we went with your proposal we'd have to have two variants of each aggregate function, and an aggregate function would have to know how to find the distinct or non-distinct version of itself. The solution is to fix the rule that converts {{AggregateCall}}s to {{AggregateCall}}s, and make sure that it preserves the {{distinct}} field. Run {{RelOptRulesTest}} and put a break point in the {{AggregateCall}} constructor and it should become clearer. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963260#comment-15963260 ] radu commented on CALCITE-1740: --- [~julianhyde] I looked over the logic of passing the DISTINCT keyword. The main problem is that AggFunctions do not have a flag in which the distinct flag could be carry. I think this is a shortcoming and should be fixed by adding a distinct flag in the SQLAggFunction. Without this i do not see a clear way to carry the Distinct flag which is ported in SQLBasicCall towards the rel parsing into objects. Basically the LogicalWindow will carry only the AggFunctions. Then when the list of AggFunctions are called for, in the get new AggregateCall objects are created which always have the distinct flag set to false - which is understandable as the flag was lost. There are 2 options as far as i see: Small fix towards throwing an error: -In the sql parsing (e.g. in the SQLParserImpl) we can trow an error in case distinct is set and we have an agg ... however i see this more general -In the convertOver from SQLToRelConverter we can throw an error when we build RexNodes with the AggFunction. This is basically the last place where we still have the distinct flag. From this point on the flag is lost as the LogicalWindow will only carry further the AggFunction =>However none of these fixes would not solve the problem for the future Bigger fix -Add the distinct flag to the AggFunction. We can than set it when it is the case or for now simply throw an exception instead of setting it until an implementation is done. It would be great to get some feedback which version would be preferred for the community. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960463#comment-15960463 ] radu commented on CALCITE-1740: --- [~julianhyde] I can try to tackle this problem. My ambitious is to implement Distinct aggregates in Flink (as you know flink community is using calcite to parse SQL queries and translate them into stream typologies that would implement the analytic). Nevertheless, as i will dig on how to set the distinct flag i could of course also throw an error and push it to calcite. My plan is too look over this next week. I will let you know how it goes. Do you have any pointers of where to start looking? are the aggregates for the window object created somewhere in any of the methods from the LogicalWindow rule, or perhaps in RexWinAggCall? > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function
[ https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957845#comment-15957845 ] Julian Hyde commented on CALCITE-1740: -- We could solve in two ways: * Throw an error that DISTINCT is not supported into window aggregates * Implement DISTINCT in window aggregates Obviously the second is better, but it's more work. > Distinct aggregate flag in window function > -- > > Key: CALCITE-1740 > URL: https://issues.apache.org/jira/browse/CALCITE-1740 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: radu >Assignee: Julian Hyde > Labels: build, windows > > When parsing queries such as > SELECT B1, SUM(DISTINCT B2) OVER (ORDER BY B4 RANGE BETWEEN INTERVAL '10' > SECOND PRECEDING AND CURRENT ROW) FROM T > The aggregates in the LogicalWindow do not have any marker of being distinct. > isDistinct() flag is not set. Probably some rule(s) silently swallow the > DISTINCT keyword > The LogicalWindow object that result is > LogicalWindow(window#0=[window(partition {} order by [2] range between $3 > PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])]) -- This message was sent by Atlassian JIRA (v6.3.15#6346)