[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-21 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979151#comment-15979151
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde] I have updated the PR with the new code implementation that 
addresses your remarks. Please have a look

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-21 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979003#comment-15979003
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde]

I have addressed the remarks except managing to create a test class in the 
SqlToRelConverterTest. I am following the model of the existing function but it 
seems that when the result is parsed for comparison (the result is correct - i 
manually checked and distinct is set correctly)..the checker does not find the 
plan. I did not manage to understand how to set and work with the objects that 
are verified.

Failed tests: 
  SqlToRelConverterTest.testSelectOverDistinct:452 plan expected:<[${plan}]> 
but was:<[
LogicalProject(SUMC=[CASE(>(COUNT($7) OVER (ROWS BETWEEN 10 PRECEDING AND 
CURRENT ROW), 0), CAST($SUM0($7) OVER (ROWS BETWEEN 10 PRECEDING AND CURRENT 
ROW)):INTEGER, null)])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
]>

let me know if you have any thoughts on this.

Anyway..regardless of this we should keep the existing test because we need to 
ensure that also the distinct is preserved when the project to window rule is 
applied.

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-21 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978239#comment-15978239
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde] Thanks a lot for the feedback. I will address these remarks and 
update the code

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-12 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965502#comment-15965502
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde] Great. I will proceed to add this modification. 
I know what you mean btw with going with the debugger through the steps...you 
indeed get a deep understanding of the internal mechanisms

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-11 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964720#comment-15964720
 ] 

Julian Hyde commented on CALCITE-1740:
--

Looks like {{RexOver}} will need a {{distinct}} field.

Don't just "check" the tests in RelOptRulesTest. Write a new test (I won't 
accept a patch without one) and run it in the debugger. You'll learn a lot.

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-11 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964086#comment-15964086
 ] 

radu commented on CALCITE-1740:
---

I have checked the tests in the RelOptRulesTest class, but i am not sure they 
help us with this issue. Indeed there are examples of creating Distinct 
Aggregates (as LogicalAggregates). However, when we would like to work with the 
aggregates in the over window the process happens in 2 steps
First the aggregates are created in a project context and than the 
ProjectToWindowRule is applied wen the LogicalWindow is created and the 
aggregation functions are passed. However, the SQLBasicCall which had the 
distinct aggregate is lost before this point. Also the AggregateCall objects 
are not created when the logicalWindow is created (just when the isDistinct is 
called). 
Therefore if we do not carry the distinct flag in the AggFunction then we would 
need to add a field in one of the enclosing objects (e.g., in the project and 
in the logicalwindow) to carry this information until the AggFunction are 
created.

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-10 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963471#comment-15963471
 ] 

Julian Hyde commented on CALCITE-1740:
--

I do not agree that the "distinct" flag should be part of the function. It is 
more like an argument to the function call, or an instruction to the "GROUP BY" 
operator for how to pass values to the aggregate function. If you have an 
aggregate function f (even a user-defined one) then the result of "f(distinct 
x)" is always the same as calling "f(x)" having eliminated the duplicate values 
of "x" first.

If we went with your proposal we'd have to have two variants of each aggregate 
function, and an aggregate function would have to know how to find the distinct 
or non-distinct version of itself.

The solution is to fix the rule that converts {{AggregateCall}}s to 
{{AggregateCall}}s, and make sure that it preserves the {{distinct}} field. Run 
{{RelOptRulesTest}} and put a break point in the {{AggregateCall}} constructor 
and it should become clearer.

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-10 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963260#comment-15963260
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde]

I looked over the logic of passing the DISTINCT keyword. The main problem is 
that AggFunctions do not have a flag in which the distinct flag could be carry. 
I think this is a shortcoming and should be fixed by adding a distinct flag in 
the SQLAggFunction. Without this i do not see a clear way to carry the Distinct 
flag which is ported in SQLBasicCall towards the rel parsing into objects. 
Basically the LogicalWindow will carry only the AggFunctions. Then when the 
list of AggFunctions are called for, in the get new AggregateCall objects are 
created which always have the distinct flag set to false - which is 
understandable as the flag was lost.

There are 2 options as far as i see:
Small fix towards throwing an error:
-In the sql parsing (e.g. in the SQLParserImpl) we can trow an error in case 
distinct is set and we have an agg ... however i see this more general
-In the convertOver from SQLToRelConverter we can throw an error when we build 
RexNodes with the AggFunction. This is basically the last place where we still 
have the distinct flag. From this point on the flag is lost as the 
LogicalWindow will only carry further the AggFunction
=>However none of these fixes would not solve the problem for the future 

Bigger fix
-Add the distinct flag to the AggFunction. We can than set it when it is the 
case or for now simply throw an exception instead of setting it until an 
implementation is done.

It would be great to get some feedback which version would be preferred for the 
community.  

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-07 Thread radu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960463#comment-15960463
 ] 

radu commented on CALCITE-1740:
---

[~julianhyde]

I can try to tackle this problem. My ambitious is to implement Distinct 
aggregates in Flink (as you know flink community is using calcite to parse SQL 
queries and translate them into stream typologies that would implement the 
analytic). Nevertheless, as i will dig on how to set the distinct flag i could 
of course also throw an error and push it to calcite. My plan is too look over 
this next week. I will let you know how it goes.

Do you have any pointers of where to start looking? are the aggregates for the 
window object created somewhere in any of the methods from the LogicalWindow 
rule, or perhaps in RexWinAggCall?

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CALCITE-1740) Distinct aggregate flag in window function

2017-04-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957845#comment-15957845
 ] 

Julian Hyde commented on CALCITE-1740:
--

We could solve in two ways:
* Throw an error that DISTINCT is not supported into window aggregates
* Implement DISTINCT in window aggregates

Obviously the second is better, but it's more work.

> Distinct aggregate flag in window function
> --
>
> Key: CALCITE-1740
> URL: https://issues.apache.org/jira/browse/CALCITE-1740
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: radu
>Assignee: Julian Hyde
>  Labels: build, windows
>
> When parsing queries such as
> SELECT B1, SUM(DISTINCT B2) OVER  (ORDER BY  B4 RANGE BETWEEN INTERVAL '10' 
> SECOND PRECEDING AND CURRENT ROW) FROM T
> The aggregates in the LogicalWindow do not have any marker of being distinct. 
> isDistinct() flag is not set. Probably some rule(s) silently swallow the 
> DISTINCT keyword
> The LogicalWindow object that result is
> LogicalWindow(window#0=[window(partition {} order by [2] range between $3 
> PRECEDING and CURRENT ROW aggs [COUNT($1), $SUM0($1)])])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)