[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-03-04 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385284#comment-16385284
 ] 

Zhong Yu commented on CALCITE-2195:
---

Thanks Julian. To make sure I understand the problem correctly, I made some 
formal analysis: [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . 
It turns out, as the code comments, we can do singleton() on a side regardless 
of whether the grouping columns are unique. I'll add another issue addressing 
this possibility. 

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.16.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-03-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383037#comment-16383037
 ] 

Julian Hyde commented on CALCITE-2195:
--

PR looks good; testing now and will merge shortly.

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.16.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-02-27 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378701#comment-16378701
 ] 

Zhong Yu commented on CALCITE-2195:
---

pull-request: https://github.com/apache/calcite/pull/637

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.15.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)