[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385284#comment-16385284 ] Zhong Yu commented on CALCITE-2195: --- Thanks Julian. To make sure I understand the problem correctly, I made some formal analysis: [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . It turns out, as the code comments, we can do singleton() on a side regardless of whether the grouping columns are unique. I'll add another issue addressing this possibility. > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.16.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383037#comment-16383037 ] Julian Hyde commented on CALCITE-2195: -- PR looks good; testing now and will merge shortly. > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.16.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378701#comment-16378701 ] Zhong Yu commented on CALCITE-2195: --- pull-request: https://github.com/apache/calcite/pull/637 > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.15.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)