[
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhong Yu updated CALCITE-2195:
--
Description:
The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in
which case the rule does not aggregate on the A side, `leftSubTotal==null`,
causing `splitter.topSplit()` to only sum over `count()` coming from the B side.
A solution would be to introduce `splitter.singleton()` on the A side, so that
it can be fed to topSplit() to be multiplied by the count.
In the case that the `unique` side does not contain the column of an agg call,
it seems that we should do `other_singleton()` on this side, and feed it to
topSplit(). However, realize that the `other()` expression is necessarily a
`count()`, or a scalar function of `count()`, because it does not depend on any
column values. In the same way, the proposed `other_singleton()` necessarily
returns 1, or some constant value. `topSplit()` would not have any need of that
constant value.Therefore in this case, we don't need a split on this side, just
leave its subtotal as null.
I'm working on a pull-request based on these understandings. BTW, is there a
reference to the algorithm used in the code? I can only find some old papers
that don't exactly cover the logic of the code. Thanks.
was:
The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in
which case the rule does not aggregate on the A side, `leftSubTotal==null`,
causing `splitter.topSplit()` to only sum over `count(*)` coming from the B
side.
A solution would be to introduce `splitter.singleton()` on the A side, so that
it can be fed to topSplit() to be multiplied by the count.
In the case that the `unique` side does not contain the column of an agg call,
it seems that we should do `other_singleton()` on this side, and feed it to
topSplit(). However, realize that the `other()` expression is necessarily a
`count(*)`, or a scalar function of `count(*)`, because it does not depend on
any column values. In the same way, the proposed `other_singleton()`
necessarily returns 1, or some constant value. `topSplit()` would not have any
need of that constant value.Therefore in this case, we don't need a split on
this side, just leave its subtotal as null.
I'm working on a pull-request based on these understandings. BTW, is there a
reference to the algorithm used in the code? I can only find some old papers
that don't exactly cover the logic of the code. Thanks.
> AggregateJoinTransposeRule fails to aggregate over unique column
>
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
> Issue Type: Bug
> Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.15.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
> causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in
> which case the rule does not aggregate on the A side, `leftSubTotal==null`,
> causing `splitter.topSplit()` to only sum over `count()` coming from the B
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg
> call, it seems that we should do `other_singleton()` on this side, and feed
> it to topSplit(). However, realize that the `other()` expression is
> necessarily a `count()`, or a scalar function of `count()`, because it does
> not depend on any column values. In the same way, the proposed
> `other_singleton()` necessarily returns 1, or some constant value.
> `topSplit()` would not have any need of that constant value.Therefore in this
> case