[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-08 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289
 ] 

Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:29 AM:
---

Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opt-in in some cases where metadata is missing, and stats 
shows that group columns are unique or nearly unique.


was (Author: zhong.j.yu):
Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-08 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289
 ] 

Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:27 AM:
---

Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.


was (Author: zhong.j.yu):
Everything is moot if I can not prove my formula. But suppose it is correct --

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM(x), SUM(y), COUNT(x,y), all of which can be split through table 
union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)