[jira] [Commented] (CALCITE-2296) Extra logic to derive additional filters in the FilterJoinRule

2018-05-13 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473696#comment-16473696
 ] 

Zhong Yu commented on CALCITE-2296:
---

Hi Vitalii, I had some ideas on this exact issue, in case it helps – 
https://mail-archives.apache.org/mod_mbox/calcite-dev/201804.mbox/%3cCACuKZqHQhxZR09E8_AzTXrMju55RWN-ULZy-TbmYbCTjtVb-=w...@mail.gmail.com%3e

> Extra logic to derive additional filters in the FilterJoinRule
> --
>
> Key: CALCITE-2296
> URL: https://issues.apache.org/jira/browse/CALCITE-2296
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.16.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: next
>
>
> There is necessary to add logic to derive additional filters from for 
> FilterJoinRule in the case of using disjunction of expressions:
>  
> [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L155]
> The example query is:
> {code:sql} 
> select * from sales.emp e join 
> sales.dept d on e.job = d.name
> where (e.deptno = 5 AND d.deptno = 2)
> OR (e.deptno = 7 AND d.deptno = 3)
> {code}
> The filrer condition in DNF
> LogicalFilter(condition=[OR(AND(=($7, 5), =($9, 2)), AND(=($7, 7), =($9, 
> 3)))])
> can be transformed into CNF and then the ncessary filters can be pushed down:
> LogicalFilter(condition=[OR(=($7, 5), =($7, 7))]) and 
> LogicalFilter(condition=[OR(=($9, 2), =($9, 3))])
> Additional filter still will be present on top of join. 
> Finally deriving additional filters can allow to scan less information. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2287) StackOverflowError from FlatList equals()

2018-04-27 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457295#comment-16457295
 ] 

Zhong Yu commented on CALCITE-2287:
---

If {{o}} is some {{Plat3List}} that tries to do the same trick, the cycle 
continues :) {{Plat3List}} could as well be {{Flat3List}} from another class 
loader.

I'll be on vacation without my computer; I would appreciate if someone can take 
over this issue and conclude with a better solution.  

> StackOverflowError from FlatList equals()
> -
>
> Key: CALCITE-2287
> URL: https://issues.apache.org/jira/browse/CALCITE-2287
> Project: Calcite
>  Issue Type: Bug
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
>
> The equals() method of `Flat3List` and `Flat4List` may invoke each other in 
> an infinite loop.
> See also https://issues.apache.org/jira/browse/HIVE-18427
>  
> We found this problem when such lists are used as keys in 
> `RelMetadataQuery.map`; occasionally there are hashcode collisions between 
> `Flat3List` and `Flat4List`, hence their equals() methods will be invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2257) Combination of predicates can be proved to be always true

2018-04-27 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456082#comment-16456082
 ] 

Zhong Yu edited comment on CALCITE-2257 at 4/27/18 7:55 AM:


Wait a sec... these two are not always the opposite, right? SQL2003, 8.7 
{quote}NOTE 191 — For all R, “R IS NOT NULL” has the same result as “NOT R IS 
NULL” if and only if R is of degree 1.
{quote}
followed by a table showing the case when both can be FALSE


was (Author: zhong.j.yu):
Wait a sec... these two are not always the opposite, right? SQL2003, 8.7 
{quote}NOTE 191 — For all R, “R IS NOT NULL” has the same result as “NOT R IS 
NULL” if and only if R is of degree 1.
{quote}

> Combination of predicates can be proved to be always true
> -
>
> Key: CALCITE-2257
> URL: https://issues.apache.org/jira/browse/CALCITE-2257
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.16.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: filter
> Fix For: 1.17.0
>
>
> I have found the case, when Filter operator is not necessary since filter 
> condition is always true, but that is not detected by current version of 
> Calcite.
> {code}
> select SAL from EMPNULLABLES_20 where SAL IS NOT NULL OR SAL is null
> {code}
> {code}
> LogicalProject(SAL=[$5])
>   LogicalFilter(condition=[OR(IS NOT NULL($5), IS NULL($5))])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], SLACKER=[$8])
>   LogicalFilter(condition=[AND(=($7, 20), >($5, 1000))])
> LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
> {code}
> But filter condition _OR(IS NOT NULL($5), IS NULL($5))_ can be proved to be 
> always true.
> I have tried _ReduceExpressionsRule_, but it doesn't give effect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2287) StackOverflowError from FlatList equals()

2018-04-27 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456028#comment-16456028
 ] 

Zhong Yu commented on CALCITE-2287:
---

https://github.com/apache/calcite/pull/682

> StackOverflowError from FlatList equals()
> -
>
> Key: CALCITE-2287
> URL: https://issues.apache.org/jira/browse/CALCITE-2287
> Project: Calcite
>  Issue Type: Bug
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
>
> The equals() method of `Flat3List` and `Flat4List` may invoke each other in 
> an infinite loop.
> See also https://issues.apache.org/jira/browse/HIVE-18427
>  
> We found this problem when such lists are used as keys in 
> `RelMetadataQuery.map`; occasionally there are hashcode collisions between 
> `Flat3List` and `Flat4List`, hence their equals() methods will be invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2287) StackOverflowError from FlatList equals()

2018-04-27 Thread Zhong Yu (JIRA)
Zhong Yu created CALCITE-2287:
-

 Summary: StackOverflowError from FlatList equals()
 Key: CALCITE-2287
 URL: https://issues.apache.org/jira/browse/CALCITE-2287
 Project: Calcite
  Issue Type: Bug
Reporter: Zhong Yu
Assignee: Julian Hyde


The equals() method of `Flat3List` and `Flat4List` may invoke each other in an 
infinite loop.

See also https://issues.apache.org/jira/browse/HIVE-18427

 

We found this problem when such lists are used as keys in 
`RelMetadataQuery.map`; occasionally there are hashcode collisions between 
`Flat3List` and `Flat4List`, hence their equals() methods will be invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2273) misinterpreted

2018-04-21 Thread Zhong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yu updated CALCITE-2273:
--
Description: 
 

 

The following string literal is accepted by Calcite parser, but the result 
value in Java is incorrect
{code:java}
U&'\+01F600'{code}
 

Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit 
unicode escape value in the form of "\xyzw" . When given the 6-digit form 
"\+xyzwrs", it parses the four chars "+xyz" as a hexadecimal, which succeeds 
too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s]

  was:
 

 

The following string literal is accepted by Calcite parser, but the result 
value in Java is incorrect
{code:java}
U&'\+01F600'{code}
 

Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit 
unicode escape value in the form of \xyzw . When given the 6-digit form 
\+xyzwrs, it parses the first four chars "+xyz" as a hexadecimal, which 
succeeds too. Therefore the result value contains 4 characters: [\u0xyz, w, r, 
s]


>  misinterpreted 
> --
>
> Key: CALCITE-2273
> URL: https://issues.apache.org/jira/browse/CALCITE-2273
> Project: Calcite
>  Issue Type: Bug
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
>
>  
>  
> The following string literal is accepted by Calcite parser, but the result 
> value in Java is incorrect
> {code:java}
> U&'\+01F600'{code}
>  
> Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit 
> unicode escape value in the form of "\xyzw" . When given the 6-digit form 
> "\+xyzwrs", it parses the four chars "+xyz" as a hexadecimal, which succeeds 
> too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2273) misinterpreted

2018-04-21 Thread Zhong Yu (JIRA)
Zhong Yu created CALCITE-2273:
-

 Summary:  misinterpreted 
 Key: CALCITE-2273
 URL: https://issues.apache.org/jira/browse/CALCITE-2273
 Project: Calcite
  Issue Type: Bug
Reporter: Zhong Yu
Assignee: Julian Hyde


 

 

The following string literal is accepted by Calcite parser, but the result 
value in Java is incorrect
{code:java}
U&'\+01F600'{code}
 

Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit 
unicode escape value in the form of \xyzw . When given the 6-digit form 
\+xyzwrs, it parses the first four chars "+xyz" as a hexadecimal, which 
succeeds too. Therefore the result value contains 4 characters: [\u0xyz, w, r, 
s]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2267) AbstractRelNode.getId() may produce duplicate IDs

2018-04-19 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445113#comment-16445113
 ] 

Zhong Yu commented on CALCITE-2267:
---

When there are multiple threads writing to `nextId`, Java memory model is 
pretty loose and allows any value to be read -- in theory. Practically there is 
a non-trivial chance that another thread overwrites `nextId` with a value 
smaller than a thread last observed. 

> AbstractRelNode.getId() may produce duplicate IDs
> -
>
> Key: CALCITE-2267
> URL: https://issues.apache.org/jira/browse/CALCITE-2267
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
>
> RelOptNode.getId() is supposed to be unique per instance; however, the 
> generation of AbstractRelNode.id is not thread safe; in a multithreaded 
> environment, the behavior is undeterministic, and there is a non-trivial 
> possibility that duplicate IDs are generated for different nodes.
> This ID is used for comparing nodes in several places, therefore it may 
> affect program stability and correctness. For example, if we set id=0 for all 
> nodes, Calcite's own test suite will have multiple failures. 
> In our own test environment, the optimizers are expected to be deterministic, 
> yet occasionally they mysteriously produce different plans. While we cannot 
> be 100% sure that it is caused by AbstractRelNode.id, we can verify that 
> fudging with the ID will reliably change the output of our optimizers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2267) AbstractRelNode.getId() may produce duplicate IDs

2018-04-19 Thread Zhong Yu (JIRA)
Zhong Yu created CALCITE-2267:
-

 Summary: AbstractRelNode.getId() may produce duplicate IDs
 Key: CALCITE-2267
 URL: https://issues.apache.org/jira/browse/CALCITE-2267
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Zhong Yu
Assignee: Julian Hyde


RelOptNode.getId() is supposed to be unique per instance; however, the 
generation of AbstractRelNode.id is not thread safe; in a multithreaded 
environment, the behavior is undeterministic, and there is a non-trivial 
possibility that duplicate IDs are generated for different nodes.

This ID is used for comparing nodes in several places, therefore it may affect 
program stability and correctness. For example, if we set id=0 for all nodes, 
Calcite's own test suite will have multiple failures. 

In our own test environment, the optimizers are expected to be deterministic, 
yet occasionally they mysteriously produce different plans. While we cannot be 
100% sure that it is caused by AbstractRelNode.id, we can verify that fudging 
with the ID will reliably change the output of our optimizers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2234) Fix java.lang.AssertionError at org.apache.calcite.sql.fun.SqlCastFunction.unparse

2018-04-03 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423759#comment-16423759
 ] 

Zhong Yu commented on CALCITE-2234:
---

A follow up bug – in `StandardConvertletTable.expandAvg()`, the cast type is 
created with `new SqlDataTypeSpec(... avgType.getPrecision() ... )`. However, 
when avgType is DOUBLE for example, its getPrecision() returns 15, while 
SqlDataTypeSpec() expects none. This causes problem later when CAST is 
converted, SqlDataTypeSpec.deriveType() fails at 

{{    } else if (precision >= 0) {}}
{{    assert sqlTypeName.allowsPrecNoScale();}}

> Fix java.lang.AssertionError at 
> org.apache.calcite.sql.fun.SqlCastFunction.unparse 
> ---
>
> Key: CALCITE-2234
> URL: https://issues.apache.org/jira/browse/CALCITE-2234
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: jingzhang
>Assignee: Julian Hyde
>Priority: Major
> Attachments: q47.sql
>
>
> When run tpcDs query 47, 57, 63, 53, 89 sql, an AssertionError would be 
> thrown out. The stack is as following.
> {code:java}
> java.lang.AssertionError 
>  at 
> org.apache.calcite.sql.fun.SqlCastFunction.unparse(SqlCastFunction.java:172) 
>  at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) 
>  at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 
>  at org.apache.calcite.sql.SqlUtil.unparseBinarySyntax(SqlUtil.java:323) 
>  at org.apache.calcite.sql.SqlSyntax$3.unparse(SqlSyntax.java:65) 
>  at org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:332) 
>  at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) 
>  at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 
>  at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:152) 
>  at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:158) 
>  at org.apache.calcite.sql.SqlNode.toString(SqlNode.java:125) 
>  at java.lang.String.valueOf(String.java:2994) 
>  at java.lang.StringBuilder.append(StringBuilder.java:131) 
>  at 
> org.apache.calcite.sql2rel.ReflectiveConvertletTable$1.convertCall(ReflectiveConvertletTable.java:90)
>  
>  at 
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4684)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4020)
>  
>  at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:138) 
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4586)
>  
>  at 
> org.apache.calcite.sql2rel.StandardConvertletTable$AvgVarianceConvertlet.convertCall(StandardConvertletTable.java:1190)
>  at 
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1891)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1800(SqlToRelConverter.java:215)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4579)
>  
>  at 
> org.apache.calcite.sql2rel.StandardConvertletTable$12.convertCall(StandardConvertletTable.java:267)
>  
>  at 
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4684)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4020)
>  
>  at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:138) 
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4586)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2862)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2710)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:662)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:624)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3084)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2073)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1965)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2288)
>  
>  at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1999)
>  
>  at 
> 

[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-25 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413095#comment-16413095
 ] 

Zhong Yu commented on CALCITE-2202:
---

Sorry about the delay, but I've finished with a better proof (I hope) - 
[http://zhong-j-yu.github.io/aggregate-join-pushdown-3.pdf] - which extends 
pushdown to all join types and join conditions.

Forget about this Jira issue, which is probably not important, and we may  
close it. On more important points --

Pushdown works on any theta condition. The focus on equality condition is a 
distraction; equality across columns is very complicated and does not help to 
simplify reasoning. I do not invoke equality of values anywhere in my proof; I 
only invoke not-distinct-ness of values in the same type.

Pushdown works on any theta condition because we keep key columns, therefore 
rows that associate with each other before still associate after, regardless of 
matching conditions.

Pushdown should also work on semi join and outer join, quite intuitively. 
However, in outer join, we need a property that aggregate on null-row returns 
null-row, which seems invalid for count(). But, we can tweak split functions to 
make it valid. For example, we can treat a null result from count() as 1 (it 
can be null if it comes from the right-side in a left-outer join). The generic 
trick is simply to permute the result of aggregate. 

group by() on empty table does cause problems; the proof reveals exactly where. 
Example, "select count() from L cross join R group by R.g"; it ought to produce 
zero rows if L is empty; it's a bug to transform it to contain "select count() 
from L group by()". There's a workaround; or, we may simply skip over such 
pathological joins.

My proof is quite lengthy, which I'm not proud of. I wish I could reason about 
it in plain language, but the problem itself seems quite expansive, requiring 
very careful handling to cover all cases. I tried my best to structure it 
nicely. If you could, please find some unlucky guy to review my proof. Thanks!

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-08 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289
 ] 

Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:29 AM:
---

Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opt-in in some cases where metadata is missing, and stats 
shows that group columns are unique or nearly unique.


was (Author: zhong.j.yu):
Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-08 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289
 ] 

Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:27 AM:
---

Everything is moot if I can not prove my formula. But suppose it is correct –

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through 
table union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.


was (Author: zhong.j.yu):
Everything is moot if I can not prove my formula. But suppose it is correct --

I do think that COVAR_POP can be pushed down; it can be calculate from 
SUM(x*y), SUM(x), SUM(y), COUNT(x,y), all of which can be split through table 
union and cross product, therefore can be pushed down over join.

Producing more candidate plans may be bad for CBO; but the extra rule (i.e. 
singled sided) can be opted in some cases where metadata is missing, or group 
columns are nearly unique.

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-08 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391279#comment-16391279
 ] 

Zhong Yu commented on CALCITE-2202:
---

Thanks Julian. Your last example actually works in my favor. My argument is 
that the original query is equivalent to the following two, where aggregate is 
pushed down on only one side
{code:java}
select sum(e.s * d.c)
from (select deptno, (sal)as s from emp) as e
join (select deptno, count(*) as c from dept group by deptno) as d
on e.deptno = d.deptno
group by e.deptno

select sum(e.s * d.c)
from (select deptno, sum(sal) as s from emp group by deptno) as e
join (select deptno, (1)  as c from dept) as d
on e.deptno = d.deptno
group by e.deptno{code}
The "cross-multiplier" effect is still there because the join multiplies the 
side that doesn't do aggregate.

So, I'm probably on the right track. However the paper I wrote is full of 
holes, obviously done by an amateur:) Please ignore it. I'll try to write up a 
new one in the weekend. Your reminder of vacuous cases, and null handling 
(different in group-by and equijoin) are all good points. And I'll focus on a 
narrower proof that works only on inner equijoin. 

 

> Aggregate Join Push-down on a Single Side
> -
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: next
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2202) Aggregate Join Push-down on a Single Side

2018-03-04 Thread Zhong Yu (JIRA)
Zhong Yu created CALCITE-2202:
-

 Summary: Aggregate Join Push-down on a Single Side
 Key: CALCITE-2202
 URL: https://issues.apache.org/jira/browse/CALCITE-2202
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: next
Reporter: Zhong Yu
Assignee: Julian Hyde
 Fix For: next


While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
apparent that aggregation can be pushed on on a single side (either side), and 
leave the other side non-aggregated, regardless of whether grouping columns are 
unique on the other side. My analysis – 
[http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .

This may be useful when the metadata is insufficient; in any case, we may try 
to provide all 3 possible transformations (aggregate on left only; right only; 
both sides) to the cost based optimizer, so that the cheapest one can be chosen 
based on stats. 

Does this make any sense, anybody? If it sounds good, I'll implement it and 
offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-03-04 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385284#comment-16385284
 ] 

Zhong Yu commented on CALCITE-2195:
---

Thanks Julian. To make sure I understand the problem correctly, I made some 
formal analysis: [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . 
It turns out, as the code comments, we can do singleton() on a side regardless 
of whether the grouping columns are unique. I'll add another issue addressing 
this possibility. 

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.16.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-02-27 Thread Zhong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378701#comment-16378701
 ] 

Zhong Yu commented on CALCITE-2195:
---

pull-request: https://github.com/apache/calcite/pull/637

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.15.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-02-27 Thread Zhong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yu updated CALCITE-2195:
--
Description: 
The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
 causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in 
which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
causing `splitter.topSplit()` to only sum over `count()` coming from the B side.

A solution would be to introduce `splitter.singleton()` on the A side, so that 
it can be fed to topSplit() to be multiplied by the count.

In the case that the `unique` side does not contain the column of an agg call, 
it seems that we should do `other_singleton()` on this side, and feed it to 
topSplit(). However, realize that the `other()` expression is necessarily a 
`count()`, or a scalar function of `count()`, because it does not depend on any 
column values. In the same way, the proposed `other_singleton()` necessarily 
returns 1, or some constant value. `topSplit()` would not have any need of that 
constant value.Therefore in this case, we don't need a split on this side, just 
leave its subtotal as null.

 

I'm working on a pull-request based on these understandings. BTW, is there a 
reference to the algorithm used in the code? I can only find some old papers 
that don't exactly cover the logic of the code. Thanks.

  was:
The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
 causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in 
which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
causing `splitter.topSplit()` to only sum over `count(*)` coming from the B 
side.

A solution would be to introduce `splitter.singleton()` on the A side, so that 
it can be fed to topSplit() to be multiplied by the count.

In the case that the `unique` side does not contain the column of an agg call, 
it seems that we should do `other_singleton()` on this side, and feed it to 
topSplit(). However, realize that the `other()` expression is necessarily a 
`count(*)`, or a scalar function of `count(*)`, because it does not depend on 
any column values. In the same way, the proposed `other_singleton()` 
necessarily returns 1, or some constant value. `topSplit()` would not have any 
need of that constant value.Therefore in this case, we don't need a split on 
this side, just leave its subtotal as null.

 

I'm working on a pull-request based on these understandings. BTW, is there a 
reference to the algorithm used in the code? I can only find some old papers 
that don't exactly cover the logic of the code. Thanks.


> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.15.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count()` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count()`, or a scalar function of `count()`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> 

[jira] [Updated] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-02-27 Thread Zhong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yu updated CALCITE-2195:
--
Fix Version/s: (was: 1.16.0)
   1.15.0

> AggregateJoinTransposeRule fails to aggregate over unique column
> 
>
> Key: CALCITE-2195
> URL: https://issues.apache.org/jira/browse/CALCITE-2195
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.15.0
>Reporter: Zhong Yu
>Assignee: Julian Hyde
>Priority: Major
> Fix For: 1.15.0
>
>
> The following query, in which "A.sal" is unique,
> {code:java}
> select sum(A.sal)
> from (select distinct sal from sales.emp) as A
> join sales.emp as B on A.sal=B.sal
> {code}
>  causes AggregateJoinTransposeRule to fail with message
> {code:java}
> java.lang.AssertionError: type mismatch:
> aggCall type:
> INTEGER
> inferred type:
> BIGINT
> {code}
> Apparently, this is a bug in the rule when `unique` is true on the A side, in 
> which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
> causing `splitter.topSplit()` to only sum over `count(*)` coming from the B 
> side.
> A solution would be to introduce `splitter.singleton()` on the A side, so 
> that it can be fed to topSplit() to be multiplied by the count.
> In the case that the `unique` side does not contain the column of an agg 
> call, it seems that we should do `other_singleton()` on this side, and feed 
> it to topSplit(). However, realize that the `other()` expression is 
> necessarily a `count(*)`, or a scalar function of `count(*)`, because it does 
> not depend on any column values. In the same way, the proposed 
> `other_singleton()` necessarily returns 1, or some constant value. 
> `topSplit()` would not have any need of that constant value.Therefore in this 
> case, we don't need a split on this side, just leave its subtotal as null.
>  
> I'm working on a pull-request based on these understandings. BTW, is there a 
> reference to the algorithm used in the code? I can only find some old papers 
> that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

2018-02-27 Thread Zhong Yu (JIRA)
Zhong Yu created CALCITE-2195:
-

 Summary: AggregateJoinTransposeRule fails to aggregate over unique 
column
 Key: CALCITE-2195
 URL: https://issues.apache.org/jira/browse/CALCITE-2195
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.15.0
Reporter: Zhong Yu
Assignee: Julian Hyde
 Fix For: 1.16.0


The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
 causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in 
which case the rule does not aggregate on the A side, `leftSubTotal==null`, 
causing `splitter.topSplit()` to only sum over `count(*)` coming from the B 
side.

A solution would be to introduce `splitter.singleton()` on the A side, so that 
it can be fed to topSplit() to be multiplied by the count.

In the case that the `unique` side does not contain the column of an agg call, 
it seems that we should do `other_singleton()` on this side, and feed it to 
topSplit(). However, realize that the `other()` expression is necessarily a 
`count(*)`, or a scalar function of `count(*)`, because it does not depend on 
any column values. In the same way, the proposed `other_singleton()` 
necessarily returns 1, or some constant value. `topSplit()` would not have any 
need of that constant value.Therefore in this case, we don't need a split on 
this side, just leave its subtotal as null.

 

I'm working on a pull-request based on these understandings. BTW, is there a 
reference to the algorithm used in the code? I can only find some old papers 
that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)