[jira] [Commented] (CALCITE-2296) Extra logic to derive additional filters in the FilterJoinRule
[ https://issues.apache.org/jira/browse/CALCITE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473696#comment-16473696 ] Zhong Yu commented on CALCITE-2296: --- Hi Vitalii, I had some ideas on this exact issue, in case it helps – https://mail-archives.apache.org/mod_mbox/calcite-dev/201804.mbox/%3cCACuKZqHQhxZR09E8_AzTXrMju55RWN-ULZy-TbmYbCTjtVb-=w...@mail.gmail.com%3e > Extra logic to derive additional filters in the FilterJoinRule > -- > > Key: CALCITE-2296 > URL: https://issues.apache.org/jira/browse/CALCITE-2296 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.16.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: next > > > There is necessary to add logic to derive additional filters from for > FilterJoinRule in the case of using disjunction of expressions: > > [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L155] > The example query is: > {code:sql} > select * from sales.emp e join > sales.dept d on e.job = d.name > where (e.deptno = 5 AND d.deptno = 2) > OR (e.deptno = 7 AND d.deptno = 3) > {code} > The filrer condition in DNF > LogicalFilter(condition=[OR(AND(=($7, 5), =($9, 2)), AND(=($7, 7), =($9, > 3)))]) > can be transformed into CNF and then the ncessary filters can be pushed down: > LogicalFilter(condition=[OR(=($7, 5), =($7, 7))]) and > LogicalFilter(condition=[OR(=($9, 2), =($9, 3))]) > Additional filter still will be present on top of join. > Finally deriving additional filters can allow to scan less information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2287) StackOverflowError from FlatList equals()
[ https://issues.apache.org/jira/browse/CALCITE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457295#comment-16457295 ] Zhong Yu commented on CALCITE-2287: --- If {{o}} is some {{Plat3List}} that tries to do the same trick, the cycle continues :) {{Plat3List}} could as well be {{Flat3List}} from another class loader. I'll be on vacation without my computer; I would appreciate if someone can take over this issue and conclude with a better solution. > StackOverflowError from FlatList equals() > - > > Key: CALCITE-2287 > URL: https://issues.apache.org/jira/browse/CALCITE-2287 > Project: Calcite > Issue Type: Bug >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > > The equals() method of `Flat3List` and `Flat4List` may invoke each other in > an infinite loop. > See also https://issues.apache.org/jira/browse/HIVE-18427 > > We found this problem when such lists are used as keys in > `RelMetadataQuery.map`; occasionally there are hashcode collisions between > `Flat3List` and `Flat4List`, hence their equals() methods will be invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2257) Combination of predicates can be proved to be always true
[ https://issues.apache.org/jira/browse/CALCITE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456082#comment-16456082 ] Zhong Yu edited comment on CALCITE-2257 at 4/27/18 7:55 AM: Wait a sec... these two are not always the opposite, right? SQL2003, 8.7 {quote}NOTE 191 — For all R, “R IS NOT NULL” has the same result as “NOT R IS NULL” if and only if R is of degree 1. {quote} followed by a table showing the case when both can be FALSE was (Author: zhong.j.yu): Wait a sec... these two are not always the opposite, right? SQL2003, 8.7 {quote}NOTE 191 — For all R, “R IS NOT NULL” has the same result as “NOT R IS NULL” if and only if R is of degree 1. {quote} > Combination of predicates can be proved to be always true > - > > Key: CALCITE-2257 > URL: https://issues.apache.org/jira/browse/CALCITE-2257 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.16.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Labels: filter > Fix For: 1.17.0 > > > I have found the case, when Filter operator is not necessary since filter > condition is always true, but that is not detected by current version of > Calcite. > {code} > select SAL from EMPNULLABLES_20 where SAL IS NOT NULL OR SAL is null > {code} > {code} > LogicalProject(SAL=[$5]) > LogicalFilter(condition=[OR(IS NOT NULL($5), IS NULL($5))]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], SLACKER=[$8]) > LogicalFilter(condition=[AND(=($7, 20), >($5, 1000))]) > LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]]) > {code} > But filter condition _OR(IS NOT NULL($5), IS NULL($5))_ can be proved to be > always true. > I have tried _ReduceExpressionsRule_, but it doesn't give effect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2287) StackOverflowError from FlatList equals()
[ https://issues.apache.org/jira/browse/CALCITE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456028#comment-16456028 ] Zhong Yu commented on CALCITE-2287: --- https://github.com/apache/calcite/pull/682 > StackOverflowError from FlatList equals() > - > > Key: CALCITE-2287 > URL: https://issues.apache.org/jira/browse/CALCITE-2287 > Project: Calcite > Issue Type: Bug >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > > The equals() method of `Flat3List` and `Flat4List` may invoke each other in > an infinite loop. > See also https://issues.apache.org/jira/browse/HIVE-18427 > > We found this problem when such lists are used as keys in > `RelMetadataQuery.map`; occasionally there are hashcode collisions between > `Flat3List` and `Flat4List`, hence their equals() methods will be invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2287) StackOverflowError from FlatList equals()
Zhong Yu created CALCITE-2287: - Summary: StackOverflowError from FlatList equals() Key: CALCITE-2287 URL: https://issues.apache.org/jira/browse/CALCITE-2287 Project: Calcite Issue Type: Bug Reporter: Zhong Yu Assignee: Julian Hyde The equals() method of `Flat3List` and `Flat4List` may invoke each other in an infinite loop. See also https://issues.apache.org/jira/browse/HIVE-18427 We found this problem when such lists are used as keys in `RelMetadataQuery.map`; occasionally there are hashcode collisions between `Flat3List` and `Flat4List`, hence their equals() methods will be invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2273) misinterpreted
[ https://issues.apache.org/jira/browse/CALCITE-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yu updated CALCITE-2273: -- Description: The following string literal is accepted by Calcite parser, but the result value in Java is incorrect {code:java} U&'\+01F600'{code} Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit unicode escape value in the form of "\xyzw" . When given the 6-digit form "\+xyzwrs", it parses the four chars "+xyz" as a hexadecimal, which succeeds too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s] was: The following string literal is accepted by Calcite parser, but the result value in Java is incorrect {code:java} U&'\+01F600'{code} Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit unicode escape value in the form of \xyzw . When given the 6-digit form \+xyzwrs, it parses the first four chars "+xyz" as a hexadecimal, which succeeds too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s] > misinterpreted > -- > > Key: CALCITE-2273 > URL: https://issues.apache.org/jira/browse/CALCITE-2273 > Project: Calcite > Issue Type: Bug >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > > > > The following string literal is accepted by Calcite parser, but the result > value in Java is incorrect > {code:java} > U&'\+01F600'{code} > > Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit > unicode escape value in the form of "\xyzw" . When given the 6-digit form > "\+xyzwrs", it parses the four chars "+xyz" as a hexadecimal, which succeeds > too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2273) misinterpreted
Zhong Yu created CALCITE-2273: - Summary: misinterpreted Key: CALCITE-2273 URL: https://issues.apache.org/jira/browse/CALCITE-2273 Project: Calcite Issue Type: Bug Reporter: Zhong Yu Assignee: Julian Hyde The following string literal is accepted by Calcite parser, but the result value in Java is incorrect {code:java} U&'\+01F600'{code} Cause: currently, SqlLiteral.unescapeUnicode() only intends to handle 4-digit unicode escape value in the form of \xyzw . When given the 6-digit form \+xyzwrs, it parses the first four chars "+xyz" as a hexadecimal, which succeeds too. Therefore the result value contains 4 characters: [\u0xyz, w, r, s] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2267) AbstractRelNode.getId() may produce duplicate IDs
[ https://issues.apache.org/jira/browse/CALCITE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445113#comment-16445113 ] Zhong Yu commented on CALCITE-2267: --- When there are multiple threads writing to `nextId`, Java memory model is pretty loose and allows any value to be read -- in theory. Practically there is a non-trivial chance that another thread overwrites `nextId` with a value smaller than a thread last observed. > AbstractRelNode.getId() may produce duplicate IDs > - > > Key: CALCITE-2267 > URL: https://issues.apache.org/jira/browse/CALCITE-2267 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > > RelOptNode.getId() is supposed to be unique per instance; however, the > generation of AbstractRelNode.id is not thread safe; in a multithreaded > environment, the behavior is undeterministic, and there is a non-trivial > possibility that duplicate IDs are generated for different nodes. > This ID is used for comparing nodes in several places, therefore it may > affect program stability and correctness. For example, if we set id=0 for all > nodes, Calcite's own test suite will have multiple failures. > In our own test environment, the optimizers are expected to be deterministic, > yet occasionally they mysteriously produce different plans. While we cannot > be 100% sure that it is caused by AbstractRelNode.id, we can verify that > fudging with the ID will reliably change the output of our optimizers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2267) AbstractRelNode.getId() may produce duplicate IDs
Zhong Yu created CALCITE-2267: - Summary: AbstractRelNode.getId() may produce duplicate IDs Key: CALCITE-2267 URL: https://issues.apache.org/jira/browse/CALCITE-2267 Project: Calcite Issue Type: Bug Components: core Reporter: Zhong Yu Assignee: Julian Hyde RelOptNode.getId() is supposed to be unique per instance; however, the generation of AbstractRelNode.id is not thread safe; in a multithreaded environment, the behavior is undeterministic, and there is a non-trivial possibility that duplicate IDs are generated for different nodes. This ID is used for comparing nodes in several places, therefore it may affect program stability and correctness. For example, if we set id=0 for all nodes, Calcite's own test suite will have multiple failures. In our own test environment, the optimizers are expected to be deterministic, yet occasionally they mysteriously produce different plans. While we cannot be 100% sure that it is caused by AbstractRelNode.id, we can verify that fudging with the ID will reliably change the output of our optimizers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2234) Fix java.lang.AssertionError at org.apache.calcite.sql.fun.SqlCastFunction.unparse
[ https://issues.apache.org/jira/browse/CALCITE-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423759#comment-16423759 ] Zhong Yu commented on CALCITE-2234: --- A follow up bug – in `StandardConvertletTable.expandAvg()`, the cast type is created with `new SqlDataTypeSpec(... avgType.getPrecision() ... )`. However, when avgType is DOUBLE for example, its getPrecision() returns 15, while SqlDataTypeSpec() expects none. This causes problem later when CAST is converted, SqlDataTypeSpec.deriveType() fails at {{ } else if (precision >= 0) {}} {{ assert sqlTypeName.allowsPrecNoScale();}} > Fix java.lang.AssertionError at > org.apache.calcite.sql.fun.SqlCastFunction.unparse > --- > > Key: CALCITE-2234 > URL: https://issues.apache.org/jira/browse/CALCITE-2234 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: jingzhang >Assignee: Julian Hyde >Priority: Major > Attachments: q47.sql > > > When run tpcDs query 47, 57, 63, 53, 89 sql, an AssertionError would be > thrown out. The stack is as following. > {code:java} > java.lang.AssertionError > at > org.apache.calcite.sql.fun.SqlCastFunction.unparse(SqlCastFunction.java:172) > at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) > at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) > at org.apache.calcite.sql.SqlUtil.unparseBinarySyntax(SqlUtil.java:323) > at org.apache.calcite.sql.SqlSyntax$3.unparse(SqlSyntax.java:65) > at org.apache.calcite.sql.SqlOperator.unparse(SqlOperator.java:332) > at org.apache.calcite.sql.SqlDialect.unparseCall(SqlDialect.java:332) > at org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) > at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:152) > at org.apache.calcite.sql.SqlNode.toSqlString(SqlNode.java:158) > at org.apache.calcite.sql.SqlNode.toString(SqlNode.java:125) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.calcite.sql2rel.ReflectiveConvertletTable$1.convertCall(ReflectiveConvertletTable.java:90) > > at > org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63) > > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4684) > > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4020) > > at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:138) > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4586) > > at > org.apache.calcite.sql2rel.StandardConvertletTable$AvgVarianceConvertlet.convertCall(StandardConvertletTable.java:1190) > at > org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1891) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.access$1800(SqlToRelConverter.java:215) > > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4579) > > at > org.apache.calcite.sql2rel.StandardConvertletTable$12.convertCall(StandardConvertletTable.java:267) > > at > org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:63) > > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4684) > > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4020) > > at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:138) > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4586) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2862) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2710) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:662) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:624) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3084) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2073) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1965) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2288) > > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1999) > > at >
[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413095#comment-16413095 ] Zhong Yu commented on CALCITE-2202: --- Sorry about the delay, but I've finished with a better proof (I hope) - [http://zhong-j-yu.github.io/aggregate-join-pushdown-3.pdf] - which extends pushdown to all join types and join conditions. Forget about this Jira issue, which is probably not important, and we may close it. On more important points -- Pushdown works on any theta condition. The focus on equality condition is a distraction; equality across columns is very complicated and does not help to simplify reasoning. I do not invoke equality of values anywhere in my proof; I only invoke not-distinct-ness of values in the same type. Pushdown works on any theta condition because we keep key columns, therefore rows that associate with each other before still associate after, regardless of matching conditions. Pushdown should also work on semi join and outer join, quite intuitively. However, in outer join, we need a property that aggregate on null-row returns null-row, which seems invalid for count(). But, we can tweak split functions to make it valid. For example, we can treat a null result from count() as 1 (it can be null if it comes from the right-side in a left-outer join). The generic trick is simply to permute the result of aggregate. group by() on empty table does cause problems; the proof reveals exactly where. Example, "select count() from L cross join R group by R.g"; it ought to produce zero rows if L is empty; it's a bug to transform it to contain "select count() from L group by()". There's a workaround; or, we may simply skip over such pathological joins. My proof is quite lengthy, which I'm not proud of. I wish I could reason about it in plain language, but the problem itself seems quite expansive, requiring very careful handling to cover all cases. I tried my best to structure it nicely. If you could, please find some unlucky guy to review my proof. Thanks! > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289 ] Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:29 AM: --- Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opt-in in some cases where metadata is missing, and stats shows that group columns are unique or nearly unique. was (Author: zhong.j.yu): Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392289#comment-16392289 ] Zhong Yu edited comment on CALCITE-2202 at 3/9/18 2:27 AM: --- Everything is moot if I can not prove my formula. But suppose it is correct – I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM( x ), SUM( y ), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. was (Author: zhong.j.yu): Everything is moot if I can not prove my formula. But suppose it is correct -- I do think that COVAR_POP can be pushed down; it can be calculate from SUM(x*y), SUM(x), SUM(y), COUNT(x,y), all of which can be split through table union and cross product, therefore can be pushed down over join. Producing more candidate plans may be bad for CBO; but the extra rule (i.e. singled sided) can be opted in some cases where metadata is missing, or group columns are nearly unique. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side
[ https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391279#comment-16391279 ] Zhong Yu commented on CALCITE-2202: --- Thanks Julian. Your last example actually works in my favor. My argument is that the original query is equivalent to the following two, where aggregate is pushed down on only one side {code:java} select sum(e.s * d.c) from (select deptno, (sal)as s from emp) as e join (select deptno, count(*) as c from dept group by deptno) as d on e.deptno = d.deptno group by e.deptno select sum(e.s * d.c) from (select deptno, sum(sal) as s from emp group by deptno) as e join (select deptno, (1) as c from dept) as d on e.deptno = d.deptno group by e.deptno{code} The "cross-multiplier" effect is still there because the join multiplies the side that doesn't do aggregate. So, I'm probably on the right track. However the paper I wrote is full of holes, obviously done by an amateur:) Please ignore it. I'll try to write up a new one in the weekend. Your reminder of vacuous cases, and null handling (different in group-by and equijoin) are all good points. And I'll focus on a narrower proof that works only on inner equijoin. > Aggregate Join Push-down on a Single Side > - > > Key: CALCITE-2202 > URL: https://issues.apache.org/jira/browse/CALCITE-2202 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: next >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: next > > > While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's > apparent that aggregation can be pushed on on a single side (either side), > and leave the other side non-aggregated, regardless of whether grouping > columns are unique on the other side. My analysis – > [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . > This may be useful when the metadata is insufficient; in any case, we may try > to provide all 3 possible transformations (aggregate on left only; right > only; both sides) to the cost based optimizer, so that the cheapest one can > be chosen based on stats. > Does this make any sense, anybody? If it sounds good, I'll implement it and > offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2202) Aggregate Join Push-down on a Single Side
Zhong Yu created CALCITE-2202: - Summary: Aggregate Join Push-down on a Single Side Key: CALCITE-2202 URL: https://issues.apache.org/jira/browse/CALCITE-2202 Project: Calcite Issue Type: Improvement Components: core Affects Versions: next Reporter: Zhong Yu Assignee: Julian Hyde Fix For: next While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's apparent that aggregation can be pushed on on a single side (either side), and leave the other side non-aggregated, regardless of whether grouping columns are unique on the other side. My analysis – [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . This may be useful when the metadata is insufficient; in any case, we may try to provide all 3 possible transformations (aggregate on left only; right only; both sides) to the cost based optimizer, so that the cheapest one can be chosen based on stats. Does this make any sense, anybody? If it sounds good, I'll implement it and offer a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385284#comment-16385284 ] Zhong Yu commented on CALCITE-2195: --- Thanks Julian. To make sure I understand the problem correctly, I made some formal analysis: [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] . It turns out, as the code comments, we can do singleton() on a side regardless of whether the grouping columns are unique. I'll add another issue addressing this possibility. > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.16.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378701#comment-16378701 ] Zhong Yu commented on CALCITE-2195: --- pull-request: https://github.com/apache/calcite/pull/637 > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.15.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yu updated CALCITE-2195: -- Description: The following query, in which "A.sal" is unique, {code:java} select sum(A.sal) from (select distinct sal from sales.emp) as A join sales.emp as B on A.sal=B.sal {code} causes AggregateJoinTransposeRule to fail with message {code:java} java.lang.AssertionError: type mismatch: aggCall type: INTEGER inferred type: BIGINT {code} Apparently, this is a bug in the rule when `unique` is true on the A side, in which case the rule does not aggregate on the A side, `leftSubTotal==null`, causing `splitter.topSplit()` to only sum over `count()` coming from the B side. A solution would be to introduce `splitter.singleton()` on the A side, so that it can be fed to topSplit() to be multiplied by the count. In the case that the `unique` side does not contain the column of an agg call, it seems that we should do `other_singleton()` on this side, and feed it to topSplit(). However, realize that the `other()` expression is necessarily a `count()`, or a scalar function of `count()`, because it does not depend on any column values. In the same way, the proposed `other_singleton()` necessarily returns 1, or some constant value. `topSplit()` would not have any need of that constant value.Therefore in this case, we don't need a split on this side, just leave its subtotal as null. I'm working on a pull-request based on these understandings. BTW, is there a reference to the algorithm used in the code? I can only find some old papers that don't exactly cover the logic of the code. Thanks. was: The following query, in which "A.sal" is unique, {code:java} select sum(A.sal) from (select distinct sal from sales.emp) as A join sales.emp as B on A.sal=B.sal {code} causes AggregateJoinTransposeRule to fail with message {code:java} java.lang.AssertionError: type mismatch: aggCall type: INTEGER inferred type: BIGINT {code} Apparently, this is a bug in the rule when `unique` is true on the A side, in which case the rule does not aggregate on the A side, `leftSubTotal==null`, causing `splitter.topSplit()` to only sum over `count(*)` coming from the B side. A solution would be to introduce `splitter.singleton()` on the A side, so that it can be fed to topSplit() to be multiplied by the count. In the case that the `unique` side does not contain the column of an agg call, it seems that we should do `other_singleton()` on this side, and feed it to topSplit(). However, realize that the `other()` expression is necessarily a `count(*)`, or a scalar function of `count(*)`, because it does not depend on any column values. In the same way, the proposed `other_singleton()` necessarily returns 1, or some constant value. `topSplit()` would not have any need of that constant value.Therefore in this case, we don't need a split on this side, just leave its subtotal as null. I'm working on a pull-request based on these understandings. BTW, is there a reference to the algorithm used in the code? I can only find some old papers that don't exactly cover the logic of the code. Thanks. > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.15.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count()` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count()`, or a scalar function of `count()`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this >
[jira] [Updated] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
[ https://issues.apache.org/jira/browse/CALCITE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yu updated CALCITE-2195: -- Fix Version/s: (was: 1.16.0) 1.15.0 > AggregateJoinTransposeRule fails to aggregate over unique column > > > Key: CALCITE-2195 > URL: https://issues.apache.org/jira/browse/CALCITE-2195 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.15.0 >Reporter: Zhong Yu >Assignee: Julian Hyde >Priority: Major > Fix For: 1.15.0 > > > The following query, in which "A.sal" is unique, > {code:java} > select sum(A.sal) > from (select distinct sal from sales.emp) as A > join sales.emp as B on A.sal=B.sal > {code} > causes AggregateJoinTransposeRule to fail with message > {code:java} > java.lang.AssertionError: type mismatch: > aggCall type: > INTEGER > inferred type: > BIGINT > {code} > Apparently, this is a bug in the rule when `unique` is true on the A side, in > which case the rule does not aggregate on the A side, `leftSubTotal==null`, > causing `splitter.topSplit()` to only sum over `count(*)` coming from the B > side. > A solution would be to introduce `splitter.singleton()` on the A side, so > that it can be fed to topSplit() to be multiplied by the count. > In the case that the `unique` side does not contain the column of an agg > call, it seems that we should do `other_singleton()` on this side, and feed > it to topSplit(). However, realize that the `other()` expression is > necessarily a `count(*)`, or a scalar function of `count(*)`, because it does > not depend on any column values. In the same way, the proposed > `other_singleton()` necessarily returns 1, or some constant value. > `topSplit()` would not have any need of that constant value.Therefore in this > case, we don't need a split on this side, just leave its subtotal as null. > > I'm working on a pull-request based on these understandings. BTW, is there a > reference to the algorithm used in the code? I can only find some old papers > that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column
Zhong Yu created CALCITE-2195: - Summary: AggregateJoinTransposeRule fails to aggregate over unique column Key: CALCITE-2195 URL: https://issues.apache.org/jira/browse/CALCITE-2195 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.15.0 Reporter: Zhong Yu Assignee: Julian Hyde Fix For: 1.16.0 The following query, in which "A.sal" is unique, {code:java} select sum(A.sal) from (select distinct sal from sales.emp) as A join sales.emp as B on A.sal=B.sal {code} causes AggregateJoinTransposeRule to fail with message {code:java} java.lang.AssertionError: type mismatch: aggCall type: INTEGER inferred type: BIGINT {code} Apparently, this is a bug in the rule when `unique` is true on the A side, in which case the rule does not aggregate on the A side, `leftSubTotal==null`, causing `splitter.topSplit()` to only sum over `count(*)` coming from the B side. A solution would be to introduce `splitter.singleton()` on the A side, so that it can be fed to topSplit() to be multiplied by the count. In the case that the `unique` side does not contain the column of an agg call, it seems that we should do `other_singleton()` on this side, and feed it to topSplit(). However, realize that the `other()` expression is necessarily a `count(*)`, or a scalar function of `count(*)`, because it does not depend on any column values. In the same way, the proposed `other_singleton()` necessarily returns 1, or some constant value. `topSplit()` would not have any need of that constant value.Therefore in this case, we don't need a split on this side, just leave its subtotal as null. I'm working on a pull-request based on these understandings. BTW, is there a reference to the algorithm used in the code? I can only find some old papers that don't exactly cover the logic of the code. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)