[jira] [Commented] (CALCITE-3390) ITEM expression does not get pushed to the right input of left-outer-join
[ https://issues.apache.org/jira/browse/CALCITE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946953#comment-16946953 ] Aman Sinha commented on CALCITE-3390: - Thanks [~julianhyde] and [~jinxing6...@126.com] for your suggestions. That's pretty much along the lines of what I discussed with [~volodymyr] and have a WIP branch here [1]. I am in the process of checking whether it breaks any of our tests in Drill before creating a PR. [1] https://github.com/amansinha100/incubator-calcite/commit/08e74a48932a8458d72cd5550c7c0ea9e677f7f5 > ITEM expression does not get pushed to the right input of left-outer-join > - > > Key: CALCITE-3390 > URL: https://issues.apache.org/jira/browse/CALCITE-3390 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.21.0 >Reporter: Aman Sinha >Assignee: Aman Sinha >Priority: Major > > In the following query, the ITEM expression above the Left Outer Join does > not get pushed to the right input (null-preserving input) of the join whereas > it should since ITEM does not change the nullability. > {noformat} > explain plan without implementation for select tt7.columns[0], tt8.columns[0] > as x from tt7 left outer join tt8 on tt7.columns[0] = tt8.columns[0]; > DrillScreenRel > DrillProjectRel(EXPR$0=[$1], x=[ITEM($2, 0)]) > DrillJoinRel(condition=[=($0, $3)], joinType=[left]) > DrillProjectRel($f2=[ITEM($0, 0)], ITEM=[ITEM($0, 0)]) > DrillScanRel(table=[[dfs, tmp, tt7]], groupscan=[EasyGroupScan > [selectionRoot=file:/tmp/tt7, numFiles=1, columns=[`columns`[0]], > files=[file:/tmp/tt7/0_0_0.csv]]]) > DrillProjectRel(columns=[$0], $f2=[ITEM($0, 0)]) > DrillScanRel(table=[[dfs, tmp, tt8]], groupscan=[EasyGroupScan > [selectionRoot=file:/tmp/tt8, numFiles=1, columns=[`columns`, `columns`[0]], > files=[file:/tmp/tt8/0_0_0.csv]]]) > {noformat} > From what I can tell, the change in behavior occurred with CALCITE-1753; > before that the ITEM was pushed on both sides of the Left Outer Join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3390) ITEM expression does not get pushed to the right input of left-outer-join
Aman Sinha created CALCITE-3390: --- Summary: ITEM expression does not get pushed to the right input of left-outer-join Key: CALCITE-3390 URL: https://issues.apache.org/jira/browse/CALCITE-3390 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.21.0 Reporter: Aman Sinha Assignee: Aman Sinha In the following query, the ITEM expression above the Left Outer Join does not get pushed to the right input (null-preserving input) of the join whereas it should since ITEM does not change the nullability. {noformat} explain plan without implementation for select tt7.columns[0], tt8.columns[0] as x from tt7 left outer join tt8 on tt7.columns[0] = tt8.columns[0]; DrillScreenRel DrillProjectRel(EXPR$0=[$1], x=[ITEM($2, 0)]) DrillJoinRel(condition=[=($0, $3)], joinType=[left]) DrillProjectRel($f2=[ITEM($0, 0)], ITEM=[ITEM($0, 0)]) DrillScanRel(table=[[dfs, tmp, tt7]], groupscan=[EasyGroupScan [selectionRoot=file:/tmp/tt7, numFiles=1, columns=[`columns`[0]], files=[file:/tmp/tt7/0_0_0.csv]]]) DrillProjectRel(columns=[$0], $f2=[ITEM($0, 0)]) DrillScanRel(table=[[dfs, tmp, tt8]], groupscan=[EasyGroupScan [selectionRoot=file:/tmp/tt8, numFiles=1, columns=[`columns`, `columns`[0]], files=[file:/tmp/tt8/0_0_0.csv]]]) {noformat} >From what I can tell, the change in behavior occurred with >https://issues.apache.org/jira/browse/CALCITE-1753 ; before that the ITEM was >pushed on both sides of the Left Outer Join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2617) FilterProjectTransposeRule should allow filter conditions with correlated variables to be pushed down
[ https://issues.apache.org/jira/browse/CALCITE-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649073#comment-16649073 ] Aman Sinha commented on CALCITE-2617: - [~julianhyde], [~zabetak] if the tests pass with these changes (by using a separate constructor and passing in a predicate for checking correlation) I don't have an issue with it. There are 2 somewhat competing requirements. Ideally, I believe the decorrelator needs to be run again after the Filter pushdown has occurred (via FPTRule). > FilterProjectTransposeRule should allow filter conditions with correlated > variables to be pushed down > - > > Key: CALCITE-2617 > URL: https://issues.apache.org/jira/browse/CALCITE-2617 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.17.0 >Reporter: Stamatis Zampetakis >Assignee: Julian Hyde >Priority: Major > Fix For: 1.18.0 > > > The rule always forbids conditions with correlated variables to be pushed > down (as of [CALCITE-769|https://issues.apache.org/jira/browse/CALCITE-769] > to avoid certain problems in the decorrelation of the query). However, in the > general context of query optimization, it is beneficial to push-down filters > and the fact that there is a correlated variable is not a reason to skip this > optimization. > In order to avoid regressions, and at the same time enable correlated > conditions to be pushed down we should make the pushing of correlated > variables configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2191) Drop support for Guava versions earlier than 19
[ https://issues.apache.org/jira/browse/CALCITE-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375951#comment-16375951 ] Aman Sinha commented on CALCITE-2191: - [~julianhyde] currently Drill uses Guava 18. I think the change in Calcite's guava version should not directly impact Drill. I will check and get back if that's not the case. > Drop support for Guava versions earlier than 19 > --- > > Key: CALCITE-2191 > URL: https://issues.apache.org/jira/browse/CALCITE-2191 > Project: Calcite > Issue Type: Task >Reporter: slim bouguerra >Assignee: Julian Hyde >Priority: Major > Fix For: 1.16.0 > > > Currently, Calcite-1.15.0 version supports Guava versions from 23 to 14. > Calcite-1.16.0-Snapshot is building against version 19.0.1 > As far I know the only reason we support versions earlier to 19 is Hive > project depending on Guava 14.0.1 This is not true anymore after > https://issues.apache.org/jira/browse/HIVE-15393. > Druid project is still using Guava 16.0.1 but [some > work|https://groups.google.com/forum/#!topic/druid-development/Dw2Qu1CWbuQ] > is under review to make sure it is not using deprecated API. > Thus I think it is time to Drop support for versions earlier than 19 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2069) RexSimplify.removeNullabilityCast() always removes cast for operand with ANY type
[ https://issues.apache.org/jira/browse/CALCITE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271764#comment-16271764 ] Aman Sinha commented on CALCITE-2069: - [~vvysotskyi] ok, makes sense. It's good that you are adding a unit test for this. Assuming you have run Drill regression tests with your change, I am good with this. +1. > RexSimplify.removeNullabilityCast() always removes cast for operand with ANY > type > - > > Key: CALCITE-2069 > URL: https://issues.apache.org/jira/browse/CALCITE-2069 > Project: Calcite > Issue Type: Bug >Reporter: Volodymyr Vysotskyi >Assignee: Julian Hyde > > When a field is received from Dynamic Table, its type left {{ANY}}, and it is > used in the filter condition with the cast, which actually should produce > physical cast (for example we are trying to cast varchar to boolean) > {{RexSimplify.removeNullabilityCast()}} removes this cast and lefts only > field in condition. > This test helps to observe this issue: > {code:java} > @Test public void testFilterCastAny() { > final RelBuilder builder = RelBuilder.create(config().build()); > final RelDataType intType = > builder.getTypeFactory().createSqlType(SqlTypeName.ANY); > RelNode root = > builder.scan("EMP") > .filter( > builder.cast( > builder.patternField("varchar_field", intType, 0), > SqlTypeName.BOOLEAN)) > .build(); > assertThat(str(root), > is("LogicalFilter(condition=[CAST(varchar_field.$0):BOOLEAN NOT > NULL])\n" > + " LogicalTableScan(table=[[scott, EMP]])\n")); > } > {code} > It happens because {{SqlTypeUtil.equalSansNullability()}} returns true if any > of its arguments has {{ANY}} type. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-2069) RexSimplify.removeNullabilityCast() always removes cast for operand with ANY type
[ https://issues.apache.org/jira/browse/CALCITE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271723#comment-16271723 ] Aman Sinha commented on CALCITE-2069: - [~vvysotskyi] I didn't fully understand the motivation in the JIRA description. Suppose I have a table with 2 columns containing the strings 'true' and 'false'. These columns will show as ANY type in Drill. If I run the following query, I still see the CAST function; it is not dropped. {noformat} explain plan for select b from dfs.tmp.test2 where cast(b as boolean) is false; ... Filter(condition=[IS FALSE(CAST($0):BOOLEAN)]) : rowType = RecordType(ANY b): ... {noformat} (note, I am working with the older calcite version, so it is possible this behavior may have changed). > RexSimplify.removeNullabilityCast() always removes cast for operand with ANY > type > - > > Key: CALCITE-2069 > URL: https://issues.apache.org/jira/browse/CALCITE-2069 > Project: Calcite > Issue Type: Bug >Reporter: Volodymyr Vysotskyi >Assignee: Julian Hyde > > When a field is received from Dynamic Table, its type left {{ANY}}, and it is > used in the filter condition with the cast, which actually should produce > physical cast (for example we are trying to cast varchar to boolean) > {{RexSimplify.removeNullabilityCast()}} removes this cast and lefts only > field in condition. > This test helps to observe this issue: > {code:java} > @Test public void testFilterCastAny() { > final RelBuilder builder = RelBuilder.create(config().build()); > final RelDataType intType = > builder.getTypeFactory().createSqlType(SqlTypeName.ANY); > RelNode root = > builder.scan("EMP") > .filter( > builder.cast( > builder.patternField("varchar_field", intType, 0), > SqlTypeName.BOOLEAN)) > .build(); > assertThat(str(root), > is("LogicalFilter(condition=[CAST(varchar_field.$0):BOOLEAN NOT > NULL])\n" > + " LogicalTableScan(table=[[scott, EMP]])\n")); > } > {code} > It happens because {{SqlTypeUtil.equalSansNullability()}} returns true if any > of its arguments has {{ANY}} type. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1048) Make metadata more robust
[ https://issues.apache.org/jira/browse/CALCITE-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246869#comment-16246869 ] Aman Sinha commented on CALCITE-1048: - The comments in RelMdMaxRowCount point to this JIRA: {noformat} public Double getMaxRowCount(RelSubset rel, RelMetadataQuery mq) { // FIXME This is a short-term fix for [CALCITE-1018]. A complete // solution will come with [CALCITE-1048]. Util.discard(Bug.CALCITE_1048_FIXED); ... } {noformat} It seems the goal of this JIRA is much broader and I am not sure of its status. For the RelSubset's max row count, should we consider adding specific implementations similar to what was done for CALCITE-1018 (for sort with limit) ? For example, if the RelSubset contains Aggregate with no group-by it will have a max rowcount of 1. > Make metadata more robust > - > > Key: CALCITE-1048 > URL: https://issues.apache.org/jira/browse/CALCITE-1048 > Project: Calcite > Issue Type: Bug >Reporter: Julian Hyde >Assignee: Julian Hyde > > Following CALCITE-794, make metadata more robust and performant, so we can > safely derive metadata from a large RelNode graph. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CALCITE-1503) Infinite loop occurs during query planning
[ https://issues.apache.org/jira/browse/CALCITE-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687620#comment-15687620 ] Aman Sinha commented on CALCITE-1503: - [~migueltaoliveira] can you attach the jstack output ? It will help narrow down the issue. > Infinite loop occurs during query planning > -- > > Key: CALCITE-1503 > URL: https://issues.apache.org/jira/browse/CALCITE-1503 > Project: Calcite > Issue Type: Bug > Components: core >Affects Versions: 1.10.0 >Reporter: Miguel Oliveira >Assignee: Julian Hyde > > The following query: > {code} > SELECT count(*) FROM ( > SELECT count(v1.`region_id`) `Count Region`, v6.`fullname` > `Customer (Name)` > FROM `foodmart`.`region` v1 > JOIN `foodmart`.`store` v3 ON v1.`region_id` = v3.`region_id` > JOIN `foodmart`.`customer` v6 ON v1.`region_id` = > v6.`customer_region_id` > JOIN `foodmart`.`sales_fact_1998` v15 ON v3.`store_id` = > v15.`store_id` AND v6.`customer_id` = v15.`customer_id` > WHERE v3.`store_name` LIKE '%Grocery%' > GROUP BY v6.`customer_region_id`,v6.`fullname`) a > {code} > causes an infinite loop during query plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CALCITE-872) Add support for aborting the query optimization process
[ https://issues.apache.org/jira/browse/CALCITE-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359486#comment-15359486 ] Aman Sinha commented on CALCITE-872: [~julianhyde] the proposal for the cancelFlag sounds reasonable to me. It wasn't obvious to me why CALCITE-1227 (streaming CSV reader) would be related, but I see that it adds functionality for cancel there. I would have thought some additional change would be needed in the VolcanoPlanner and HepPlanner to check for the cancel flag to interrupt planning for the 3 things mentioned in this JIRA description. Any thoughts ? > Add support for aborting the query optimization process > --- > > Key: CALCITE-872 > URL: https://issues.apache.org/jira/browse/CALCITE-872 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.4.0-incubating >Reporter: Aman Sinha >Assignee: Julian Hyde > > We should have the facility to abort the query optimization process. There > are several motivations for having this: > 1. The optimizer's join planning may take too long (order of minutes) when > working with larger number of tables. > 2. Certain sequence of rule applications may cause a cycle. > 3. Operations related to metadata could potentially introduce a dependency > cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CALCITE-1288) Avoid doing the same join twice if count(distinct) exists
[ https://issues.apache.org/jira/browse/CALCITE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325730#comment-15325730 ] Aman Sinha commented on CALCITE-1288: - [~julianhyde], for systems that don't support Grouping Sets (which is the enhancement you implemented in CALCITE-732), it would be useful to have this rewrite. Drill currently does not have GS but I would imagine some other systems may also benefit, even though this rewrite is specific to a single agg(distinct) combined with other non-distinct aggregates. What do you think ? > Avoid doing the same join twice if count(distinct) exists > - > > Key: CALCITE-1288 > URL: https://issues.apache.org/jira/browse/CALCITE-1288 > Project: Calcite > Issue Type: Improvement >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > When the query has one distinct aggregate and one or more non-distinct > aggregates, the join instance need not produce the join-based plan. We can > generate multi-phase aggregates. > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) > LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > The more efficient form should look like > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CALCITE-777) IS NOT NULL filter is incorrectly dropped for aggregates and window functions
[ https://issues.apache.org/jira/browse/CALCITE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114823#comment-15114823 ] Aman Sinha commented on CALCITE-777: I'll assign to myself for further investigation. > IS NOT NULL filter is incorrectly dropped for aggregates and window functions > - > > Key: CALCITE-777 > URL: https://issues.apache.org/jira/browse/CALCITE-777 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.3.0-incubating >Reporter: Aman Sinha >Assignee: Julian Hyde > > The below plans show the IS NOT NULL filter is incorrectly dropped. > {code} > select wsum from (select sum(sal) over (partition by deptno) as wsum from > emp) where wsum is not null; > LogicalProject(WSUM=[$0]) > LogicalProject(WSUM=[SUM($5) OVER (PARTITION BY $7 RANGE BETWEEN UNBOUNDED > PRECEDING AND UNBOUNDED FOLLOWING)]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > {code} > select wsum from (select sum(sal) as wsum from emp group by deptno) where > wsum is not null; > LogicalProject(WSUM=[$0]) > LogicalProject(WSUM=[$1]) > LogicalAggregate(group=[{0}], WSUM=[SUM($1)]) > LogicalProject(DEPTNO=[$7], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CALCITE-777) IS NOT NULL filter is incorrectly dropped for aggregates and window functions
[ https://issues.apache.org/jira/browse/CALCITE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha reassigned CALCITE-777: -- Assignee: Aman Sinha (was: Julian Hyde) > IS NOT NULL filter is incorrectly dropped for aggregates and window functions > - > > Key: CALCITE-777 > URL: https://issues.apache.org/jira/browse/CALCITE-777 > Project: Calcite > Issue Type: Bug >Affects Versions: 1.3.0-incubating >Reporter: Aman Sinha >Assignee: Aman Sinha > > The below plans show the IS NOT NULL filter is incorrectly dropped. > {code} > select wsum from (select sum(sal) over (partition by deptno) as wsum from > emp) where wsum is not null; > LogicalProject(WSUM=[$0]) > LogicalProject(WSUM=[SUM($5) OVER (PARTITION BY $7 RANGE BETWEEN UNBOUNDED > PRECEDING AND UNBOUNDED FOLLOWING)]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} > {code} > select wsum from (select sum(sal) as wsum from emp group by deptno) where > wsum is not null; > LogicalProject(WSUM=[$0]) > LogicalProject(WSUM=[$1]) > LogicalAggregate(group=[{0}], WSUM=[SUM($1)]) > LogicalProject(DEPTNO=[$7], SAL=[$5]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)