[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492558#comment-16492558 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-392500411 @vvysotskyi The problems I mentioned above do not appear at the master . I will close this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481727#comment-16481727 ] ASF GitHub Bot commented on DRILL-5913: --- vvysotskyi commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-390423923 @weijietong, yes, they should never happen. I think the bug is that was used `sum0` instead of `sum` aggregate function or vice versa. `sum0` has the non-nullable return type, but the return type of `sum` is nullable. Could you please review `DrillReduceAggregatesRule` and check for this problem? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468197#comment-16468197 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387589324 @vvysotskyi do you make sure that two same name aggs with the same input ref but different data types should never happen here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468168#comment-16468168 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387584681 I have noticed the return type inference codes. The return type is right. To current implementation, if there are two agg calls with the same inputs but truly with different data types,it will definitely go to choose wrong agg call for reuse and cause potential errors. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467536#comment-16467536 ] ASF GitHub Bot commented on DRILL-5913: --- vvysotskyi commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387434799 @weijietong, the root cause of this error may have another reason. Drill has its own rules to determine the return type for aggregate functions, and in the most cases, it differs from Calcite rules. I suppose in some places were used rules for Calcite, but should be used rules for Drill. Regarding the creating new calls for the same agg call with the same inputs, I suppose it will be ineffective to create the new calls only because return types differ. The problem is that return type was chosen incorrectly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467360#comment-16467360 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387387738 @vvysotskyi I have tested the jira issue sql on the current master and it passed .But another new test case: ``` @Test public void testDRILL5913_t() throws Exception { test("select n_nationkey, stddev((case when ( bigint_col ) >0 then 1 else 0 end)) * 1.0 as col1, avg((case when ( bigint_col) >0 then 1 else 0 end)) * 1.0 as col2 from " + "( select n_name,n_nationkey, sum( n_regionkey) as bigint_col from cp.`tpch/nation.parquet` group by n_name,n_nationkey ) t group by n_nationkey"); } ``` will throw another Exception at Drill version 1.13 with Calcite version 1.15 but passed at current master. The exception message is: ``` Caused by: java.lang.AssertionError: Type mismatch: rel rowtype: RecordType(ANY n_nationkey, BIGINT $f1, BIGINT $f2, BIGINT NOT NULL $f3, BIGINT $f4) NOT NULL equivRel rowtype: RecordType(ANY n_nationkey, BIGINT $f1, BIGINT $f2, BIGINT NOT NULL $f3, BIGINT NOT NULL $f4) NOT NULL ``` All of the main reason is that DrillReduceAggregationRule.reduceAgg invoked RexBuilder.addAggCall method whose parameter aggCallMapping acts as a AggCall cache. The aggCallMapping cache only care about the call name not the data type. The current master code of Calcite does nothing about this part since I find this bug. I don't think I can exhaustive all the test cases to prove our current master implementation right. But it seems security to have my tuned part of codes (validating AggCall cache with data type) to the master to prevent any future possible issues. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466861#comment-16466861 ] ASF GitHub Bot commented on DRILL-5913: --- vvysotskyi commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387284020 @weijietong, could you please check that this bug is still reproduced on current master? I tried a query from the Jira description and it is finished successfully. I suppose it was fixed in the scope of Calcite upgrade. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466813#comment-16466813 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387274177 @KulykRoman seems you are familiar with this part of codes . Could you also take look at this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466802#comment-16466802 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387271505 @vvysotskyi @amansinha100 could you take a look at this PR. I ever contact with @julianhyde . Since Calcite treats stddev stddev_samp input parameter data type as their original data type,no cast behavior happens at its` AggregateReduceFunctionsRule` implementation.So this error will not happen at Calcite. So this PR changes our Drill own `DrillReduceAggregatesRule` implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249111#comment-16249111 ] ASF GitHub Bot commented on DRILL-5913: --- Github user weijietong commented on the issue: https://github.com/apache/drill/pull/1016 @amansinha100 maybe you are familiar with this part of codes . Could you give a review ? anyone else will also be welcome. > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223467#comment-16223467 ] ASF GitHub Bot commented on DRILL-5913: --- GitHub user weijietong opened a pull request: https://github.com/apache/drill/pull/1016 DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… `DrillReduceAggregatesRule` mix the processing of same functions with same inputRefs but different dataTypes. The error info and related reproducible sample sql are [here](https://issues.apache.org/jira/browse/DRILL-5913) I will also try to concat the Calcite devs to make sure whether they agree to make the `RexBuilder.addAggCall` distinguish the same `AggregateCall`s with same inputRefs but different dataTypes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/weijietong/drill DRILL-5913 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1016.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1016 commit 6a5cee4e8f2c5955f88c20ace182b829a1ecd51e Author: weijie.tongDate: 2017-10-28T12:14:55Z solve the mixed processing of same functions of same inputRefs but different dataTypes > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v6.4.14#64029)