[jira] [Commented] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-10-23 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219496#comment-17219496
 ] 

Nemon Lou commented on HIVE-24165:
--

Not able to reproduce in master branch.

After upgrade calcite from 1.16.0 to 1.17.0,this bug also gone for branch3 with 
multi distinct rewrite.

May be fixed in CALCITE-2232

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at 

[jira] [Commented] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-15 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195976#comment-17195976
 ] 

Panagiotis Garefalakis commented on HIVE-24165:
---

Hey [~nemon] would you mind openning a PR for this?

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
> Attachments: HIVE-24165.patch
>
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> 

[jira] [Commented] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-14 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195807#comment-17195807
 ] 

Nemon Lou commented on HIVE-24165:
--

In fact , i reproduce this issue by apply HIVE-22448 back to Hive branch 3.1.2. 
Master branch should have the same issue.

AggregateProjectPullUpConstantsRule expects groupSet in Aggregate to be ordered 
and start with 0, like \{0,1,2}.but after multiple distinct rewrite, groupSet 
is \{3,4,5}.

 

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at