[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721484#comment-15721484 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841701/HIVE-15239.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10761 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=44) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2412/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2412/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2412/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841701 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch, HIVE-15239.4.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721233#comment-15721233 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841693/HIVE-15239.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10731 tests executed *Failed tests:* {noformat} TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=143) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_interval_mapjoin.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2410/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2410/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2410/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841693 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch, HIVE-15239.4.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713840#comment-15713840 ] Xuefu Zhang commented on HIVE-15239: +1 > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch, HIVE-15239.4.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712934#comment-15712934 ] Xuefu Zhang commented on HIVE-15239: +1 with minor comment on RB. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712003#comment-15712003 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841281/HIVE-15239.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10753 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=132) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=92) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2359/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2359/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2359/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841281 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711469#comment-15711469 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841244/HIVE-15239.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10738 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=94) [join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,stats16.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q] org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2357/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2357/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2357/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841244 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch, > HIVE-15239.3.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710944#comment-15710944 ] Xuefu Zhang commented on HIVE-15239: [~lirui] do mind creating a RB for this? Thanks. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710435#comment-15710435 ] Xuefu Zhang commented on HIVE-15239: Sorry for the delay. Re: my point #1, I was referring to this: {code} SetfirstRootOperators = first.getAllRootOperators(); Set secondRootOperators = second.getAllRootOperators(); if (firstRootOperators.size() != secondRootOperators.size()) { return false; } // need to check paths and partition desc for MapWorks if (first instanceof MapWork && !compareMapWork((MapWork) first, (MapWork) second)) { return false; } {code} I think it's better to be like the following in order to logical unit of code together. {code} // need to check paths and partition desc for MapWorks if (first instanceof MapWork && !compareMapWork((MapWork) first, (MapWork) second)) { return false; } Set firstRootOperators = first.getAllRootOperators(); Set secondRootOperators = second.getAllRootOperators(); if (firstRootOperators.size() != secondRootOperators.size()) { return false; } {code} As to exhaustive check, your fix will solve the problem describe here. I would even believe there is a possibility that there are two two mapwork that works on different partitions of the same table, such as in case of union. Overall, I feel more testing is needed for this feature. Of course this goes beyond the scope of this JIRA. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707619#comment-15707619 ] Rui Li commented on HIVE-15239: --- Pinging [~xuefuz] > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700706#comment-15700706 ] Rui Li commented on HIVE-15239: --- Thanks [~csun] for the explanations. I think these tests have been ignored for a while because some of them fail when I explicitly run with TestMiniSparkOnYarnCliDriver. I'll fix and re-enable them in a separate JIRA. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696544#comment-15696544 ] Chao Sun commented on HIVE-15239: - [~lirui] yes I added this. It's for the test cases that should ONLY run under HoS, like HoS dynamic partition pruning. Part of the change is in the test node's config file, which is not part of the git repository. I guess much has changed since then for the qtest framework so this needs to be changed as well. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15694624#comment-15694624 ] Rui Li commented on HIVE-15239: --- Latest failures are not related. And I guess {{spark.only.query.files}} are not automatically picked up. We can enable them in a follow-on. [~xuefuz] please have another look. Thanks. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693097#comment-15693097 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12840399/HIVE-15239.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10733 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=146) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=92) org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable (batchId=218) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2282/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2282/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2282/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12840399 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692721#comment-15692721 ] Rui Li commented on HIVE-15239: --- Thanks [~xuefuz] for the suggestions. 1. Not sure if I'm following your point. The code is in method {{compareWork}}, which checks if two works are equivalent. The patch adds some special checking for MapWork. If the check fails, we don't have to check the operators. 2. OK I'll move the null check to compare methods. Some of them need to stay in the caller though, otherwise we'll get NPEs. 3. I thought about override the equals method of each corresponding classes. But I'm not sure how to override the hashCode methods accordingly. The fields used in the comparison are same as those used in the clone method of each classes. So I think it's exhaustive. Actually I'm not sure if it's necessary to go this far in the comparison. We can simply compare the paths to solve the example problem in this JIRA - different paths mean the MapWorks are for different tables/partitions. I don't know if it's ever possible that two MapWorks point to the same path but have different PartitionDesc. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691764#comment-15691764 ] Xuefu Zhang commented on HIVE-15239: Patch looks good. Two minor comments: 1. The following code seems being inserted in the middle of other code block. {code} // need to check paths and partition desc for MapWorks if (first instanceof MapWork && !compareMapWork((MapWork) first, (MapWork) second)) { return false; } {code} 2. As a custom, null check and null equal check might be better in the compare method itself rather than letting the caller take the responsibility. This applies to the few private methods introduced, but no big deal though. 3. I'm not sure if it makes sense to put these compare() methods in the corresponding classes. Otherwise, these comparisons can be easily broken. One concern I have is whether the comparisons are exhaustive. That is, whether the condition check is sufficient. With some many noisy fields in those compared classes, it's hard to see which are important and which are not. Thoughts? > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690856#comment-15690856 ] Hive QA commented on HIVE-15239: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12840261/HIVE-15239.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 10703 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=128) [union_remove_15.q,bucket_map_join_tez1.q,groupby7_noskew.q,bucketmapjoin1.q,subquery_multiinsert.q,auto_join8.q,auto_join6.q,groupby2_map_skew.q,lateral_view_explode2.q,join28.q,load_dyn_part1.q,skewjoinopt17.q,skewjoin_union_remove_1.q,union_remove_20.q,bucketmapjoin5.q] TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=94) [parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,groupby3.q,bucketsortoptimize_insert_8.q,union20.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=43) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_part] (batchId=105) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=123) org.apache.hive.hcatalog.api.TestHCatClient.testBasicDDLCommands (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testCreateTableLike (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testDatabaseLocation (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testDropPartitionsWithPartialSpec (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testDropTableException (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testEmptyTableInstantiation (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testGetMessageBusTopicName (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testGetPartitionsWithPartialSpec (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testObjectNotFoundException (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testOtherFailure (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSchema (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testRenameTable (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testReplicationTaskIter (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=166) org.apache.hive.hcatalog.api.TestHCatClient.testUpdateTableSchema (batchId=166) org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish (batchId=172) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2262/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2262/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2262/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 29 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12840261 - PreCommit-HIVE-Build > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli >Assignee: Rui Li > Attachments: HIVE-15239.1.patch > > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676230#comment-15676230 ] wangwenli commented on HIVE-15239: -- [~xuefu.w...@kodak.com] different problem? what is the problem? come to this issue, in {code} org.apache.hadoop.hive.ql.optimizer.spark.CombineEquivalentWorkResolver.EquivalentWorkMatcher.compareWork() {code} it check operator is same or not, here the tablescan operator is same base on the currently impl TableScanOperatorComparator, but they are different tables tablescan, should not be same. Maybe we can add one more check, check the table is same or not. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
[ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675878#comment-15675878 ] Xuefu Zhang commented on HIVE-15239: cc: [~lirui]. I got a different problem with cdh 5.7. Currently having problem to set up Hive on Spark with latest code. > hive on spark combine equivalentwork get wrong result because of tablescan > operation compare > - > > Key: HIVE-15239 > URL: https://issues.apache.org/jira/browse/HIVE-15239 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.0, 2.1.0 >Reporter: wangwenli > > env: hive on spark engine > reproduce step: > {code} > create table a1(KEHHAO string, START_DT string) partitioned by (END_DT > string); > create table a2(KEHHAO string, START_DT string) partitioned by (END_DT > string); > alter table a1 add partition(END_DT='20161020'); > alter table a1 add partition(END_DT='20161021'); > insert into table a1 partition(END_DT='20161020') > values('2000721360','20161001'); > SELECT T1.KEHHAO,COUNT(1) FROM ( > SELECT KEHHAO FROM a1 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > UNION ALL > SELECT KEHHAO FROM a2 T > WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND > T.END_DT-1 > ) T1 > GROUP BY T1.KEHHAO > HAVING COUNT(1)>1; > +-+--+--+ > | t1.kehhao | _c1 | > +-+--+--+ > | 2000721360 | 2| > +-+--+--+ > {code} > the result should be none record -- This message was sent by Atlassian JIRA (v6.3.4#6332)