[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query
[ https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847891#comment-15847891 ] Sergey Shelukhin edited comment on HIVE-15680 at 2/1/17 2:26 AM: - Submitted an addendum patch .04 there. You can try your patch with that patch to see if that makes tests pass here. Not sure why it triggers that path though... hopefully it doesn't somehow break projection. Although in this case it's a text table. was (Author: sershe): Submitted a patch there. You can try your patch with that patch to see if that makes tests pass here. Not sure why it triggers that path though... hopefully it doesn't somehow break projection. Although in this case it's a text table. > Incorrect results when hive.optimize.index.filter=true and same ORC table is > referenced twice in query > -- > > Key: HIVE-15680 > URL: https://issues.apache.org/jira/browse/HIVE-15680 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.0 >Reporter: Anthony Hsu >Assignee: Anthony Hsu > Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, > HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch, HIVE-15680.6.patch > > > To repro: > {noformat} > set hive.optimize.index.filter=true; > create table test_table(number int) stored as ORC; > -- Two insertions will create two files, with one stripe each > insert into table test_table VALUES (1); > insert into table test_table VALUES (2); > -- This should and does return 2 records > select * from test_table; > -- These should and do each return 1 record > select * from test_table where number = 1; > select * from test_table where number = 2; > -- This should return 2 records but only returns 1 record > select * from test_table where number = 1 > union all > select * from test_table where number = 2; > {noformat} > What's happening is only the last predicate is being pushed down. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query
[ https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786 ] Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM: - Same issue, even with explicit aliases: {noformat} hive (default)> set hive.optimize.index.filter=true; hive (default)> select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-01-20 19:38:11,937 Stage-1 map = 100%, reduce = 0% Ended Job = job_local876667430_0002 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 2 Time taken: 1.711 seconds, Fetched: 1 row(s) {noformat} Here's the explain plan, which does show a single mapper processing two table scans: {noformat} hive (default)> explain > select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: x filterExpr: (number = 1) (type: boolean) Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (number = 1) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 1 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Union Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TableScan alias: y filterExpr: (number = 2) (type: boolean) Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (number = 2) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 2 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Union Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 0.237 seconds, Fetched: 55 row(s) {noformat} was (Author: erwaman): Same issue, even with explicit aliases: {noformat} hive (default)> set hive.optimize.index.filter=true; hive (default)> select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-01-20 19:38:11,937 Stage-1 map = 100%, reduce = 0% Ended Job = job_local876667430_0002 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 2 Time taken:
[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query
[ https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832171#comment-15832171 ] Gopal V edited comment on HIVE-15680 at 1/20/17 5:51 PM: - [~erwaman]: is this only happening for MRv2? {code} hive> -- This should return 2 records but only returns 1 record hive> select * from test_table where number = 1 > union all > select * from test_table where number = 2; Query ID = gopal_20170120125021_ea181e13-828c-42e7-8070-6a09a715b694 Total jobs = 1 Launching Job 1 out of 1 -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. llap SUCCEEDED 1 100 0 0 Map 3 .. llap SUCCEEDED 1 100 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 0.16 s -- Status: DAG finished successfully in 0.16 seconds Query Execution Summary -- OPERATIONDURATION -- Compile Query 0.30s Prepare Plan0.21s Submit Plan 0.27s Start DAG 0.00s Run DAG 0.15s -- Task Execution Summary -- VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS -- Map 1 0.00 00 1 0 Map 3 0.00 00 1 0 -- LLAP IO Summary -- VERTICES ROWGROUPS META_HIT META_MISS DATA_HIT DATA_MISS ALLOCATION USED TOTAL_IO -- Map 1 1 0 20B 6B262.14KB 3B 0.03s Map 3 1 0 20B 6B262.14KB 3B 0.03s -- FileSystem Counters Summary Scheme: HDFS -- VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS -- Map 1257B 6 0 101B 2 Map 3257B 6 0 101B 2 -- Scheme: FILE -- VERTICES BYTES_READ READ_OPS LARGE_READ_OPS BYTES_WRITTEN WRITE_OPS -- Map 1 0B 0 0 0B 0 Map 3 0B 0 0 0B 0 -- OK 1 2 Time taken: 1.038 seconds, Fetched: 2 row(s) {code} was (Author: gopalv): [~erwaman]: is this only happening for MRv2? > Incorrect results when hive.optimize.index.filter=true and same ORC table is > referenced twice in query > -- > > Key: HIVE-15680 > URL: https://issues.apache.org/jira/browse/HIVE-15680 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.2.0 >Reporter: Anthony Hsu >Assignee: Anthony Hsu > > To repro: > {noformat} > set