[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847891#comment-15847891
 ] 

Sergey Shelukhin edited comment on HIVE-15680 at 2/1/17 2:26 AM:
-

Submitted an addendum patch .04 there. You can try your patch with that patch 
to see if that makes tests pass here. Not sure why it triggers that path 
though... hopefully it doesn't somehow break projection. Although in this case 
it's a text table.


was (Author: sershe):
Submitted a patch there. You can try your patch with that patch to see if that 
makes tests pass here. Not sure why it triggers that path though... hopefully 
it doesn't somehow break projection. Although in this case it's a text table.

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15680.1.patch, HIVE-15680.2.patch, 
> HIVE-15680.3.patch, HIVE-15680.4.patch, HIVE-15680.5.patch, HIVE-15680.6.patch
>
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832786#comment-15832786
 ] 

Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM:
-

Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

Here's the explain plan, which does show a single mapper processing two table 
scans:
{noformat}
hive (default)> explain
  > select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: x
filterExpr: (number = 1) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 1) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  TableScan
alias: y
filterExpr: (number = 2) (type: boolean)
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (number = 2) (type: boolean)
  Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
Union
  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

Time taken: 0.237 seconds, Fetched: 55 row(s)
{noformat}


was (Author: erwaman):
Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
  > union all
  > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 

[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

2017-01-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832171#comment-15832171
 ] 

Gopal V edited comment on HIVE-15680 at 1/20/17 5:51 PM:
-

[~erwaman]: is this only happening for MRv2?

{code}
hive> -- This should return 2 records but only returns 1 record
hive> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
Query ID = gopal_20170120125021_ea181e13-828c-42e7-8070-6a09a715b694
Total jobs = 1
Launching Job 1 out of 1


--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED  1  100
   0   0
Map 3 ..  llap SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 0.16 s 
--
Status: DAG finished successfully in 0.16 seconds

Query Execution Summary
--
OPERATIONDURATION
--
Compile Query   0.30s
Prepare Plan0.21s
Submit Plan 0.27s
Start DAG   0.00s
Run DAG 0.15s
--

Task Execution Summary
--
  VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
OUTPUT_RECORDS
--
 Map 1   0.00 00  1 
  0
 Map 3   0.00 00  1 
  0
--

LLAP IO Summary
--
  VERTICES ROWGROUPS  META_HIT  META_MISS  DATA_HIT  DATA_MISS  ALLOCATION 
USED  TOTAL_IO
--
 Map 1 1 0  20B 6B262.14KB  
 3B 0.03s
 Map 3 1 0  20B 6B262.14KB  
 3B 0.03s
--

FileSystem Counters Summary

Scheme: HDFS
--
  VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  BYTES_WRITTEN  
   WRITE_OPS
--
 Map 1257B 6  0   101B  
   2
 Map 3257B 6  0   101B  
   2
--

Scheme: FILE
--
  VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  BYTES_WRITTEN  
   WRITE_OPS
--
 Map 1  0B 0  0 0B  
   0
 Map 3  0B 0  0 0B  
   0
--

OK
1
2
Time taken: 1.038 seconds, Fetched: 2 row(s)
{code}


was (Author: gopalv):
[~erwaman]: is this only happening for MRv2?

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> --
>
> Key: HIVE-15680
> URL: https://issues.apache.org/jira/browse/HIVE-15680
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set