[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-03 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851535#comment-15851535
 ] 

Yongzhi Chen commented on HIVE-15782:
-

Agree with, we'd better make the value right first.
The patch looks good.  +1

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-02 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850323#comment-15850323
 ] 

Aihua Xu commented on HIVE-15782:
-

We will not filter the data when there are nonsupported data types and 
currently Hive is returning incorrect result. I will investigate if we can 
support decimal, date and timestamp in the following jiras.

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-02 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850325#comment-15850325
 ] 

Aihua Xu commented on HIVE-15782:
-

The test failures are not related.

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850283#comment-15850283
 ] 

Hive QA commented on HIVE-15782:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850645/HIVE-15782.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11009 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=106)

[bucketsortoptimize_insert_4.q,multi_insert_mixed.q,vectorization_10.q,auto_join18_multi_distinct.q,join_cond_pushdown_3.q,custom_input_output_format.q,skewjoinopt5.q,vectorization_part_project.q,vector_count_distinct.q,skewjoinopt4.q,count.q,parallel.q,union33.q,union_lateralview.q,nullgroup4.q]
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=137)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3330/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3330/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3330/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850645 - PreCommit-HIVE-Build

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-02 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850076#comment-15850076
 ] 

Yongzhi Chen commented on HIVE-15782:
-

The source code change makes sense. But it may have performance issue for some 
query. Should you treat "or" statement , "and" statement differently?

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849002#comment-15849002
 ] 

Hive QA commented on HIVE-15782:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850463/HIVE-15782.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes
 (batchId=253)
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes2
 (batchId=253)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3309/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3309/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3309/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850463 - PreCommit-HIVE-Build

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true

2017-02-01 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848830#comment-15848830
 ] 

Aihua Xu commented on HIVE-15782:
-

patch-1: decimal, date and timestamp currently are supported for filtering for 
parquet. Currently if there is such type in the filter expression, such 
subexpression with that type is incorrectly ignored. 

With the patch, if we can't convert search argument into filter expression, 
then no filtering will be applied on parquet files.

> query on parquet table returns incorrect result when 
> hive.optimize.index.filter is set to true 
> ---
>
> Key: HIVE-15782
> URL: https://issues.apache.org/jira/browse/HIVE-15782
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15782.1.patch
>
>
> When hive.optimize.index.filter is set to true, the parquet table is filtered 
> using the parquet column index. 
> {noformat}
> set hive.optimize.index.filter=true;
> CREATE TABLE t1 (
>   name string,
>   dec decimal(5,0)
> ) stored as parquet;
> insert into table t1 values('Jim', 3);
> insert into table t1 values('Tom', 5);
> select * from t1 where (name = 'Jim' or dec = 5);
> {noformat}
> Only one row {{Jim, 3}} is returned, but both should be returned. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)