[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851535#comment-15851535 ] Yongzhi Chen commented on HIVE-15782: - Agree with, we'd better make the value right first. The patch looks good. +1 > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850323#comment-15850323 ] Aihua Xu commented on HIVE-15782: - We will not filter the data when there are nonsupported data types and currently Hive is returning incorrect result. I will investigate if we can support decimal, date and timestamp in the following jiras. > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850325#comment-15850325 ] Aihua Xu commented on HIVE-15782: - The test failures are not related. > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850283#comment-15850283 ] Hive QA commented on HIVE-15782: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12850645/HIVE-15782.2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11009 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=106) [bucketsortoptimize_insert_4.q,multi_insert_mixed.q,vectorization_10.q,auto_join18_multi_distinct.q,join_cond_pushdown_3.q,custom_input_output_format.q,skewjoinopt5.q,vectorization_part_project.q,vector_count_distinct.q,skewjoinopt4.q,count.q,parallel.q,union33.q,union_lateralview.q,nullgroup4.q] org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=137) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3330/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3330/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3330/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12850645 - PreCommit-HIVE-Build > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850076#comment-15850076 ] Yongzhi Chen commented on HIVE-15782: - The source code change makes sense. But it may have performance issue for some query. Should you treat "or" statement , "and" statement differently? > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch, HIVE-15782.2.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849002#comment-15849002 ] Hive QA commented on HIVE-15782: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12850463/HIVE-15782.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11023 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes (batchId=253) org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes2 (batchId=253) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3309/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3309/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3309/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12850463 - PreCommit-HIVE-Build > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15782) query on parquet table returns incorrect result when hive.optimize.index.filter is set to true
[ https://issues.apache.org/jira/browse/HIVE-15782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848830#comment-15848830 ] Aihua Xu commented on HIVE-15782: - patch-1: decimal, date and timestamp currently are supported for filtering for parquet. Currently if there is such type in the filter expression, such subexpression with that type is incorrectly ignored. With the patch, if we can't convert search argument into filter expression, then no filtering will be applied on parquet files. > query on parquet table returns incorrect result when > hive.optimize.index.filter is set to true > --- > > Key: HIVE-15782 > URL: https://issues.apache.org/jira/browse/HIVE-15782 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-15782.1.patch > > > When hive.optimize.index.filter is set to true, the parquet table is filtered > using the parquet column index. > {noformat} > set hive.optimize.index.filter=true; > CREATE TABLE t1 ( > name string, > dec decimal(5,0) > ) stored as parquet; > insert into table t1 values('Jim', 3); > insert into table t1 values('Tom', 5); > select * from t1 where (name = 'Jim' or dec = 5); > {noformat} > Only one row {{Jim, 3}} is returned, but both should be returned. -- This message was sent by Atlassian JIRA (v6.3.15#6346)