Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
On July 30, 2015, 9:59 p.m., Aihua Xu wrote: Thanks for your comments Aihua. Unfortunatly I did not see this feedback after I committed the patch. The good thing is that your test case works correctly. I will add more tests later for predicate push down, so I will make sure to add this test. - Sergio --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93649 --- On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
On July 31, 2015, 5:35 a.m., cheng xu wrote: Looks good to me. Just one minor question. cheng xu wrote: Also I am wondering why this search argument works fine for ORC. ORC ignores the partitioned columns later in the code. But Parquet does not, and it fails. On July 31, 2015, 5:35 a.m., cheng xu wrote: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java, lines 103-104 https://reviews.apache.org/r/36942/diff/2/?file=1025266#file1025266line103 Why we need to create the leaf when columns is null? I left this for compatibility tests used on TestConvertAstToSearchArg.java and TestParquetRecordReaderWrapper.java. - Sergio --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93697 --- On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93711 --- Looks good to me. - Dong Chen On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- Review request for hive and Aihua Xu. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93651 --- This looks good to me. - Reuben Kuhnert On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Changes --- Thanks Reuben for your feedback. This new patch includes fixes for your comments and the failured tests that appear on Jira. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93649 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java (line 142) https://reviews.apache.org/r/36942/#comment148060 nit: typo - should be schema. ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q (line 7) https://reviews.apache.org/r/36942/#comment148066 Can you add a new partition p2 with data so that we can show only the data from p1 are returned? Not completely follow the logic, but I'm worrying that p='p1' gets removed. - Aihua Xu On July 30, 2015, 9:22 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 9:22 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
On July 31, 2015, 1:35 p.m., cheng xu wrote: Looks good to me. Just one minor question. Also I am wondering why this search argument works fine for ORC. - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93697 --- On July 31, 2015, 5:22 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 31, 2015, 5:22 a.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93697 --- Looks good to me. Just one minor question. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (lines 103 - 104) https://reviews.apache.org/r/36942/#comment148112 Why we need to create the leaf when columns is null? - cheng xu On July 31, 2015, 5:22 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 31, 2015, 5:22 a.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena
Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/#review93587 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 54) https://reviews.apache.org/r/36942/#comment147977 If the goal here is to get just the top-level fields, can we do something like: ``` for (Type field : schema.getFields()) { columns.add(field.getName()); } ``` This might be a little bit clearer. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 64) https://reviews.apache.org/r/36942/#comment147969 Minor nit: Since we have the opportunity to fix it, can we change 'leafs' to 'leaves'. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java (line 102) https://reviews.apache.org/r/36942/#comment147978 ListT has O(N) lookup time. Can we store this in a SetT (O(1)) instead? - Reuben Kuhnert On July 30, 2015, 3:43 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36942/ --- (Updated July 30, 2015, 3:43 p.m.) Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho. Bugs: HIVE-11401 https://issues.apache.org/jira/browse/HIVE-11401 Repository: hive-git Description --- The following patch reviews the predicate created by Hive, and removes any column that does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter the columns correctly. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 49e52da2e26fd7213df1db88716eaee94cb536b8 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java 87dd344534f09c7fc565fdc467ac82a51f37ebba ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION Diff: https://reviews.apache.org/r/36942/diff/ Testing --- Unit tests: TestParquetFilterPredicate.java Integration tests: parquet_predicate_pushdown.q Thanks, Sergio Pena