Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 )
Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types ...................................................................... Patch Set 25: Code-Review+1 (5 comments) Thanks Daniel, the change LGTM. Tamas said he's planning to take another look so I'm only giving +1 for now. http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7 PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common types > I think it can only apply to top level columns as I check whether the type I see. Is there a plan to extend it to non top level fields? Btw, does Parquet-MR generate bloom filters for them as well? http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887 PS24, Line 887: > I've added the query option and put this code and the call to CreateColIdx2 Yeah, adding a test is never a bad idea and it shouldn't be hard to add such a test in a .test file. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483 PS24, Line 1483: > I've considered that but ParquetPageIndex::ReadAll() Oh, I see. Thanks for taking a look. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716 PS24, Line 1716: bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size())); > I included the row group idx, column idx and filename in the log message. I I think it's OK as it is. Based on the column idx and query text I think we should be able to identify the conjunct. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37 PS24, Line 37: . > I do not want to guarantee in which cases '*storage' will be used as the im I see, thanks for extending the comment. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 25 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 12 Apr 2021 15:55:19 +0000 Gerrit-HasComments: Yes
