Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
......................................................................


Patch Set 25: Code-Review+1

(5 comments)

Thanks Daniel, the change LGTM. Tamas said he's planning to take another look 
so I'm only giving +1 for now.

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7
PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common 
types
> I think it can only apply to top level columns as I check whether the type
I see. Is there a plan to extend it to non top level fields? Btw, does 
Parquet-MR generate bloom filters for them as well?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887
PS24, Line 887:
> I've added the query option and put this code and the call to CreateColIdx2
Yeah, adding a test is never a bad idea and it shouldn't be hard to add such a 
test in a .test file.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483
PS24, Line 1483:
> I've considered that but ParquetPageIndex::ReadAll()
Oh, I see. Thanks for taking a look.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716
PS24, Line 1716:           
bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size()));
> I included the row group idx, column idx and filename in the log message. I
I think it's OK as it is. Based on the column idx and query text I think we 
should be able to identify the conjunct.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37
PS24, Line 37: .
> I do not want to guarantee in which cases '*storage' will be used as the im
I see, thanks for extending the comment.



--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 25
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:55:19 +0000
Gerrit-HasComments: Yes

Reply via email to