Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
......................................................................


Patch Set 26:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7
PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common 
types
> I see. Is there a plan to extend it to non top level fields? Btw, does Parq
No, I don't think we are planning to add filtering for complex types.
Parquet MR only cares about leaf values, not complex values and also it is not 
clear how we could interpret operations (<>=) on these complex types.
Concerning non-top-level values inside complex types, I'm not sure whether it 
is possible in Parquet-MR. I don't think we should do it now, we could have a 
look at it if it turns out to be important.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887
PS24, Line 887:
> Yeah, adding a test is never a bad idea and it shouldn't be hard to add suc
Done.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483
PS24, Line 1483:
> Oh, I see. Thanks for taking a look.
Done


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716
PS24, Line 1716:           
bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size()));
> I think it's OK as it is. Based on the column idx and query text I think we
Done


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37
PS24, Line 37: ', to the given Pa
> I see, thanks for extending the comment.
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h@17
PS23, Line 17: #pragma once
> nit: missing include guard
Added #pragma once.


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc
File be/src/exec/parquet/parquet-bloom-filter-util.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc@18
PS23, Line 18: #include "exec/parquet/parquet-bloom-filter-util.h"
             :
             : #include <sstream>
             :
             : #include "exprs/scalar-expr-evaluator.h"
             : #include "util/parquet-bloom-filter.h"
> nit: include order, usually it is standard include directories first then u
Isn't the first include in a .cc file usually the corresponding .h file? See 
for example be/src/exec/parquet/hdfs-parquet-scanner.cc.


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc
File be/src/util/impala-bloom-filter-buffer-allocator.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc@19
PS23, Line 19: #include "runtime/exec-env.h"
> nit: empty line
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc
File be/src/util/parquet-bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc@18
PS23, Line 18: #include <cmath>
             : #include <unordered_set>
             : #include <vector>
             :
             : #include "common/names.h"
             : #include "testutil/gtest-util.h"
             : #include "util/parquet-bloom-filter.h"
             :
> nit: include order, usually it is standard include directories first then u
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc
File be/src/util/parquet-bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc@18
PS23, Line 18: #include "parquet-bloom-filter.h"
             :
             : #include <cmath>
             : #include <cstdint>
             :
             : #include "kudu/util/slice.h"
             : #include "kudu/util/status.h"
             : #include "util/kudu-status-util.h"
             :
             : #include "thirdparty/xxhash/xxhash.h"
> nit: include order, usually it is standard include directories first then u
Isn't the first include in a .cc file usually the corresponding .h file? See 
for example be/src/exec/parquet/hdfs-parquet-scanner.cc.


http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py@20
PS23, Line 20:
> nit: empty line
Done



--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 26
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 15 Apr 2021 12:42:04 +0000
Gerrit-HasComments: Yes

Reply via email to