Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 )
Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types ...................................................................... Patch Set 26: (11 comments) http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7 PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common types > I see. Is there a plan to extend it to non top level fields? Btw, does Parq No, I don't think we are planning to add filtering for complex types. Parquet MR only cares about leaf values, not complex values and also it is not clear how we could interpret operations (<>=) on these complex types. Concerning non-top-level values inside complex types, I'm not sure whether it is possible in Parquet-MR. I don't think we should do it now, we could have a look at it if it turns out to be important. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887 PS24, Line 887: > Yeah, adding a test is never a bad idea and it shouldn't be hard to add suc Done. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483 PS24, Line 1483: > Oh, I see. Thanks for taking a look. Done http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716 PS24, Line 1716: bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size())); > I think it's OK as it is. Based on the column idx and query text I think we Done http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37 PS24, Line 37: ', to the given Pa > I see, thanks for extending the comment. Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h@17 PS23, Line 17: #pragma once > nit: missing include guard Added #pragma once. http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc File be/src/exec/parquet/parquet-bloom-filter-util.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc@18 PS23, Line 18: #include "exec/parquet/parquet-bloom-filter-util.h" : : #include <sstream> : : #include "exprs/scalar-expr-evaluator.h" : #include "util/parquet-bloom-filter.h" > nit: include order, usually it is standard include directories first then u Isn't the first include in a .cc file usually the corresponding .h file? See for example be/src/exec/parquet/hdfs-parquet-scanner.cc. http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc File be/src/util/impala-bloom-filter-buffer-allocator.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc@19 PS23, Line 19: #include "runtime/exec-env.h" > nit: empty line Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc File be/src/util/parquet-bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc@18 PS23, Line 18: #include <cmath> : #include <unordered_set> : #include <vector> : : #include "common/names.h" : #include "testutil/gtest-util.h" : #include "util/parquet-bloom-filter.h" : > nit: include order, usually it is standard include directories first then u Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc File be/src/util/parquet-bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc@18 PS23, Line 18: #include "parquet-bloom-filter.h" : : #include <cmath> : #include <cstdint> : : #include "kudu/util/slice.h" : #include "kudu/util/status.h" : #include "util/kudu-status-util.h" : : #include "thirdparty/xxhash/xxhash.h" > nit: include order, usually it is standard include directories first then u Isn't the first include in a .cc file usually the corresponding .h file? See for example be/src/exec/parquet/hdfs-parquet-scanner.cc. http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py File tests/query_test/test_parquet_bloom_filter.py: http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py@20 PS23, Line 20: > nit: empty line Done -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 26 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 15 Apr 2021 12:42:04 +0000 Gerrit-HasComments: Yes
