[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7078/ -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 16 Apr 2021 00:06:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@285 PS21, Line 285: set minmax_filter_threshold=0.5; : select straight_join count(*) from : lineitem_orderkey_only a join [SHUFFLE] tpch_parquet.orders b : where a.l_orderkey = b.o_orderkey; It looks like this test behave differently if filter arrived on time vs arrived late. The jenkins failure seemingly because of both filters arrived late. It is curious that late filter arrival lead to less row from lineitem_orderkey_only scan, not more. In my local machine, if I add query option "set runtime_filter_wait_time_ms=5000;", this test pass. But without setting wait time, an AlwaysFalse minmax filter seems to be broadcasted. This is a section of query profile where this test failed: lv-desktop:27000: Filter 1 arrival: 1s367ms Filter 0 arrival: 1s371ms ... Runtime filters: Not all filters arrived (arrived: [], missing [1, 0]), waited for 785ms. Arrival delay: 1s000ms. ... Filter 1 (0): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - InactiveTotalTime: 0.000ns - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 1 (1) - Rows processed: 1.81M (1810767) - Rows rejected: 1.81M (1810767) - Rows total: 2.14M (2142543) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) - TotalTime: 0.000ns -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 21:07:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 6: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8590/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 15 Apr 2021 18:37:38 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17170 ) Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7078/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 15 Apr 2021 18:18:08 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17170 to look at the new patch set (#6). Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 .. WIP IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Vihang). The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Testing: - ran Impyla's test suite with Python 2 and 3 Other TODOs: - remove preexisting extra logic needed to use 0.11.0 for python - the thrift compilation was changed to generated template code - this was an easy solution to avoid a compilation issue after merging IMPALA-10600, but I should check its effect on compilation time Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 --- M be/src/benchmarks/network-perf-benchmark.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalogd-main.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/hs2-http-test.cc M be/src/rpc/thrift-client.h M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/rpc/thrift-util.cc M be/src/rpc/thrift-util.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/testutil/in-process-servers.h M be/src/transport/THttpServer.cpp M be/src/transport/THttpServer.h M be/src/transport/THttpTransport.cpp M be/src/transport/THttpTransport.h M be/src/transport/TSaslClientTransport.cpp M be/src/transport/TSaslClientTransport.h M be/src/transport/TSaslServerTransport.cpp M be/src/transport/TSaslServerTransport.h M be/src/transport/TSaslTransport.cpp M be/src/transport/TSaslTransport.h M be/src/util/parquet-reader.cc M bin/impala-config.sh M common/thrift/CMakeLists.txt M infra/python/deps/requirements.txt M java/pom.xml M shell/ext-py/thrift_sasl-0.4.2/setup.py M tests/beeswax/impala_beeswax.py 44 files changed, 194 insertions(+), 191 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/6 -- To view, visit http://gerrit.cloudera.org:8080/17170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Gerrit-Change-Number: 17170 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8589/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 16 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Thu, 15 Apr 2021 14:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. IMPALA-10445: Adjust NDV's scale with query option This is a new way to control NDV's scale. Since IMPALA-2658, we can trade memory for more accurate estimation by setting larger `scale` in SQL function NDV(, ). However the use of larger NDV scale requires the modification of SQL queries which may not be practical in certain applications: - Firstly, SQL writers are reluctant to lower that scale. They prone to fill up the scale, which will make the cluster unstable. Especially when there are `group by`s with high cardinalities. So it is wiser to let cluster admin other than sql writer choose appropriate scale. - Secondly, In some application scenarios, queries are stored in DBs. In a BI system, for example, rewriting thousands of SQLs is risky. In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE` with the following semantics: 1. The allowed value is in the range [1..10]; 2. Previously, the scale used in NDV() functions was fixed at 2. Now the scale is provided by the newly added query options. 3. It does not influence the NDV scale for SQL function NDV(, ) in which the NDV scale is provided by the 2nd argument . We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT can work with this query option. After this, cluster admins can substitute `count(distinct )` with `ndv(, scale)`. Implementation details: - The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default ndv behavior. - We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to `ExprRewriter`, making it compatible with further rewrite rules. - The newly added rewrite rule `DefaultNdvScaleRule` is applied after `CountDistinctToNdvRule`. Usage: To set a default ndv scale: ``` SET DEFAULT_NDV_SCALE = 10; ``` To unset: ``` SET DEFAULT_NDV_SCALE = 2; ``` Here are test results of a typical workload (cardinality=40,090,650): ++ | Metric| Count Distinct |NDV2|NDV5|NDV10 | ++ | Memory(GB) | 3.83 |1.84|1.85| 1.89 | | Duration(s) | 182.89| 30.22|29.72 | 29.24 | | ErrorRate |0% |1.8%|1.17% | 0.06% | ++ Testing: 1) Added 3 unit test cases in `ExprRewriteRulesTest`. 2) Added 5 unit test cases in `ExprRewriterTest`. 3) Ran all front-end unit test, passed. 4) Added a new query-option test. Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java 11 files changed, 314 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/16 -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 16 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8588/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 15 Apr 2021 13:18:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17306 ) Change subject: IMPALA-10445: Adjust NDV's scale with query option .. Patch Set 15: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181 PS15, Line 181: nit: redudant space http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@184 PS15, Line 184: should nit: "callers should" http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@188 PS15, Line 188:* expStr: origin expr :* rules: list of rewrite rules :* expectedExprStr: expected expr :* return: Expr nit: our code style use param comments like these: /** * * * @param exprStr: original expr * @param rules: list of rewrite rules * @param expectedExprStr: expected expr * @return the rewritten Expr */ http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@794 PS15, Line 794: session.options().setAppx_count_distinct(true); : RewritesOk("count(distinct bool_col)", rules, "ndv(bool_col)"); : } : : @Test : public void testDefaultNdvScaleRule() throws ImpalaException { : List rules = Lists.newArrayList( : org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE : ); : session.options().setDefault_ndv_scale(10); : RewritesOk("ndv(bool_col)", rules, "ndv(bool_col, 10)"); : } : : @Test : public void testDefaultNdvScaleRuleNotSet() throws ImpalaException { : List rules = Lists.newArrayList( : org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE : ); : RewritesOk("ndv(bool_col)", rules, null); : } : : @Test : public void testDefaultNdvScaleRuleSetDefault() throws ImpalaException { : List rules = Lists.newArrayList( : org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE : ); : session.options().setDefault_ndv_scale(2); : RewritesOk("ndv(bool_col)", rules, null); nit: Just like other tests in this file, you can merge these into one test method since they share the same rule. http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@822 PS15, Line 822: nit: redundant blank line -- To view, visit http://gerrit.cloudera.org:8080/17306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50 Gerrit-Change-Number: 17306 Gerrit-PatchSet: 15 Gerrit-Owner: fifteencai Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: fifteencai Gerrit-Comment-Date: Thu, 15 Apr 2021 13:08:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8587/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 26 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 15 Apr 2021 13:01:04 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17262 ) Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types .. WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types This change adds support for writing Parquet Bloom filters for the types for which read support was added in IMPALA-10640. Writing of Parquet Bloom filters can be controlled by the 'parquet_bloom_filter_write' query option which has the following possible values: NEVER - never write Parquet Bloom filters TBL_PROPS - write Parquet Bloom filters as set in table properties IF_NO_DICT - write Parquet Bloom filters if the row group is not fully dictionary encoded ALWAYS - always write Parquet Bloom filters, even if the row group is fully dictionary encoded TODO: Implement table properties involving Parquet Bloom filters. TODO: Decide size of Parquet Bloom filter based on NDV heuristics or configuration. Testing: TODO Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/hdfs-parquet-table-writer.h M be/src/exec/parquet/parquet-bloom-filter-util.cc M be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M be/src/util/dict-encoding.h M be/src/util/parquet-bloom-filter.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift 12 files changed, 321 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/4 -- To view, visit http://gerrit.cloudera.org:8080/17262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792 Gerrit-Change-Number: 17262 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 26: (183 comments) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h File be/src/thirdparty/xxhash/xxhash.h: http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@70 PS26, Line 70: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735 line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@92 PS26, Line 92: * https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@113 PS26, Line 113: # elif defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@243 PS26, Line 243: # define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_64bits_reset_withSecret) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@253 PS26, Line 253: # define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSeed) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@254 PS26, Line 254: # define XXH3_128bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@270 PS26, Line 270: #define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@429 PS26, Line 429: * @param statePtr A pointer to an @ref XXH32_state_t allocated with @ref XXH32_createState(). line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@441 PS26, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, const XXH32_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@476 PS26, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@628 PS26, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@631 PS26, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@700 PS26, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@724 PS26, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (120 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@743 PS26, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, const XXH3_state_t* src_state); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@756 PS26, Line 756: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@766 PS26, Line 766: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t secretSize); line too long (121 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@768 PS26, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* statePtr, const void* input, size_t length); line too long (107 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@791 PS26, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@792 PS26, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (122 > 90)
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. Patch Set 26: (11 comments) http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7 PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common types > I see. Is there a plan to extend it to non top level fields? Btw, does Parq No, I don't think we are planning to add filtering for complex types. Parquet MR only cares about leaf values, not complex values and also it is not clear how we could interpret operations (<>=) on these complex types. Concerning non-top-level values inside complex types, I'm not sure whether it is possible in Parquet-MR. I don't think we should do it now, we could have a look at it if it turns out to be important. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887 PS24, Line 887: > Yeah, adding a test is never a bad idea and it shouldn't be hard to add suc Done. http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483 PS24, Line 1483: > Oh, I see. Thanks for taking a look. Done http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716 PS24, Line 1716: bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size())); > I think it's OK as it is. Based on the column idx and query text I think we Done http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37 PS24, Line 37: ', to the given Pa > I see, thanks for extending the comment. Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h File be/src/exec/parquet/parquet-bloom-filter-util.h: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h@17 PS23, Line 17: #pragma once > nit: missing include guard Added #pragma once. http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc File be/src/exec/parquet/parquet-bloom-filter-util.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc@18 PS23, Line 18: #include "exec/parquet/parquet-bloom-filter-util.h" : : #include : : #include "exprs/scalar-expr-evaluator.h" : #include "util/parquet-bloom-filter.h" > nit: include order, usually it is standard include directories first then u Isn't the first include in a .cc file usually the corresponding .h file? See for example be/src/exec/parquet/hdfs-parquet-scanner.cc. http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc File be/src/util/impala-bloom-filter-buffer-allocator.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc@19 PS23, Line 19: #include "runtime/exec-env.h" > nit: empty line Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc File be/src/util/parquet-bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc@18 PS23, Line 18: #include : #include : #include : : #include "common/names.h" : #include "testutil/gtest-util.h" : #include "util/parquet-bloom-filter.h" : > nit: include order, usually it is standard include directories first then u Done http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc File be/src/util/parquet-bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc@18 PS23, Line 18: #include "parquet-bloom-filter.h" : : #include : #include : : #include "kudu/util/slice.h" : #include "kudu/util/status.h" : #include "util/kudu-status-util.h" : : #include "thirdparty/xxhash/xxhash.h" > nit: include order, usually it is standard include directories first then u Isn't the first include in a .cc file usually the corresponding .h file? See for example be/src/exec/parquet/hdfs-parquet-scanner.cc. http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py File tests/query_test/test_parquet_bloom_filter.py:
[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types
Daniel Becker has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most common types .. IMPALA-10640: Support reading Parquet Bloom filters - most common types This change adds read support for Parquet Bloom filters for types that can reasonably be supported in Impala. Other types, such as CHAR(N), would be very difficult to support because the length may be different in Parquet and in Impala which results in truncation or padding, and that changes the hash which makes using the Bloom filter impossible. Write support will be added in a later change. The supported Parquet type - Impala type pairs are the following: --- |Parquet type | Impala type| |---| |INT32| TINYINT, SMALLINT, INT | |INT64| BIGINT | |FLOAT| FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --- The following types are not supported for the given reasons: |Impala type | Problem | || |VARCHAR(N) | truncation can change hash| |CHAR(N) | padding / truncation can change hash | |DECIMAL | multiple encodings supported | |TIMESTAMP | multiple encodings supported, timezone conversion | |DATE| not considered yet| Support may be added for these types later, see IMPALA-10641. If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests whether Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. The Parquet file used in the test was generated with an external tool. - Added unit tests for ParquetBloomFilter in file be/src/util/parquet-bloom-filter-test.cc - A minor, unrelated change was done in be/src/util/bloom-filter-test.cc: the MakeRandom() function had return type uint64_t, the documentation claimed it returned a 64 bit random number, but the actual number of random bits is 32, which is what is intended in the tests. The return type and documentation have been corrected to use 32 bits. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h A be/src/exec/parquet/parquet-bloom-filter-util.cc A be/src/exec/parquet/parquet-bloom-filter-util.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/kudu/util/block_bloom_filter.cc M be/src/kudu/util/block_bloom_filter.h M be/src/kudu/util/block_bloom_filter_avx2.cc M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/service/query-options.cc M be/src/service/query-options.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter-test.cc A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M bin/jenkins/critique-gerrit-review.py M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M common/thrift/parquet.thrift M testdata/data/README A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 39 files changed, 7,264 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/26 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 26 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 6 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 09:44:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17289 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky This change disables the overlap min/max filter test for hdfs in erasure coding, due to the query plan change (from 3-node scan to 2-node scan) which splits the row groups among scan nodes differently. The SkipIfEC class in test harness skip.py is enhanced with a new skip reason 'different_scan_split' to facilitate this action. Testing: 1. Ran unit tests; 2. Ran core tests. Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Reviewed-on: http://gerrit.cloudera.org:8080/17289 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/common/skip.py M tests/query_test/test_runtime_filters.py 2 files changed, 4 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f Gerrit-Change-Number: 17289 Gerrit-PatchSet: 7 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17284 ) Change subject: IMPALA-10645: Log catalogd HMS API metrics .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7076/ -- To view, visit http://gerrit.cloudera.org:8080/17284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287 Gerrit-Change-Number: 17284 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 15 Apr 2021 06:37:48 + Gerrit-HasComments: No