[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7078/


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 16 Apr 2021 00:06:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@285
PS21, Line 285: set minmax_filter_threshold=0.5;
  : select straight_join count(*) from
  : lineitem_orderkey_only a join [SHUFFLE] tpch_parquet.orders b
  : where a.l_orderkey = b.o_orderkey;
It looks like this test behave differently if filter arrived on time vs arrived 
late.
The jenkins failure seemingly because of both filters arrived late.
It is curious that late filter arrival lead to less row from 
lineitem_orderkey_only scan, not more.

In my local machine, if I add query option "set 
runtime_filter_wait_time_ms=5000;", this test pass.
But without setting wait time, an AlwaysFalse minmax filter seems to be 
broadcasted. This is a section of query profile where this test failed:


  lv-desktop:27000:
Filter 1 arrival: 1s367ms
Filter 0 arrival: 1s371ms
...
  Runtime filters: Not all filters arrived (arrived: [], missing [1, 
0]), waited for 785ms. Arrival delay: 1s000ms.
...
  Filter 1 (0):
 - Files processed: 0 (0)
 - Files rejected: 0 (0)
 - Files total: 0 (0)
 - InactiveTotalTime: 0.000ns
 - RowGroups processed: 0 (0)
 - RowGroups rejected: 0 (0)
 - RowGroups total: 1 (1)
 - Rows processed: 1.81M (1810767)
 - Rows rejected: 1.81M (1810767)
 - Rows total: 2.14M (2142543)
 - Splits processed: 0 (0)
 - Splits rejected: 0 (0)
 - Splits total: 0 (0)
 - TotalTime: 0.000ns



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 21:07:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 6:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8590/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 15 Apr 2021 18:37:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17170 )

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7078/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 15 Apr 2021 18:18:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

2021-04-15 Thread Csaba Ringhofer (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17170

to look at the new patch set (#6).

Change subject: WIP IMPALA-7825: Upgrade Thrift version to 0.11.0
..

WIP IMPALA-7825: Upgrade Thrift version to 0.11.0

Before this patch Impala mainly used Thrift 0.9.3, but it was
possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0
Thrift lib was already included in toolchain.

Most of the changes are related to replacing boost:: with std::
shared_ptr-s in cpp code (this is a continuation of patch by Vihang).

The Thrift upgrade also needs an Impyla relase with Thrift 0.11.0, as
Impala's test framework relies on Impyla. A thrift_sasl release is also
needed, because it currently pins Thrift version to 0.9.3 for Python 2.

The current patch uses alpha releases from Impyla and thrift_sasl that
use thrift 0.11.0.

Testing:
- ran Impyla's test suite with Python 2 and 3

Other TODOs:
- remove preexisting extra logic needed to use 0.11.0 for python
- the thrift compilation was changed to generated template code -
  this was an easy solution to avoid a compilation issue after
  merging IMPALA-10600, but I should check its effect on compilation
  time

Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-service-client-wrapper.h
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/hs2-http-test.cc
M be/src/rpc/thrift-client.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/rpc/thrift-util.cc
M be/src/rpc/thrift-util.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/statestore/statestore-service-client-wrapper.h
M be/src/statestore/statestore-subscriber-client-wrapper.h
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/testutil/in-process-servers.h
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M be/src/transport/THttpTransport.cpp
M be/src/transport/THttpTransport.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
M be/src/util/parquet-reader.cc
M bin/impala-config.sh
M common/thrift/CMakeLists.txt
M infra/python/deps/requirements.txt
M java/pom.xml
M shell/ext-py/thrift_sasl-0.4.2/setup.py
M tests/beeswax/impala_beeswax.py
44 files changed, 194 insertions(+), 191 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/17170/6
--
To view, visit http://gerrit.cloudera.org:8080/17170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6
Gerrit-Change-Number: 17170
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8589/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Thu, 15 Apr 2021 14:46:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale` in SQL function
NDV(, ). However the use of larger NDV scale requires
the modification of SQL queries which may not be practical in certain
applications:

- Firstly, SQL writers are reluctant to lower that scale. They prone
to fill up the scale, which will make the cluster unstable. Especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin other than sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`
with the following semantics:

1. The allowed value is in the range [1..10];
2. Previously, the scale used in NDV() functions was fixed at 2.
Now the scale is provided by the newly added query options.
3. It does not influence the NDV scale for SQL function
NDV(, ) in which the NDV scale is provided by the 2nd
argument .

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can
substitute `count(distinct )` with `ndv(, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 314 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/16
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 16
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8588/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 15 Apr 2021 13:18:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

2021-04-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
..


Patch Set 15: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@181
PS15, Line 181:
nit: redudant space


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@184
PS15, Line 184: should
nit: "callers should"


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@188
PS15, Line 188:* expStr: origin expr
  :* rules: list of rewrite rules
  :* expectedExprStr: expected expr
  :* return: Expr
nit: our code style use param comments like these:

  /**
   * 
   *
   * @param exprStr: original expr
   * @param rules: list of rewrite rules
   * @param expectedExprStr: expected expr
   * @return the rewritten Expr
   */


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@794
PS15, Line 794: session.options().setAppx_count_distinct(true);
  : RewritesOk("count(distinct bool_col)", rules, 
"ndv(bool_col)");
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRule() throws ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : session.options().setDefault_ndv_scale(10);
  : RewritesOk("ndv(bool_col)", rules, "ndv(bool_col, 10)");
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRuleNotSet() throws 
ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : RewritesOk("ndv(bool_col)", rules, null);
  :   }
  :
  :   @Test
  :   public void testDefaultNdvScaleRuleSetDefault() throws 
ImpalaException {
  : List rules = Lists.newArrayList(
  : 
org.apache.impala.rewrite.DefaultNdvScaleRule.INSTANCE
  : );
  : session.options().setDefault_ndv_scale(2);
  : RewritesOk("ndv(bool_col)", rules, null);
nit: Just like other tests in this file, you can merge these into one test 
method since they share the same rule.


http://gerrit.cloudera.org:8080/#/c/17306/15/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@822
PS15, Line 822:
nit: redundant blank line



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 15
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: fifteencai 
Gerrit-Comment-Date: Thu, 15 Apr 2021 13:08:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8587/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 26
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 15 Apr 2021 13:01:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..

WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

This change adds support for writing Parquet Bloom filters for the types
for which read support was added in IMPALA-10640.

Writing of Parquet Bloom filters can be controlled by the
'parquet_bloom_filter_write' query option which has the following
possible values:
  NEVER  - never write Parquet Bloom filters
  TBL_PROPS  - write Parquet Bloom filters as set in table properties
  IF_NO_DICT - write Parquet Bloom filters if the row group is not
   fully dictionary encoded
  ALWAYS - always write Parquet Bloom filters, even if the row
   group is fully dictionary encoded

TODO: Implement table properties involving Parquet Bloom filters.

TODO: Decide size of Parquet Bloom filter based on NDV heuristics or
configuration.

Testing:
  TODO

Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-bloom-filter-util.cc
M be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/dict-encoding.h
M be/src/util/parquet-bloom-filter.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
12 files changed, 321 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/4
--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 26:

(183 comments)

http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@70
PS26, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@92
PS26, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@113
PS26, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@243
PS26, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@253
PS26, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@254
PS26, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@270
PS26, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@429
PS26, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@441
PS26, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@476
PS26, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@628
PS26, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@631
PS26, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@700
PS26, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@724
PS26, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@743
PS26, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@756
PS26, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@766
PS26, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@768
PS26, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* 
statePtr, const void* input, size_t length);
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@791
PS26, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/26/be/src/thirdparty/xxhash/xxhash.h@792
PS26, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const 
void* data, size_t len, const void* secret, size_t secretSize);
line too long (122 > 90)


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-15 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 26:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7
PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common 
types
> I see. Is there a plan to extend it to non top level fields? Btw, does Parq
No, I don't think we are planning to add filtering for complex types.
Parquet MR only cares about leaf values, not complex values and also it is not 
clear how we could interpret operations (<>=) on these complex types.
Concerning non-top-level values inside complex types, I'm not sure whether it 
is possible in Parquet-MR. I don't think we should do it now, we could have a 
look at it if it turns out to be important.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887
PS24, Line 887:
> Yeah, adding a test is never a bad idea and it shouldn't be hard to add suc
Done.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483
PS24, Line 1483:
> Oh, I see. Thanks for taking a look.
Done


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716
PS24, Line 1716:   
bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size()));
> I think it's OK as it is. Based on the column idx and query text I think we
Done


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37
PS24, Line 37: ', to the given Pa
> I see, thanks for extending the comment.
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.h@17
PS23, Line 17: #pragma once
> nit: missing include guard
Added #pragma once.


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc
File be/src/exec/parquet/parquet-bloom-filter-util.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/exec/parquet/parquet-bloom-filter-util.cc@18
PS23, Line 18: #include "exec/parquet/parquet-bloom-filter-util.h"
 :
 : #include 
 :
 : #include "exprs/scalar-expr-evaluator.h"
 : #include "util/parquet-bloom-filter.h"
> nit: include order, usually it is standard include directories first then u
Isn't the first include in a .cc file usually the corresponding .h file? See 
for example be/src/exec/parquet/hdfs-parquet-scanner.cc.


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc
File be/src/util/impala-bloom-filter-buffer-allocator.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/impala-bloom-filter-buffer-allocator.cc@19
PS23, Line 19: #include "runtime/exec-env.h"
> nit: empty line
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc
File be/src/util/parquet-bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter-test.cc@18
PS23, Line 18: #include 
 : #include 
 : #include 
 :
 : #include "common/names.h"
 : #include "testutil/gtest-util.h"
 : #include "util/parquet-bloom-filter.h"
 :
> nit: include order, usually it is standard include directories first then u
Done


http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc
File be/src/util/parquet-bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/17026/23/be/src/util/parquet-bloom-filter.cc@18
PS23, Line 18: #include "parquet-bloom-filter.h"
 :
 : #include 
 : #include 
 :
 : #include "kudu/util/slice.h"
 : #include "kudu/util/status.h"
 : #include "util/kudu-status-util.h"
 :
 : #include "thirdparty/xxhash/xxhash.h"
> nit: include order, usually it is standard include directories first then u
Isn't the first include in a .cc file usually the corresponding .h file? See 
for example be/src/exec/parquet/hdfs-parquet-scanner.cc.


http://gerrit.cloudera.org:8080/#/c/17026/23/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-15 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..

IMPALA-10640: Support reading Parquet Bloom filters - most common types

This change adds read support for Parquet Bloom filters for types that
can reasonably be supported in Impala. Other types, such as CHAR(N),
would be very difficult to support because the length may be different
in Parquet and in Impala which results in truncation or padding, and
that changes the hash which makes using the Bloom filter impossible.
Write support will be added in a later change.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

The following types are not supported for the given reasons:

 
|Impala type |  Problem  |
||
|VARCHAR(N)  | truncation can change hash|
|CHAR(N) | padding / truncation can change hash  |
|DECIMAL | multiple encodings supported  |
|TIMESTAMP   | multiple encodings supported, timezone conversion |
|DATE| not considered yet|
 

Support may be added for these types later, see IMPALA-10641.

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests
whether Parquet Bloom filtering works for the supported types and
that we do not incorrectly discard row groups for the unsupported
type VARCHAR. The Parquet file used in the test was generated with
an external tool.
  - Added unit tests for ParquetBloomFilter in file
be/src/util/parquet-bloom-filter-test.cc
  - A minor, unrelated change was done in
be/src/util/bloom-filter-test.cc: the MakeRandom() function had
return type uint64_t, the documentation claimed it returned a 64 bit
random number, but the actual number of random bits is 32, which is
what is intended in the tests. The return type and documentation
have been corrected to use 32 bits.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A be/src/exec/parquet/parquet-bloom-filter-util.cc
A be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/kudu/util/block_bloom_filter.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/kudu/util/block_bloom_filter_avx2.cc
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter-test.cc
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/jenkins/critique-gerrit-review.py
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
39 files changed, 7,264 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/26
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 26
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 15 Apr 2021 09:44:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..

IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky

This change disables the overlap min/max filter test for hdfs in
erasure coding, due to the query plan change (from 3-node scan to
2-node scan) which splits the row groups among scan nodes differently.

The SkipIfEC class in test harness skip.py is enhanced with a new
skip reason 'different_scan_split' to facilitate this action.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Reviewed-on: http://gerrit.cloudera.org:8080/17289
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/common/skip.py
M tests/query_test/test_runtime_filters.py
2 files changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10645: Log catalogd HMS API metrics

2021-04-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17284 )

Change subject: IMPALA-10645: Log catalogd HMS API metrics
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7076/


--
To view, visit http://gerrit.cloudera.org:8080/17284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id41afe89bbe3395c158919bddd09f302c6752287
Gerrit-Change-Number: 17284
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Apr 2021 06:37:48 +
Gerrit-HasComments: No