[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16898 )

Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7900/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8
Gerrit-Change-Number: 16898
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Dec 2020 05:36:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores

2020-12-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16898 )

Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores
..


Patch Set 1:

I noticed the JIRA from Sahil and figured it was an easy fix to speed up table 
loading.


--
To view, visit http://gerrit.cloudera.org:8080/16898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8
Gerrit-Change-Number: 16898
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Dec 2020 05:15:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10117: Skip calls to FsPermissionCache for blob stores

2020-12-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16898


Change subject: IMPALA-10117: Skip calls to FsPermissionCache for blob stores
..

IMPALA-10117: Skip calls to FsPermissionCache for blob stores

This avoids calling precacheChildrenOf() in cases when the
cached values will never be used. This change simply skips
calling precacheChildrenOf() in the cases when getPermissions()
is never called.

There is some opportunity to clean up this permissions
checking further, but I decided to keep this fix limited
in scope.

Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 27 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/16898/1
--
To view, visit http://gerrit.cloudera.org:8080/16898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2034695a956307309f656d56aa57aa07ae5163d8
Gerrit-Change-Number: 16898
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6803/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 23 Dec 2020 05:11:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 5:

Patch set 5 is a rebase on top of IMPALA-9550 fix.
The flakiness should be gone now.


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 23 Dec 2020 01:33:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16895 )

Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..

IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch

TestResultSpoolingFetchSize.test_fetch has been flaky in
ubuntu-16.04-dockerised environment for not reaching finished state
within 10 seconds. This patch increase the timeout of the test to 30
seconds.

Testing:
- Looped the test locally.

Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Reviewed-on: http://gerrit.cloudera.org:8080/16895
Reviewed-by: Bikramjeet Vig 
Tested-by: Impala Public Jenkins 
---
M tests/query_test/test_result_spooling.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Bikramjeet Vig: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16895 )

Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 23 Dec 2020 00:54:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 40:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7899/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Dec 2020 00:30:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 40:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@757
PS40, Line 757: options.setDisable_overlap_filter(true); // Required so 
that output doesn't vary by whether parquet tables are used or not.
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@787
PS40, Line 787: options.setDisable_overlap_filter(true); // Required so 
that output doesn't vary by the format of the table used.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16720/40/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@787
PS40, Line 787: options.setDisable_overlap_filter(true); // Required so 
that output doesn't vary by the format of the table used.
line too long (118 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/40/tests/run-tests.py
File tests/run-tests.py:

http://gerrit.cloudera.org:8080/#/c/16720/40/tests/run-tests.py@219
PS40, Line 219: %
flake8: E131 continuation line unaligned for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 23 Dec 2020 00:08:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#40). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..

IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds a new class of predicates called overlap predicates
to aid in the determination of whether a Parquet row group or a page
overlap with a range computed from an equi hash join. If not, then
the entire row group or page are skipped. An overlap predicate exists
as a min/max filter.

For the following query, the min and max in such a min/max filter are
computed with the values from the join column from table 'b' and become
fully available when the entire hash table is built. To evaluate the
overlap predicate, these two values are compared against the min/max
of each row group or page at the scan node for 'a'.

  select straight_join count(*)
  from lineitem_sorted_l_shipdate a join [SHUFFLE]
   lineitem_sorted_l_shipdate b
  where a.l_shipdate = b.l_receiptdate
  and b.l_commitdate = "1992-01-31";

An overlap predicate associated with the column type J (in hash table)
and scan column type S will be formed when one of the following is true:
   Both J and S are booleans
   Both J and S are integers (tinyint, smallint, int, or bigint)
   Both J and S are approximate numeric (float or double)
   Both J and S are decimals with the same precision and scale
   Both J and S are strings (STRING, CHAR or VARCHAR)
   Both J and S are date
   Both J and S are timestamp

Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option
does not apply to min/max filters created for overlap predicates.
The overlap predicates will always be evaluated, after the min/max
conjuncts (if any).

Two new run-time profile counters are added to report the number of row
groups or pages filtered out via the overlap predicates respectively:
  1. NumRuntimeFilteredRowGroups
  2. NumRuntimeFilteredPages

Testing:
1. Unit tested on various column types with TPCH and TPCDS tables.
   Benefits were significant when the join column on the outer table
   is sorted, or when the min/max boundary values of the pages or row
   groups are monotonic;
2. Added new tests in min_max_filters.test to demonstrate the number
   of filtered out pages and row groups with the two new profile counters;
2. Added new tests in runtime-filter-propagation.test to demonstrate
   that the overlap predicates work with different column types;
4. Added data type specific overlap method tests in
   min-max-filter-test.cc;
5. Core testing.

TBD in this patch:
1. Performance measurement.

To do in follow-up JIRAs:
1. Apply the overlap predicate on partition columns;
2. Apply the overlap predicate on each row;
3. IR code-gen for various MinMaxFilter::EvalOverlap methods.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/disable-runtime-overlap-filter.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 

[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16895 )

Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6801/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 19:17:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 39:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7898/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 39
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 22 Dec 2020 19:16:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16895 )

Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 19:16:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..


Patch Set 39:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16720/39/tests/run-tests.py
File tests/run-tests.py:

http://gerrit.cloudera.org:8080/#/c/16720/39/tests/run-tests.py@219
PS39, Line 219: %
flake8: E131 continuation line unaligned for hanging indent



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 39
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 22 Dec 2020 18:55:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-22 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#39). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325: Parquet scan should use min/max statistics to 
skip pages based on equi-join predicate
..

IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds a new class of predicates called overlap predicates
to aid in the determination of whether a Parquet row group or a page
overlap with a range computed from an equi hash join. If not, then
the entire row group or page are skipped. An overlap predicate exists
as a min/max filter.

For the following query, the min and max in such a min/max filter are
computed with the values from the join column from table 'b' and become
fully available when the entire hash table is built. To evaluate the
overlap predicate, these two values are compared against the min/max
of each row group or page at the scan node for 'a'.

  select straight_join count(*)
  from lineitem_sorted_l_shipdate a join [SHUFFLE]
   lineitem_sorted_l_shipdate b
  where a.l_shipdate = b.l_receiptdate
  and b.l_commitdate = "1992-01-31";

An overlap predicate associated with the column type J (in hash table)
and scan column type S will be formed when one of the following is true:
   Both J and S are booleans
   Both J and S are integers (tinyint, smallint, int, or bigint)
   Both J and S are approximate numeric (float or double)
   Both J and S are decimals with the same precision and scale
   Both J and S are strings (STRING, CHAR or VARCHAR)
   Both J and S are date
   Both J and S are timestamp

Like any existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option
does not apply to min/max filters created for overlap predicates.
The overlap predicates will always be evaluated, after the min/max
conjuncts (if any).

Two new run-time profile counters are added to report the number of row
groups or pages filtered out via the overlap predicates respectively:
  1. NumRuntimeFilteredRowGroups
  2. NumRuntimeFilteredPages

Testing:
1. Unit tested on various column types with TPCH and TPCDS tables.
   Benefits were significant when the join column on the outer table
   is sorted, or when the min/max boundary values of the pages or row
   groups are monotonic;
2. Added new tests in min_max_filters.test to demonstrate the number
   of filtered out pages and row groups with the two new profile counters;
2. Added new tests in runtime-filter-propagation.test to demonstrate
   that the overlap predicates work with different column types;
4. Added data type specific overlap method tests in
   min-max-filter-test.cc;
5. Core testing.

TBD in this patch:
1. Performance measurement.

To do in follow-up JIRAs:
1. Apply the overlap predicate on partition columns;
2. IR code-gen for various MinMaxFilter::EvalOverlap methods.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/disable-runtime-overlap-filter.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
M tests/run-tests.py
36 files changed, 1,869 insertions(+), 252 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/39
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: 

[Impala-ASF-CR] IMPALA-9975 (part 2): Introduce new admission control daemon

2020-12-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16891 )

Change subject: IMPALA-9975 (part 2): Introduce new admission control daemon
..


Patch Set 1:

(3 comments)

Nice refactoring. I had a couple of high-level questions.

http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/runtime/exec-env.cc@289
PS1, Line 289:   if (FLAGS_is_coordinator && 
FLAGS_admission_control_service_addr.empty()) {
IIRC the executors do actually need an AdmissionController to report the memory 
usage from the pool MemTrackers. Looking at the code there's a lot of 
indirection, but basically UpdateMemTrackerStats() gets the memory usage from 
the pool mem trackers and that gets added to the topic update.

I think this is not as relevant as it once was, but it does play a role when 
we're running without memory limits or where there's significant untracked 
memory. So we want to preserve this behaviour at least in the standard config.

We could consider disabling this when the admission control daemon is in use - 
it's less relevant when we don't need to deal with distributed admission 
control.


http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/scheduling/admissiond-main.cc
File be/src/scheduling/admissiond-main.cc:

http://gerrit.cloudera.org:8080/#/c/16891/1/be/src/scheduling/admissiond-main.cc@44
PS1, Line 44:   ABORT_IF_ERROR(daemon_env.Init(/* init_jvm */ false));
Does the admission controller need to load the llama-site.xml and 
fair-scheduler.xml configs? Wondering if it might actually need a JVM for that 
purpose.


http://gerrit.cloudera.org:8080/#/c/16891/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/16891/1/bin/start-impala-cluster.py@413
PS1, Line 413: 127.0.0.1
Maybe should be INTERNAL_LISTEN_HOST. Not sure it matters though.



--
To view, visit http://gerrit.cloudera.org:8080/16891
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Gerrit-Change-Number: 16891
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 22 Dec 2020 18:32:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6434: Add support to decode RLE DICTIONARY encoded pages

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16893 )

Change subject: IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7897/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I90942022edcd5d96c720a1bde53879e50394660a
Gerrit-Change-Number: 16893
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 22 Dec 2020 17:52:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6434: Add support to decode RLE DICTIONARY encoded pages

2020-12-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16893 )

Change subject: IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages
..

IMPALA-6434: Add support to decode RLE_DICTIONARY encoded pages

This add the support to use this enum value instead of the
old PLAIN/PLAIN_DICTIONARY values.

A hidden option -use_new_parquet_dictionary_encodings is
added to turn on writing too, for test purposes only.

Testing:
* Added an automated test using a pregenerated test file.
* Ran core tests.
* Manually tested by writing out TPC-H lineitem with the new encoding
  and reading back in Impala and Hive.

Parquet-tools output for the generated test file:
$ hadoop jar 
~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta 
/test-warehouse/att/824de2afebad009f-6f460ade0003_643159826_data.0.parq
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: reading another 1 footers
20/12/21 20:28:36 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
file:
hdfs://localhost:20500/test-warehouse/att/824de2afebad009f-6f460ade0003_643159826_data.0.parq
creator: impala version 4.0.0-SNAPSHOT (build 
7b691c5d4249f0cb1ced8ddf01033fbbe10511d9)

file schema: schema

id:  OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
bool_col:OPTIONAL BOOLEAN R:0 D:1
tinyint_col: OPTIONAL INT32 L:INTEGER(8,true) R:0 D:1
smallint_col:OPTIONAL INT32 L:INTEGER(16,true) R:0 D:1
int_col: OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
bigint_col:  OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
float_col:   OPTIONAL FLOAT R:0 D:1
double_col:  OPTIONAL DOUBLE R:0 D:1
date_string_col: OPTIONAL BINARY R:0 D:1
string_col:  OPTIONAL BINARY R:0 D:1
timestamp_col:   OPTIONAL INT96 R:0 D:1
year:OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
month:   OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1

row group 1: RC:8 TS:754 OFFSET:4

id:   INT32 SNAPPY DO:4 FPO:48 SZ:74/73/0.99 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 7, num_nulls: 0]
bool_col: BOOLEAN SNAPPY DO:0 FPO:141 SZ:26/24/0.92 VC:8 ENC:RLE,PLAIN 
ST:[min: false, max: true, num_nulls: 0]
tinyint_col:  INT32 SNAPPY DO:220 FPO:243 SZ:51/47/0.92 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
smallint_col: INT32 SNAPPY DO:343 FPO:366 SZ:51/47/0.92 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
int_col:  INT32 SNAPPY DO:467 FPO:490 SZ:51/47/0.92 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 1, num_nulls: 0]
bigint_col:   INT64 SNAPPY DO:586 FPO:617 SZ:59/55/0.93 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0, max: 10, num_nulls: 0]
float_col:FLOAT SNAPPY DO:724 FPO:747 SZ:51/47/0.92 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 1.1, num_nulls: 0]
double_col:   DOUBLE SNAPPY DO:845 FPO:876 SZ:59/55/0.93 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: -0.0, max: 10.1, num_nulls: 0]
date_string_col:  BINARY SNAPPY DO:983 FPO:1028 SZ:74/88/1.19 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0x30312F30312F3039, max: 0x30342F30312F3039, 
num_nulls: 0]
string_col:   BINARY SNAPPY DO:1143 FPO:1168 SZ:53/49/0.92 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 0x30, max: 0x31, num_nulls: 0]
timestamp_col:INT96 SNAPPY DO:1261 FPO:1329 SZ:98/138/1.41 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[num_nulls: 0, min/max not defined]
year: INT32 SNAPPY DO:1451 FPO:1470 SZ:47/43/0.91 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 2009, max: 2009, num_nulls: 0]
month:INT32 SNAPPY DO:1563 FPO:1594 SZ:60/56/0.93 VC:8 
ENC:RLE,RLE_DICTIONARY ST:[min: 1, max: 4, num_nulls: 0]

Parquet-tools output for one of the lineitem files:
$ hadoop jar 
~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta 
/test-warehouse/li2/4b4d9143c575dd71-3f69d3cf0001_1879643220_data.0.parq
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: reading another 1 footers
20/12/22 09:39:56 INFO hadoop.ParquetFileReader: Initiating action with 
parallelism: 5
file:
hdfs://localhost:20500/test-warehouse/li2/4b4d9143c575dd71-3f69d3cf0001_1879643220_data.0.parq
creator: impala version 4.0.0-SNAPSHOT (build 
7b691c5d4249f0cb1ced8ddf01033fbbe10511d9)

file schema: schema

l_orderkey:  OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_partkey:   OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_suppkey:   OPTIONAL INT64 L:INTEGER(64,true) R:0 D:1
l_linenumber:OPTIONAL INT32 L:INTEGER(32,true) R:0 D:1
l_quantity:  

[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16895 )

Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7896/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 22 Dec 2020 17:20:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16895


Change subject: IMPALA-9550: Fix flakiness in 
TestResultSpoolingFetchSize.test_fetch
..

IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch

TestResultSpoolingFetchSize.test_fetch has been flaky in
ubuntu-16.04-dockerised environment for not reaching finished state
within 10 seconds. This patch increase the timeout of the test to 30
seconds.

Testing:
- Looped the test locally.

Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
---
M tests/query_test/test_result_spooling.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16895/1
--
To view, visit http://gerrit.cloudera.org:8080/16895
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5
Gerrit-Change-Number: 16895
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 4:

> Patch Set 4: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/

Hit by IMPALA-9550.
Ironically, I just close IMPALA-9550 yesterday as not reproducible. I will 
reopen the JIRA again.


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 15:46:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString

2020-12-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16884 )

Change subject: IMPALA-10374: Limit iteration at 
BufferedTupleStream::DebugString
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/


--
To view, visit http://gerrit.cloudera.org:8080/16884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d
Gerrit-Change-Number: 16884
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 22 Dec 2020 10:45:30 +
Gerrit-HasComments: No