[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently, Impala creates a plan first and looks for runtime filters
based on the complete plan. This means the cardinality estimate in the
query plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower than the
cardinality estimate due to the existence of runtime filters.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

The reduced cardinality is stored in new fields 'filteredCardinality_'
and 'filteredInputCardinality_', separate from existing fields
'cardinality_' and 'inputCardinality_'. Future work should merge the new
cardinality fields with the old cardinality fields after we can validate
that the cardinality reduction does not regress memory estimation.

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
toward ProcessingCost reduction, lower scan fragment parallelism, lower
CpuAsk, and increase the chance of query assignment to the smaller
executor group set. Other execution modes will see no change in their
execution parallelism or memory estimates.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Ran full TPC-DS 3TB benchmark and see no regression due to
  query plan change.
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Reviewed-on: http://gerrit.cloudera.org:8080/20498
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 20: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 19 Dec 2023 04:27:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 18 Dec 2023 23:43:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#19).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently, Impala creates a plan first and looks for runtime filters
based on the complete plan. This means the cardinality estimate in the
query plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower than the
cardinality estimate due to the existence of runtime filters.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

The reduced cardinality is stored in new fields 'filteredCardinality_'
and 'filteredInputCardinality_', separate from existing fields
'cardinality_' and 'inputCardinality_'. Future work should merge the new
cardinality fields with the old cardinality fields after we can validate
that the cardinality reduction does not regress memory estimation.

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
toward ProcessingCost reduction, lower scan fragment parallelism, lower
CpuAsk, and increase the chance of query assignment to the smaller
executor group set. Other execution modes will see no change in their
execution parallelism or memory estimates.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Ran full TPC-DS 3TB benchmark and see no regression due to
  query plan change.
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10073/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 18 Dec 2023 23:46:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 18 Dec 2023 23:46:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14755/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 16 Dec 2023 04:49:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 18:

ps18 is rebase to resolve merge conflict.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 16 Dec 2023 04:28:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#18).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently, Impala creates a plan first and looks for runtime filters
based on the complete plan. This means the cardinality estimate in the
query plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower than the
cardinality estimate due to the existence of runtime filters.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

The reduced cardinality is stored in new fields 'filteredCardinality_'
and 'filteredInputCardinality_', separate from existing fields
'cardinality_' and 'inputCardinality_'. Future work should merge the new
cardinality fields with the old cardinality fields after we can validate
that the cardinality reduction does not regress memory estimation.

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
toward ProcessingCost reduction, lower scan fragment parallelism, lower
CpuAsk, and increase the chance of query assignment to the smaller
executor group set. Other execution modes will see no change in their
execution parallelism or memory estimates.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14744/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 15 Dec 2023 18:36:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG@12
PS16, Line 12: than
> nit: than
Done


http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@306
PS16, Line 306: getCardinality()
> I found a potential issue in resource estimation of EXCHANGE node.
ps17 store the reduced cardinality into separate filed instead of replacing 
cardinality_ and inputCardinality_ filed.
Therefore, memory estimation remain the same before and after patch, even if 
this feature is enabled by default.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 15 Dec 2023 18:14:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-15 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#17).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently, Impala creates a plan first and looks for runtime filters
based on the complete plan. This means the cardinality estimate in the
query plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower than the
cardinality estimate due to the existence of runtime filters.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

The reduced cardinality is stored in new fields 'filteredCardinality_'
and 'filteredInputCardinality_', separate from existing fields
'cardinality_' and 'inputCardinality_'. Future work should merge the new
cardinality fields with the old cardinality fields after we can validate
that the cardinality reduction does not regress memory estimation.

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
toward ProcessingCost reduction, lower scan fragment parallelism, lower
CpuAsk, and increase the chance of query assignment to the smaller
executor group set. Other execution modes will see no change in their
execution parallelism or memory estimates.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/16//COMMIT_MSG@12
PS16, Line 12: that
nit: than


http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/16/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@306
PS16, Line 306: getCardinality()
I found a potential issue in resource estimation of EXCHANGE node.
If getCardinality() is abnormally low here (and in 
estimateDeferredRPCQueueSize()) after runtime filter reduction, 
estimatedTotalQueueByteSize may be severely underestimate.

SCAN and JOIN node is not impacted.
SCAN node estimate is based on scan range count before runtime filter reduction.
JOIN node estimate use cardinality, but of the build side.
This patch operate on the probe pipeline, so no impact there for JOIN memory 
estimate.

I think this patch should be modified to focus on using reduced cardinality for 
ProcessingCost only. Resource estimation should keep using original cardinality 
before runtime filter so that memory estimate stay conservative.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 14 Dec 2023 23:14:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14651/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 01:17:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 16:

(1 comment)

ps16 is a rebase.

http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java@100
PS15, Line 100: consume
> Nit: comsumes
Done



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 12 Dec 2023 00:51:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-11 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#16).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution modes will see no change in their execution parallelism,
but might see lower resource estimate.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-11 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 15: Code-Review+1

(1 comment)

Just a nit, otherwise LGTM.

http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/15/fe/src/main/java/org/apache/impala/planner/ScanNode.java@100
PS15, Line 100: consume
Nit: comsumes



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 11 Dec 2023 14:28:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14605/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 06 Dec 2023 20:58:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-06 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2003
PS14, Line 2003: estScanRangeAfterRuntimeFilter(), 
getEffectiveNumScanRanges()));
> Spelling this out to number of scan range might be better. That way, user c
Done


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@812
PS14, Line 812: its
> Nit: its.
Done


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@569
PS14, Line 569: long scanCardinalityA
> Learning from IMPALA-12510, think this should be capped to estimate that at
Done


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578
PS14, Line 578:
> This should be ceil.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 06 Dec 2023 20:32:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-06 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#15).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution modes will see no change in their execution parallelism,
but might see lower resource estimate.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-06 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java@498
PS14, Line 498: planCtx
> Is there a reason you pass a PlannerContext instead of the reduction scale
I'd like to follow precedent set by Planner.computeProcessingCost() and 
Planner.computeResourceReqs().
They all have PlannerContext as param and unpack query options that they need 
inside the method.
Reading Frontend.createExecRequest() is also cleaner this way.


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@569
PS14, Line 569: scanRangeSelectivity_
Learning from IMPALA-12510, think this should be capped to estimate that at 
least 1 scan range read is possible after filtering.


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578
PS14, Line 578: Math.round
This should be ceil.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 06 Dec 2023 19:52:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-06 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315
PS12, Line 315:   TQueryOptions options;
> The MAKE_OPTIONDEF macro error when I move options later.
Ok, then MAKE_OPTIONDEF somehow uses 'options'.


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/Planner.java@498
PS14, Line 498: planCtx
Is there a reason you pass a PlannerContext instead of the reduction scale as 
before? If only the reduction scale is needed, I think it's cleaner to only 
pass that.


http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@812
PS14, Line 812: it's
Nit: its.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 06 Dec 2023 13:58:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-12-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2003
PS14, Line 2003: String.format(" scan-range-selectivity=%.3f", 
scanRangeSelectivity_)
Spelling this out to number of scan range might be better. That way, user can 
directly compare this against 'ScanRangesComplete' counter in profile.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 04 Dec 2023 18:32:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14523/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Nov 2023 23:32:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315
PS12, Line 315:   TQueryOptions options;
> Done
The MAKE_OPTIONDEF macro error when I move options later.
ps14 move options back to beginning of test.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Nov 2023 23:07:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#14).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution modes will see no change in their execution parallelism,
but might see lower resource estimate.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 13:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14522/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Nov 2023 23:05:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 13:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31
PS12, Line 31: mode
> Nit: modes.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35
PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE
> Could you elaborate on how it can be used (what values are valid, what they
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315
PS12, Line 315:   // List of pairs of Key and boolean flag on whether the 
option is inclusive of 0 and 1.
> I'd put it right before the loop.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316
PS12, Line 316:   pair, bool> case_set[]{
> This is not used.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170
PS12, Line 1170: ;
> No need for the 'float' literal, especially as it is a double.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837
PS12, Line 837: exceed
> Nit: exceeds.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849
PS12, Line 849: Default
> Nit: Defaults.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474
PS12, Line 474: only applies at HDFS columnar file
> Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also
Yes. Mentioned Kudu in comment.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485
PS12, Line 485: reduc
> Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lowe
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486
PS12, Line 486: at
> Nit: is?
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS12, Line 488: below
> It can be a bit confusing that this join node is at the bottom of the stack
Added "in node tree".


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS12, Line 514: this, reducedCardinality, partitionSelectivities);
> I think that it would be more readable if the the body of the loop would be
Added RuntimeFilter.reducedCardinalityForScanNode().


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531
PS12, Line 531: dinality, scaledPartSel);
  :   }
> I am not sure if this is correct - all max selectivities per column will be
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561
PS12, Line 561:
> Can it be multiple join nodes or just one?
There can be multiple join nodes in nodeStack and they connect with each other 
at probe pipeline.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565
PS12, Line 565:
> Could you also mention 'reductionScale' here?
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578
PS12, Line 578: dinality_ * scanRangeSelectivity_
> Could extract this as 'currentCardinality'.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591
PS12, Line 591: // with the least
> Shouldn't it be 'partitionSelectivity'? Or is it the same thing?
It is the same thing. I name it scanRangeSelectivity_ because from ScanNode 
perspective, it will be applied to reduce the estimated number of scan range 
that survive file-level filter and actually being read.

Scan range might be 1 range per file or  multiple range per file, depending on 
target file system.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609
PS12, Line 609:
> Why do we leave out the last one?
Clarified in the new comment below this.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-27 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#13).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution modes will see no change in their execution parallelism,
but might see lower resource estimate.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE, a range of [0.0..1.0] that
controls the cardinality reduction scale from runtime filter analysis to
help with benchmarking and disabling cardinality reduction if needed (by
setting to 0.0). Default to 1.0.

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-21 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 12:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31
PS12, Line 31: mode
Nit: modes.


http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35
PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE
Could you elaborate on how it can be used (what values are valid, what they 
mean) and what the default value is?


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315
PS12, Line 315:   TQueryOptions options;
I'd put it right before the loop.


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316
PS12, Line 316:   QueryConstants qc;
This is not used.


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170
PS12, Line 1170: f
No need for the 'float' literal, especially as it is a double.


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837
PS12, Line 837: exceed
Nit: exceeds.


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849
PS12, Line 849: Default
Nit: Defaults.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474
PS12, Line 474: only applies at HDFS columnar file
Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also 
included, isn't it?


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485
PS12, Line 485: lower
Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lower 
estimate", i.e. a lower bound.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486
PS12, Line 486: in
Nit: is?


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS12, Line 488: below
It can be a bit confusing that this join node is at the bottom of the stack but 
there are nodes below it. I know other nodes are below it in the node tree, not 
in the stack, but this could be made explicit.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561
PS12, Line 561: join nodes
Can it be multiple join nodes or just one?


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565
PS12, Line 565:* the simplest join cardinality formula from 
JoinNode.computeGenericJoinCardinality().
Could you also mention 'reductionScale' here?


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578
PS12, Line 578: nodeStack.get(i).getCardinality()
Could extract this as 'currentCardinality'.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591
PS12, Line 591: scanRangeSelectivity_
Shouldn't it be 'partitionSelectivity'? Or is it the same thing?


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609
PS12, Line 609: > 1
Why do we leave out the last one?



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 21 Nov 2023 10:18:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS12, Line 514: long buildKeyNdv = filter.getNdvEstimate();
I think that it would be more readable if the the body of the loop would be a 
separate functions - if I understand correctly it has a pretty clear role of 
counting a cardinality per filter and updating the partitionSelectivities. It 
may also make sense to move the function to RuntimeFilter, e.g. 
CalculateSelectivityForScanNode().


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531
PS12, Line 531: If column name is unknown, compare the selectivity
  :   // against filters from other unknown column (colName 
== "").
I am not sure if this is correct - all max selectivities per column will be 
multiplied later, and it is possible that the "unknown" selectivity is related 
to a column already in the list, just wrapped in some expression. It may be 
safer to ignore these partition filters for now.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@11
PS12, Line 11: ss_sold_date_sk = d_date_sk
Can you also add a test when there is an expression in the key to test relevant 
path in the code? Also, a similar test could be added with a partition key.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@91
PS12, Line 91: partitions=1824/182
Is it possible to see the effect of runtime filters on partition selectivity in 
some test?


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@299
PS12, Line 299: left
Can you also add a test for semi left join?
Also, it seem strange that left anti join is supported as it is not included in 
isSelectiveAndReducingJoin.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 20 Nov 2023 18:13:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14474/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 Nov 2023 20:29:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 12:

(4 comments)

patch set 12 change the workload of testRuntimeFilterCardinalityReduction to 
tpcds_parquet to include row-level runtime filter for cardinality reduction.

http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@26
PS11, Line 26: ss_sold_date_sk = d_date_sk
> Can you add a test with more than 1 equi join predicates?
Add test that add extra predicate (sr_returned_date_sk = d_date_sk predicate).


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@40
PS11, Line 40: |--01:SCAN HDFS [tpcds_parquet.store_returns]
 : | HDFS partitions=1/1 files=1 size=15.43MB
 : | row-size=16B cardinality=287.51K
 : |
> Would the planner also reduce build side scan node cardinality if there was
Add test that also join against time_dim for this.
Build side should see no cardinality reduction if it is scan node only.


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@46
PS11, Line 46:  RF001 -> ss_sold_date_sk
> Can you add an example with more than 1 runtime filters consumed by the sca
In the new test that also join against time_dim, cardinality estimate reduced 
further to 61.


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@48
PS11, Line 48:
> Are the parallel/distributed plans useful in the tests? At the first glance
Removed DISTRIBUTEDPLAN and PARALLELPLANS.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 Nov 2023 20:07:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-17 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#12).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism,
but might see lower resource estimate.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025
PS7, Line 1025: isLeftOuterJoin
> Left outer join may still eligible to be included in nodeStack if it is sel
Another join type: shouldn't this be applicable to semi join?


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS7, Line 488:  the least output cardinality
> Added testRuntimeFilterCardinalityReduction.
Thanks, it is much more understandable for me now!


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@26
PS11, Line 26: ss_sold_date_sk = d_date_sk
Can you add a test with more than 1 equi join predicates?


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@40
PS11, Line 40: |--01:SCAN HDFS [tpcds.store_returns]
 : | HDFS partitions=1/1 files=1 size=31.19MB
 : | row-size=16B cardinality=287.51K
 : |
Would the planner also reduce build side scan node cardinality if there was a 
bloom filter consumed there? Can you add a test for this?


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@46
PS11, Line 46:  RF000 -> ss_sold_date_sk
Can you add an example with more than 1 runtime filters consumed by the scanner?


http://gerrit.cloudera.org:8080/#/c/20498/11/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@48
PS11, Line 48: DISTRIBUTEDPLAN
Are the parallel/distributed plans useful in the tests? At the first glance 
they are just adding noise.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 Nov 2023 16:21:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14467/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 Nov 2023 00:40:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-16 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 11:

Ran exhaustive tests overnight and fixed 2 more tests in patch set 11.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 Nov 2023 00:18:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-16 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#11).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14458/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 16 Nov 2023 01:20:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14456/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 16 Nov 2023 01:04:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 10:

Patch set 10 add exception about inputCardinalityEst for Kudu.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 16 Nov 2023 00:54:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#10).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461
PS7, Line 461:  columnar sc
> Changed this variable name to evalAtRowLevel to better represent the intent
Patch set 9 include Kudu scanner as well. I test myself and see that most of 
the time, the reduced cardinality is still above actual rows returned by Kudu.


http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429
PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 
03:HASH JOIN into
 : # an expanding join and makes probe pipeline ineligible for 
cardinality reduction.
> Still unsure about this testcase. It looks like reducing cardinality of 00:
I stand on current algorithm that nodeStack must not be empty and scan node 
cardinality reduction should only involve runtime filter coming from join nodes 
in nodeStack. It might miss some reduction opportunity, but is safer to reason 
about the propagation.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 16 Nov 2023 00:43:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#9).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Add fe test testRuntimeFilterCardinalityReduction and
  testRuntimeFilterCardinalityReductionOnKudu
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction-on-kudu.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 8:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025
PS7, Line 1025: isLeftOuterJoin
> Can you explain why is it supported for left outer join? We shouldn't have
Left outer join may still eligible to be included in nodeStack if it is 
selective and reducing (output cardinality < input cardinality),  even though 
it is not producing runtime filter for the leftmost scan.

Note that this method is exist to clear noteStack state if traversal arrives at 
an expanding join.
runtime-filter-cardinality-reduction.test shows an example.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461
PS7, Line 461: lter call in
> I think that cardinality reduction should be also applied to Kudu scanners.
Changed this variable name to evalAtRowLevel to better represent the intention.
Kudu scanner does not eval filter on row level (no call to EvalRuntimeFilter 
function), but I do remember that Impala can push runtime filter down to Kudu, 
which I'm not sure what the behavior will be. Please let me know what you think.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468
PS7, Line 468:
> Can you add comments on why does isAllColumnarScanner matter? What happens
No row-group or row level filtering happen if scanning against text file. I 
have not confirm if non-partition filter is still generated for all-text scan, 
but I think it is correct to generate non-partition runtime filter for mixed 
file format scan, but we should not use that to reason about cardinality 
reduction since it does not applies for all files.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480
PS7, Line 480:
 :   /**
 :* Given a contiguous probe pipeline 'nodeStac
> Do we guarantee this somehow? If yes, then there could  be a Precondition i
This is guaranteed by JoinNode.isSelectiveAndReducingJoin(). nodeStack will be 
cleared if traversal arrived at expanding join. Added sanity check at ps 8.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS7, Line 488: ed output cardinality and par
> I am still trying to wrap my head around this algorithm.
Added testRuntimeFilterCardinalityReduction.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513
PS7, Line 513: e(node instanceof Join
> Does this case handle the case when the key is result of an expression and
Replaced this with filter.getTargetExpr(id_).getNumDistinctValues().
For the key, it is handled already through filter.getNdvEstimate() and 
filter.getBuildKeyNumCardinality().


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS7, Line 514:   JoinNode join = (JoinNode) node;
> Should we be at this point if probe side key ndv is unknown? It may be safe
Done


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522
PS7, Line 522: ts();
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544
PS7, Line 544:colName = ta
> The name is a bit misleading, as it should be actually the lowestJoinCard.
Done


http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429
PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 
03:HASH JOIN into
 : # an expanding join and makes probe pipeline ineligible for 
cardinality reduction.
Still unsure about this testcase. It looks like reducing cardinality of 00:SCAN 
by RF000 is still valid, even though 04:HASH JOIN is not part of stackNode 
because 03:HASH JOIN is expanding.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14455/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 15 Nov 2023 20:59:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#8).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Add PlannerTest.testRuntimeFilterCardinalityReduction
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025
PS7, Line 1025: isLeftOuterJoin
Can you explain why is it supported for left outer join? We shouldn't have any 
runtime filter pushed down to probe side in that case.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461
PS7, Line 461: HdfsScanNode
I think that cardinality reduction should be also applied to Kudu scanners.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468
PS7, Line 468: isAllColumnarScanner
Can you add comments on why does isAllColumnarScanner matter? What happens if 
it is a text file? Do we generate runtime filters at all in that case if it is 
not a partition filter?


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480
PS7, Line 480: he probe pipeline
 :* 'nodeStack' must have original cardinality estimate that 
continues decreasing from
 :* scan node up towards the highest join node.
Do we guarantee this somehow? If yes, then there could  be a Precondition in 
the loop. Generally this seems true for inner joins unless there are duplicates 
on the build side.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS7, Line 488: getReducedCardinalityByFilter
I am still trying to wrap my head around this algorithm.

I think that it would be very helpful to have a 1-2 tests where more 
information would be provided to be able track how cardinalities are reduced, 
e.g. a single join and a multi join query with the NDVs in comment. Adding a 
summary in comment could be also useful to show that the cardinality reductions 
are realistic compared to the cardinalites during query execution.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513
PS7, Line 513: targetSlot == null ? -
Does this case handle the case when the key is result of an expression and is 
not simply a slotRef? All Expr has getNumDistinctValues(), which is filled 
based on children expressions, so we could use that as an NDV estimate.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS7, Line 514: if (scanColumnNdv < 0) scanColumnNdv = cardinality_; // 
fallback
Should we be at this point if probe side key ndv is unknown? It may be safer to 
not do any cardinality reduction if we don't know the ndv. This is a bit 
different than doing the estimation in join nodes, where we really need to 
"guess" something even if there are no stats.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522
PS7, Line 522: colum
typo


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544
PS7, Line 544: highestJoinCard
The name is a bit misleading, as it should be actually the lowestJoinCard.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 15 Nov 2023 16:06:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14443/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 14 Nov 2023 22:05:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1816
PS6, Line 1816: TQueryExecRequest result = new TQueryExecRequest();
  : if (options.runtime_filter_cardinality_reduction_scale > 0) 
{
  :   Planner.reduceCardinalityByRuntimeFilter(
  :   planRoots, 
options.runtime_filter_cardinality_reduction_scale);
> This cardinality reduction happen before cost and resource requirement calc
Added RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE option.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 14 Nov 2023 21:38:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-14 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#7).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

This patch also adds development query option named
RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE to help with benchmarking and
disabling cardinality reduction if needed (by setting to 0.0).

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Add be test QueryOptions.SetFractionalOptions
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-14 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/20498/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1816
PS6, Line 1816: Planner.reduceCardinalityByRuntimeFilter(planRoots);
  : Planner.computeProcessingCost(planRoots, result, 
planner.getPlannerCtx());
  : Planner.computeResourceReqs(planRoots, queryCtx, result,
  : planner.getPlannerCtx(), 
planner.getAnalysisResult().isQueryStmt());
This cardinality reduction happen before cost and resource requirement 
calculation. It is possible that lower cardinality result in lower resource 
estimation, albeit only limited to ScanNode, JoinNode, and ExchangeNode.
Having query option to disable/scale the cardinality reduction might be 
desirable to revert back to old behavior.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 14 Nov 2023 16:01:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14399/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 09 Nov 2023 18:55:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-09 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@18
PS5, Line 18: contiguou
> Nit: contiguous
Done


http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26
PS5, Line 26:  exe
> Nit: modes.
Done


http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26
PS5, Line 26: reduction is
> "... reduction is present/available in all ..."
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@454
PS5, Line 454: tility m
> Now that it's public, is it still internal?
Removed this.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@475
PS5, Line 475: computeGenericJoinCardina
> This method was renamed.
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1180
PS5, Line 1180: ScanNode, ExchangeNode
> Aren't also ExchangeNodes included?
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@710
PS5, Line 710:   || buildSlotRef.getDesc().getParent() == null
> Is it possible that buildSlotRef.getDesc().getParent() is null?
Thanks! I added that check.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@22
PS5, Line 22: import java.util.List;
> Seems to be unused.
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@25
PS5, Line 25:
> Seems to be unused.
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@102
PS5, Line 102: tivity_ = 1.0;
> It is not set there, and I can't find any other place where it's set.
Thanks for catching this! This should be assigned at 
ScanNode.reduceCardinalityByRuntimeFilter().


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@456
PS5, Line 456:* join node id to list of runtime filters from it.
> Could you describe the algorithm also here?
Added comments.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@470
PS5, Line 470: // Row-level runtime filtering, however, only applies at 
columnar file format.
> Could extract this loop into a function.
Move it to groupFiltersForCardinalityReduction().


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474
PS5, Line 474:
> Nit: applies
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@475
PS5, Line 475:
> Nit: applies (present tense) would be better.
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@487
PS5, Line 487:*/
> Could extract this loop into a function.
Move it to getReducedCardinalityByFilter().


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@510
PS5, Line 510: ter.get
> Nit: apply
Done


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@517
PS5, Line 517: cardOnThisJoin = Math.min(cardOnThisJoin, 
estCardAfterFilter);
> If 'colName' isn't updated here for more than one column, those columns wil
Yes, that is the intention. More distinct column will lead to lower 
parititonSelectivity. If columns can not be distinguished, pick the least 
selectivity one among them.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@528
PS5, Line 528: colName = targetSlot.getDesc().getColumn().getName();
> I don't understand why we choose the max here, we've been reducing 'reduced
The algorithm relies on the original cardinality estimate and assume scan will 
return at least the same number of rows as the highest join node. Updated the 
code and comment to highlight this.



--
To view, 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-09 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#6).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contiguous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction is present in all execution
modes (MT_DOP=0, MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based
planning mode will be the primary beneficiary of this patch. It can lead
towards ProcessingCost reduction, lower scan fragment parallelism, and
increase chance of query assignment to the smaller executor group set.
Other execution mode will see no change in their execution parallelism.

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 5:

(18 comments)

Thanks Riza

http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@18
PS5, Line 18: contigous
Nit: contiguous


http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26
PS5, Line 26: mode
Nit: modes.


http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26
PS5, Line 26: reduction in
"... reduction is present/available in all ..."


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@454
PS5, Line 454: internal
Now that it's public, is it still internal?


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@475
PS5, Line 475: getGenericJoinCardinality
This method was renamed.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1180
PS5, Line 1180: ScanNode and JoinNodes
Aren't also ExchangeNodes included?


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@710
PS5, Line 710:   || buildSlotRef.getDesc().getParent().getTable() == 
null
Is it possible that buildSlotRef.getDesc().getParent() is null?


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@22
PS5, Line 22: import java.util.HashSet;
Seems to be unused.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@25
PS5, Line 25: import java.util.Set;
Seems to be unused.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@102
PS5, Line 102: Set in Planner.applyRuntimeFilterSelectivity()
It is not set there, and I can't find any other place where it's set.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@456
PS5, Line 456:   protected void 
reduceCardinalityByRuntimeFilter(Stack nodeStack) {
Could you describe the algorithm also here?


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@470
PS5, Line 470: for (RuntimeFilterGenerator.RuntimeFilter filter : 
getRuntimeFilters()) {
Could extract this loop into a function.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474
PS5, Line 474: aplied
Nit: applies


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@475
PS5, Line 475: applied
Nit: applies (present tense) would be better.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@487
PS5, Line 487: for (int i = nodeStack.size() - 1; i >= 0; i--) {
Could extract this loop into a function.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@510
PS5, Line 510: applies
Nit: apply


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@517
PS5, Line 517: colName = targetSlot.getDesc().getColumn().getName();
If 'colName' isn't updated here for more than one column, those columns will be 
handled if they were the same.


http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@528
PS5, Line 528: reducedCardinality = Math.max(reducedCardinality, 
highestJoinCard);
I don't understand why we choose the max here, we've been reducing 
'reducedCardinality' so far and 'nodeStack.get(0).getCardinality()' was not yet 
updated. Could you explain? Thanks.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-11-07 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 5:

patch set 5 is a rebase to catch up with latest toolchain.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 07 Nov 2023 19:41:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14288/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 31 Oct 2023 16:56:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-31 Thread Riza Suminto (Code Review)
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#4).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contigous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate. The runtime filter
selectivity is calculated with the simplest join cardinality
formula (JoinNode.computeGenericJoinCardinality()).

While this cardinality reduction in all execution mode (MT_DOP=0,
MT_DOP>0, and COMPUTE_PROCESSING_COST=1), cost-based planning mode will
be the primary beneficiary of this patch. It can lead towards
ProcessingCost reduction, lower scan fragment parallelism, and increase
chance of query assignment to the smaller executor group set. Other
execution mode will see no change in their execution parallelism.

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.
- Pass core tests.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-dist-method.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M 

[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-24 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@15
PS3, Line 15: This patch applies runtime filter selectivity to lower cardinality
: estimates of scan nodes and certain join nodes above them after 
runtime
: filter generation and before resource requirement computation.
> I think that ideally we shouldn't use join node selectivity, but the NDV of
I think understand your concern. As conclusion, is it correct that I should 
make similar implementation as JoinNode.getGenericJoinCardinality(), but 
against incoming runtime filters instead of join conjuncts?
https://github.com/apache/impala/blob/c244aadcf367360e52807a84e7fba8b6237651fd/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L404-L411


http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@34
PS3, Line 34: Testing:
> It would be nice to have some targeted tests for edge cases, e.g. missing s
Maybe I should consider enabling this for all situation rather than exclusive 
on COMPUTE_PROCESSING_COST=1 to ease testing. I will look around.


http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java@538
PS3, Line 538: CardinalityRefinerVisitor
> I don't have a clear plan, but couldn't this be turned into a recursive fun
Last time I tried to implement recursive calls over PlanNodes tree, I was hit 
by StackOverflow error.
It could be a flaw in my implementation though. I'll try again making it 
recursive.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 24 Oct 2023 21:03:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-24 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 3:

(3 comments)

The implementation looks good to me, but i have some high level questions.

http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@15
PS3, Line 15: This patch applies runtime filter selectivity to lower cardinality
: estimates of scan nodes and certain join nodes above them after 
runtime
: filter generation and before resource requirement computation.
I think that ideally we shouldn't use join node selectivity, but the NDV of the 
key on the build side. Generally we assume that selectivity on the join node 
will also reduce the NDV of the key, so the two becomes interchangeable, but I 
would prefer if this logic would stay within join node/runtime filter code, and 
not spread to other nodes.

This is how I see the steps that lead to cardinality reduction in the scan node:
1. ndv(key) is calculated in the join builder based on build side columns stats 
and estimated selectivity
2. a runtime filter is created for key, which also has an fpp based on ndv(key) 
and bloom filter size
3. the scan node can calculate bloom filter selectivity based on its own ndv 
(after applying other predicates), the ndv from the bloom filter and fpp of the 
bloom filter

This probably gives the same results as the current solution, but makes it 
easier to think about the effect of other predicates / runtime filters / 
duplicate keys on build side.

A good example where this could help is a join with huge number of duplicates + 
a very selective filter on the build side. We could assume that the build side 
ndv(key) is reduced due to the selective predicate, but 
getJoinNodeSelectivity() would be still > 0 due to the duplicate matches. So we 
wouldn't reduce scanner cardinality while we could assume that many rows are 
dropped based on the bloom filter.


http://gerrit.cloudera.org:8080/#/c/20498/3//COMMIT_MSG@34
PS3, Line 34: Testing:
It would be nice to have some targeted tests for edge cases, e.g. missing 
stats, duplicate matches in join, effect of other predicates.


http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/3/fe/src/main/java/org/apache/impala/planner/Planner.java@538
PS3, Line 538: CardinalityRefinerVisitor
I don't have a clear plan, but couldn't this be turned into a recursive 
function in PlanNode?



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 24 Oct 2023 09:17:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-19 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG@31
PS2, Line 31: mode
> Nit: modes
Done



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 19 Oct 2023 23:10:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-19 Thread Riza Suminto (Code Review)
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#3).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contigous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate.

This cardinality reduction is currently only applied in cost-based
planning mode (COMPUTE_PROCESSING_COST option is True) to avoid
potential regression in regular planning mode. Cost-based planning mode
can benefit the most from reduced scan cardinality. It can lead towards
ProcessingCost reduction, lower scan fragment parallelism, and increase
chance of query assignment to the smaller executor group set. We can
consider enabling this cardinality reduction technique for all planning
modes after more thorough performance evaluation (which require more
planner test changes).

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
5 files changed, 465 insertions(+), 253 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/3
--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 2:

(4 comments)

Thanks Riza.

http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/2//COMMIT_MSG@31
PS2, Line 31: mode
Nit: modes


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132
PS1, Line 132: d la
> I use Long here in case the original value cardinality_ itself is unknown (
Ok, I understand now.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386
PS1, Line 386:   if (originalCardinality_ != null) {
> Added "changed from". Please check the output in tpcds-processing-cost.test
I think it's ok like this.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658
PS1, Line 658:
> I chose to clarify with comment instead. Calling getEstFpp() on other kind
It's ok if all other filters are deterministic, i.e. false positives are never 
returned.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 19 Oct 2023 15:00:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14195/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 18 Oct 2023 00:44:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@18
PS1, Line 18: select
> Nit: selects
Done


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@20
PS1, Line 20: produce
> Nit: produces.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25
PS1, Line 25: option is True)
> It's not clear to me how it explains that the cardinality reduction is only
Clarified the intention in commit message. I plan to evaluate this in smaller 
scope first before enabling it in all planning mode.


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25
PS1, Line 25:  mode (CO
> Typo: PROCESSING
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@128
PS1, Line 128:   // invalid: -1
> Mention in the comment that it can also be replaced in replaceCardinality()
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132
PS1, Line 132: d la
> Does it need to be a Long? Can't we use -1 as the value indicating an inval
I use Long here in case the original value cardinality_ itself is unknown (-1).
Therefore, after replaceCardinality() called, (cardinality_=10, 
originalCardinality_=null) can be differentiated from  (cardinality_=10, 
originalCardinality_=-1).


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386
PS1, Line 386:   if (originalCardinality_ != null) {
> Shouldn't we add some explanation here about what the value in parentheses
Added "changed from". Please check the output in tpcds-processing-cost.test and 
let me know if this is OK or too verbose.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@534
PS1, Line 534: refine
> Nit: refines.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@547
PS1, Line 547: side
> Nit: do we need 'hand' here?
Removed.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@585
PS1, Line 585: ace("refineCardin
> Could add a precondition check that 'nodeStack_' is not empty.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592
PS1, Line 592: Filter filter : scan.ge
> Could extract it into a variable.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592
PS1, Line 592: ator.Runtim
> Could use Map::computeIfAbsent() here.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@621
PS1, Line 621: consider
> Nit: considers.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@658
PS1, Line 658: require
> Nit: requires.
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@640
PS1, Line 640:   for (RuntimeFilterTarget target : targets_) {
> I think it would be cleaner if we inverted the if and returned from there -
Done


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658
PS1, Line 658:
> Can we check if it is a Bloom filter? We could add a Precondition check.
I chose to clarify with comment instead. Calling getEstFpp() on other kind of 
filter is still legal, but will return 0.
Please review if it is OK or not.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 18 Oct 2023 00:31:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-17 Thread Riza Suminto (Code Review)
Hello Daniel Becker, Abhishek Rawat, David Rorke, Csaba Ringhofer, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20498

to look at the new patch set (#2).

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm selects a contigous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produces a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate.

This cardinality reduction is currently only applied in cost-based
planning mode (COMPUTE_PROCESSING_COST option is True) to avoid
potential regression in regular planning mode. Cost-based planning mode
can benefit the most from reduced scan cardinality. It can lead towards
ProcessingCost reduction, lower scan fragment parallelism, and increase
chance of query assignment to the smaller executor group set. We can
consider enabling this cardinality reduction technique for all planning
mode after more thorough performance evaluation (which require more
planner test changes).

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
5 files changed, 465 insertions(+), 253 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/2
--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-10-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 1:

(16 comments)

I can't say I understand the change, but here are some comments.

http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@18
PS1, Line 18: select
Nit: selects


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@20
PS1, Line 20: produce
Nit: produces.


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25
PS1, Line 25: POCESSING
Typo: PROCESSING


http://gerrit.cloudera.org:8080/#/c/20498/1//COMMIT_MSG@25
PS1, Line 25: This is because
It's not clear to me how it explains that the cardinality reduction is only 
applied if the query option is on (as opposed to always being applied).


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@128
PS1, Line 128:   protected long cardinality_;
Mention in the comment that it can also be replaced in replaceCardinality().


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@132
PS1, Line 132: Long
Does it need to be a Long? Can't we use -1 as the value indicating an 
invalid/unset state, like in the case of 'cardinality_'?


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java@386
PS1, Line 386: expBuilder.append("(")
Shouldn't we add some explanation here about what the value in parentheses 
means? Or would it be too much here?


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@534
PS1, Line 534: refine
Nit: refines.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@547
PS1, Line 547: hand
Nit: do we need 'hand' here?


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@585
PS1, Line 585: nodeStack_.get(0)
Could add a precondition check that 'nodeStack_' is not empty.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592
PS1, Line 592: filter.getSrc().getId()
Could extract it into a variable.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@592
PS1, Line 592: containsKey
Could use Map::computeIfAbsent() here.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@621
PS1, Line 621: consider
Nit: considers.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/Planner.java@658
PS1, Line 658: require
Nit: requires.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@640
PS1, Line 640:   for (RuntimeFilterTarget target : targets_) {
I think it would be cleaner if we inverted the if and returned from there - 
'continue' would not be needed.


http://gerrit.cloudera.org:8080/#/c/20498/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@658
PS1, Line 658: only valid for bloom filter.
Can we check if it is a Bloom filter? We could add a Precondition check.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 17 Oct 2023 09:27:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14042/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 20 Sep 2023 23:21:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12018: Consider runtime filter for cardinality reduction

2023-09-20 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20498


Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
..

IMPALA-12018: Consider runtime filter for cardinality reduction

Currently Impala creates a plan first and looks for runtime filters
based on the complete plan. This means cardinality estimate in the query
plan does not incorporate runtime filter selectivity. Actual scan
cardinality from runtime execution is often much lower that the
cardinality estimate due to existence of runtime filter.

This patch applies runtime filter selectivity to lower cardinality
estimates of scan nodes and certain join nodes above them after runtime
filter generation and before resource requirement computation. The
algorithm select a contigous probe pipeline consisting of a scan node,
exchanges, and reducing join nodes. Depending on whether the join node
produce a runtime filter and the type of that runtime filter, it then
applies the runtime filter selectivity to the scan node to reduce its
cardinality and input cardinality estimate.

This cardinality reduction is currently only applied if
COMPUTE_POCESSING_COST option is True. This is because multiple executor
group set setup can benefit the most from reduced scan cardinality. It
can lead towards ProcessingCost reduction, lower scan fragment
parallelism, and increase chance of query assignment to the smaller
executor group set. We can consider enabling this for all cases after
more thorough performance evaluation (which require more planner test
changes).

Testing:
- Pass test_executor_groups.py.
- Pass PlannerTest#testProcessingCost.

Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
---
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
5 files changed, 456 insertions(+), 252 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/20498/1
--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto