Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
......................................................................


Patch Set 13:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@31
PS12, Line 31: mode
> Nit: modes.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12//COMMIT_MSG@35
PS12, Line 35: RUNTIME_FILTER_CARDINALITY_REDUCTION_SCALE
> Could you elaborate on how it can be used (what values are valid, what they
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@315
PS12, Line 315:   // List of pairs of Key and boolean flag on whether the 
option is inclusive of 0 and 1.
> I'd put it right before the loop.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options-test.cc@316
PS12, Line 316:   pair<OptionDef<double>, bool> case_set[]{
> This is not used.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/20498/12/be/src/service/query-options.cc@1170
PS12, Line 1170: ;
> No need for the 'float' literal, especially as it is a double.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@837
PS12, Line 837: exceed
> Nit: exceeds.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/common/thrift/ImpalaService.thrift@849
PS12, Line 849: Default
> Nit: Defaults.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474
PS12, Line 474: only applies at HDFS columnar file
> Now 'isAllColumnarScanner' is replaced with 'evalAtRowLevel', Kudu is also
Yes. Mentioned Kudu in comment.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@485
PS12, Line 485: reduc
> Maybe 'reduced' would be better, "lower o. c. estimate" may suggest a "lowe
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@486
PS12, Line 486: at
> Nit: is?
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS12, Line 488: below
> It can be a bit confusing that this join node is at the bottom of the stack
Added "in node tree".


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS12, Line 514:             this, reducedCardinality, partitionSelectivities);
> I think that it would be more readable if the the body of the loop would be
Added RuntimeFilter.reducedCardinalityForScanNode().


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@531
PS12, Line 531: dinality, scaledPartSel);
              :   }
> I am not sure if this is correct - all max selectivities per column will be
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@561
PS12, Line 561:
> Can it be multiple join nodes or just one?
There can be multiple join nodes in nodeStack and they connect with each other 
at probe pipeline.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@565
PS12, Line 565:
> Could you also mention 'reductionScale' here?
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@578
PS12, Line 578: dinality_ * scanRangeSelectivity_
> Could extract this as 'currentCardinality'.
Done


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@591
PS12, Line 591:     // with the least
> Shouldn't it be 'partitionSelectivity'? Or is it the same thing?
It is the same thing. I name it scanRangeSelectivity_ because from ScanNode 
perspective, it will be applied to reduce the estimated number of scan range 
that survive file-level filter and actually being read.

Scan range might be 1 range per file or  multiple range per file, depending on 
target file system.


http://gerrit.cloudera.org:8080/#/c/20498/12/fe/src/main/java/org/apache/impala/planner/ScanNode.java@609
PS12, Line 609:
> Why do we leave out the last one?
Clarified in the new comment below this.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@11
PS12, Line 11: ss_sold_date_sk = d_date_sk
> Can you also add a test when there is an expression in the key to test rele
Added as Case 07 and Case 08 in patch set 13.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@91
PS12, Line 91: _ticket_number is n
> Is it possible to see the effect of runtime filters on partition selectivit
Added scan-range-selectivity next to max-scan-range-rows in query plan.


http://gerrit.cloudera.org:8080/#/c/20498/12/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@299
PS12, Line 299: itio
> Can you also add a test for semi left join?
Left anti join is a variant of left semi join, per this code in 
JoinOperator.java

  public boolean isLeftSemiJoin() {
    return this == JoinOperator.LEFT_SEMI_JOIN || this == 
JoinOperator.LEFT_ANTI_JOIN
        || this == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN
        || this == JoinOperator.ICEBERG_DELETE_JOIN;
  }



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Mon, 27 Nov 2023 22:55:11 +0000
Gerrit-HasComments: Yes

Reply via email to