Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
......................................................................


Patch Set 8:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025
PS7, Line 1025: isLeftOuterJoin
> Can you explain why is it supported for left outer join? We shouldn't have
Left outer join may still eligible to be included in nodeStack if it is 
selective and reducing (output cardinality < input cardinality),  even though 
it is not producing runtime filter for the leftmost scan.

Note that this method is exist to clear noteStack state if traversal arrives at 
an expanding join.
runtime-filter-cardinality-reduction.test shows an example.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461
PS7, Line 461: lter call in
> I think that cardinality reduction should be also applied to Kudu scanners.
Changed this variable name to evalAtRowLevel to better represent the intention.
Kudu scanner does not eval filter on row level (no call to EvalRuntimeFilter 
function), but I do remember that Impala can push runtime filter down to Kudu, 
which I'm not sure what the behavior will be. Please let me know what you think.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468
PS7, Line 468:
> Can you add comments on why does isAllColumnarScanner matter? What happens
No row-group or row level filtering happen if scanning against text file. I 
have not confirm if non-partition filter is still generated for all-text scan, 
but I think it is correct to generate non-partition runtime filter for mixed 
file format scan, but we should not use that to reason about cardinality 
reduction since it does not applies for all files.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480
PS7, Line 480:
             :   /**
             :    * Given a contiguous probe pipeline 'nodeStac
> Do we guarantee this somehow? If yes, then there could  be a Precondition i
This is guaranteed by JoinNode.isSelectiveAndReducingJoin(). nodeStack will be 
cleared if traversal arrived at expanding join. Added sanity check at ps 8.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS7, Line 488: ed output cardinality and par
> I am still trying to wrap my head around this algorithm.
Added testRuntimeFilterCardinalityReduction.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513
PS7, Line 513: e(node instanceof Join
> Does this case handle the case when the key is result of an expression and
Replaced this with filter.getTargetExpr(id_).getNumDistinctValues().
For the key, it is handled already through filter.getNdvEstimate() and 
filter.getBuildKeyNumCardinality().


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS7, Line 514:       JoinNode join = (JoinNode) node;
> Should we be at this point if probe side key ndv is unknown? It may be safe
Done


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522
PS7, Line 522: ts();
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544
PS7, Line 544:    colName = ta
> The name is a bit misleading, as it should be actually the lowestJoinCard.
Done


http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test:

http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429
PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 
03:HASH JOIN into
             : # an expanding join and makes probe pipeline ineligible for 
cardinality reduction.
Still unsure about this testcase. It looks like reducing cardinality of 00:SCAN 
by RF000 is still valid, even though 04:HASH JOIN is not part of stackNode 
because 03:HASH JOIN is expanding.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 15 Nov 2023 22:21:08 +0000
Gerrit-HasComments: Yes

Reply via email to