Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20498 )

Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025
PS7, Line 1025: isLeftOuterJoin
Can you explain why is it supported for left outer join? We shouldn't have any 
runtime filter pushed down to probe side in that case.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461
PS7, Line 461: HdfsScanNode
I think that cardinality reduction should be also applied to Kudu scanners.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468
PS7, Line 468: isAllColumnarScanner
Can you add comments on why does isAllColumnarScanner matter? What happens if 
it is a text file? Do we generate runtime filters at all in that case if it is 
not a partition filter?


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480
PS7, Line 480: he probe pipeline
             :    * 'nodeStack' must have original cardinality estimate that 
continues decreasing from
             :    * scan node up towards the highest join node.
Do we guarantee this somehow? If yes, then there could  be a Precondition in 
the loop. Generally this seems true for inner joins unless there are duplicates 
on the build side.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488
PS7, Line 488: getReducedCardinalityByFilter
I am still trying to wrap my head around this algorithm.

I think that it would be very helpful to have a 1-2 tests where more 
information would be provided to be able track how cardinalities are reduced, 
e.g. a single join and a multi join query with the NDVs in comment. Adding a 
summary in comment could be also useful to show that the cardinality reductions 
are realistic compared to the cardinalites during query execution.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513
PS7, Line 513: targetSlot == null ? -
Does this case handle the case when the key is result of an expression and is 
not simply a slotRef? All Expr has getNumDistinctValues(), which is filled 
based on children expressions, so we could use that as an NDV estimate.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514
PS7, Line 514:         if (scanColumnNdv < 0) scanColumnNdv = cardinality_; // 
fallback
Should we be at this point if probe side key ndv is unknown? It may be safer to 
not do any cardinality reduction if we don't know the ndv. This is a bit 
different than doing the estimation in join nodes, where we really need to 
"guess" something even if there are no stats.


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522
PS7, Line 522: colum
typo


http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544
PS7, Line 544: highestJoinCard
The name is a bit misleading, as it should be actually the lowestJoinCard.



--
To view, visit http://gerrit.cloudera.org:8080/20498
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1
Gerrit-Change-Number: 20498
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 15 Nov 2023 16:06:50 +0000
Gerrit-HasComments: Yes

Reply via email to