Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 )
Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025 PS7, Line 1025: isLeftOuterJoin Can you explain why is it supported for left outer join? We shouldn't have any runtime filter pushed down to probe side in that case. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461 PS7, Line 461: HdfsScanNode I think that cardinality reduction should be also applied to Kudu scanners. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468 PS7, Line 468: isAllColumnarScanner Can you add comments on why does isAllColumnarScanner matter? What happens if it is a text file? Do we generate runtime filters at all in that case if it is not a partition filter? http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480 PS7, Line 480: he probe pipeline : * 'nodeStack' must have original cardinality estimate that continues decreasing from : * scan node up towards the highest join node. Do we guarantee this somehow? If yes, then there could be a Precondition in the loop. Generally this seems true for inner joins unless there are duplicates on the build side. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS7, Line 488: getReducedCardinalityByFilter I am still trying to wrap my head around this algorithm. I think that it would be very helpful to have a 1-2 tests where more information would be provided to be able track how cardinalities are reduced, e.g. a single join and a multi join query with the NDVs in comment. Adding a summary in comment could be also useful to show that the cardinality reductions are realistic compared to the cardinalites during query execution. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513 PS7, Line 513: targetSlot == null ? - Does this case handle the case when the key is result of an expression and is not simply a slotRef? All Expr has getNumDistinctValues(), which is filled based on children expressions, so we could use that as an NDV estimate. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS7, Line 514: if (scanColumnNdv < 0) scanColumnNdv = cardinality_; // fallback Should we be at this point if probe side key ndv is unknown? It may be safer to not do any cardinality reduction if we don't know the ndv. This is a bit different than doing the estimation in join nodes, where we really need to "guess" something even if there are no stats. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522 PS7, Line 522: colum typo http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544 PS7, Line 544: highestJoinCard The name is a bit misleading, as it should be actually the lowestJoinCard. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 15 Nov 2023 16:06:50 +0000 Gerrit-HasComments: Yes
