Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 )
Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... Patch Set 8: (10 comments) http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/JoinNode.java@1025 PS7, Line 1025: isLeftOuterJoin > Can you explain why is it supported for left outer join? We shouldn't have Left outer join may still eligible to be included in nodeStack if it is selective and reducing (output cardinality < input cardinality), even though it is not producing runtime filter for the leftmost scan. Note that this method is exist to clear noteStack state if traversal arrives at an expanding join. runtime-filter-cardinality-reduction.test shows an example. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@461 PS7, Line 461: lter call in > I think that cardinality reduction should be also applied to Kudu scanners. Changed this variable name to evalAtRowLevel to better represent the intention. Kudu scanner does not eval filter on row level (no call to EvalRuntimeFilter function), but I do remember that Impala can push runtime filter down to Kudu, which I'm not sure what the behavior will be. Please let me know what you think. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@468 PS7, Line 468: > Can you add comments on why does isAllColumnarScanner matter? What happens No row-group or row level filtering happen if scanning against text file. I have not confirm if non-partition filter is still generated for all-text scan, but I think it is correct to generate non-partition runtime filter for mixed file format scan, but we should not use that to reason about cardinality reduction since it does not applies for all files. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@480 PS7, Line 480: : /** : * Given a contiguous probe pipeline 'nodeStac > Do we guarantee this somehow? If yes, then there could be a Precondition i This is guaranteed by JoinNode.isSelectiveAndReducingJoin(). nodeStack will be cleared if traversal arrived at expanding join. Added sanity check at ps 8. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@488 PS7, Line 488: ed output cardinality and par > I am still trying to wrap my head around this algorithm. Added testRuntimeFilterCardinalityReduction. http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@513 PS7, Line 513: e(node instanceof Join > Does this case handle the case when the key is result of an expression and Replaced this with filter.getTargetExpr(id_).getNumDistinctValues(). For the key, it is handled already through filter.getNdvEstimate() and filter.getBuildKeyNumCardinality(). http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@514 PS7, Line 514: JoinNode join = (JoinNode) node; > Should we be at this point if probe side key ndv is unknown? It may be safe Done http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@522 PS7, Line 522: ts(); > typo Done http://gerrit.cloudera.org:8080/#/c/20498/7/fe/src/main/java/org/apache/impala/planner/ScanNode.java@544 PS7, Line 544: colName = ta > The name is a bit misleading, as it should be actually the lowestJoinCard. Done http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test: http://gerrit.cloudera.org:8080/#/c/20498/8/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test@429 PS8, Line 429: # Removing sr_ticket_number=ss_ticket_number predicate will turn 03:HASH JOIN into : # an expanding join and makes probe pipeline ineligible for cardinality reduction. Still unsure about this testcase. It looks like reducing cardinality of 00:SCAN by RF000 is still valid, even though 04:HASH JOIN is not part of stackNode because 03:HASH JOIN is expanding. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 15 Nov 2023 22:21:08 +0000 Gerrit-HasComments: Yes
