Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20498 )
Change subject: IMPALA-12018: Consider runtime filter for cardinality reduction ...................................................................... Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@18 PS5, Line 18: contiguou > Nit: contiguous Done http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: exe > Nit: modes. Done http://gerrit.cloudera.org:8080/#/c/20498/5//COMMIT_MSG@26 PS5, Line 26: reduction is > "... reduction is present/available in all ..." Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@454 PS5, Line 454: tility m > Now that it's public, is it still internal? Removed this. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/JoinNode.java@475 PS5, Line 475: computeGenericJoinCardina > This method was renamed. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1180 PS5, Line 1180: ScanNode, ExchangeNode > Aren't also ExchangeNodes included? Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@710 PS5, Line 710: || buildSlotRef.getDesc().getParent() == null > Is it possible that buildSlotRef.getDesc().getParent() is null? Thanks! I added that check. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@22 PS5, Line 22: import java.util.List; > Seems to be unused. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@25 PS5, Line 25: > Seems to be unused. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@102 PS5, Line 102: tivity_ = 1.0; > It is not set there, and I can't find any other place where it's set. Thanks for catching this! This should be assigned at ScanNode.reduceCardinalityByRuntimeFilter(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@456 PS5, Line 456: * join node id to list of runtime filters from it. > Could you describe the algorithm also here? Added comments. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@470 PS5, Line 470: // Row-level runtime filtering, however, only applies at columnar file format. > Could extract this loop into a function. Move it to groupFiltersForCardinalityReduction(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@474 PS5, Line 474: > Nit: applies Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@475 PS5, Line 475: > Nit: applies (present tense) would be better. Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@487 PS5, Line 487: */ > Could extract this loop into a function. Move it to getReducedCardinalityByFilter(). http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@510 PS5, Line 510: ter.get > Nit: apply Done http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@517 PS5, Line 517: cardOnThisJoin = Math.min(cardOnThisJoin, estCardAfterFilter); > If 'colName' isn't updated here for more than one column, those columns wil Yes, that is the intention. More distinct column will lead to lower parititonSelectivity. If columns can not be distinguished, pick the least selectivity one among them. http://gerrit.cloudera.org:8080/#/c/20498/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java@528 PS5, Line 528: colName = targetSlot.getDesc().getColumn().getName(); > I don't understand why we choose the max here, we've been reducing 'reduced The algorithm relies on the original cardinality estimate and assume scan will return at least the same number of rows as the highest join node. Updated the code and comment to highlight this. -- To view, visit http://gerrit.cloudera.org:8080/20498 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I033789c9b63a8188484e3afde8e646563918b3e1 Gerrit-Change-Number: 20498 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Thu, 09 Nov 2023 18:39:48 +0000 Gerrit-HasComments: Yes
