Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 )
Change subject: IMPALA-10252: fix invalid runtime filters for outer joins ...................................................................... Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13 PS4, Line 13: is > nit: if Done http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. > I wonder if the root cause is that the null rows from the inner do not > participate in the filtering during run-time, The NULL rows from the inner are only created when there is an unmatched row from the outer (they're just represented as a NULL pointer). That happens after we've already built the runtime filter, so the rows don't really exist in a way that they could be included in the filter. The algorithm for building the filters is basically: for row in inner: add row to hash table for filter in produced filters: evaluate filter expression and add to filter I tried an alternative solution which was to add an extra step to the end: if join is left outer or full outer: construct NULL row evaluate filter expression over NULL row and add to filter The solution works in theory, but it turns out that constructing the NULL row is not trivial if the RowDescriptor for the inner has non-nullable slots/tuples, so we'd need to generate a new RowDescriptor and fix up expressions to reference it. I think you're touching on something else, which is that NULL values are not included in the filter. That isn't the root cause here, I believe we handle that correctly. Specifically, we can't use runtime filters for IS NOT DISTINCT FROM equi-join predicates, and those are rejected by SingleNodePlanner.getNormalizedEqPred(). http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292 PS4, Line 2292: slots > nit: with all NULL slots into a literal. Done http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293 PS4, Line 2293: the expression > nit: the literal. Done http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373 PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() || : filterSrcNode.getJoinOp().isFullOuterJoin()) > I wonder if we need to add the similar guard for the right outer join too. This is done towards the end of distributed planning when the join order and distribution modes are already fixed. Right outer joins can be present here, but don't need any special handling because the filters are always sent from the inner to the outer and it's valid to filter any rows from the outer that don't match the inner. http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@388 PS4, Line 388: and and > nit: duplication. Done -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 26 Oct 2020 17:44:09 +0000 Gerrit-HasComments: Yes
