Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13
PS4, Line 13: is
> nit: if
Done


http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> I wonder if the root cause is that the null rows from the inner do not 
> participate in the filtering during run-time,

The NULL rows from the inner are only created when there is an unmatched row 
from the outer (they're just represented as a NULL pointer). That happens after 
we've already built the runtime filter, so the rows don't really exist in a way 
that they could be included in the filter.

The algorithm for building the filters is basically:

  for row in inner:
    add row to hash table
    for filter in produced filters:
      evaluate filter expression and add to filter

I tried an alternative solution which was to add an extra step to the end:

  if join is left outer or full outer:
    construct NULL row
    evaluate filter expression over NULL row and add to filter

The solution works in theory, but it turns out that constructing the NULL row 
is not trivial if the RowDescriptor for the inner has non-nullable 
slots/tuples, so we'd need to generate a new RowDescriptor and fix up 
expressions to reference it.

I think you're touching on something else, which is that NULL values are not 
included in the filter. That isn't the root cause here, I believe we handle 
that correctly. Specifically, we can't use runtime filters for IS NOT DISTINCT 
FROM equi-join predicates, and those are rejected by 
SingleNodePlanner.getNormalizedEqPred().


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292
PS4, Line 2292: slots
> nit: with all NULL slots into a literal.
Done


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293
PS4, Line 2293: the expression
> nit: the literal.
Done


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373
PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() ||
             :               filterSrcNode.getJoinOp().isFullOuterJoin())
> I wonder if we need to add the similar guard for the right outer join too.
This is done towards the end of distributed planning when the join order and 
distribution modes are already fixed. Right outer joins can be present here, 
but don't need any special handling because the filters are always sent from 
the inner to the outer and it's valid to filter any rows from the outer that 
don't match the inner.


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@388
PS4, Line 388: and and
> nit: duplication.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 26 Oct 2020 17:44:09 +0000
Gerrit-HasComments: Yes

Reply via email to