Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
......................................................................


Patch Set 6: Code-Review+1

> Patch Set 5:
>
> Updated the commit message as requested.
>
> Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at 
> least this case correctly - it does call isTrueWithNullSlots() on the 
> expression. I guess it's possible that it might handle more complex 
> expressions incorrectly, e.g. if the expression has slots from both sides of 
> the join and is false when all slots are null but true if a subset of slots 
> is null.
>
>
>
>   [localhost.EXAMPLE.COM:21050] default> set 
> ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1;
>   ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1
>   [localhost.EXAMPLE.COM:21050] default> explain select * from 
> functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
> t2.id where zeroifnull(t2.int_col) = 0;
>   Query: explain select * from functional.alltypes t1 left outer join 
> functional.alltypestiny t2 on  t1.id = t2.id where zeroifnull(t2.int_col) = 0
>   +------------------------------------------------------------+
>   | Explain String                                             |
>   +------------------------------------------------------------+
>   | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB                  |
>   | Codegen disabled by planner                                |
>   |                                                            |
>   | PLAN-ROOT SINK                                             |
>   | |                                                          |
>   | 04:EXCHANGE [UNPARTITIONED]                                |
>   | |                                                          |
>   | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                  |
>   | |  hash predicates: t1.id = t2.id                          |
>   | |  other predicates: zeroifnull(t2.int_col) = 0            |
>   | |  row-size=178B cardinality=7.30K                         |
>   | |                                                          |
>   | |--03:EXCHANGE [BROADCAST]                                 |
>   | |  |                                                       |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]               |
>   | |     HDFS partitions=4/4 files=4 size=460B                |
>   | |     row-size=89B cardinality=8                           |
>   | |                                                          |
>   | 00:SCAN HDFS [functional.alltypes t1]                      |
>   |    HDFS partitions=24/24 files=24 size=478.45KB            |
>   |    row-size=89B cardinality=7.30K                          |
>   +------------------------------------------------------------+
>   Fetched 22 row(s) in 0.05s
>   [localhost.EXAMPLE.COM:21050] default> explain select * from 
> functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
> t2.id where t2.int_col = 0;
>   Query: explain select * from functional.alltypes t1 left outer join 
> functional.alltypestiny t2 on  t1.id = t2.id where t2.int_col = 0
>   +------------------------------------------------------------+
>   | Explain String                                             |
>   +------------------------------------------------------------+
>   | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB                  |
>   | Codegen disabled by planner                                |
>   |                                                            |
>   | PLAN-ROOT SINK                                             |
>   | |                                                          |
>   | 04:EXCHANGE [UNPARTITIONED]                                |
>   | |                                                          |
>   | 02:HASH JOIN [INNER JOIN, BROADCAST]                       |
>   | |  hash predicates: t1.id = t2.id                          |
>   | |  runtime filters: RF000 <- t2.id                         |
>   | |  row-size=178B cardinality=4                             |
>   | |                                                          |
>   | |--03:EXCHANGE [BROADCAST]                                 |
>   | |  |                                                       |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]               |
>   | |     HDFS partitions=4/4 files=4 size=460B                |
>   | |     predicates: t2.int_col = 0                           |
>   | |     row-size=89B cardinality=4                           |
>   | |                                                          |
>   | 00:SCAN HDFS [functional.alltypes t1]                      |
>   |    HDFS partitions=24/24 files=24 size=478.45KB            |
>   |    runtime filters: RF000 -> t1.id                         |
>   |    row-size=89B cardinality=7.30K                          |
>   +------------------------------------------------------------+
>   Fetched 24 row(s) in 0.05s

Yeah exactly I was thinking something like:

t1 join t2 join t3 join t3 with a coalesce(t1, t2, t3, t4) I'll try to mock up 
an example, but it isn't relevant for this fix.

Nice job!


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 07 Dec 2020 19:48:11 +0000
Gerrit-HasComments: No

Reply via email to