[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Dec 2020 03:15:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false if y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Examples of expressions that may be affected by this change are COALESCE and ISNULL. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Reviewed-on: http://gerrit.cloudera.org:8080/16622 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/query_test/test_queries.py 6 files changed, 144 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6733/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 21:33:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Code-Review+1 > Patch Set 5: > > Updated the commit message as requested. > > Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at > least this case correctly - it does call isTrueWithNullSlots() on the > expression. I guess it's possible that it might handle more complex > expressions incorrectly, e.g. if the expression has slots from both sides of > the join and is false when all slots are null but true if a subset of slots > is null. > > > > [localhost.EXAMPLE.COM:21050] default> set > ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1; > ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1 > [localhost.EXAMPLE.COM:21050] default> explain select * from > functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = > t2.id where zeroifnull(t2.int_col) = 0; > Query: explain select * from functional.alltypes t1 left outer join > functional.alltypestiny t2 on t1.id = t2.id where zeroifnull(t2.int_col) = 0 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 | > | Per-Host Resource Estimates: Memory=163MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 04:EXCHANGE [UNPARTITIONED]| > | | | > | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | > | | hash predicates: t1.id = t2.id | > | | other predicates: zeroifnull(t2.int_col) = 0| > | | row-size=178B cardinality=7.30K | > | | | > | |--03:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | HDFS partitions=4/4 files=4 size=460B| > | | row-size=89B cardinality=8 | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |HDFS partitions=24/24 files=24 size=478.45KB| > |row-size=89B cardinality=7.30K | > ++ > Fetched 22 row(s) in 0.05s > [localhost.EXAMPLE.COM:21050] default> explain select * from > functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = > t2.id where t2.int_col = 0; > Query: explain select * from functional.alltypes t1 left outer join > functional.alltypestiny t2 on t1.id = t2.id where t2.int_col = 0 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 | > | Per-Host Resource Estimates: Memory=163MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 04:EXCHANGE [UNPARTITIONED]| > | | | > | 02:HASH JOIN [INNER JOIN, BROADCAST] | > | | hash predicates: t1.id = t2.id | > | | runtime filters: RF000 <- t2.id | > | | row-size=178B cardinality=4 | > | | | > | |--03:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | HDFS partitions=4/4 files=4 size=460B| > | | predicates: t2.int_col = 0 | > | | row-size=89B cardinality=4 | > | | | > | 00:SCAN HDFS [functional.alltypes t1] | > |HDFS partitions=24/24 files=24 size=478.45KB| > |run
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 18 Nov 2020 17:40:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 6: Ping on this review -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Nov 2020 17:33:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 5: Updated the commit message as requested. Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at least this case correctly - it does call isTrueWithNullSlots() on the expression. I guess it's possible that it might handle more complex expressions incorrectly, e.g. if the expression has slots from both sides of the join and is false when all slots are null but true if a subset of slots is null. [localhost.EXAMPLE.COM:21050] default> set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1; ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1 [localhost.EXAMPLE.COM:21050] default> explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = t2.id where zeroifnull(t2.int_col) = 0; Query: explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = t2.id where zeroifnull(t2.int_col) = 0 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 | | Per-Host Resource Estimates: Memory=163MB | | Codegen disabled by planner| || | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED]| | | | | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | | | hash predicates: t1.id = t2.id | | | other predicates: zeroifnull(t2.int_col) = 0| | | row-size=178B cardinality=7.30K | | | | | |--03:EXCHANGE [BROADCAST] | | | | | | | 01:SCAN HDFS [functional.alltypestiny t2] | | | HDFS partitions=4/4 files=4 size=460B| | | row-size=89B cardinality=8 | | | | | 00:SCAN HDFS [functional.alltypes t1] | |HDFS partitions=24/24 files=24 size=478.45KB| |row-size=89B cardinality=7.30K | ++ Fetched 22 row(s) in 0.05s [localhost.EXAMPLE.COM:21050] default> explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = t2.id where t2.int_col = 0; Query: explain select * from functional.alltypes t1 left outer join functional.alltypestiny t2 on t1.id = t2.id where t2.int_col = 0 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 | | Per-Host Resource Estimates: Memory=163MB | | Codegen disabled by planner| || | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED]| | | | | 02:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: t1.id = t2.id | | | runtime filters: RF000 <- t2.id | | | row-size=178B cardinality=4 | | | | | |--03:EXCHANGE [BROADCAST] | | | | | | | 01:SCAN HDFS [functional.alltypestiny t2] | | | HDFS partitions=4/4 files=4 size=460B| | | predicates: t2.int_col = 0 | | | row-size=89B cardinality=4 | | | | | 00:SCAN HDFS [functional.alltypes t1] | |HDFS partitions=24/24 files=24 size=478.45KB| |runtime filters: RF000 -> t1.id | |row-size=89B cardinality=7.30K | ++
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Hello Aman Sinha, Qifan Chen, Shant Hovsepian, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16622 to look at the new patch set (#6). Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false if y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Examples of expressions that may be affected by this change are COALESCE and ISNULL. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/query_test/test_queries.py 6 files changed, 144 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/6 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 5: Code-Review+1 Good catch! Made me realize we likely have a bug with the LOJ -> INNER rewrite that is similar. nit: Mention coalesce() in the commit message as it is likely the most used null preserving function and would help when doing a search in case there is a change in behavior. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 31 Oct 2020 01:26:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. > https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches descr Okay. The tuple level nullability indicator in the plan is very helpful. I like it a lot. For this special case when prepare the filter for a join predicate non null-rejecting, I guess we need an extra step to compute the filter value for a null row from the inner due to the outer join. This is almost like we need to an expression on top of 4 to expand tuple 1 to 1N, where N is for the null row described above. The expression probably can take care of the descriptor clone. Anyway, I am very fine with the current fix. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Oct 2020 17:46:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. > Okay. Thanks a lot for trying the NULL row (to the filter) method. Yes, my https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches describes the physical layout of the row batches. So each row is a composition of pointers to tuples. A join produces rows with a new layout which is the concatenation of the tuples of the input rows (except semi-joins, which retain the row layout from the outer). You can see that in the plan I included below, where a left join produces a row composed of tuples 1 and 2 from the inputs: tuple-ids=0,1N An unmatched row from an outer join is represented by a NULL tuple pointer. Nullability is represented in the row descriptor as an extra flag (that's the N in 1N above). Most tuples are non-nullable - that's only introduced by outer joins and aggregations with multiple agg classes IIRC. The runtime filter expressions in this example would be evaluated over the input row produced by operator 04, which has a single non-nullable tuple 1. The problem is that each expression tree in the planner is specific to a particular row layout, so to evaluate a runtime filter expression over a row with layout 1N instead of layout 1, we'd need to generate a new RowDescriptor, then clone the runtime filter expression and fixed it up by replacing all the SlotRefs with SlotRefs referencing the new row descriptor. I think we would also need to add some TupleIsNull() predicates in order to correctly handle the nullability (since SlotRef only handles the slot-level nullability, I think). Then we'd have to plumb the expression through to the backend so it can be evaluated. > explain select count(*) from functional.alltypes t1 left join functional.alltypestiny t2 on t1.id = t2.id; Query: explain select count(*) from functional.alltypes t1 left join functional.alltypestiny t2 on t1.id = t2.id +-+ | Explain String | +-+ | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 | | Per-Host Resource Estimates: Memory=214MB | | Codegen disabled by planner | | Analyzed query: SELECT count(*) FROM functional.alltypes t1 LEFT OUTER JOIN | | functional.alltypestiny t2 ON t1.id = t2.id | | | | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B thread-reservation=1 | | PLAN-ROOT SINK | | | output exprs: count(*) | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | | | 06:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=2 row-size=8B cardinality=1 | | | in pipelines: 06(GETNEXT), 03(OPEN) | | | | | 05:EXCHANGE [UNPARTITIONED] | | | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=2 row-size=8B cardinality=1 | | | in pipelines: 03(GETNEXT) | | | | | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Host Resources: mem-estimate=171.94MB mem-reservation=1.97MB thread-reservation=2 | | 03
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. > > I wonder if the root cause is that the null rows from the inner do not pa Okay. Thanks a lot for trying the NULL row (to the filter) method. Yes, my question was motivated from the current fix by evaluating the expression with all nulls. Can you please explain the non-nullable slots/tuples in the inner? I thought for outer joins, the inner row produced must be nullable. Done. http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373 PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() || : filterSrcNode.getJoinOp().isFullOuterJoin()) > This is done towards the end of distributed planning when the join order an Good point. Thanks. Done. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 27 Oct 2020 15:42:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7559/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 26 Oct 2020 17:56:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13 PS4, Line 13: is > nit: if Done http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. > I wonder if the root cause is that the null rows from the inner do not > participate in the filtering during run-time, The NULL rows from the inner are only created when there is an unmatched row from the outer (they're just represented as a NULL pointer). That happens after we've already built the runtime filter, so the rows don't really exist in a way that they could be included in the filter. The algorithm for building the filters is basically: for row in inner: add row to hash table for filter in produced filters: evaluate filter expression and add to filter I tried an alternative solution which was to add an extra step to the end: if join is left outer or full outer: construct NULL row evaluate filter expression over NULL row and add to filter The solution works in theory, but it turns out that constructing the NULL row is not trivial if the RowDescriptor for the inner has non-nullable slots/tuples, so we'd need to generate a new RowDescriptor and fix up expressions to reference it. I think you're touching on something else, which is that NULL values are not included in the filter. That isn't the root cause here, I believe we handle that correctly. Specifically, we can't use runtime filters for IS NOT DISTINCT FROM equi-join predicates, and those are rejected by SingleNodePlanner.getNormalizedEqPred(). http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292 PS4, Line 2292: slots > nit: with all NULL slots into a literal. Done http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293 PS4, Line 2293: the expression > nit: the literal. Done http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373 PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() || : filterSrcNode.getJoinOp().isFullOuterJoin()) > I wonder if we need to add the similar guard for the right outer join too. This is done towards the end of distributed planning when the join order and distribution modes are already fixed. Right outer joins can be present here, but don't need any special handling because the filters are always sent from the inner to the outer and it's valid to filter any rows from the outer that don't match the inner. http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@388 PS4, Line 388: and and > nit: duplication. Done -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 26 Oct 2020 17:44:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16622 to look at the new patch set (#5). Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false if y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/query_test/test_queries.py 6 files changed, 144 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/5 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 4: (6 comments) Looks good to me! http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13 PS4, Line 13: is nit: if http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17 PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL. I wonder if the root cause is that the null rows from the inner do not participate in the filtering during run-time, as otherwise, the result should be correct. In the test below, the values to be sent to the outer is isnull(id, 1). Query: explain SELECT s.id, s.int_col % 2, v.id, zeroifnull(v.id + 1) FROM functional_parquet.alltypessmall s LEFT OUTER JOIN ( SELECT id, int_col FROM functional_parquet.alltypestiny t) v ON s.id = v.id WHERE s.int_col = isnull(v.id, 1) ORDER BY s.id ASC LIMIT 10 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=3.96MB Threads=6 | | Per-Host Resource Estimates: Memory=52MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | functional_parquet.alltypessmall, functional_parquet.alltypestiny | | | | PLAN-ROOT SINK | | | | | 06:MERGING-EXCHANGE [UNPARTITIONED] | | | order by: id ASC | | | limit: 10 | | | | | 03:TOP-N [LIMIT=10] | | | order by: id ASC | | | row-size=12B cardinality=10 | | | | | 02:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | | | hash predicates: s.id = id | | | other predicates: s.int_col = isnull(id, 1) | | | runtime filters: RF000 <- isnull(id, 1) | | | row-size=12B cardinality=939 | | | | | |--05:EXCHANGE [HASH(id)] | | | | | | | 01:SCAN HDFS [functional_parquet.alltypestiny t] | | | HDFS partitions=4/4 files=4 size=11.92KB | | | row-size=4B cardinality=758 | | | | | 04:EXCHANGE [HASH(s.id)] | | | | | 00:SCAN HDFS [functional_parquet.alltypessmall s] | |HDFS partitions=4/4 files=4 size=14.76KB | |runtime filters: RF000 -> s.int_col | |row-size=8B cardinality=939 | ++ Fetched 33 row(s) in 0.02s http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292 PS4, Line 2292: slots nit: with all NULL slots into a literal. http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293 PS4, Line 2293: the expression nit: the literal. http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/or
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7525/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 22 Oct 2020 19:25:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7524/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 22 Oct 2020 19:24:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7523/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 22 Oct 2020 19:20:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16622 to look at the new patch set (#4). Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false is y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/query_test/test_queries.py 6 files changed, 144 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/4 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379 PS2, Line 379: // input then it is not safe to generate a runtime filter from a LEFT OUTER JOIN line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@382 PS2, Line 382: // E.g. If the source expression is zeroifnull(y), column y has values [1, 2, 3], line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 22 Oct 2020 19:02:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16622 to look at the new patch set (#3). Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false is y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 5 files changed, 144 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/3 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins
Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16622 ) Change subject: IMPALA-10252: fix invalid runtime filters for outer joins .. IMPALA-10252: fix invalid runtime filters for outer joins The planner generates runtime filters for non-join conjuncts assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is correct in many cases where NULLs stemming from unmatched rows would result in the predicate evaluating to false. E.g. x = y is always false is y is NULL. However, it is incorrect if the NULL returned from the unmatched row can result in the predicate evaluating to true. E.g. x = isnull(y, 1) can return true even if y is NULL. The fix is to detect cases when the source expression from the left input of the join returns non-NULL for null inputs and then skip generating the filter. Testing: Added regression tests: * Planner tests for LEFT OUTER and FULL OUTER where the runtime filter was incorrectly generated before this patch. * Enabled end-to-end test that was previously failing. * Added a new runtime filter test that will execute on both Parquet and Kudu (which are subtly different because of nullability of slots). Ran exhaustive tests. Change-Id: I507af1cc8df15bca21e0d8555019997812087261 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 5 files changed, 144 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/2 -- To view, visit http://gerrit.cloudera.org:8080/16622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261 Gerrit-Change-Number: 16622 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong