[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Dec 2020 03:15:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false if y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Examples of expressions that may be affected by this change are
COALESCE and ISNULL.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.

Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Reviewed-on: http://gerrit.cloudera.org:8080/16622
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M tests/query_test/test_queries.py
6 files changed, 144 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6733/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Dec 2020 21:34:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Dec 2020 21:33:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-12-07 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 6: Code-Review+1

> Patch Set 5:
>
> Updated the commit message as requested.
>
> Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at 
> least this case correctly - it does call isTrueWithNullSlots() on the 
> expression. I guess it's possible that it might handle more complex 
> expressions incorrectly, e.g. if the expression has slots from both sides of 
> the join and is false when all slots are null but true if a subset of slots 
> is null.
>
>
>
>   [localhost.EXAMPLE.COM:21050] default> set 
> ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1;
>   ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1
>   [localhost.EXAMPLE.COM:21050] default> explain select * from 
> functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
> t2.id where zeroifnull(t2.int_col) = 0;
>   Query: explain select * from functional.alltypes t1 left outer join 
> functional.alltypestiny t2 on  t1.id = t2.id where zeroifnull(t2.int_col) = 0
>   ++
>   | Explain String |
>   ++
>   | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB  |
>   | Codegen disabled by planner|
>   ||
>   | PLAN-ROOT SINK |
>   | |  |
>   | 04:EXCHANGE [UNPARTITIONED]|
>   | |  |
>   | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]  |
>   | |  hash predicates: t1.id = t2.id  |
>   | |  other predicates: zeroifnull(t2.int_col) = 0|
>   | |  row-size=178B cardinality=7.30K |
>   | |  |
>   | |--03:EXCHANGE [BROADCAST] |
>   | |  |   |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]   |
>   | | HDFS partitions=4/4 files=4 size=460B|
>   | | row-size=89B cardinality=8   |
>   | |  |
>   | 00:SCAN HDFS [functional.alltypes t1]  |
>   |HDFS partitions=24/24 files=24 size=478.45KB|
>   |row-size=89B cardinality=7.30K  |
>   ++
>   Fetched 22 row(s) in 0.05s
>   [localhost.EXAMPLE.COM:21050] default> explain select * from 
> functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
> t2.id where t2.int_col = 0;
>   Query: explain select * from functional.alltypes t1 left outer join 
> functional.alltypestiny t2 on  t1.id = t2.id where t2.int_col = 0
>   ++
>   | Explain String |
>   ++
>   | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 |
>   | Per-Host Resource Estimates: Memory=163MB  |
>   | Codegen disabled by planner|
>   ||
>   | PLAN-ROOT SINK |
>   | |  |
>   | 04:EXCHANGE [UNPARTITIONED]|
>   | |  |
>   | 02:HASH JOIN [INNER JOIN, BROADCAST]   |
>   | |  hash predicates: t1.id = t2.id  |
>   | |  runtime filters: RF000 <- t2.id |
>   | |  row-size=178B cardinality=4 |
>   | |  |
>   | |--03:EXCHANGE [BROADCAST] |
>   | |  |   |
>   | |  01:SCAN HDFS [functional.alltypestiny t2]   |
>   | | HDFS partitions=4/4 files=4 size=460B|
>   | | predicates: t2.int_col = 0   |
>   | | row-size=89B cardinality=4   |
>   | |  |
>   | 00:SCAN HDFS [functional.alltypes t1]  |
>   |HDFS partitions=24/24 files=24 size=478.45KB|
>   |run

[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-11-18 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Nov 2020 17:40:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 6:

Ping on this review


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 17 Nov 2020 17:33:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-11-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 5:

Updated the commit message as requested.

Shant, I think hasNullRejectingConjucts (sp) in Analyzer.java handles at least 
this case correctly - it does call isTrueWithNullSlots() on the expression. I 
guess it's possible that it might handle more complex expressions incorrectly, 
e.g. if the expression has slots from both sides of the join and is false when 
all slots are null but true if a subset of slots is null.



  [localhost.EXAMPLE.COM:21050] default> set 
ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION=1;
  ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION set to 1
  [localhost.EXAMPLE.COM:21050] default> explain select * from 
functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
t2.id where zeroifnull(t2.int_col) = 0;
  Query: explain select * from functional.alltypes t1 left outer join 
functional.alltypestiny t2 on  t1.id = t2.id where zeroifnull(t2.int_col) = 0
  ++
  | Explain String |
  ++
  | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5 |
  | Per-Host Resource Estimates: Memory=163MB  |
  | Codegen disabled by planner|
  ||
  | PLAN-ROOT SINK |
  | |  |
  | 04:EXCHANGE [UNPARTITIONED]|
  | |  |
  | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]  |
  | |  hash predicates: t1.id = t2.id  |
  | |  other predicates: zeroifnull(t2.int_col) = 0|
  | |  row-size=178B cardinality=7.30K |
  | |  |
  | |--03:EXCHANGE [BROADCAST] |
  | |  |   |
  | |  01:SCAN HDFS [functional.alltypestiny t2]   |
  | | HDFS partitions=4/4 files=4 size=460B|
  | | row-size=89B cardinality=8   |
  | |  |
  | 00:SCAN HDFS [functional.alltypes t1]  |
  |HDFS partitions=24/24 files=24 size=478.45KB|
  |row-size=89B cardinality=7.30K  |
  ++
  Fetched 22 row(s) in 0.05s
  [localhost.EXAMPLE.COM:21050] default> explain select * from 
functional.alltypes t1 left outer join functional.alltypestiny t2 on  t1.id = 
t2.id where t2.int_col = 0;
  Query: explain select * from functional.alltypes t1 left outer join 
functional.alltypestiny t2 on  t1.id = t2.id where t2.int_col = 0
  ++
  | Explain String |
  ++
  | Max Per-Host Resource Reservation: Memory=2.98MB Threads=5 |
  | Per-Host Resource Estimates: Memory=163MB  |
  | Codegen disabled by planner|
  ||
  | PLAN-ROOT SINK |
  | |  |
  | 04:EXCHANGE [UNPARTITIONED]|
  | |  |
  | 02:HASH JOIN [INNER JOIN, BROADCAST]   |
  | |  hash predicates: t1.id = t2.id  |
  | |  runtime filters: RF000 <- t2.id |
  | |  row-size=178B cardinality=4 |
  | |  |
  | |--03:EXCHANGE [BROADCAST] |
  | |  |   |
  | |  01:SCAN HDFS [functional.alltypestiny t2]   |
  | | HDFS partitions=4/4 files=4 size=460B|
  | | predicates: t2.int_col = 0   |
  | | row-size=89B cardinality=4   |
  | |  |
  | 00:SCAN HDFS [functional.alltypes t1]  |
  |HDFS partitions=24/24 files=24 size=478.45KB|
  |runtime filters: RF000 -> t1.id |
  |row-size=89B cardinality=7.30K  |
  ++

[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-11-02 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Qifan Chen, Shant Hovsepian, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16622

to look at the new patch set (#6).

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false if y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Examples of expressions that may be affected by this change are
COALESCE and ISNULL.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.

Change-Id: I507af1cc8df15bca21e0d8555019997812087261
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M tests/query_test/test_queries.py
6 files changed, 144 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/6
--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-30 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 5: Code-Review+1

Good catch! Made me realize we likely have a bug with the LOJ -> INNER rewrite 
that is similar.

nit: Mention coalesce() in the commit message as it is likely the most used 
null preserving function and would help when doing a search in case there is a 
change in behavior.


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 31 Oct 2020 01:26:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-27 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches descr
Okay. The tuple level nullability indicator in the plan is very helpful. I like 
it a lot.

For this special case when prepare the filter for a join predicate non 
null-rejecting, I guess we need an extra step to compute the filter value for a 
null row from the inner due to the outer join.

This is almost like we need to an expression on top of 4 to expand tuple 1 to 
1N, where N is for the null row described above. The expression probably can 
take care of the descriptor clone.

Anyway, I am very fine with the current fix.



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Oct 2020 17:46:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> Okay. Thanks a lot for trying the NULL row (to the filter) method. Yes, my
https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches describes 
the physical layout of the row batches.

So each row is a composition of pointers to tuples. A join produces rows with a 
new layout which is the concatenation of the tuples of the input rows (except 
semi-joins, which retain the row layout from the outer). You can see that in 
the plan I included below, where a left join produces a row composed of tuples 
1 and 2 from the inputs: tuple-ids=0,1N

An unmatched row from an outer join is represented by a NULL tuple pointer. 
Nullability is represented in the row descriptor as an extra flag (that's the N 
in 1N above). Most tuples are non-nullable - that's only introduced by outer 
joins and aggregations with multiple agg classes IIRC.

The runtime filter expressions in this example would be evaluated over the 
input row produced by operator 04, which has a single non-nullable tuple 1.

The problem is that each expression tree in the planner is specific to a 
particular row layout, so to evaluate a runtime filter expression over a row 
with layout 1N instead of layout 1, we'd need to generate a new RowDescriptor, 
then clone the runtime filter expression and fixed it up by replacing all the 
SlotRefs with SlotRefs referencing the new row descriptor. I think we would 
also need to add some TupleIsNull() predicates in order to correctly handle the 
nullability (since SlotRef only handles the slot-level nullability, I think). 
Then we'd have to plumb the expression through to the backend so it can be 
evaluated.

  > explain select count(*) from functional.alltypes t1 left join 
functional.alltypestiny t2 on t1.id = t2.id;
  Query: explain select count(*) from functional.alltypes t1 left join 
functional.alltypestiny t2 on t1.id = t2.id
  
+-+
  | Explain String  
|
  
+-+
  | Max Per-Host Resource Reservation: Memory=1.98MB Threads=5  
|
  | Per-Host Resource Estimates: Memory=214MB   
|
  | Codegen disabled by planner 
|
  | Analyzed query: SELECT count(*) FROM functional.alltypes t1 LEFT OUTER JOIN 
|
  | functional.alltypestiny t2 ON t1.id = t2.id 
|
  | 
|
  | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
|
  | |  Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B 
thread-reservation=1 |
  | PLAN-ROOT SINK  
|
  | |  output exprs: count(*)   
|
  | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
|
  | |   
|
  | 06:AGGREGATE [FINALIZE] 
|
  | |  output: count:merge(*)   
|
  | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0 |
  | |  tuple-ids=2 row-size=8B cardinality=1
|
  | |  in pipelines: 06(GETNEXT), 03(OPEN)  
|
  | |   
|
  | 05:EXCHANGE [UNPARTITIONED] 
|
  | |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 
|
  | |  tuple-ids=2 row-size=8B cardinality=1
|
  | |  in pipelines: 03(GETNEXT)
|
  | |   
|
  | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
|
  | Per-Host Resources: mem-estimate=171.94MB mem-reservation=1.97MB 
thread-reservation=2   |
  | 03

[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-27 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 5: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> > I wonder if the root cause is that the null rows from the inner do not pa
Okay. Thanks a lot for trying the NULL row (to the filter) method. Yes, my 
question was motivated from the current fix by evaluating the expression with 
all nulls. Can you please explain the non-nullable slots/tuples in the inner?  
I thought for outer joins, the inner row produced must be nullable.

Done.


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373
PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() ||
 :   filterSrcNode.getJoinOp().isFullOuterJoin())
> This is done towards the end of distributed planning when the join order an
Good point. Thanks.

Done.



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 27 Oct 2020 15:42:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7559/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 26 Oct 2020 17:56:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13
PS4, Line 13: is
> nit: if
Done


http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
> I wonder if the root cause is that the null rows from the inner do not 
> participate in the filtering during run-time,

The NULL rows from the inner are only created when there is an unmatched row 
from the outer (they're just represented as a NULL pointer). That happens after 
we've already built the runtime filter, so the rows don't really exist in a way 
that they could be included in the filter.

The algorithm for building the filters is basically:

  for row in inner:
add row to hash table
for filter in produced filters:
  evaluate filter expression and add to filter

I tried an alternative solution which was to add an extra step to the end:

  if join is left outer or full outer:
construct NULL row
evaluate filter expression over NULL row and add to filter

The solution works in theory, but it turns out that constructing the NULL row 
is not trivial if the RowDescriptor for the inner has non-nullable 
slots/tuples, so we'd need to generate a new RowDescriptor and fix up 
expressions to reference it.

I think you're touching on something else, which is that NULL values are not 
included in the filter. That isn't the root cause here, I believe we handle 
that correctly. Specifically, we can't use runtime filters for IS NOT DISTINCT 
FROM equi-join predicates, and those are rejected by 
SingleNodePlanner.getNormalizedEqPred().


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292
PS4, Line 2292: slots
> nit: with all NULL slots into a literal.
Done


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293
PS4, Line 2293: the expression
> nit: the literal.
Done


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@373
PS4, Line 373: filterSrcNode.getJoinOp().isLeftOuterJoin() ||
 :   filterSrcNode.getJoinOp().isFullOuterJoin())
> I wonder if we need to add the similar guard for the right outer join too.
This is done towards the end of distributed planning when the join order and 
distribution modes are already fixed. Right outer joins can be present here, 
but don't need any special handling because the filters are always sent from 
the inner to the outer and it's valid to filter any rows from the outer that 
don't match the inner.


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@388
PS4, Line 388: and and
> nit: duplication.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 26 Oct 2020 17:44:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-26 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16622

to look at the new patch set (#5).

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false if y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.

Change-Id: I507af1cc8df15bca21e0d8555019997812087261
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M tests/query_test/test_queries.py
6 files changed, 144 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/5
--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-26 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 4:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@13
PS4, Line 13: is
nit: if


http://gerrit.cloudera.org:8080/#/c/16622/4//COMMIT_MSG@17
PS4, Line 17: x = isnull(y, 1) can return true even if y is NULL.
I wonder if the root cause is that the null rows from the inner do not 
participate in the filtering during run-time, as otherwise, the result should 
be correct.

In the test below, the values to be sent to the outer is isnull(id, 1).

Query: explain SELECT s.id, s.int_col % 2, v.id, zeroifnull(v.id + 1)
FROM functional_parquet.alltypessmall s LEFT OUTER JOIN (
SELECT id, int_col
FROM functional_parquet.alltypestiny t) v ON s.id = v.id
WHERE
s.int_col  = isnull(v.id, 1)

ORDER BY s.id ASC
LIMIT 10
++
| Explain String
 |
++
| Max Per-Host Resource Reservation: Memory=3.96MB Threads=6
 |
| Per-Host Resource Estimates: Memory=52MB  
 |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| functional_parquet.alltypessmall, functional_parquet.alltypestiny 
 |
|   
 |
| PLAN-ROOT SINK
 |
| | 
 |
| 06:MERGING-EXCHANGE [UNPARTITIONED]   
 |
| |  order by: id ASC   
 |
| |  limit: 10  
 |
| | 
 |
| 03:TOP-N [LIMIT=10]   
 |
| |  order by: id ASC   
 |
| |  row-size=12B cardinality=10
 |
| | 
 |
| 02:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]   
 |
| |  hash predicates: s.id = id 
 |
| |  other predicates: s.int_col = isnull(id, 1)
 |
| |  runtime filters: RF000 <- isnull(id, 1)
 |
| |  row-size=12B cardinality=939   
 |
| | 
 |
| |--05:EXCHANGE [HASH(id)] 
 |
| |  |  
 |
| |  01:SCAN HDFS [functional_parquet.alltypestiny t]   
 |
| | HDFS partitions=4/4 files=4 size=11.92KB
 |
| | row-size=4B cardinality=758 
 |
| | 
 |
| 04:EXCHANGE [HASH(s.id)]  
 |
| | 
 |
| 00:SCAN HDFS [functional_parquet.alltypessmall s] 
 |
|HDFS partitions=4/4 files=4 size=14.76KB   
 |
|runtime filters: RF000 -> s.int_col
 |
|row-size=8B cardinality=939
 |
++
Fetched 33 row(s) in 0.02s


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2292
PS4, Line 2292: slots
nit: with all NULL slots into a literal.


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2293
PS4, Line 2293: the expression
nit: the literal.


http://gerrit.cloudera.org:8080/#/c/16622/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/or

[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7525/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 22 Oct 2020 19:25:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7524/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 22 Oct 2020 19:24:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7523/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 22 Oct 2020 19:20:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16622

to look at the new patch set (#4).

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false is y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.
Change-Id: I507af1cc8df15bca21e0d8555019997812087261
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M tests/query_test/test_queries.py
6 files changed, 144 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/4
--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379
PS2, Line 379: // input then it is not safe to generate a runtime 
filter from a LEFT OUTER JOIN
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16622/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@382
PS2, Line 382: // E.g. If the source expression is zeroifnull(y), 
column y has values [1, 2, 3],
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 22 Oct 2020 19:02:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16622

to look at the new patch set (#3).

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false is y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.
Change-Id: I507af1cc8df15bca21e0d8555019997812087261
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
5 files changed, 144 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/3
--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10252: fix invalid runtime filters for outer joins

2020-10-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16622 )

Change subject: IMPALA-10252: fix invalid runtime filters for outer joins
..

IMPALA-10252: fix invalid runtime filters for outer joins

The planner generates runtime filters for non-join conjuncts
assigned to LEFT OUTER and FULL OUTER JOIN nodes. This is
correct in many cases where NULLs stemming from unmatched rows
would result in the predicate evaluating to false. E.g.
x = y is always false is y is NULL.

However, it is incorrect if the NULL returned from the unmatched
row can result in the predicate evaluating to true. E.g.
x = isnull(y, 1) can return true even if y is NULL.

The fix is to detect cases when the source expression from the
left input of the join returns non-NULL for null inputs and then
skip generating the filter.

Testing:
Added regression tests:
* Planner tests for LEFT OUTER and FULL OUTER where the runtime
  filter was incorrectly generated before this patch.
* Enabled end-to-end test that was previously failing.
* Added a new runtime filter test that will execute on both
  Parquet and Kudu (which are subtly different because of nullability of
  slots).

Ran exhaustive tests.
Change-Id: I507af1cc8df15bca21e0d8555019997812087261
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
5 files changed, 144 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16622/2
--
To view, visit http://gerrit.cloudera.org:8080/16622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I507af1cc8df15bca21e0d8555019997812087261
Gerrit-Change-Number: 16622
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong