manuzhang opened a new pull request, #37074:
URL: https://github.com/apache/spark/pull/37074
### What changes were proposed in this pull request?
Don't remove the Project before Filter in `ColumnPruning` when the Filter
expression contains an IN or correlated EXISTS subquery. Otherwise,
`AnalysisException: Found conflicting attributes` would be thrown in
`RewritePredicateSubquery`.
### Why are the changes needed?
This is a legitimate self-join query and should not throw exception when
de-duplicating attributes in subquery and outer values.
```sql
select * from
(
select v1.a, v1.b, v2.c
from v1
inner join v2
on v1.a=v2.a) t3
where not exists (
select 1
from v2
where t3.a=v2.a and t3.b=v2.b and t3.c=v2.c
)
```
Here's what happens with the current code. The above query is analyzed into
following `LogicalPlan` before `ColumnPruning`.
```
Project [a#250, b#251, c#268]
+- Filter NOT exists#272 [(a#250 = a#266) && (b#251 = b#267) && (c#268 =
c#268#277)]
: +- Project [1 AS 1#273, _1#259 AS a#266, _2#260 AS b#267, _3#261 AS
c#268#277]
: +- LocalRelation [_1#259, _2#260, _3#261]
+- Project [a#250, b#251, c#268]
+- Join Inner, (a#250 = a#266)
:- Project [a#250, b#251]
: +- Project [_1#243 AS a#250, _2#244 AS b#251]
: +- LocalRelation [_1#243, _2#244, _3#245]
+- Project [a#266, c#268]
+- Project [_1#259 AS a#266, _3#261 AS c#268]
+- LocalRelation [_1#259, _2#260, _3#261]
```
Then in `ColumnPruning`, the Project before Filter (`Project [a#250, b#251,
c#268]`) is removed. This changes the `outputSet` of the child of Filter.
Later, when `RewritePredicateSubquery` de-duplicates conflicting attributes, it
would complain `Found conflicting attributes a#266 in the condition joining
outer plan`.
Hence, this PR proposes not to remove the Project before Filter for such SQL
with IN or correlated EXISTS subquery.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add UT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]