allisonwang-db opened a new pull request #32179:
URL: https://github.com/apache/spark/pull/32179
### What changes were proposed in this pull request?
This PR updated the `foundNonEqualCorrelatedPred` logic for correlated
subqueries in `CheckAnalysis` to only allow correlated equality predicates that
guarantee one-to-one mapping between inner and outer attributes, instead of all
equality predicates.
### Why are the changes needed?
To fix correctness bugs. Before this fix Spark can give wrong results for
certain correlated subqueries that pass CheckAnalysis:
Example 1:
```sql
create or replace view t1(c) as values ('a'), ('b')
create or replace view t2(c) as values ('ab'), ('abc'), ('bc')
select c, (select count(*) from t2 where t1.c = substring(t2.c, 1, 1)) from
t1
```
Correct results: [(a, 2), (b, 1)]
Spark results:
```
+---+-----------------+
|c |scalarsubquery(c)|
+---+-----------------+
|a |1 |
|a |1 |
|b |1 |
+---+-----------------+
```
Example 2:
```sql
create or replace view t1(a, b) as values (0, 6), (1, 5), (2, 4), (3, 3);
create or replace view t2(c) as values (6);
select c, (select count(*) from t1 where a + b = c) from t2;
```
Correct results: [(6, 4)]
Spark results:
```
+---+-----------------+
|c |scalarsubquery(c)|
+---+-----------------+
|6 |1 |
|6 |1 |
|6 |1 |
|6 |1 |
+---+-----------------+
```
### Does this PR introduce _any_ user-facing change?
Yes. Users will not be able to run queries that contain unsupported
correlated equality predicates.
### How was this patch tested?
Added unit tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]