wangyum opened a new pull request, #42180:
URL: https://github.com/apache/spark/pull/42180
### What changes were proposed in this pull request?
This PR adds `OptimizeOneRowRelationSubquery` in batch of `Subquery`.
### Why are the changes needed?
To further optimize the query. Currently, `OptimizeOneRowRelationSubquery`
cannot optimize the subquery if an optimizable filter exists. For example:
```sql
CREATE temporary VIEW v1
AS
SELECT id, 'foo' AS kind FROM (SELECT 1 AS id) t;
CREATE temporary VIEW v2
AS
SELECT * FROM v1 WHERE kind = (SELECT kind FROM v1 WHERE kind = 'foo');
EXPLAIN EXTENDED SELECT * FROM v1 JOIN v2 ON v1.id = v2.id;
```
Before this PR:
```
== Optimized Logical Plan ==
Join Inner
:- Project [1 AS id#18, foo AS kind#19]
: +- OneRowRelation
+- Project [1 AS id#21, foo AS kind#22]
+- Filter (foo = scalar-subquery#20 [])
: +- Project [foo AS kind#30]
: +- OneRowRelation
+- OneRowRelation
```
After this PR:
```
== Optimized Logical Plan ==
Join Inner
:- Project [1 AS id#253, foo AS kind#254]
: +- OneRowRelation
+- Project [1 AS id#256, foo AS kind#257]
+- OneRowRelation
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]