peter-toth opened a new pull request #32298:
URL: https://github.com/apache/spark/pull/32298
### What changes were proposed in this pull request?
This PR:
- Adds a new subquery type `MultiScalarSubquery` / `MultiScalarSubqueryExec`
to compute multiple scalar values at the same time.
- Adds a new optimizer rule `MergeScalarSubqueries` to merge similar
non-correlated scalar subqueries into multi-column scalar subqueries and
replaces the original scalar subquery expression to
`GetStructField(MultiScalarSubquery(...))`.
- Lets the `ReuseSubquery` / `ReuseAdaptiveSubquery` rules to replace
multiple instances of the same `MultiScalarSubquery` to reuse references to
make sure a `MultiScalarSubquery` runs only once.
E.g. the following query:
```
SELECT
(SELECT avg(a) FROM t GROUP BY b),
(SELECT sum(b) FROM t GROUP BY b)
```
is optimized from:
```
Project [scalar-subquery#231 [] AS scalarsubquery()#241, scalar-subquery#232
[] AS scalarsubquery()#242L]
: :- Aggregate [b#234], [avg(a#233) AS avg(a)#236]
: : +- Relation default.t[a#233,b#234] parquet
: +- Aggregate [b#240], [sum(b#240) AS sum(b)#238L]
: +- Project [b#240]
: +- Relation default.t[a#239,b#240] parquet
+- OneRowRelation
```
to:
```
Project [multi-scalar-subquery#231.avg(a) AS scalarsubquery()#241,
multi-scalar-subquery#232.sum(b) AS scalarsubquery()#242L]
: :- Aggregate [b#234], [avg(a#233) AS avg(a)#236, sum(b#234) AS
sum(b)#238L]
: : +- Project [a#233, b#234]
: : +- Relation default.t[a#233,b#234] parquet
: +- Aggregate [b#234], [avg(a#233) AS avg(a)#236, sum(b#234) AS
sum(b)#238L]
: +- Project [a#233, b#234]
: +- Relation default.t[a#233,b#234] parquet
+- OneRowRelation
```
### Why are the changes needed?
Performance improvement.
```
TPCDS Snappy: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------
q9 - spark.sql.scalarSubqueyMerge.enabled=false 45892
47172 1220 0.0 Infinity 1.0X
q9 - spark.sql.scalarSubqueyMerge.enabled=true 16769
16863 124 0.0 Infinity 2.7X
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UTs. I will add new ones later...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]