wangyum commented on pull request #33603:
URL: https://github.com/apache/spark/pull/33603#issuecomment-892524996
It seems this is the last rule needs to be considered the `Project`.
These rules may include `Project`: `RemoveRedundantAggregates`,
`CombineFilters` and `PushPredicateThroughJoin`, but they will all be optimized
by other rules. For example:
```sql
create table t1(a int, b int, c int) using parquet;
create table t2(x int, y int, z int) using parquet;
```
```
select x, y from (select a as x, b as y, c as z from t1 group by a, b, c) t
group by x, y
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.CollapseProject ===
Aggregate [x#5, y#6], [x#5, y#6] Aggregate [x#5,
y#6], [x#5, y#6]
!+- Project [x#5, y#6] +- Aggregate
[a#0, b#1, c#2], [a#0 AS x#5, b#1 AS y#6]
! +- Aggregate [a#0, b#1, c#2], [a#0 AS x#5, b#1 AS y#6] +- Relation
default.t1[a#0,b#1,c#2] parquet
! +- Relation default.t1[a#0,b#1,c#2] parquet
```
```
select x, y from (select a as x, b as y, c as z from t1 where a > 1) t where
y > 1
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates
===
Project [x#0, y#1] Project [x#0, y#1]
!+- Filter (y#1 > 1) +- Project [a#3 AS
x#0, b#4 AS y#1, c#5 AS z#2]
! +- Project [a#3 AS x#0, b#4 AS y#1, c#5 AS z#2] +- Filter ((a#3 >
1) AND (b#4 > 1))
! +- Filter (a#3 > 1) +- Relation
default.t1[a#3,b#4,c#5] parquet
! +- Relation default.t1[a#3,b#4,c#5] parquet
```
```
select a2 from (select a as a2, x as x2 from t1 join t2 on t1.a = t2.x) t
where a2 = 1
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates
===
Project [a2#8] Project [a2#8]
!+- Filter (a2#8 = 1) +- Project [a#3
AS a2#8, x#10 AS x2#9]
! +- Project [a#3 AS a2#8, x#10 AS x2#9] +- Join Inner,
(a#3 = x#10)
! +- Join Inner, (a#3 = x#10) :- Filter
(a#3 = 1)
! :- Relation default.t1[a#3,b#4,c#5] parquet : +-
Relation default.t1[a#3,b#4,c#5] parquet
! +- Relation default.t2[x#10,y#11,z#12] parquet +- Relation
default.t2[x#10,y#11,z#12] parquet
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]