AngersZhuuuu commented on a change in pull request #30278:
URL: https://github.com/apache/spark/pull/30278#discussion_r519659900
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1269,6 +1269,7 @@ object PushPredicateThroughNonJoin extends
Rule[LogicalPlan] with PredicateHelpe
case _: Sort => true
case _: BatchEvalPython => true
case _: ArrowEvalPython => true
+ case _: Expand => true
Review comment:
> > seems current master fix this case
>
> What do you mean by "fix this case"?
I have found the pr #29673
Before this pr, SQL
```
SELECT
years,
appversion,
SUM(uusers) AS users
FROM (SELECT
Date_trunc('year', dt) AS years,
CASE
WHEN h.pid = 3 THEN 'iOS'
WHEN h.pid = 4 THEN 'Android'
ELSE 'Other'
END AS viewport,
h.vs AS appversion,
Count(DISTINCT u.uid) AS uusers
,Count(DISTINCT u.suid) AS srcusers
FROM t1 u
join t2 h
ON h.uid = u.uid
GROUP BY 1,
2,
3) AS a
WHERE viewport = 'iOS'
GROUP BY 1,
2
```
Optimized plan is
```
== Optimized Logical Plan ==
Aggregate [years#0, appversion#2], [years#0, appversion#2, sum(uusers#3L) AS
users#5L]
+- Aggregate [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN
(h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25,
vs#17], [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24 AS years#0, vs#17 AS
appversion#2, count(if ((gid#23 = 1)) u.`uid`#26 else null) AS uusers#3L]
+- Aggregate [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN
(h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25,
vs#17, u.`uid`#26, u.`suid`#27, gid#23], [date_trunc('year', CAST(u.`dt` AS
TIMESTAMP))#24, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN
'Android' ELSE 'Other' END#25, vs#17, u.`uid`#26, gid#23]
+- Filter (CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN
'Android' ELSE 'Other' END#25 = iOS)
+- Expand [ArrayBuffer(date_trunc(year, cast(dt#14 as timestamp),
Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN
Android ELSE Other END, vs#17, uid#12, null, 1), ArrayBuffer(date_trunc(year,
cast(dt#14 as timestamp), Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS
WHEN (pid#16 = 4) THEN Android ELSE Other END, vs#17, null, suid#15, 2)],
[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN (h.`pid` = 3) THEN
'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25, vs#17, u.`uid`#26,
u.`suid`#27, gid#23]
+- Project [uid#12, dt#14, suid#15, pid#16, vs#17]
+- Join Inner, (uid#18 = uid#12)
:- Project [uid#12, dt#14, suid#15]
: +- Filter isnotnull(uid#12)
: +- Relation[pid#11,uid#12,sid#13,dt#14,suid#15]
parquet
+- Project [pid#16, vs#17, uid#18]
+- Filter isnotnull(uid#18)
+- Relation[pid#16,vs#17,uid#18,csid#19] parquet
```
After that pr, Optimized plan is
```
== Optimized Logical Plan ==
Aggregate [years#0, appversion#2], [years#0, appversion#2, sum(uusers#3L) AS
users#5L]
+- Aggregate [date_trunc(year, cast(dt#14 as timestamp),
Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN
Android ELSE Other END, vs#17], [date_trunc(year, cast(dt#14 as timestamp),
Some(Asia/Shanghai)) AS years#0, vs#17 AS appversion#2, count(distinct uid#12)
AS uusers#3L]
+- Project [uid#12, dt#14, pid#16, vs#17]
+- Join Inner, (uid#18 = uid#12)
:- Project [uid#12, dt#14]
: +- Filter isnotnull(uid#12)
: +- Relation[pid#11,uid#12,sid#13,dt#14,suid#15] parquet
+- Project [pid#16, vs#17, uid#18]
+- Filter ((CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4)
THEN Android ELSE Other END = iOS) AND isnotnull(uid#18))
+- Relation[pid#16,vs#17,uid#18,csid#19] parquet
```
`Filter((CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN Android ELSE
Other END = iOS))` is pushed down and won't generate `Expand`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]