AngersZhuuuu commented on a change in pull request #30278:
URL: https://github.com/apache/spark/pull/30278#discussion_r519659900



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1269,6 +1269,7 @@ object PushPredicateThroughNonJoin extends 
Rule[LogicalPlan] with PredicateHelpe
     case _: Sort => true
     case _: BatchEvalPython => true
     case _: ArrowEvalPython => true
+    case _: Expand => true

Review comment:
       > > seems current master fix this case
   > 
   > What do you mean by "fix this case"?
   
   I have found the pr #29673  
   Before this pr, SQL
   ```
   SELECT
          years,
          appversion,                                               
          SUM(uusers) AS users                                      
   FROM   (SELECT
                  Date_trunc('year', dt)          AS years,
                  CASE                                              
                    WHEN h.pid = 3 THEN 'iOS'           
                    WHEN h.pid = 4 THEN 'Android'       
                    ELSE 'Other'                                    
                  END                             AS viewport,      
                  h.vs                            AS appversion,
                  Count(DISTINCT u.uid)           AS uusers
                  ,Count(DISTINCT u.suid)         AS srcusers
           FROM   t1 u                                   
                  join t2 h                              
                    ON h.uid = u.uid            
           GROUP  BY 1,                                             
                     2,                                             
                     3) AS a
   WHERE  viewport = 'iOS'                                          
   GROUP  BY 1,                                                     
             2
   ```
   
   Optimized plan is  
   ```
   == Optimized Logical Plan ==
   Aggregate [years#0, appversion#2], [years#0, appversion#2, sum(uusers#3L) AS 
users#5L]
   +- Aggregate [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN 
(h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25, 
vs#17], [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24 AS years#0, vs#17 AS 
appversion#2, count(if ((gid#23 = 1)) u.`uid`#26 else null) AS uusers#3L]
      +- Aggregate [date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN 
(h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25, 
vs#17, u.`uid`#26, u.`suid`#27, gid#23], [date_trunc('year', CAST(u.`dt` AS 
TIMESTAMP))#24, CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 
'Android' ELSE 'Other' END#25, vs#17, u.`uid`#26, gid#23]
         +- Filter (CASE WHEN (h.`pid` = 3) THEN 'iOS' WHEN (h.`pid` = 4) THEN 
'Android' ELSE 'Other' END#25 = iOS)
            +- Expand [ArrayBuffer(date_trunc(year, cast(dt#14 as timestamp), 
Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN 
Android ELSE Other END, vs#17, uid#12, null, 1), ArrayBuffer(date_trunc(year, 
cast(dt#14 as timestamp), Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS 
WHEN (pid#16 = 4) THEN Android ELSE Other END, vs#17, null, suid#15, 2)], 
[date_trunc('year', CAST(u.`dt` AS TIMESTAMP))#24, CASE WHEN (h.`pid` = 3) THEN 
'iOS' WHEN (h.`pid` = 4) THEN 'Android' ELSE 'Other' END#25, vs#17, u.`uid`#26, 
u.`suid`#27, gid#23]
               +- Project [uid#12, dt#14, suid#15, pid#16, vs#17]
                  +- Join Inner, (uid#18 = uid#12)
                     :- Project [uid#12, dt#14, suid#15]
                     :  +- Filter isnotnull(uid#12)
                     :     +- Relation[pid#11,uid#12,sid#13,dt#14,suid#15] 
parquet
                     +- Project [pid#16, vs#17, uid#18]
                        +- Filter isnotnull(uid#18)
                           +- Relation[pid#16,vs#17,uid#18,csid#19] parquet
   ```
   
   After that pr, Optimized plan is 
   ```
   == Optimized Logical Plan ==
   Aggregate [years#0, appversion#2], [years#0, appversion#2, sum(uusers#3L) AS 
users#5L]
   +- Aggregate [date_trunc(year, cast(dt#14 as timestamp), 
Some(Asia/Shanghai)), CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN 
Android ELSE Other END, vs#17], [date_trunc(year, cast(dt#14 as timestamp), 
Some(Asia/Shanghai)) AS years#0, vs#17 AS appversion#2, count(distinct uid#12) 
AS uusers#3L]
      +- Project [uid#12, dt#14, pid#16, vs#17]
         +- Join Inner, (uid#18 = uid#12)
            :- Project [uid#12, dt#14]
            :  +- Filter isnotnull(uid#12)
            :     +- Relation[pid#11,uid#12,sid#13,dt#14,suid#15] parquet
            +- Project [pid#16, vs#17, uid#18]
               +- Filter ((CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) 
THEN Android ELSE Other END = iOS) AND isnotnull(uid#18))
                  +- Relation[pid#16,vs#17,uid#18,csid#19] parquet
   ```
   
   `Filter((CASE WHEN (pid#16 = 3) THEN iOS WHEN (pid#16 = 4) THEN Android ELSE 
Other END = iOS))` is pushed down and won't generate  `Expand`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to