[GitHub] [spark] maropu commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

GitBox Thu, 17 Sep 2020 07:55:43 -0700


maropu commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r490318779




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -2575,13 +2580,13 @@ class Analyzer(
         case ne: NamedExpression =>
           // If a named expression is not in regularExpressions, add it to
           // extractedExprBuffer and replace it with an AttributeReference.
+          val attr = ne.toAttribute
           val missingExpr =
-            AttributeSet(Seq(expr)) -- (regularExpressions ++ 
extractedExprBuffer)
+            AttributeSet(Seq(attr)) -- (regularExpressions ++ 
extractedExprBuffer)
           if (missingExpr.nonEmpty) {
             extractedExprBuffer += ne
           }
-          // alias will be cleaned in the rule CleanupAliases
-          ne
+          attr

Review comment:
       Ah, I notice that we don't need this update now because I made the 
`checkIfSameExprIdNotReused` integrity check looser in [the 
commit](https://github.com/apache/spark/pull/29585/commits/9a875c6eb948981da7dce965d97b37a4b8ca7220)
 (this update comes from the comment: 
https://github.com/apache/spark/pull/29585#discussion_r484751614).
   
   Why we need this update in the previous check is as follows;
   ```
   // I used the current master:
   sql("SET spark.sql.planChangeLog.level=WARN")
   scala> val df = Seq(("a", 1)).toDF("k", "v")
   scala> val aggDf = df.cube("k").agg(sum("v"), 
rank().over(Window.partitionBy(grouping_id("k")).orderBy(sum("v"))))
   
   20/09/17 23:32:58 WARN PlanChangeLogger: 
   === Applying Rule 
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions ===
   !'Aggregate [k#191, spark_grouping_id#190L], [k#191, sum(cast(v#178 as 
bigint)) AS sum(v)#187L, rank(sum(cast(v#178 as bigint))) 
windowspecdefinition(spark_grouping_id#190L AS grouping_id(k)#192L, 
sum(cast(v#178 as bigint)) ASC NULLS FIRST, specifiedwindowframe(RowFrame, 
unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY 
grouping_id(k) ORDER BY sum(v) ASC NULLS FIRST unspecifiedframe$())#188]   
Project [k#191, sum(v)#187L, RANK() OVER (PARTITION BY grouping_id(k) ORDER BY 
sum(v) ASC NULLS FIRST unspecifiedframe$())#188]
   !+- Expand [List(k#177, v#178, k#189, 0), List(k#177, v#178, null, 1)], 
[k#177, v#178, k#191, spark_grouping_id#190L]                                   
                                                                                
                                                                                
                                                                                
                       +- Project [k#191, sum(v)#187L, _w0#196L, 
grouping_id(k)#192L, _w2#200L, spark_grouping_id#190L, RANK() OVER (PARTITION 
BY grouping_id(k) ORDER BY sum(v) ASC NULLS FIRST unspecifiedframe$())#188, 
RANK() OVER (PARTITION BY grouping_id(k) ORDER BY sum(v) ASC NULLS FIRST 
unspecifiedframe$())#188]
   !   +- Project [k#177, v#178, k#177 AS k#189]                                
                                                                                
                                                                                
                                                                                
                                                                                
                     +- Window [rank(_w0#196L) 
windowspecdefinition(spark_grouping_id#190L AS grouping_id(k)#192L, _w2#200L 
ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), 
currentrow$())) AS RANK() OVER (PARTITION BY grouping_id(k) ORDER BY sum(v) ASC 
NULLS FIRST unspecifiedframe$())#188], [spark_grouping_id#190L AS 
grouping_id(k)#192L], [_w2#200L ASC NULLS FIRST]
   !      +- Project [_1#172 AS k#177, _2#173 AS v#178]                         
                                                                                
                                                                                
                                                                                
                                                                                
                        +- Aggregate [k#191, spark_grouping_id#190L], [k#191, 
sum(cast(v#178 as bigint)) AS sum(v)#187L, sum(cast(v#178 as bigint)) AS 
_w0#196L, spark_grouping_id#190L AS grouping_id(k)#192L, sum(cast(v#178 as 
bigint)) AS _w2#200L, spark_grouping_id#190L]
   !         +- LocalRelation [_1#172, _2#173]                                  
                                                                                
                                                                                
                                                                                
                                                                                
                           +- Expand [List(k#177, v#178, k#189, 0), List(k#177, 
v#178, null, 1)], [k#177, v#178, k#191, spark_grouping_id#190L]
   !                                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                              +- Project [k#177, v#178, k#177 AS k#189]
   !                                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                 +- Project [_1#172 AS k#177, _2#173 AS v#178]
   !                                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                    +- LocalRelation [_1#172, _2#173]
   ```
   `ExtractWindowExpressions` replicates an alias `spark_grouping_id#190L AS 
grouping_id(k)#192L` (`Alias(AttributeReference(spark_grouping_id#190L), 
"grouping_id(k)"#192L`) in both `Window` and the child `Aggregate`. The parent 
`Window` has the `exirId=192L` in the `inputSet`, so [the previous integrity 
check](https://github.com/apache/spark/pull/29585/commits/9a875c6eb948981da7dce965d97b37a4b8ca7220)
 fails.
   
   Anyway, I will revert this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

Reply via email to