[GitHub] [spark] maropu commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

GitBox Mon, 07 Sep 2020 19:02:19 -0700


maropu commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r484610944




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -2575,13 +2580,13 @@ class Analyzer(
         case ne: NamedExpression =>
           // If a named expression is not in regularExpressions, add it to
           // extractedExprBuffer and replace it with an AttributeReference.
+          val attr = ne.toAttribute
           val missingExpr =
-            AttributeSet(Seq(expr)) -- (regularExpressions ++ 
extractedExprBuffer)
+            AttributeSet(Seq(attr)) -- (regularExpressions ++ 
extractedExprBuffer)
           if (missingExpr.nonEmpty) {
             extractedExprBuffer += ne
           }
-          // alias will be cleaned in the rule CleanupAliases
-          ne
+          attr

Review comment:
       I updated this code to fix the test failure below;
   ```
   [info] - grouping/grouping_id inside window function *** FAILED *** (75 
milliseconds)
   [info]   org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
After applying rule 
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions in 
batch Resolution, the structural integrity of the plan is broken., tree:
   [info] Project [course#427, year#428, sum(earnings)#421, grouping_id(course, 
year)#422L, RANK() OVER (PARTITION BY grouping_id(course, year) ORDER BY 
sum(earnings) ASC NULLS FIRST unspecifiedframe$())#423]
   [info] +- Project [course#427, year#428, sum(earnings)#421, 
grouping_id(course, year)#422L, _w0#434, grouping_id(course, year)#430L, 
_w2#438, spark_grouping_id#426L, RANK() OVER (PARTITION BY grouping_id(course, 
year) ORDER BY sum(earnings) ASC NULLS FIRST unspecifiedframe$())#423, RANK() 
OVER (PARTITION BY grouping_id(course, year) ORDER BY sum(earnings) ASC NULLS 
FIRST unspecifiedframe$())#423]
   [info]    +- Window [rank(_w0#434) 
windowspecdefinition(spark_grouping_id#426L AS grouping_id(course, year)#430L, 
_w2#438 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), 
currentrow$())) AS RANK() OVER (PARTITION BY grouping_id(course, year) ORDER BY 
sum(earnings) ASC NULLS FIRST unspecifiedframe$())#423], 
[spark_grouping_id#426L AS grouping_id(course, year)#430L], [_w2#438 ASC NULLS 
FIRST]
   [info]       +- Aggregate [course#427, year#428, spark_grouping_id#426L], 
[course#427, year#428, sum(earnings#258) AS sum(earnings)#421, 
spark_grouping_id#426L AS grouping_id(course, year)#429L AS grouping_id(course, 
year)#422L, sum(earnings#258) AS _w0#434, spark_grouping_id#426L AS 
grouping_id(course, year)#430L, sum(earnings#258) AS _w2#438, 
spark_grouping_id#426L]
   [info]          +- Expand [List(course#256, year#257, earnings#258, 
course#424, year#425, 0), List(course#256, year#257, earnings#258, course#424, 
null, 1), List(course#256, year#257, earnings#258, null, year#425, 2), 
List(course#256, year#257, earnings#258, null, null, 3)], [course#256, 
year#257, earnings#258, course#427, year#428, spark_grouping_id#426L]
   [info]             +- Project [course#256, year#257, earnings#258, 
course#256 AS course#424, year#257 AS year#425]
   [info]                +- SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$CourseSales, true])).course, true, false) 
AS course#256, knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$CourseSales, true])).year AS year#257, 
knownnotnull(assertnotnull(input[0, 
org.apache.spark.sql.test.SQLTestData$CourseSales, true])).earnings AS 
earnings#258]
   [info]                   +- ExternalRDD [obj#255]
   [info]   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:235)
   [info]   at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   ```
   The named expression `spark_grouping_id#426L AS grouping_id(course, 
year)#430L` was duplicated in the `Window` and `Aggregate` nodes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

Reply via email to