imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] 
Resolve conflicting references in aggregate expressions 
URL: https://github.com/apache/spark/pull/26441#discussion_r344516803
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##########
 @@ -949,14 +949,19 @@ class Analyzer(
             if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty 
=>
           (oldVersion, oldVersion.copy(serializer = 
oldVersion.serializer.map(_.newInstance())))
 
-        // Handle projects that create conflicting aliases.
         case oldVersion @ Project(projectList, _)
-            if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-          (oldVersion, oldVersion.copy(projectList = newAliases(projectList)))
+            if hasConflict(projectList, conflictingAttributes) =>
+          (oldVersion,
+            oldVersion.copy(
+              projectList =
+                newNamedExpression(projectList, conflictingAttributes)))
 
         case oldVersion @ Aggregate(_, aggregateExpressions, _)
 
 Review comment:
   > Could we fix this issue in an easier way than the current fix?
   
   I don't think it is robust enough. For example, the following test fails 
with the suggested fix:
   ```
   [info] - [SPARK-6231] join - self join auto resolve ambiguity *** FAILED *** 
(251 milliseconds)
   [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
Resolved attribute(s) key#4619 missing from key#4518,value#4519 in operator 
!Aggregate [key#4619], [key#4619, sum(cast(key#4619 as bigint)) AS 
sum(key)#4620L]. Attribute(s) with the same name appear in the operation: key. 
Please check if the right attribute(s) are used.;;
   [info]   Join Inner, (key#4518 = key#4518)
   [info]   :- Aggregate [key#4518], [key#4518, count(1) AS count(1)#4610L]
   [info]   :  +- Project [_1#4513 AS key#4518, _2#4514 AS value#4519]
   [info]   :     +- LocalRelation [_1#4513, _2#4514]
   [info]   +- !Aggregate [key#4619], [key#4619, sum(cast(key#4619 as bigint)) 
AS sum(key)#4620L]
   [info]      +- Project [_1#4513 AS key#4518, _2#4514 AS value#4519]
   [info]         +- LocalRelation [_1#4513, _2#4514]
   [info]   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to