imback82 commented on a change in pull request #26441: [SPARK-29682][SQL]
Resolve conflicting references in aggregate expressions
URL: https://github.com/apache/spark/pull/26441#discussion_r344470317
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -951,12 +951,15 @@ class Analyzer(
// Handle projects that create conflicting aliases.
case oldVersion @ Project(projectList, _)
- if
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
+ if hasConflictInAlias(projectList, conflictingAttributes) =>
Review comment:
Thanks @viirya for the suggestion. I made `Project` also use `hasConflict`
and the generated plan is a bit different (for the better I think).
As an example, for the following plan with `#225` as a conflicting attribute:
```
Project [i#225]
+- Project [_1#220 AS i#225, _2#221 AS j#226]
+- LocalRelation [_1#220, _2#221]
```
it used to be resolved as:
```
+- Project [i#230]
+- Project [_1#220 AS i#230, _2#221 AS j#231]
+- LocalRelation [_1#220, _2#221]
```
Notice that `#221`'s alias was also changed (`newAliases` would replace all
aliases with new ones).
But now with `newNamedExpression`, it produces the following plan:
```
Project [i#230]
+- Project [_1#220 AS i#230, _2#221 AS j#226]
+- LocalRelation [_1#220, _2#221]
```
Notice that `#221`'s alias is not changed since it's not a conflicting
attribute.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]