Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21184#discussion_r202334300
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -284,6 +288,80 @@ class Analyzer(
}
}
+ /**
+ * Replaces [[Alias]] with the same exprId but different references with
[[Alias]] having
+ * different exprIds. This is a rare situation which can cause incorrect
results.
+ */
+ object DeduplicateAliases extends Rule[LogicalPlan] {
--- End diff --
Yes, that is also true. But in many places in the codebase we just compare
attributes using `semanticEquals` or in some other cases, even `equals`. Well,
if we admit that different attributes can have the same `exprId`, all these
places should be checked in order to be sure that the same problem cannot
happen there too. Moreover (this is more a nit), the `semanticEquals` or
`sameRef` method itself would be wrong according to its semantic, as it may
return `true` even when two attributes don't have the same reference. This is
the reason why I opted for this solution, which seems to me cleaner as it
solves the root cause of the problem. What do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]