Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21184#discussion_r201961786
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -284,6 +288,80 @@ class Analyzer(
}
}
+ /**
+ * Replaces [[Alias]] with the same exprId but different references with
[[Alias]] having
+ * different exprIds. This is a rare situation which can cause incorrect
results.
+ */
+ object DeduplicateAliases extends Rule[LogicalPlan] {
--- End diff --
The main problem which causes the added UT to fail is that
`FoldablePropagation` replaces all foldable aliases which are considered to be
the same. If the same alias with same exprId is located in different part of
the plan (referencing actually different things, despite they have the same
id...) this can cause wrong replacement to happen. So in the added UT, the plan
is:
```
== Analyzed Logical Plan ==
a: int, b: int, n: bigint
Union
:- Project [a#5, b#17, n#19L]
: +- Project [a#5, b#17, n#19L, n#19L]
: +- Window [count(1)
windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(),
unboundedfollowing$())) AS n#19L]
: +- Project [a#5, b#6 AS b#17]
: +- Project [_1#2 AS a#5, _2#3 AS b#6]
: +- LocalRelation [_1#2, _2#3]
+- Project [a#12, b#17, n#19L]
+- Project [a#12, b#17, n#19L, n#19L]
+- Window [count(1)
windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(),
unboundedfollowing$())) AS n#19L]
+- Project [a#12, b#14 AS b#17]
+- Project [a#12, 0 AS b#14]
+- Project [value#10 AS a#12]
+- LocalRelation [value#10]
```
Please note that in both the branches of the union we have the same `b#17`
attribute, but they are referencing different things. As the lower one is a
foldable value which evaluates to 0, all the `b#17` are replace with a literal
0, causing a wrong result.
Despite we might fix this specific problem in the related Optimizer rule, I
think that in general we assume that items with the same id are the same. So I
proposed this solution in order to fix all the possible issues which may arise
due to this situation which is not expected.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]