Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19178#discussion_r137970319
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1115,6 +1115,8 @@ class Analyzer(
g.copy(join = true, child = addMissingAttr(g.child, missing))
case d: Distinct =>
throw new AnalysisException(s"Can't add $missingAttrs to $d")
+ case u: Union =>
+ u.withNewChildren(u.children.map(addMissingAttr(_,
missingAttrs)))
--- End diff --
This is not the only issue in `Union` and I think binary operators have the
same issue, e.g.,
```
scala> df3.join(df4).filter("grouping_id()=0").show()
org.apache.spark.sql.AnalysisException: cannot resolve
'`spark_grouping_id`' given input columns: [a, sum(b), a, sum(b)];;
'Filter ('spark_grouping_id = 0)
+- Join Inner
:- Aggregate [a#27, spark_grouping_id#25], [a#27, sum(cast(b#6 as
bigint)) AS sum(b)#24L]
: +- Expand [List(a#5, b#6, a#26, 0), List(a#5, b#6, null, 1)], [a#5,
b#6, a#27, spark_grouping_id#25]
: +- Project [a#5, b#6, a#5 AS a#26]
: +- Project [_1#0 AS a#5, _2#1 AS b#6]
: +- LocalRelation [_1#0, _2#1]
+- Aggregate [a#38, spark_grouping_id#36], [a#38, sum(cast(b#16 as
bigint)) AS sum(b)#35L]
+- Expand [List(a#15, b#16, a#37, 0), List(a#15, b#16, null, 1)],
[a#15, b#16, a#38, spark_grouping_id#36]
+- Project [a#15, b#16, a#15 AS a#37]
+- Project [_1#10 AS a#15, _2#11 AS b#16]
+- LocalRelation [_1#10, _2#11]
```
So, we need more general solution for this case, I think.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]