Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19178#discussion_r137970319
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -1115,6 +1115,8 @@ class Analyzer(
               g.copy(join = true, child = addMissingAttr(g.child, missing))
             case d: Distinct =>
               throw new AnalysisException(s"Can't add $missingAttrs to $d")
    +        case u: Union =>
    +          u.withNewChildren(u.children.map(addMissingAttr(_, 
missingAttrs)))
    --- End diff --
    
    This is not the only issue in `Union` and I think binary operators have the 
same issue, e.g., 
    ```
    scala> df3.join(df4).filter("grouping_id()=0").show()
    org.apache.spark.sql.AnalysisException: cannot resolve 
'`spark_grouping_id`' given input columns: [a, sum(b), a, sum(b)];;
    'Filter ('spark_grouping_id = 0)
    +- Join Inner
       :- Aggregate [a#27, spark_grouping_id#25], [a#27, sum(cast(b#6 as 
bigint)) AS sum(b)#24L]
       :  +- Expand [List(a#5, b#6, a#26, 0), List(a#5, b#6, null, 1)], [a#5, 
b#6, a#27, spark_grouping_id#25]
       :     +- Project [a#5, b#6, a#5 AS a#26]
       :        +- Project [_1#0 AS a#5, _2#1 AS b#6]
       :           +- LocalRelation [_1#0, _2#1]
       +- Aggregate [a#38, spark_grouping_id#36], [a#38, sum(cast(b#16 as 
bigint)) AS sum(b)#35L]
          +- Expand [List(a#15, b#16, a#37, 0), List(a#15, b#16, null, 1)], 
[a#15, b#16, a#38, spark_grouping_id#36]
             +- Project [a#15, b#16, a#15 AS a#37]
                +- Project [_1#10 AS a#15, _2#11 AS b#16]
                   +- LocalRelation [_1#10, _2#11]
    ```
    So, we need more general solution for this case, I think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to