chasingegg commented on pull request #35760:
URL: https://github.com/apache/spark/pull/35760#issuecomment-1061512536
> ### What changes were proposed in this pull request?
> Fixes a correctness bug in `Union` in the case that there are duplicate
output columns. Previously, duplicate columns on one side of the union would
result in a duplicate column being output on the other side of the union.
>
> To do so, we go through the union’s child’s output and find the
duplicates. For each duplicate set, there is a first duplicate: this one is
left alone. All following duplicates are aliased and given a tag; this tag is
used to remove ambiguity during resolution.
>
> As the first duplicate is left alone, the user can still select it,
avoiding a breaking change. As the later duplicates are given new expression
IDs, this fixes the correctness bug.
>
> ### Why are the changes needed?
> Output of union with duplicate columns in the children was incorrect
>
> ### Does this PR introduce _any_ user-facing change?
> Example query:
>
> ```
> SELECT a, a FROM VALUES (1, 1), (1, 2) AS t1(a, b)
> UNION ALL SELECT c, d FROM VALUES (2, 2), (2, 3) AS t2(c, d)
> ```
>
> Result before:
>
> ```
> a | a
> _ | _
> 1 | 1
> 1 | 1
> 2 | 2
> 2 | 2
> ```
>
> Result after:
>
> ```
> a | a
> _ | _
> 1 | 1
> 1 | 2
> 2 | 2
> 2 | 3
> ```
>
> ### How was this patch tested?
> Unit tests
Result After should be
```
a | a
_ | _
1 | 1
1 | 1
2 | 2
2 | 3
```
?
And why we should support `select a from (SELECT a, a FROM VALUES (1, 1),
(1, 2) AS t1(a, b)
UNION ALL SELECT c, d FROM VALUES (2, 2), (2, 3) AS t2(c, d))`? The result
after this fix is
a
_
1
1
2
2
I think it should be broken because it is ambiguous, instead of choosing the
first column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]