GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/19240
[SPARK-22018][SQL]Preserve top-level alias metadata when collapsing projects
## What changes were proposed in this pull request?
If there are two projects like as follows.
```
Project [a_with_metadata#27 AS b#26]
+- Project [a#0 AS a_with_metadata#27]
+- LocalRelation <empty>, [a#0, b#1]
```
Child Project has an output column with a metadata in it, and the parent
Project has an alias that implicitly forwards the metadata. So this metadata is
visible for higher operators. Upon applying CollapseProject optimizer rule, the
metadata is not preserved.
```
Project [a#0 AS b#26]
+- LocalRelation <empty>, [a#0, b#1]
```
This is incorrect, as downstream operators that expect certain metadata
(e.g. watermark in structured streaming) to identify certain fields will fail
to do so. This PR fixes it by preserving the metadata of top-level aliases.
## How was this patch tested?
New unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-22018
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19240.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19240
----
commit b3e41a7603e8bf917e9b596bdeb6afa51a32a695
Author: Tathagata Das <[email protected]>
Date: 2017-09-15T00:29:56Z
Peserver top-level alias metadata
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]