GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/19240

    [SPARK-22018][SQL]Preserve top-level alias metadata when collapsing projects

    ## What changes were proposed in this pull request?
    If there are two projects like as follows.
    ```
    Project [a_with_metadata#27 AS b#26]
    +- Project [a#0 AS a_with_metadata#27]
       +- LocalRelation <empty>, [a#0, b#1]
    ```
    Child Project has an output column with a metadata in it, and the parent 
Project has an alias that implicitly forwards the metadata. So this metadata is 
visible for higher operators. Upon applying CollapseProject optimizer rule, the 
metadata is not preserved.
    ```
    Project [a#0 AS b#26]
    +- LocalRelation <empty>, [a#0, b#1]
    ```
    This is incorrect, as downstream operators that expect certain metadata 
(e.g. watermark in structured streaming) to identify certain fields will fail 
to do so. This PR fixes it by preserving the metadata of top-level aliases.
    
    ## How was this patch tested?
    New unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-22018

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19240
    
----
commit b3e41a7603e8bf917e9b596bdeb6afa51a32a695
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2017-09-15T00:29:56Z

    Peserver top-level alias metadata

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to