GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/21539

    [SPARK-24500][SQL] Make sure streams are materialized during Tree 
transforms.

    ## What changes were proposed in this pull request?
    If you construct catalyst trees using `scala.collection.immutable.Stream` 
you can run into situations where valid transformations do not seem to have any 
effect. There are two causes for this behavior:
    - `Stream` is evaluated lazily. Note that default implementation will 
generally only evaluate a function for the first element (this makes testing a 
bit tricky).
    - `TreeNode` and `QueryPlan` use side effects to detect if a tree has 
changed. Mapping over a stream is lazy and does not need to trigger this side 
effect. If this happens the node will invalidly assume that it did not change 
and return itself instead if the newly created node (this is for GC reasons).
    
    This PR fixes this issue by forcing materialization on streams in 
`TreeNode` and `QueryPlan`.
    
    ## How was this patch tested?
    Unit tests were added to `TreeNodeSuite` and `LogicalPlanSuite`. An 
integration test was added to the `PlannerSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-24500

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21539
    
----
commit d5832f4b50d4c8f84feb462291d8da37e87b192f
Author: Herman van Hovell <hvanhovell@...>
Date:   2018-06-12T09:51:05Z

    Make sure streams are materialized during Tree transforms.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to