GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/21539
[SPARK-24500][SQL] Make sure streams are materialized during Tree
transforms.
## What changes were proposed in this pull request?
If you construct catalyst trees using `scala.collection.immutable.Stream`
you can run into situations where valid transformations do not seem to have any
effect. There are two causes for this behavior:
- `Stream` is evaluated lazily. Note that default implementation will
generally only evaluate a function for the first element (this makes testing a
bit tricky).
- `TreeNode` and `QueryPlan` use side effects to detect if a tree has
changed. Mapping over a stream is lazy and does not need to trigger this side
effect. If this happens the node will invalidly assume that it did not change
and return itself instead if the newly created node (this is for GC reasons).
This PR fixes this issue by forcing materialization on streams in
`TreeNode` and `QueryPlan`.
## How was this patch tested?
Unit tests were added to `TreeNodeSuite` and `LogicalPlanSuite`. An
integration test was added to the `PlannerSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-24500
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21539.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21539
----
commit d5832f4b50d4c8f84feb462291d8da37e87b192f
Author: Herman van Hovell <hvanhovell@...>
Date: 2018-06-12T09:51:05Z
Make sure streams are materialized during Tree transforms.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]