[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21539 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21539#discussion_r194902636 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -679,6 +679,13 @@ class PlannerSuite extends SharedSQLContext { } assert(rangeExecInZeroPartition.head.outputPartitioning == UnknownPartitioning(0)) } + + test("SPARK-24500: create union with stream of children") { +val df = Union(Stream( + Range(1, 1, 1, 1), + Range(1, 2, 1, 1))) +df.queryExecution.executedPlan.execute() --- End diff -- Yeah, it would throw an `UnsupportedOperationException` before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21539#discussion_r194902354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { def mapChildren(f: BaseType => BaseType): BaseType = { if (children.nonEmpty) { var changed = false + def mapChild(child: Any): Any = child match { +case arg: TreeNode[_] if containsChild(arg) => --- End diff -- Yeah, so was about to do that but then I noticed that they handle different cases, `mapChild` handles `TreeNode` and `(TreeNode, TreeNode)`, whereas L326-L349 handles `TreeNode`, `Option[TreeNode]` and `Map[_, TreeNode]`. I am not sure if combining them is useful, and if it is then I'd rather do it in a different PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21539#discussion_r194836411 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -679,6 +679,13 @@ class PlannerSuite extends SharedSQLContext { } assert(rangeExecInZeroPartition.head.outputPartitioning == UnknownPartitioning(0)) } + + test("SPARK-24500: create union with stream of children") { +val df = Union(Stream( + Range(1, 1, 1, 1), + Range(1, 2, 1, 1))) +df.queryExecution.executedPlan.execute() --- End diff -- does it throw exception before? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21539#discussion_r194835459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { def mapChildren(f: BaseType => BaseType): BaseType = { if (children.nonEmpty) { var changed = false + def mapChild(child: Any): Any = child match { +case arg: TreeNode[_] if containsChild(arg) => --- End diff -- shall we reuse these code in L326? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21539#discussion_r194834520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { def mapChildren(f: BaseType => BaseType): BaseType = { if (children.nonEmpty) { var changed = false + def mapChild(child: Any): Any = child match { +case arg: TreeNode[_] if containsChild(arg) => + val newChild = f(arg.asInstanceOf[BaseType]) --- End diff -- These code are duplicated in L326, shall we reuse them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/21539 [SPARK-24500][SQL] Make sure streams are materialized during Tree transforms. ## What changes were proposed in this pull request? If you construct catalyst trees using `scala.collection.immutable.Stream` you can run into situations where valid transformations do not seem to have any effect. There are two causes for this behavior: - `Stream` is evaluated lazily. Note that default implementation will generally only evaluate a function for the first element (this makes testing a bit tricky). - `TreeNode` and `QueryPlan` use side effects to detect if a tree has changed. Mapping over a stream is lazy and does not need to trigger this side effect. If this happens the node will invalidly assume that it did not change and return itself instead if the newly created node (this is for GC reasons). This PR fixes this issue by forcing materialization on streams in `TreeNode` and `QueryPlan`. ## How was this patch tested? Unit tests were added to `TreeNodeSuite` and `LogicalPlanSuite`. An integration test was added to the `PlannerSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-24500 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21539 commit d5832f4b50d4c8f84feb462291d8da37e87b192f Author: Herman van Hovell Date: 2018-06-12T09:51:05Z Make sure streams are materialized during Tree transforms. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org