[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21539


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21539#discussion_r194902636
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -679,6 +679,13 @@ class PlannerSuite extends SharedSQLContext {
 }
 assert(rangeExecInZeroPartition.head.outputPartitioning == 
UnknownPartitioning(0))
   }
+
+  test("SPARK-24500: create union with stream of children") {
+val df = Union(Stream(
+  Range(1, 1, 1, 1),
+  Range(1, 2, 1, 1)))
+df.queryExecution.executedPlan.execute()
--- End diff --

Yeah, it would throw an `UnsupportedOperationException` before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21539#discussion_r194902354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def mapChildren(f: BaseType => BaseType): BaseType = {
 if (children.nonEmpty) {
   var changed = false
+  def mapChild(child: Any): Any = child match {
+case arg: TreeNode[_] if containsChild(arg) =>
--- End diff --

Yeah, so was about to do that but then I noticed that they handle different 
cases, `mapChild` handles `TreeNode` and `(TreeNode, TreeNode)`, whereas 
L326-L349 handles `TreeNode`, `Option[TreeNode]` and `Map[_, TreeNode]`. I am 
not sure if combining them is useful, and if it is then I'd rather do it in a 
different PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21539#discussion_r194836411
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -679,6 +679,13 @@ class PlannerSuite extends SharedSQLContext {
 }
 assert(rangeExecInZeroPartition.head.outputPartitioning == 
UnknownPartitioning(0))
   }
+
+  test("SPARK-24500: create union with stream of children") {
+val df = Union(Stream(
+  Range(1, 1, 1, 1),
+  Range(1, 2, 1, 1)))
+df.queryExecution.executedPlan.execute()
--- End diff --

does it throw exception before?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21539#discussion_r194835459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def mapChildren(f: BaseType => BaseType): BaseType = {
 if (children.nonEmpty) {
   var changed = false
+  def mapChild(child: Any): Any = child match {
+case arg: TreeNode[_] if containsChild(arg) =>
--- End diff --

shall we reuse these code in L326?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21539#discussion_r194834520
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -301,6 +290,37 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def mapChildren(f: BaseType => BaseType): BaseType = {
 if (children.nonEmpty) {
   var changed = false
+  def mapChild(child: Any): Any = child match {
+case arg: TreeNode[_] if containsChild(arg) =>
+  val newChild = f(arg.asInstanceOf[BaseType])
--- End diff --

These code are duplicated in L326, shall we reuse them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21539: [SPARK-24500][SQL] Make sure streams are material...

2018-06-12 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/21539

[SPARK-24500][SQL] Make sure streams are materialized during Tree 
transforms.

## What changes were proposed in this pull request?
If you construct catalyst trees using `scala.collection.immutable.Stream` 
you can run into situations where valid transformations do not seem to have any 
effect. There are two causes for this behavior:
- `Stream` is evaluated lazily. Note that default implementation will 
generally only evaluate a function for the first element (this makes testing a 
bit tricky).
- `TreeNode` and `QueryPlan` use side effects to detect if a tree has 
changed. Mapping over a stream is lazy and does not need to trigger this side 
effect. If this happens the node will invalidly assume that it did not change 
and return itself instead if the newly created node (this is for GC reasons).

This PR fixes this issue by forcing materialization on streams in 
`TreeNode` and `QueryPlan`.

## How was this patch tested?
Unit tests were added to `TreeNodeSuite` and `LogicalPlanSuite`. An 
integration test was added to the `PlannerSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-24500

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21539


commit d5832f4b50d4c8f84feb462291d8da37e87b192f
Author: Herman van Hovell 
Date:   2018-06-12T09:51:05Z

Make sure streams are materialized during Tree transforms.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org