[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-09-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r80187168
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --


>
+ this patch's changes to ObjectExpressionsSuite.scala
+ this patch's changes to ExpressionEvalHelper.scala (this is also critical)
- this patch's changes to objects.scala

Under this case, the test failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-09-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r80182811
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

@lw-lin without this patch's changes to ExpressionEvalHelper.scala, this 
test still passes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-29 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76579917
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

@viirya maybe test against the following?

- + this patch's changes to ObjectExpressionsSuite.scala
- + this patch's changes to ExpressionEvalHelper.scala (this is also 
critical)
- - this patch's changes to objects.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76576622
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

But looks like this change doesn't reflect in the test? Without this 
change, the added test is passed too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14698


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76177037
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

yea.

`GenerateUnsafeProjection.generate` here was not able to use unsafe-backed 
data structure because it's `Create*Struct`.
`UnsafeProjection.create`, however, does use unsafe-backed data structure 
(UnsafeRow, UnsafeArrayData, ...) so that this test is valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76176582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
--- End diff --

we're calling `toString` on a `UTF8String`, so maybe there's no need to 
clone `UTF8String`s?

```java
...
/* 072 */value8 = MapObjects_loopValue2.getUTF8String(0);
...
/* 082 */funcResult = value8.toString();
...
/* 086 */value7 = (java.lang.String) funcResult;
...
/* 128 */convertedArray[loopIndex] = ...;
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76176311
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
+  case StructType(_) =>
+s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? 
" +
--- End diff --

done.
indeed we should narrow down to `UnsafeRow`, since other `MutableRow`s' 
backed data are not shared even though they are mutable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-24 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76176321
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
+  case StructType(_) =>
+s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? 
" +
+  s"${genFunction.value}.copy() : ${genFunction.value})"
+  case ArrayType(_, _) =>
+s"(${genFunction.value} instanceof 
${classOf[UnsafeArrayData].getName}? " +
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r75762666
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

This is to prevent issues with `Create*Struct` right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r75762228
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
--- End diff --

Should we also copy UTF8String? This can be backed by a reused buffer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r75762055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
+  case StructType(_) =>
+s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? 
" +
+  s"${genFunction.value}.copy() : ${genFunction.value})"
+  case ArrayType(_, _) =>
+s"(${genFunction.value} instanceof 
${classOf[UnsafeArrayData].getName}? " +
--- End diff --

Can you generalize the code generation a little bit here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r75761961
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -474,6 +474,20 @@ case class MapObjects private(
   s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)"
 }
 
+// Make a copy of the data if it's unsafe-backed
+val genFunctionValue = lambdaFunction.dataType match {
+  case StructType(_) =>
+s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? 
" +
--- End diff --

UnsafeRow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-18 Thread lw-lin
GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/14698

[SPARK-17061][SPARK-17093][SQL] `MapObjects` should make copies of 
unsafe-backed data

## What changes were proposed in this pull request?

Currently `MapObjects` does not make copies of unsafe-backed data, leading 
to problems like 
[SPARK-17061](https://issues.apache.org/jira/browse/SPARK-17061) 
[SPARK-17093](https://issues.apache.org/jira/browse/SPARK-17093).

This patch makes `MapObjects` make copies of unsafe-backed data.

## How was this patch tested?

Add a new test case which would fail without this patch.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark mapobjects-copy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14698


commit 4fc1ec51a938762a70bd3a50111b0b3a00e94955
Author: Liwei Lin 
Date:   2016-08-18T02:53:30Z

MapObjects should copy unsafe-backed data




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org