[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r80187168 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- > + this patch's changes to ObjectExpressionsSuite.scala + this patch's changes to ExpressionEvalHelper.scala (this is also critical) - this patch's changes to objects.scala Under this case, the test failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r80182811 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- @lw-lin without this patch's changes to ExpressionEvalHelper.scala, this test still passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76579917 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- @viirya maybe test against the following? - + this patch's changes to ObjectExpressionsSuite.scala - + this patch's changes to ExpressionEvalHelper.scala (this is also critical) - - this patch's changes to objects.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76576622 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- But looks like this change doesn't reflect in the test? Without this change, the added test is passed too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14698 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76177037 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- yea. `GenerateUnsafeProjection.generate` here was not able to use unsafe-backed data structure because it's `Create*Struct`. `UnsafeProjection.create`, however, does use unsafe-backed data structure (UnsafeRow, UnsafeArrayData, ...) so that this test is valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176582 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { --- End diff -- we're calling `toString` on a `UTF8String`, so maybe there's no need to clone `UTF8String`s? ```java ... /* 072 */value8 = MapObjects_loopValue2.getUTF8String(0); ... /* 082 */funcResult = value8.toString(); ... /* 086 */value7 = (java.lang.String) funcResult; ... /* 128 */convertedArray[loopIndex] = ...; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { + case StructType(_) => +s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? " + --- End diff -- done. indeed we should narrow down to `UnsafeRow`, since other `MutableRow`s' backed data are not shared even though they are mutable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r76176321 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { + case StructType(_) => +s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? " + + s"${genFunction.value}.copy() : ${genFunction.value})" + case ArrayType(_, _) => +s"(${genFunction.value} instanceof ${classOf[UnsafeArrayData].getName}? " + --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r75762666 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- This is to prevent issues with `Create*Struct` right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r75762228 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { --- End diff -- Should we also copy UTF8String? This can be backed by a reused buffer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r75762055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { + case StructType(_) => +s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? " + + s"${genFunction.value}.copy() : ${genFunction.value})" + case ArrayType(_, _) => +s"(${genFunction.value} instanceof ${classOf[UnsafeArrayData].getName}? " + --- End diff -- Can you generalize the code generation a little bit here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r75761961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -474,6 +474,20 @@ case class MapObjects private( s"$seq == null ? $array[$loopIndex] : $seq.apply($loopIndex)" } +// Make a copy of the data if it's unsafe-backed +val genFunctionValue = lambdaFunction.dataType match { + case StructType(_) => +s"(${genFunction.value} instanceof ${classOf[MutableRow].getName}? " + --- End diff -- UnsafeRow? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14698 [SPARK-17061][SPARK-17093][SQL] `MapObjects` should make copies of unsafe-backed data ## What changes were proposed in this pull request? Currently `MapObjects` does not make copies of unsafe-backed data, leading to problems like [SPARK-17061](https://issues.apache.org/jira/browse/SPARK-17061) [SPARK-17093](https://issues.apache.org/jira/browse/SPARK-17093). This patch makes `MapObjects` make copies of unsafe-backed data. ## How was this patch tested? Add a new test case which would fail without this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark mapobjects-copy Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14698.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14698 commit 4fc1ec51a938762a70bd3a50111b0b3a00e94955 Author: Liwei LinDate: 2016-08-18T02:53:30Z MapObjects should copy unsafe-backed data --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org