[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99646/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99645/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99645/testReport)** for PR 23088 at commit [`f8cfb54`](https://github.com/apache/spark/commit/f8cfb544a805ecb5d1056f703dde4e7705ad1810). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99646/testReport)** for PR 23213 at commit [`2ced0ca`](https://github.com/apache/spark/commit/2ced0cab0c16ee7a2400035a5a7794033eae3ed9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 It's easy to track `numKeyLookups` at `HashedRelation`, but it's hard to track `numProbes`. One idea is, we pass a `MutableInt` to `LongToUnsafeRowMap.getValue` as a parameter, and in the method we set the actual `numProbes` of this look up to the `MutableInt` parameter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by multiple threads. For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via `HashedRelation.asReadOnlyCopy`. However, this is a shallow copy, the `LongToUnsafeRowMap` is not copied and likely shared by multiple `HashedRelation`s. The metrics is per-task, so I think a better fix is to track the hash probe metrics per `HashedRelation`, instead of per `LongToUnsafeRowMap`. It's too costly to copy the `LongToUnsafeRowMap`, we should think about how to do it efficiently. cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99653/testReport)** for PR 23088 at commit [`41cfe80`](https://github.com/apache/spark/commit/41cfe8084a73e13336ab753a46fdc4c950583478). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22899 @gatorsmile Thanks a lot. I completely agree that we should try and combine these two. I will continue to think about it :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5709/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99652/testReport)** for PR 23194 at commit [`ac2b004`](https://github.com/apache/spark/commit/ac2b0048f13acbce8f9b82f2b1c5f5cd268c63d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5708/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99649/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23207 cc @cloud-fan @gatorsmile @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99649/testReport)** for PR 23194 at commit [`877751b`](https://github.com/apache/spark/commit/877751bdab5310dae4c5d8a1f2b7c3bbd761b718). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23214: [SPARK-26155] Optimizing the performance of LongT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23214#discussion_r238549645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -398,8 +399,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private var numKeys = 0L // Tracking average number of probes per key lookup. - private var numKeyLookups = 0L - private var numProbes = 0L + private var numKeyLookups = new LongAdder + private var numProbes = new LongAdder --- End diff -- I'm surprised. I think `LongToUnsafeRowMap` is used in a single thread environment and multi-thread contend should not be an issue here. Do you have any insights about how this fixes the perf issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5707/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99648/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99648/testReport)** for PR 23169 at commit [`f0f75c2`](https://github.com/apache/spark/commit/f0f75c25b95010d63ecdf83bb9f280687361d154). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit [`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99643/testReport)** for PR 23207 at commit [`7c8e516`](https://github.com/apache/spark/commit/7c8e5161904f1fd0fa4d99e6c497ef1be3542bdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22899: [SPARK-25573] Combine resolveExpression and resol...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22899 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22899 LGTM Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22899 To be honest, we might still need to revisit it since it is still very confusing to the developer which one they should use, top-down? or bottom-up? The current use case for top-down is majorly for resolving the higher order functions. This PR at least improves the description. We might need to combine them in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 Basically logically there are only two expressions: In which handles arbitrary expressions, and InSet which handles expressions with literals. Both could work: (1) we provide two separate expressions for InSet, one using switch, and one using hashset, or (2) we just provide one InSet and internally in InSet have two implementations ... The downside with creating different expressions for the same logical expression is that potentially the downstream optimization rules would need to match more. On Mon, Dec 03, 2018 at 10:52 PM, DB Tsai < notificati...@github.com > wrote: > > > > @ rxin ( https://github.com/rxin ) switch in Java is still significantly > faster than hash set even without boxing / unboxing problems when the > number of elements are small. We were thinking about to have two > implementations in InSet , and pick up switch if the number of elements are > small, or otherwise pick up hash set one. But this is the same complexity > as having two implements in In as this PR. > > > > @ cloud-fan ( https://github.com/cloud-fan ) do you suggest to create an OptimizeIn > which has switch and hash set implementations based on the length of the > elements and remove InSet ? Basically, what we were thinking above. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443991336 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPKtGyx5jWxgtO1y5WsiXYDAQqRQ4ks5u1hvJgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99644/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23171 @rxin `switch` in Java is still significantly faster than hash set even without boxing / unboxing problems when the number of elements are small. We were thinking about to have two implementations in `InSet`, and pick up `switch` if the number of elements are small, or otherwise pick up hash set one. But this is the same complexity as having two implements in `In` as this PR. @cloud-fan do you suggest to create an `OptimizeIn` which has `switch` and hash set implementations based on the length of the elements and remove `InSet`? Basically, what we were thinking above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/testReport)** for PR 23194 at commit [`867ee5e`](https://github.com/apache/spark/commit/867ee5e5ee73d00ea492655aabaa407089ea0f91). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @adrian-wang ok~ I will add some comments to explain the reason --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @JkSelf thx~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23214 @LuciferYang the patch is fine in my test environment. @adrian-wang I will run all the tpcds queries in spark2.3 and spark2.3 with this patch later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #99650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99650/testReport)** for PR 22468 at commit [`fbfbbff`](https://github.com/apache/spark/commit/fbfbbff55d900ae1101ceb4f7823a9298464cb07). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238543369 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- fixed code to use `ExpressionEvalHelper.checkResult`. I don't remember correctly though, we might have some historical reasons about that; `ArrayBasedMapData` has no `hashCode` and `equals`. Probably, somebody might know this... cc: @hvanhovell @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5706/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan @viirya #23214 maybe reslove this problem and we needn't revert this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23214 maybe add some detailed test result in description and explain the reason for this in code comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ping @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @cloud-fan , help to review this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @JkSelf help to check this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23214: [SPARK-26155] Optimizing the performance of LongT...
GitHub user LuciferYang opened a pull request: https://github.com/apache/spark/pull/23214 [SPARK-26155] Optimizing the performance of LongToUnsafeRowMap ## What changes were proposed in this pull request? To slove @JkSelf report problem at [SPARK-26155](https://issues.apache.org/jira/browse/SPARK-26155), use LongAdder instead of Long of `numKeyLookups` and `numProbes` to reduce add operation times. @JkSelf test this patch in Intel performance testing environment and run TPCDS sqls after this patch with Spark-2.3 and master no longer slower than Spark-2.1. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/LuciferYang/spark spark-26155 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23214 commit 34decea2f59d9fb3930f4d2ca1ad8a625d0409ea Author: yangjie01 Date: 2018-12-04T05:49:19Z use LongAdder instead of Long --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22899 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99642/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22899 **[Test build #99642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99642/testReport)** for PR 22899 at commit [`45e314a`](https://github.com/apache/spark/commit/45e314aa33fb0c108f185519ff58db691c34d58c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5705/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99649/testReport)** for PR 23194 at commit [`877751b`](https://github.com/apache/spark/commit/877751bdab5310dae4c5d8a1f2b7c3bbd761b718). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 How about, we create an `OptimizedIn`, and convert `In` to `OptimizedIn` if the list is all literals? `OptimizedIn` will pick `switch` or hash set based on the length of the list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99648/testReport)** for PR 23169 at commit [`f0f75c2`](https://github.com/apache/spark/commit/f0f75c25b95010d63ecdf83bb9f280687361d154). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/23169 @DaveDeCaprio You might miss to roll back change in test. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99632/testReport/org.apache.spark.sql.catalyst.trees/TreeNodeSuite/treeString_limits_plan_length/ I also think you need to add a new test with setting configuration to some value and see whether it works properly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238534101 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- Or we can use `ExpressionEvalHelper.checkResult` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/23169 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238533700 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- maybe we should implement `equals` and `hashCode` in `ArrayBasedMapData` and `UnsafeMapData`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238530264 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,100 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, +// we need to take care of it to compare rows. +def toComparable(d: Any): Any = d match { --- End diff -- Since we cannot compare `ArrayBasedMapData`s directly (that is, `assert(mapResultRow === mapExpectedRow)` fails), I just converted them into the `Seq`s of keys/values by this method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #99647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99647/testReport)** for PR 22468 at commit [`7ef5f86`](https://github.com/apache/spark/commit/7ef5f866eb02f6638a5be00a602de6c6810ae2a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5704/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 I thought InSwitch logically is the same as InSet, in which all the child expressions are literals? On Mon, Dec 03, 2018 at 8:38 PM, Wenchen Fan < notificati...@github.com > wrote: > > > > I think InSet is not an optimized version of In , but just a way to > separate the implementation for different conditions (the length of the > list). Maybe we should do the same thing here, create a InSwitch and > convert In to it when meeting some conditions. One problem is, In and InSwitch > is same in the interpreted version, maybe we should create a base class > for them. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443968486 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPDTQic0Ii5UD40m_Uj5kMVy4pNExks5u1fxPgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5703/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5702/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 I think `InSet` is not an optimized version of `In`, but just a way to separate the implementation for different conditions (the length of the list). Maybe we should do the same thing here, create a `InSwitch` and convert `In` to it when meeting some conditions. One problem is, `In` and `InSwitch` is same in the interpreted version, maybe we should create a base class for them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99646/testReport)** for PR 23213 at commit [`2ced0ca`](https://github.com/apache/spark/commit/2ced0cab0c16ee7a2400035a5a7794033eae3ed9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/23213 [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE_CODEGEN_ENABLED=false ## What changes were proposed in this pull request? For better test coverage, this pr set `false` at `WHOLESTAGE_CODEGEN_ENABLED` for interpreter execution tests when running `SQLQueryTestSuite`. This pr moved the existing tests into `SQLQuerySuite` because explain output results are different between codegen-only and interpreter modes. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark InterpreterModeTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23213 commit 2ced0cab0c16ee7a2400035a5a7794033eae3ed9 Author: Takeshi Yamamuro Date: 2018-12-04T04:17:48Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99645/testReport)** for PR 23088 at commit [`f8cfb54`](https://github.com/apache/spark/commit/f8cfb544a805ecb5d1056f703dde4e7705ad1810). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23194 LGTM except for minor comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238526892 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,41 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) + protected def withEmptyDirInTablePath(dirName: String)(f: File => Unit): Unit = { --- End diff -- private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99637/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #99637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99637/testReport)** for PR 22514 at commit [`e04812d`](https://github.com/apache/spark/commit/e04812d6aff0270ab4c0e6c3b2e204a682739e23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23171 That probably means we should just optimize InSet to have the switch version though? Rather than do it in In? On Mon, Dec 03, 2018 at 8:20 PM, Wenchen Fan < notificati...@github.com > wrote: > > > > @ rxin ( https://github.com/rxin ) I proposed the same thing before, but > one problem is that, we only convert In to InSet when the length of list > reaches the threshold. If the switch way is faster than hash set when the > list is small, it seems still worth to optimize In using switch. > > > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub ( > https://github.com/apache/spark/pull/23171#issuecomment-443965616 ) , or mute > the thread ( > https://github.com/notifications/unsubscribe-auth/AATvPEkrUFJuT4FI167cCI9b0nfv16V4ks5u1fgNgaJpZM4Y4P4J > ). > > > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23171 @rxin I proposed the same thing before, but one problem is that, we only convert `In` to `InSet` when the length of list reaches the threshold. If the `switch` way is faster than hash set when the list is small, it seems still worth to optimize `In` using `switch`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty con...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23212 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99640/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23212 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23212 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23212: [SPARK-25498][SQL][FOLLOW-UP] Return an empty config set...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23212 **[Test build #99640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99640/testReport)** for PR 23212 at commit [`ca09bb8`](https://github.com/apache/spark/commit/ca09bb82d49d29a5d6f088d2250e980bab64a8b7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r238524973 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import org.apache.spark.sql.types.StructType; /** - * A mix-in interface for {@link DataSourceV2}. Data sources can implement this interface to + * A mix-in interface for {@link Table}. Data sources can implement this interface to * provide data writing ability for batch processing. * * This interface is used to create {@link BatchWriteSupport} instances when end users run --- End diff -- I don't have a better naming in mind, so I leave it as `WriteSupport` for now. Better naming is welcome to match `Scan`! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238524763 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- As an example, `ProjectExec.outputPartitioning` can be wrong, as it doesn't consider the aliases in the project list. I think it's clearer to adjust the `outputPartitioning` there, instead of dealing with it in a rule. What if we have more rules need to check `outputPartitioning` and `requiredChildDistribution`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99638/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for resolu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238524452 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -186,6 +186,54 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } + protected def testORCTableLocation(isConvertMetastore: Boolean): Unit = { +val tableName1 = "spark_orc1" +val tableName2 = "spark_orc2" + +withTempDir { dir => + val someDF1 = Seq((1, 1, "orc1"), (2, 2, "orc2")).toDF("c1", "c2", "c3").repartition(1) + withTable(tableName1, tableName2) { +val dataDir = s"${dir.getCanonicalPath}/dir1/" +val parentDir = s"${dir.getCanonicalPath}/" +val wildCardDir = new File(s"${dir}/*").toURI +someDF1.write.orc(dataDir) +val parentDirStatement = + s""" + |CREATE EXTERNAL TABLE $tableName1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '${parentDir}'""".stripMargin +sql(parentDirStatement) +val parentDirSqlStatement = s"select * from ${tableName1}" +if (isConvertMetastore) { + checkAnswer(sql(parentDirSqlStatement), Nil) +} else { + checkAnswer(sql(parentDirSqlStatement), + (1 to 2).map(i => Row(i, i, s"orc$i"))) +} + +val wildCardStatement = + s""" + |CREATE EXTERNAL TABLE $tableName2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS orc + |LOCATION '$wildCardDir'""".stripMargin --- End diff -- Thanks, @kevinyu98 . Also, please update the PR title ``` - [Spark-25993][SQL][TEST]Add test cases for resolution of ORC table location + [SPARK-25993][SQL][TEST] Add test cases for CREATE EXTERNAL TABLE with subdirectories ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #99638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99638/testReport)** for PR 20433 at commit [`7cb8f87`](https://github.com/apache/spark/commit/7cb8f87ba83fb04e891e069b8339dcef6d835ed9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99635/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org