[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 As @cloud-fan said `the hash join metrics is wrongly implemented`, we will partial revert # SPARK-21052, no longer need this patch, close it ~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 On the other hand, if is only a `multi-thread problem`, may not affect performance because there is no synchronized code part ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I think there is a problem, but no one found out because it's only about metrics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ``` For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via HashedRelation.asReadOnlyCopy. However, this is a shallow copy, the LongToUnsafeRowMap is not copied and likely shared by multiple HashedRelations. ``` Was there no problems of data correctness in the past use unthread-safe Long type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23214 Thanks for doing this. I think we are more close to the root cause. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99651/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit [`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 It's easy to track `numKeyLookups` at `HashedRelation`, but it's hard to track `numProbes`. One idea is, we pass a `MutableInt` to `LongToUnsafeRowMap.getValue` as a parameter, and in the method we set the actual `numProbes` of this look up to the `MutableInt` parameter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by multiple threads. For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via `HashedRelation.asReadOnlyCopy`. However, this is a shallow copy, the `LongToUnsafeRowMap` is not copied and likely shared by multiple `HashedRelation`s. The metrics is per-task, so I think a better fix is to track the hash probe metrics per `HashedRelation`, instead of per `LongToUnsafeRowMap`. It's too costly to copy the `LongToUnsafeRowMap`, we should think about how to do it efficiently. cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5707/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit [`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @adrian-wang ok~ I will add some comments to explain the reason --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @JkSelf thx~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23214 @LuciferYang the patch is fine in my test environment. @adrian-wang I will run all the tpcds queries in spark2.3 and spark2.3 with this patch later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23214 maybe add some detailed test result in description and explain the reason for this in code comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ping @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @cloud-fan , help to review this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @JkSelf help to check this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org