[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-09 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
As @cloud-fan said `the hash join metrics is wrongly implemented`,  we will 
partial revert # SPARK-21052, no longer need this patch, close it ~


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
On the other hand, if is only a `multi-thread problem`, may not affect 
performance because there is no  synchronized code part ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23214
  
I think there is a problem, but no one found out because it's only about 
metrics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
```
For broadcast hash join, we will copy the broadcasted hash relation to 
avoid multi-thread problem, via HashedRelation.asReadOnlyCopy. However, this is 
a shallow copy, the LongToUnsafeRowMap is not copied and likely shared by 
multiple HashedRelations.
```
Was there no problems of data correctness in the past use unthread-safe 
Long type?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/23214
  
Thanks for doing this. I think we are more close to the root cause.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99651/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23214
  
**[Test build #99651 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)**
 for PR 23214 at commit 
[`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23214
  
It's easy to track `numKeyLookups` at `HashedRelation`, but it's hard to 
track `numProbes`. One idea is, we pass a `MutableInt` to 
`LongToUnsafeRowMap.getValue` as a parameter, and in the method we set the 
actual `numProbes` of this look up to the `MutableInt` parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23214
  
I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by 
multiple threads.

For broadcast hash join, we will copy the broadcasted hash relation to 
avoid multi-thread problem, via `HashedRelation.asReadOnlyCopy`. However, this 
is a shallow copy, the `LongToUnsafeRowMap` is not copied and likely shared by 
multiple `HashedRelation`s.

The metrics is per-task, so I think a better fix is to track the hash probe 
metrics per `HashedRelation`, instead of per `LongToUnsafeRowMap`. It's too 
costly to copy the `LongToUnsafeRowMap`, we should think about how to do it 
efficiently. cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5707/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23214
  
**[Test build #99651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)**
 for PR 23214 at commit 
[`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23214
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
@adrian-wang ok~ I will add some comments to explain the reason


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
@JkSelf thx~ 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread JkSelf
Github user JkSelf commented on the issue:

https://github.com/apache/spark/pull/23214
  
@LuciferYang  the patch is fine in my test environment. 
@adrian-wang  I will run all the tpcds queries in spark2.3 and spark2.3 
with this patch later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread adrian-wang
Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23214
  
maybe add some detailed test result in description and explain the reason 
for this in code comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
ping @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
cc @cloud-fan ,  help to review this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue:

https://github.com/apache/spark/pull/23214
  
cc @JkSelf  help to check this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23214
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org