Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19135
Sorry I'm not so familiar with this part, but from the test result seems
that the performance just improved a little. I would doubt the way you generate
RDD `0 until Integer.MAX_VALUE` might take most of the time (since a large
integer array needs to be serialized with tasks and ship to executor).
Also I see you use 1 executor with 20 cores to do test. In the normal usage
case we will not allocate so many cores to 1 executor, can you please test with
2-4 cores per executor, I guess with less cores, the contention of
MemoryManager lock should be alleviated, and the performance might be close.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]