GitHub user ConeyLiu opened a pull request:
https://github.com/apache/spark/pull/19135
[SPARK-21923][CORE]Avoid call reserveUnrollMemoryForThisTask every record
## What changes were proposed in this pull request?
When Spark persist data to Unsafe memory, we call the method
`MemoryStore.putIteratorAsValues`, which need synchronize the `memoryManager`
for every record write. This implementation is not necessary, we can apply for
more memory at a time to reduce unnecessary synchronization.
## How was this patch tested?
Test case (with 1 executor 20 core):
```scala
val start = System.currentTimeMillis()
val data = sc.parallelize(0 until Integer.MAX_VALUE, 100)
.persist(StorageLevel.OFF_HEAP)
.count()
println(System.currentTimeMillis() - start)
```
Test result:
before
| 27647 | 29108 | 28591 | 28264 | 27232 |
after
| 26868 | 26358 | 27767 | 26653 | 26693 |
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ConeyLiu/spark memorystore
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19135.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19135
----
commit d20fa4312b811d10ba993dcbb7a9bebc24d5a56c
Author: Xianyang Liu <[email protected]>
Date: 2017-09-03T06:43:57Z
avoid call reserveUnrollMemoryForThisTask every record
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]