GitHub user ConeyLiu opened a pull request: https://github.com/apache/spark/pull/19135
[SPARK-21923][CORE]Avoid call reserveUnrollMemoryForThisTask every record ## What changes were proposed in this pull request? When Spark persist data to Unsafe memory, we call the method `MemoryStore.putIteratorAsValues`, which need synchronize the `memoryManager` for every record write. This implementation is not necessary, we can apply for more memory at a time to reduce unnecessary synchronization. ## How was this patch tested? Test case (with 1 executor 20 core): ```scala val start = System.currentTimeMillis() val data = sc.parallelize(0 until Integer.MAX_VALUE, 100) .persist(StorageLevel.OFF_HEAP) .count() println(System.currentTimeMillis() - start) ``` Test result: before | 27647 | 29108 | 28591 | 28264 | 27232 | after | 26868 | 26358 | 27767 | 26653 | 26693 | You can merge this pull request into a Git repository by running: $ git pull https://github.com/ConeyLiu/spark memorystore Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19135 ---- commit d20fa4312b811d10ba993dcbb7a9bebc24d5a56c Author: Xianyang Liu <xianyang....@intel.com> Date: 2017-09-03T06:43:57Z avoid call reserveUnrollMemoryForThisTask every record ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org