GitHub user kiszk reopened a pull request: https://github.com/apache/spark/pull/19222
[SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several types of memory block ## What changes were proposed in this pull request? This PR allows us to use one of several types of `MemoryBlock`, such as byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use `java.nio.DirectByteBuffer` allows to have off heap memory which is automatically deallocated by JVM. `MemoryBlock` class has primitive accessors like `Platform.getInt()`, `Platform.putint()`, or `Platform.copyMemory()`. This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and other places. This PR can improve performance of operations involving memory accesses (e.g. `UTF8String.trim`) by 1.8x. For now, this PR does not use `MemoryBlock` for `BufferHolder` based on @cloud-fan's [suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290). Since this PR is a successor of #11494, close #11494. Many codes were ported from #11494. Many efforts were put here. **I think this PR should credit to @yzotov.** This PR can achieve **1.1-1.4x performance improvements** for operations in `UTF8String` or `Murmur3_x86_32`. Other operations are almost comparable performances. Without this PR ``` OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Hash byte arrays with length 268435487: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Murmur3_x86_32 526 / 536 0.0 131399881.5 1.0X UTF8String benchmark: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ hashCode 525 / 552 1022.6 1.0 1.0X substring 414 / 423 1298.0 0.8 1.3X ``` With this PR ``` OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Hash byte arrays with length 268435487: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Murmur3_x86_32 474 / 488 0.0 118552232.0 1.0X UTF8String benchmark: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ hashCode 476 / 480 1127.3 0.9 1.0X substring 287 / 291 1869.9 0.5 1.7X ``` Benchmark program ``` test("benchmark Murmur3_x86_32") { val length = 8192 * 32768 + 31 val seed = 42L val iters = 1 << 2 val random = new Random(seed) val arrays = Array.fill[MemoryBlock](numArrays) { val bytes = new Array[Byte](length) random.nextBytes(bytes) new ByteArrayMemoryBlock(bytes, Platform.BYTE_ARRAY_OFFSET, length) } val benchmark = new Benchmark("Hash byte arrays with length " + length, iters * numArrays, minNumIters = 20) benchmark.addCase("HiveHasher") { _: Int => var sum = 0L for (_ <- 0L until iters) { sum += HiveHasher.hashUnsafeBytesBlock( arrays(i), Platform.BYTE_ARRAY_OFFSET, length) } } benchmark.run() } test("benchmark UTF8String") { val N = 512 * 1024 * 1024 val iters = 2 val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters = 20) val str0 = new java.io.StringWriter() { { for (i <- 0 until N) { write(" ") } } }.toString val s0 = UTF8String.fromString(str0) benchmark.addCase("hashCode") { _: Int => var h: Int = 0 for (_ <- 0L until iters) { h += s0.hashCode } } benchmark.addCase("substring") { _: Int => var s: UTF8String = null for (_ <- 0L until iters) { s = s0.substring(N / 2 - 5, N / 2 + 5) } } benchmark.run() } ``` I run [this benchmark program](https://gist.github.com/kiszk/94f75b506c93a663bbbc372ffe8f05de) using [the commit](https://github.com/apache/spark/pull/19222/commits/ee5a79861c18725fb1cd9b518cdfd2489c05b81d6). I got the following results: ``` OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 on Linux 4.4.0-66-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Memory access benchmarks: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ ByteArrayMemoryBlock get/putInt() 220 / 221 609.3 1.6 1.0X Platform get/putInt(byte[]) 220 / 236 610.9 1.6 1.0X Platform get/putInt(Object) 492 / 494 272.8 3.7 0.4X OnHeapMemoryBlock get/putLong() 322 / 323 416.5 2.4 0.7X long[] 221 / 221 608.0 1.6 1.0X Platform get/putLong(long[]) 321 / 321 418.7 2.4 0.7X Platform get/putLong(Object) 561 / 563 239.2 4.2 0.4X ``` I also run [this benchmark program](https://gist.github.com/kiszk/5fdb4e03733a5d110421177e289d1fb5) for comparing performance of `Platform.copyMemory()`. ``` OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 on Linux 4.4.0-66-generic Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz Platform copyMemory: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Object to Object 1961 / 1967 8.6 116.9 1.0X System.arraycopy Object to Object 1917 / 1921 8.8 114.3 1.0X byte array to byte array 1961 / 1968 8.6 116.9 1.0X System.arraycopy byte array to byte array 1909 / 1937 8.8 113.8 1.0X int array to int array 1921 / 1990 8.7 114.5 1.0X double array to double array 1918 / 1923 8.7 114.3 1.0X Object to byte array 1961 / 1967 8.6 116.9 1.0X Object to short array 1965 / 1972 8.5 117.1 1.0X Object to int array 1910 / 1915 8.8 113.9 1.0X Object to float array 1971 / 1978 8.5 117.5 1.0X Object to double array 1919 / 1944 8.7 114.4 1.0X byte array to Object 1959 / 1967 8.6 116.8 1.0X int array to Object 1961 / 1970 8.6 116.9 1.0X double array to Object 1917 / 1924 8.8 114.3 1.0X ``` These results show three facts: 1. According to the second/third or sixth/seventh results in the first experiment, if we use `Platform.get/putInt(Object)`, we achieve more than 2x worse performance than `Platform.get/putInt(byte[])` with concrete type (i.e. `byte[]`). 2. According to the second/third or fourth/fifth/sixth results in the first experiment, the fastest way to access an array element on Java heap is `array[]`. **Cons of `array[]` is that it is not possible to support unaligned-8byte access.** 3. According to the first/second/third or fourth/sixth/seventh results in the first experiment, `getInt()/putInt() or getLong()/putLong()` in subclasses of `MemoryBlock` can achieve comparable performance to `Platform.get/putInt()` or `Platform.get/putLong()` with concrete type (second or sixth result). There is no overhead regarding virtual call. 4. According to results in the second experiment, for `Platform.copy()`, to pass `Object` can achieve the same performance as to pass any type of primitive array as source or destination. 5. According to second/fourth results in the second experiment, `Platform.copy()` can achieve the same performance as `System.arrayCopy`. **It would be good to use `Platform.copy()` since `Platform.copy()` can take any types for src and dst.** We are incrementally replace `Platform.get/putXXX` with `MemoryBlock.get/putXXX`. This is because we have two advantages. 1) Achieve better performance due to having a concrete type for an array. 2) Use simple OO design instead of passing `Object` It is easy to use `MemoryBlock` in `InternalRow`, `BufferHolder`, `TaskMemoryManager`, and others that are already abstracted. It is not easy to use `MemoryBlock` in utility classes related to hashing or others. Other candidates are - UnsafeRow, UnsafeArrayData, UnsafeMapData, SpecificUnsafeRowJoiner - UTF8StringBuffer - BufferHolder - TaskMemoryManager - OnHeapColumnVector - BytesToBytesMap - CachedBatch - classes for hash - others. ## How was this patch tested? Added `UnsafeMemoryAllocator` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-10399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19222 ---- commit f427ca2a0f30d70f42b4755d0d72318cfa0e4c77 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-13T10:16:19Z introduce ByteArrayMemoryBlock, IntArrayMemoryBlock, LongArrayMemoryBlock, and OffheaMemoryBlock commit 5d7ccdb0e845afcaa430bac3d21b519d35d1e6f4 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-13T17:15:25Z OffHeapColumnVector uses UnsafeMemoryAllocator commit 251fa09d4421085a6c97f56bb38108980a79b5c8 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-13T17:27:09Z UTF8String uses UnsafeMemoryAllocator commit 790bbe7f3ac52b2b1e4375684998737d6127a552 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-13T17:36:57Z Platform.copymemory() in UsafeInMemorySorter uses new MemoryBlock commit 93a792e7729f213fcef66f0ed9b33f45259d1ec3 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-14T15:34:48Z address review comments commit 0beab0308cdac57e85daf7794170a0ca899ab568 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-14T17:53:33Z fix test failures (e.g. String in UnsafeArrayData) commit fcf764c1aebdc847675f710c17ec8477d6022a40 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-18T16:13:13Z fix failures commit d2d2e50f8a2baf41d5b85127bf888da6f8bca343 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-21T18:45:55Z minor update of UTF8String constructor commit f5e10bb52c33856ddd3e1b1f8483b170e0167c53 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T11:00:12Z rename method name commit 1905e8ca4b3b8200fa56f5fc91899bb420a07628 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T11:01:13Z remove unused code commit 7778e586e94130749cec3f54a60b6fb24514647a Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T11:02:30Z update arrayEquals commit 4f96c82b151b78641bcfc92a65913048b055cfee Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T13:18:01Z rebase master commit d1d6ae90589c0fae6c64af0cb95696e270446228 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T14:51:40Z make more methods final commit 914dcd11d0d5ef014284868f2794cb4e5baa0958 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-22T15:39:57Z make fill method final in MemoryBlock commit 336e4b7bfd7edcb861edeac3ca115dead785b68a Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-23T11:35:15Z fix test failures commit 5be9ccb163832e1895b045730a06e79eb3b171cf Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-24T14:00:40Z add testsuite commit 43e6b572bd893bd42e58df930a34b7e31549a49a Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-24T18:10:49Z pass concrete type to the first argument of Platform.get*/put* to get better performance commit 05f024e566f828e9c3f836430c9c7b34da5e954b Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-28T01:51:38Z rename methods related to hash commit 9071cf6449123400f3d774664e1709337b05c555 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-28T01:52:48Z added methods for MemoryBlock commit 37ee9fa07a8f8faaeb097164422bb15958fa4b1c Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-28T03:44:30Z rebase with master commit d0b5d59bb31fe2845477ee243008992686e2f2a2 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-09-28T04:37:25Z fix scala style error commit 5cdad44717ccb510d4114d14cfff304bef9f5bb4 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-10-14T07:29:14Z use MemoryBlock in Murmur3 for performance reason commit 91028fa2ae34bb3ae667692112b8455d4394cbbd Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-10-14T07:29:30Z fix typo in comment commit 0210bd1e5f46f81617a35493d2cd0b737b4cf85d Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-10-29T12:32:38Z address review comment commit df6dad3762f4e918d503df75ae8fce052af8bf43 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2017-11-28T06:08:52Z rebase with master commit 1fa47a8c291ab54a8c1e386737769f467a76a672 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-02-20T12:17:26Z fix failures commit 01f9c8e8146ff8d21d18feba59c0c2ad83299e2a Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-02-20T17:30:11Z fix failures in ArrowColumnVectorSuite and FeatureHasherSuite commit 2ed8f82f13288e00e6352bdccbac0dd890c91715 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-02-21T17:48:41Z address review comments rename LongArrayMemoryBlock to OnHeapMemoryBlock remove intArrayMemoryBlock separate copyMemory to copyFrom and writeTo and make them non-static add assertion for range check commit 5e3afd11a2dc76d2cd23264b4052abbf3f5d7e9d Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-02-23T16:20:21Z address review comments commit 95fbdee04e9137938cdc76f7f4573116720357f5 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-02-24T00:22:44Z fix test failures ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org