GitHub user kiszk reopened a pull request:

    https://github.com/apache/spark/pull/19222

    [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several 
types of memory block

    ## What changes were proposed in this pull request?
    
    This PR allows us to use one of several types of `MemoryBlock`, such as 
byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use 
`java.nio.DirectByteBuffer` allows to have off heap memory which is 
automatically deallocated by JVM. `MemoryBlock`  class has primitive accessors 
like `Platform.getInt()`, `Platform.putint()`, or `Platform.copyMemory()`. 
    
    This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and 
other places. This PR can improve performance of operations involving memory 
accesses (e.g. `UTF8String.trim`) by 1.8x.
    
    For now, this PR does not use `MemoryBlock` for `BufferHolder` based on 
@cloud-fan's 
[suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290).
    
    Since this PR is a successor of #11494, close #11494. Many codes were 
ported from #11494. Many efforts were put here. **I think this PR should credit 
to @yzotov.**
    
    
    This PR can achieve **1.1-1.4x performance improvements** for  operations 
in `UTF8String` or `Murmur3_x86_32`. Other operations are almost comparable 
performances.
    
    Without this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    Hash byte arrays with length 268435487:  Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Murmur3_x86_32                                 526 /  536          0.0   
131399881.5       1.0X
    
    UTF8String benchmark:                    Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    hashCode                                       525 /  552       1022.6      
     1.0       1.0X
    substring                                      414 /  423       1298.0      
     0.8       1.3X
    ```
    
    With this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    Hash byte arrays with length 268435487:  Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Murmur3_x86_32                                 474 /  488          0.0   
118552232.0       1.0X
    
    UTF8String benchmark:                    Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    hashCode                                       476 /  480       1127.3      
     0.9       1.0X
    substring                                      287 /  291       1869.9      
     0.5       1.7X
    ```
    
    Benchmark program
    ```
    test("benchmark Murmur3_x86_32") {
      val length = 8192 * 32768 + 31
      val seed = 42L
      val iters = 1 << 2
      val random = new Random(seed)
      val arrays = Array.fill[MemoryBlock](numArrays) {
        val bytes = new Array[Byte](length)
        random.nextBytes(bytes)
        new ByteArrayMemoryBlock(bytes, Platform.BYTE_ARRAY_OFFSET, length)
      }
                                                                    
      val benchmark = new Benchmark("Hash byte arrays with length " + length,
        iters * numArrays, minNumIters = 20) 
      benchmark.addCase("HiveHasher") { _: Int =>
        var sum = 0L
        for (_ <- 0L until iters) {
          sum += HiveHasher.hashUnsafeBytesBlock(
            arrays(i), Platform.BYTE_ARRAY_OFFSET, length)
        }
      }
      benchmark.run()   
    }
    
    test("benchmark UTF8String") {
      val N = 512 * 1024 * 1024
      val iters = 2
      val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters = 20)
      val str0 = new java.io.StringWriter() { { for (i <- 0 until N) { write(" 
") } } }.toString
      val s0 = UTF8String.fromString(str0)
      benchmark.addCase("hashCode") { _: Int =>
        var h: Int = 0
        for (_ <- 0L until iters) { h += s0.hashCode }
      }
      benchmark.addCase("substring") { _: Int =>
        var s: UTF8String = null
        for (_ <- 0L until iters) { s = s0.substring(N / 2 - 5, N / 2 + 5) }
      }
      benchmark.run()
    }
    ```
    
      
      
    I run [this benchmark 
program](https://gist.github.com/kiszk/94f75b506c93a663bbbc372ffe8f05de) using 
[the 
commit](https://github.com/apache/spark/pull/19222/commits/ee5a79861c18725fb1cd9b518cdfd2489c05b81d6).
 I got the following results:
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 on Linux 
4.4.0-66-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    Memory access benchmarks:                Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    ByteArrayMemoryBlock get/putInt()              220 /  221        609.3      
     1.6       1.0X
    Platform get/putInt(byte[])                    220 /  236        610.9      
     1.6       1.0X
    Platform get/putInt(Object)                    492 /  494        272.8      
     3.7       0.4X
    OnHeapMemoryBlock get/putLong()                322 /  323        416.5      
     2.4       0.7X
    long[]                                         221 /  221        608.0      
     1.6       1.0X
    Platform get/putLong(long[])                   321 /  321        418.7      
     2.4       0.7X
    Platform get/putLong(Object)                   561 /  563        239.2      
     4.2       0.4X
    ```
    
    I also run [this benchmark 
program](https://gist.github.com/kiszk/5fdb4e03733a5d110421177e289d1fb5) for 
comparing performance of `Platform.copyMemory()`.
    ```
    OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 on Linux 
4.4.0-66-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    Platform copyMemory:                     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Object to Object                              1961 / 1967          8.6      
   116.9       1.0X
    System.arraycopy Object to Object             1917 / 1921          8.8      
   114.3       1.0X
    byte array to byte array                      1961 / 1968          8.6      
   116.9       1.0X
    System.arraycopy byte array to byte array      1909 / 1937          8.8     
    113.8       1.0X
    int array to int array                        1921 / 1990          8.7      
   114.5       1.0X
    double array to double array                  1918 / 1923          8.7      
   114.3       1.0X
    Object to byte array                          1961 / 1967          8.6      
   116.9       1.0X
    Object to short array                         1965 / 1972          8.5      
   117.1       1.0X
    Object to int array                           1910 / 1915          8.8      
   113.9       1.0X
    Object to float array                         1971 / 1978          8.5      
   117.5       1.0X
    Object to double array                        1919 / 1944          8.7      
   114.4       1.0X
    byte array to Object                          1959 / 1967          8.6      
   116.8       1.0X
    int array to Object                           1961 / 1970          8.6      
   116.9       1.0X
    double array to Object                        1917 / 1924          8.8      
   114.3       1.0X
    ```
    
    These results show three facts:
    1. According to the second/third or sixth/seventh results in the first 
experiment, if we use `Platform.get/putInt(Object)`, we achieve more than 2x 
worse performance than `Platform.get/putInt(byte[])` with concrete type (i.e. 
`byte[]`).
    2. According to the second/third or fourth/fifth/sixth results in the first 
experiment, the fastest way to access an array element on Java heap is 
`array[]`. **Cons of `array[]` is that it is not possible to support 
unaligned-8byte access.**
    3. According to the first/second/third or fourth/sixth/seventh results in 
the first experiment, `getInt()/putInt() or getLong()/putLong()` in subclasses 
of `MemoryBlock` can achieve comparable performance to `Platform.get/putInt()` 
or `Platform.get/putLong()` with concrete type (second or sixth result). There 
is no overhead regarding virtual call.
    4. According to results in the second experiment, for `Platform.copy()`, to 
pass `Object` can achieve the same performance as to pass any type of primitive 
array as source or destination.
    5. According to second/fourth results in the second experiment, 
`Platform.copy()` can achieve the same performance as `System.arrayCopy`. **It 
would be good to use `Platform.copy()` since `Platform.copy()` can take any 
types for src and dst.**
    
    We are incrementally replace `Platform.get/putXXX` with 
`MemoryBlock.get/putXXX`. This is because we have two advantages.
    1) Achieve better performance due to having a concrete type for an array.
    2) Use simple OO design instead of passing `Object`
    It is easy to use `MemoryBlock` in `InternalRow`, `BufferHolder`, 
`TaskMemoryManager`, and others that are already abstracted. It is not easy to 
use `MemoryBlock` in utility classes related to hashing or others.
    
    Other candidates are 
    - UnsafeRow, UnsafeArrayData, UnsafeMapData, SpecificUnsafeRowJoiner
    - UTF8StringBuffer
    - BufferHolder
    - TaskMemoryManager
    - OnHeapColumnVector
    - BytesToBytesMap
    - CachedBatch
    - classes for hash
    - others.
    
    
    ## How was this patch tested?
    
    Added `UnsafeMemoryAllocator`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-10399

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19222
    
----
commit f427ca2a0f30d70f42b4755d0d72318cfa0e4c77
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-13T10:16:19Z

    introduce ByteArrayMemoryBlock, IntArrayMemoryBlock, LongArrayMemoryBlock, 
and OffheaMemoryBlock

commit 5d7ccdb0e845afcaa430bac3d21b519d35d1e6f4
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-13T17:15:25Z

    OffHeapColumnVector uses UnsafeMemoryAllocator

commit 251fa09d4421085a6c97f56bb38108980a79b5c8
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-13T17:27:09Z

    UTF8String uses UnsafeMemoryAllocator

commit 790bbe7f3ac52b2b1e4375684998737d6127a552
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-13T17:36:57Z

    Platform.copymemory() in UsafeInMemorySorter uses new MemoryBlock

commit 93a792e7729f213fcef66f0ed9b33f45259d1ec3
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-14T15:34:48Z

    address review comments

commit 0beab0308cdac57e85daf7794170a0ca899ab568
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-14T17:53:33Z

    fix test failures (e.g. String in UnsafeArrayData)

commit fcf764c1aebdc847675f710c17ec8477d6022a40
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-18T16:13:13Z

    fix failures

commit d2d2e50f8a2baf41d5b85127bf888da6f8bca343
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-21T18:45:55Z

    minor update of UTF8String constructor

commit f5e10bb52c33856ddd3e1b1f8483b170e0167c53
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T11:00:12Z

    rename method name

commit 1905e8ca4b3b8200fa56f5fc91899bb420a07628
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T11:01:13Z

    remove unused code

commit 7778e586e94130749cec3f54a60b6fb24514647a
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T11:02:30Z

    update arrayEquals

commit 4f96c82b151b78641bcfc92a65913048b055cfee
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T13:18:01Z

    rebase master

commit d1d6ae90589c0fae6c64af0cb95696e270446228
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T14:51:40Z

    make more methods final

commit 914dcd11d0d5ef014284868f2794cb4e5baa0958
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-22T15:39:57Z

    make fill method final in MemoryBlock

commit 336e4b7bfd7edcb861edeac3ca115dead785b68a
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-23T11:35:15Z

    fix test failures

commit 5be9ccb163832e1895b045730a06e79eb3b171cf
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-24T14:00:40Z

    add testsuite

commit 43e6b572bd893bd42e58df930a34b7e31549a49a
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-24T18:10:49Z

    pass concrete type to the first argument of Platform.get*/put* to get 
better performance

commit 05f024e566f828e9c3f836430c9c7b34da5e954b
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-28T01:51:38Z

    rename methods related to hash

commit 9071cf6449123400f3d774664e1709337b05c555
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-28T01:52:48Z

    added methods for MemoryBlock

commit 37ee9fa07a8f8faaeb097164422bb15958fa4b1c
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-28T03:44:30Z

    rebase with master

commit d0b5d59bb31fe2845477ee243008992686e2f2a2
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-09-28T04:37:25Z

    fix scala style error

commit 5cdad44717ccb510d4114d14cfff304bef9f5bb4
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-10-14T07:29:14Z

    use MemoryBlock in Murmur3 for performance reason

commit 91028fa2ae34bb3ae667692112b8455d4394cbbd
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-10-14T07:29:30Z

    fix typo in comment

commit 0210bd1e5f46f81617a35493d2cd0b737b4cf85d
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-10-29T12:32:38Z

    address review comment

commit df6dad3762f4e918d503df75ae8fce052af8bf43
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2017-11-28T06:08:52Z

    rebase with master

commit 1fa47a8c291ab54a8c1e386737769f467a76a672
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-02-20T12:17:26Z

    fix failures

commit 01f9c8e8146ff8d21d18feba59c0c2ad83299e2a
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-02-20T17:30:11Z

    fix failures in ArrowColumnVectorSuite and FeatureHasherSuite

commit 2ed8f82f13288e00e6352bdccbac0dd890c91715
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-02-21T17:48:41Z

    address review comments
    
    rename LongArrayMemoryBlock to OnHeapMemoryBlock
    
    remove intArrayMemoryBlock
    
    separate copyMemory to copyFrom and writeTo and make them non-static
    
    add assertion for range check

commit 5e3afd11a2dc76d2cd23264b4052abbf3f5d7e9d
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-02-23T16:20:21Z

    address review comments

commit 95fbdee04e9137938cdc76f7f4573116720357f5
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-02-24T00:22:44Z

    fix test failures

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to