Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19222
  
    @hvanhovell @rednaxelafx
    After running a benchmark program, I took a polymorphic approach (i.e. each 
subclass has `getInt()`/`putInt()` methods. Then, I got better performance than 
monomorphic approach (i.e. only `MemoryBlock` class has `final` 
`getInt()`/`putInt()` methods.
    **The root cause for better performance is to pass a concrete type to the 
first argument of `Platform.getInt()/putInt()` instead of virtual call.**
    
    I run [this benchmark 
program](https://gist.github.com/kiszk/94f75b506c93a663bbbc372ffe8f05de) using 
[the 
commit](https://github.com/apache/spark/commit/0714ddcab6d83a489e791536775630e75e8fe5c6).
 I got the following results:
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    Memory access benchmarks:                Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    IntArrayMemoryBlock                            423 /  445        634.1      
     1.6       1.0X
    ByteArrayMemoryBlock                           433 /  443        620.3      
     1.6       1.0X
    Platform                                       431 /  436        622.7      
     1.6       1.0X
    Platform Object                               1004 / 1055        267.4      
     3.7       0.4X
    Platform copyMemory                             45 /   48       5903.9      
     0.2       9.3X
    Platform copyMemory Object                      45 /   47       6004.0      
     0.2       9.5X
    ```
    
    This result shows three facts:
    1. According to the first three results, To have `getInt()/putInt()` in 
subclasses of `MemoryBlock` can achieve comparable performance to the current 
implementation (`Platform` in a table).
    2. According to the third and forth results, even if we use 
`Platform.getInt()/putInt(), we achieve more than 2x worse performance 
(`Platform Object` in a table) when we pass `Object` to the first argument 
instead of concrete type (i.e. `byte[]`).
    For example, `byte[] b; Platform.getInt(b, 0);` can achieve better 
performance than `Object o; Platform.getInt(o, 0);`
    3. According to the fifth and sixth results, for Platform.copy(), to pass 
`Object` can achieve the same performance as to pass `byte[]`.
    
    From fact 2., I used polymorphic approach to pass the concrete type for 
each subclass of `MemoryBlock`. As a result, we can achieve the same 
performance if the current Spark uses a concrete type for the first argument of 
`Platform.getInt()/putInt()`.
    If the current Spark uses `Object` (e.g. 
[here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java#L61)),
 this PR can achieve better performance.
    
    Probably, @rednaxelafx can explain this very well :)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to