Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19222
  
    @hvanhovell cc: @rednaxelafx
    Surprisingly, this PR improves memory accesses for `UTF8String ` by about 
1.8x due to passing the concrete type to `Platform.getByte()/putByte()`.
    
    Without this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    UTF8String benchmark:                    Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    trim                                          1260 / 1290        426.2      
     2.3       1.0X
    ```
    
    With this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    UTF8String benchmark:                    Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    trim                                           684 /  729        785.3      
     1.3       1.0X
    ```
    
    Benchmark program
    ```
      test("benchmark UTF8String") {
        val N = 512 * 1024 * 1024
        val iters = 2
        val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters = 
20)
    
        val s = new java.io.StringWriter() { { for (i <- 0 until N) { write(" 
") } } }.toString
        val str = UTF8String.fromString(s)
    
        benchmark.addCase("trim") { _: Int =>
          var trimmed: UTF8String = null
          for (_ <- 0L until iters) {
            trimmed = str.trim()
          }
        }
        benchmark.run()
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to