Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19222
@hvanhovell cc: @rednaxelafx
Surprisingly, this PR improves memory accesses for `UTF8String ` by about
1.8x due to passing the concrete type to `Platform.getByte()/putByte()`.
Without this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux
4.4.0-22-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
UTF8String benchmark: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
trim 1260 / 1290 426.2
2.3 1.0X
```
With this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux
4.4.0-22-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
UTF8String benchmark: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
trim 684 / 729 785.3
1.3 1.0X
```
Benchmark program
```
test("benchmark UTF8String") {
val N = 512 * 1024 * 1024
val iters = 2
val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters =
20)
val s = new java.io.StringWriter() { { for (i <- 0 until N) { write("
") } } }.toString
val str = UTF8String.fromString(s)
benchmark.addCase("trim") { _: Int =>
var trimmed: UTF8String = null
for (_ <- 0L until iters) {
trimmed = str.trim()
}
}
benchmark.run()
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]