zhztheplayer commented on PR #52817:
URL: https://github.com/apache/spark/pull/52817#issuecomment-3513268573
@HyukjinKwon Thank you for the quick response.
I benchmarked using the existing HashedRelationMetricsBenchmark.
### 500K Rows
Before:
```
OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux
6.14.0-33-generic
18:40:31.460 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep,
-m, 1, model name, /proc/cpuinfo) exited with code 1:
Unknown processor
LongToUnsafeRowMap metrics: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
LongToUnsafeRowMap 55 63
4 9.1 109.8 1.0X
```
After:
```
OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux
6.14.0-33-generic
18:39:53.863 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep,
-m, 1, model name, /proc/cpuinfo) exited with code 1:
Unknown processor
LongToUnsafeRowMap metrics: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
LongToUnsafeRowMap 63 105
38 8.0 125.5 1.0X
```
### 10M Rows
Before:
```
OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux
6.14.0-33-generic
18:53:43.292 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep,
-m, 1, model name, /proc/cpuinfo) exited with code 1:
Unknown processor
LongToUnsafeRowMap metrics: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
LongToUnsafeRowMap 2955 3121
235 3.4 295.5 1.0X
```
After:
```
OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux
6.14.0-33-generic
18:54:10.447 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep,
-m, 1, model name, /proc/cpuinfo) exited with code 1:
Unknown processor
LongToUnsafeRowMap metrics: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
LongToUnsafeRowMap 3048 3336
408 3.3 304.8 1.0X
```
-------
> e.g., sometimes it appears JIT to be quite smarter than using direct off
heap memory
Yes, and also regarding the benchmark results, I do think the new approach
has to be at least a bit slower. Do you have any suggestions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]