zhztheplayer commented on PR #52817:
URL: https://github.com/apache/spark/pull/52817#issuecomment-3513268573

   @HyukjinKwon Thank you for the quick response.
   
   I benchmarked using the existing HashedRelationMetricsBenchmark.
   
   ### 500K Rows
   
   Before:
   
   ```
   OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux 
6.14.0-33-generic
   18:40:31.460 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, 
-m, 1, model name, /proc/cpuinfo) exited with code 1: 
   
   Unknown processor
   LongToUnsafeRowMap metrics:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   LongToUnsafeRowMap                                   55             63       
    4          9.1         109.8       1.0X
   ```
   
   After:
   
   ```
   OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux 
6.14.0-33-generic
   18:39:53.863 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, 
-m, 1, model name, /proc/cpuinfo) exited with code 1: 
   
   Unknown processor
   LongToUnsafeRowMap metrics:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   LongToUnsafeRowMap                                   63            105       
   38          8.0         125.5       1.0X
   ```
   
   ### 10M Rows
   
   Before: 
   
   ```
   OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux 
6.14.0-33-generic
   18:53:43.292 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, 
-m, 1, model name, /proc/cpuinfo) exited with code 1: 
   
   Unknown processor
   LongToUnsafeRowMap metrics:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   LongToUnsafeRowMap                                 2955           3121       
  235          3.4         295.5       1.0X
   ```
   
   After:
   
   ```
   OpenJDK 64-Bit Server VM 17.0.16+8-Ubuntu-0ubuntu124.04.1 on Linux 
6.14.0-33-generic
   18:54:10.447 ERROR org.apache.spark.util.Utils: Process List(/usr/bin/grep, 
-m, 1, model name, /proc/cpuinfo) exited with code 1: 
   
   Unknown processor
   LongToUnsafeRowMap metrics:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   LongToUnsafeRowMap                                 3048           3336       
  408          3.3         304.8       1.0X
   ```
   
   -------
   
   > e.g., sometimes it appears JIT to be quite smarter than using direct off 
heap memory
   
   Yes, and also regarding the benchmark results, I do think the new approach 
has to be at least a bit slower. Do you have any suggestions?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to