LuciferYang commented on PR #38427:
URL: https://github.com/apache/spark/pull/38427#issuecomment-1296152282

   
https://github.com/LuciferYang/spark/blob/size-bench/core/src/test/scala/org/apache/spark/SizeBenchmark.scala
   
   I write a simple bench to test 2 the following two scenarios:
   
   1. Different `ArrayBuffer[Int]` size + call `.size` once
   2. Same `ArrayBuffer[Int]` size + call `.size` more than once
   
   The local test results are as follows (`toSeq + Size` represents without the 
pr, `toIndexedSeq + Size ` epresents with the pr)
   
   **Scala 2.13**
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1 and call .size 1 time(s):  Best Time(ms) 
  Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              3  
            4           0         34.3          29.2       1.0X
   toIndexedSeq + Size                                                       4  
            4           0         25.3          39.5       0.7X
   
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 10 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              11 
            12           1          8.9         112.1       1.0X
   toIndexedSeq + Size                                                        5 
             5           0         21.3          47.0       2.4X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 100 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               
92             94           4          1.1         919.3       1.0X
   toIndexedSeq + Size                                                        
24             25           0          4.1         241.2       3.8X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               
920            924           7          0.1        9200.1       1.0X
   toIndexedSeq + Size                                                        
214            215           1          0.5        2142.2       4.3X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 10000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               
9005           9218         300          0.0       90053.2       1.0X
   toIndexedSeq + Size                                                        
2960           2962           2          0.0       29604.2       3.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 100000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              
100712         102251        2177          0.0     1007118.0       1.0X
   toIndexedSeq + Size                                                        
29682          30167         687          0.0      296816.2       3.4X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 2 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               
774            780           7          0.1        7744.3       1.0X
   toIndexedSeq + Size                                                        
295            296           2          0.3        2947.9       2.6X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 3 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               
938            941           3          0.1        9381.0       1.0X
   toIndexedSeq + Size                                                        
297            302           4          0.3        2969.4       3.2X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 4 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              
1102           1111          13          0.1       11020.2       1.0X
   toIndexedSeq + Size                                                        
299            301           2          0.3        2992.8       3.7X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 5 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              
1281           1290          12          0.1       12812.8       1.0X
   toIndexedSeq + Size                                                        
301            313          17          0.3        3007.9       4.3X
   ```
   
   **Scala 2.12**
   
   ```
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1 and call .size 1 time(s):  Best Time(ms) 
  Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                              1  
            1           0        187.5           5.3       1.0X
   toIndexedSeq + Size                                                       4  
            4           0         25.0          40.0       0.1X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 10 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
---------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                               1 
             1           0         81.0          12.4       1.0X
   toIndexedSeq + Size                                                        5 
             5           0         21.6          46.3       0.3X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 100 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                
1              1           0         81.1          12.3       1.0X
   toIndexedSeq + Size                                                        
29             29           0          3.5         289.5       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
1              1           0         80.7          12.4       1.0X
   toIndexedSeq + Size                                                        
247            249           3          0.4        2468.2       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 10000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
 1              1           0         80.7          12.4       1.0X
   toIndexedSeq + Size                                                        
2432           2434           3          0.0       24318.6       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 100000 and call .size 1 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-------------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
  1              1           0         80.7          12.4       1.0X
   toIndexedSeq + Size                                                        
26121          26133          17          0.0      261209.8       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 2 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
1              1           0        141.9           7.0       1.0X
   toIndexedSeq + Size                                                        
263            267           4          0.4        2629.3       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 3 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
2              2           0         65.7          15.2       1.0X
   toIndexedSeq + Size                                                        
264            267           2          0.4        2636.7       0.0X
   
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 4 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
2              2           0         60.7          16.5       1.0X
   toIndexedSeq + Size                                                        
264            267           2          0.4        2641.9       0.0X
   
   OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4
   Apple M1
   Test size of Seq with buffer size 1000 and call .size 5 time(s):  Best 
Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
-----------------------------------------------------------------------------------------------------------------------------------------------
   toSeq + Size                                                                 
2              2           0         55.6          18.0       1.0X
   toIndexedSeq + Size                                                        
263            266           4          0.4        2629.1       0.0X
   ```
   
   From the test results, it can be seen that change to `toIndexedSeq` can 
improve the performance of Scala 2.13 by 3 to 4 times( So it is necessary to 
change to `toIndexedSeq` in long term), but it may also cause more than 10 
times of performance degradation for Scala 2.12. 
   
   The performance impact is related to the size of `ArrayBuffer` and the 
number of calls to `.size` methods.. 
   
   I am using GA(x86) to test this scenario and update the conclusion later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to