LuciferYang commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1296152282
https://github.com/LuciferYang/spark/blob/size-bench/core/src/test/scala/org/apache/spark/SizeBenchmark.scala I write a simple bench to test 2 the following two scenarios: 1. Different `ArrayBuffer[Int]` size + call `.size` once 2. Same `ArrayBuffer[Int]` size + call `.size` more than once The local test results are as follows (`toSeq + Size` represents without the pr, `toIndexedSeq + Size ` epresents with the pr) **Scala 2.13** ``` OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 3 4 0 34.3 29.2 1.0X toIndexedSeq + Size 4 4 0 25.3 39.5 0.7X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 10 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 11 12 1 8.9 112.1 1.0X toIndexedSeq + Size 5 5 0 21.3 47.0 2.4X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 100 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 92 94 4 1.1 919.3 1.0X toIndexedSeq + Size 24 25 0 4.1 241.2 3.8X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 920 924 7 0.1 9200.1 1.0X toIndexedSeq + Size 214 215 1 0.5 2142.2 4.3X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 10000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------------ toSeq + Size 9005 9218 300 0.0 90053.2 1.0X toIndexedSeq + Size 2960 2962 2 0.0 29604.2 3.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 100000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 100712 102251 2177 0.0 1007118.0 1.0X toIndexedSeq + Size 29682 30167 687 0.0 296816.2 3.4X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 2 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 774 780 7 0.1 7744.3 1.0X toIndexedSeq + Size 295 296 2 0.3 2947.9 2.6X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 3 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 938 941 3 0.1 9381.0 1.0X toIndexedSeq + Size 297 302 4 0.3 2969.4 3.2X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 4 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1102 1111 13 0.1 11020.2 1.0X toIndexedSeq + Size 299 301 2 0.3 2992.8 3.7X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 5 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1281 1290 12 0.1 12812.8 1.0X toIndexedSeq + Size 301 313 17 0.3 3007.9 4.3X ``` **Scala 2.12** ``` OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 187.5 5.3 1.0X toIndexedSeq + Size 4 4 0 25.0 40.0 0.1X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 10 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 81.0 12.4 1.0X toIndexedSeq + Size 5 5 0 21.6 46.3 0.3X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 100 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 81.1 12.3 1.0X toIndexedSeq + Size 29 29 0 3.5 289.5 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 80.7 12.4 1.0X toIndexedSeq + Size 247 249 3 0.4 2468.2 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 10000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------------ toSeq + Size 1 1 0 80.7 12.4 1.0X toIndexedSeq + Size 2432 2434 3 0.0 24318.6 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 100000 and call .size 1 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 80.7 12.4 1.0X toIndexedSeq + Size 26121 26133 17 0.0 261209.8 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 2 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 1 1 0 141.9 7.0 1.0X toIndexedSeq + Size 263 267 4 0.4 2629.3 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 3 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 2 2 0 65.7 15.2 1.0X toIndexedSeq + Size 264 267 2 0.4 2636.7 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 4 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 2 2 0 60.7 16.5 1.0X toIndexedSeq + Size 264 267 2 0.4 2641.9 0.0X OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Mac OS X 11.4 Apple M1 Test size of Seq with buffer size 1000 and call .size 5 time(s): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------- toSeq + Size 2 2 0 55.6 18.0 1.0X toIndexedSeq + Size 263 266 4 0.4 2629.1 0.0X ``` From the test results, it can be seen that change to `toIndexedSeq` can improve the performance of Scala 2.13 by 3 to 4 times( So it is necessary to change to `toIndexedSeq` in long term), but it may also cause more than 10 times of performance degradation for Scala 2.12. The performance impact is related to the size of `ArrayBuffer` and the number of calls to `.size` methods.. I am using GA(x86) to test this scenario and update the conclusion later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
