fitermay commented on issue #23986: [SPARK-27070] Fix performance bug in DefaultPartitionCoalescer URL: https://github.com/apache/spark/pull/23986#issuecomment-470916722 Benchmark with 100K blocks instead of several million. Number of hosts = 1 is clearly the worst case ```Intel64 Family 6 Model 63 Stepping 2, GenuineIntel Coalesced RDD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Java HotSpot(TM) 64-Bit Server VM 1.8.0_112-b15 on Windows 10 10.0 Intel64 Family 6 Model 63 Stepping 2, GenuineIntel Coalesced RDD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Coalesce Num Partitions: 100 Num Hosts: 1 492 520 33 0.2 4919.9 1.0X Coalesce Num Partitions: 100 Num Hosts: 5 310 328 22 0.3 3103.2 1.6X Coalesce Num Partitions: 100 Num Hosts: 10 247 267 19 0.4 2468.2 2.0X Coalesce Num Partitions: 100 Num Hosts: 20 240 252 15 0.4 2399.7 2.1X Coalesce Num Partitions: 100 Num Hosts: 40 229 244 16 0.4 2290.8 2.1X Coalesce Num Partitions: 100 Num Hosts: 80 212 225 13 0.5 2123.6 2.3X Coalesce Num Partitions: 500 Num Hosts: 1 1149 1177 26 0.1 11492.7 0.4X Coalesce Num Partitions: 500 Num Hosts: 5 464 500 34 0.2 4643.8 1.1X Coalesce Num Partitions: 500 Num Hosts: 10 386 397 19 0.3 3862.2 1.3X Coalesce Num Partitions: 500 Num Hosts: 20 336 340 7 0.3 3358.1 1.5X Coalesce Num Partitions: 500 Num Hosts: 40 269 283 17 0.4 2686.0 1.8X Coalesce Num Partitions: 500 Num Hosts: 80 239 245 9 0.4 2391.0 2.1X Coalesce Num Partitions: 1000 Num Hosts: 1 2213 2258 39 0.0 22131.2 0.2X Coalesce Num Partitions: 1000 Num Hosts: 5 645 650 9 0.2 6448.8 0.8X Coalesce Num Partitions: 1000 Num Hosts: 10 467 473 7 0.2 4673.8 1.1X Coalesce Num Partitions: 1000 Num Hosts: 20 413 425 17 0.2 4133.7 1.2X Coalesce Num Partitions: 1000 Num Hosts: 40 341 347 10 0.3 3412.4 1.4X Coalesce Num Partitions: 1000 Num Hosts: 80 269 276 11 0.4 2688.8 1.8X Coalesce Num Partitions: 5000 Num Hosts: 1 11048 11100 46 0.0 110484.2 0.0X Coalesce Num Partitions: 5000 Num Hosts: 5 2396 2457 55 0.0 23959.0 0.2X Coalesce Num Partitions: 5000 Num Hosts: 10 1390 1397 9 0.1 13899.1 0.4X Coalesce Num Partitions: 5000 Num Hosts: 20 852 858 6 0.1 8516.9 0.6X Coalesce Num Partitions: 5000 Num Hosts: 40 569 586 21 0.2 5692.7 0.9X Coalesce Num Partitions: 5000 Num Hosts: 80 432 440 9 0.2 4322.7 1.1X Coalesce Num Partitions: 10000 Num Hosts: 1 19685 19779 83 0.0 196853.8 0.0X Coalesce Num Partitions: 10000 Num Hosts: 5 4044 4144 87 0.0 40437.9 0.1X Coalesce Num Partitions: 10000 Num Hosts: 10 2393 2483 88 0.0 23931.6 0.2X Coalesce Num Partitions: 10000 Num Hosts: 20 1242 1338 84 0.1 12419.6 0.4X Coalesce Num Partitions: 10000 Num Hosts: 40 816 821 9 0.1 8158.7 0.6X Coalesce Num Partitions: 10000 Num Hosts: 80 555 571 23 0.2 5554.2 0.9X ``` After patch: ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_112-b15 on Windows 10 10.0 Intel64 Family 6 Model 63 Stepping 2, GenuineIntel Coalesced RDD: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Coalesce Num Partitions: 100 Num Hosts: 1 394 433 37 0.3 3941.6 1.0X Coalesce Num Partitions: 100 Num Hosts: 5 275 279 7 0.4 2748.4 1.4X Coalesce Num Partitions: 100 Num Hosts: 10 236 241 9 0.4 2355.8 1.7X Coalesce Num Partitions: 100 Num Hosts: 20 226 239 12 0.4 2259.1 1.7X Coalesce Num Partitions: 100 Num Hosts: 40 220 233 14 0.5 2199.3 1.8X Coalesce Num Partitions: 100 Num Hosts: 80 212 227 14 0.5 2120.3 1.9X Coalesce Num Partitions: 500 Num Hosts: 1 961 976 24 0.1 9606.9 0.4X Coalesce Num Partitions: 500 Num Hosts: 5 358 367 10 0.3 3580.5 1.1X Coalesce Num Partitions: 500 Num Hosts: 10 288 299 19 0.3 2877.5 1.4X Coalesce Num Partitions: 500 Num Hosts: 20 251 257 9 0.4 2508.5 1.6X Coalesce Num Partitions: 500 Num Hosts: 40 248 252 4 0.4 2478.1 1.6X Coalesce Num Partitions: 500 Num Hosts: 80 225 234 13 0.4 2247.3 1.8X Coalesce Num Partitions: 1000 Num Hosts: 1 1575 1581 9 0.1 15747.8 0.3X Coalesce Num Partitions: 1000 Num Hosts: 5 515 524 10 0.2 5154.8 0.8X Coalesce Num Partitions: 1000 Num Hosts: 10 363 384 20 0.3 3633.5 1.1X Coalesce Num Partitions: 1000 Num Hosts: 20 294 300 6 0.3 2943.6 1.3X Coalesce Num Partitions: 1000 Num Hosts: 40 255 259 4 0.4 2549.3 1.5X Coalesce Num Partitions: 1000 Num Hosts: 80 240 252 11 0.4 2398.7 1.6X Coalesce Num Partitions: 5000 Num Hosts: 1 6904 6948 64 0.0 69038.0 0.1X Coalesce Num Partitions: 5000 Num Hosts: 5 2070 2109 33 0.0 20702.0 0.2X Coalesce Num Partitions: 5000 Num Hosts: 10 1136 1153 27 0.1 11362.4 0.3X Coalesce Num Partitions: 5000 Num Hosts: 20 696 752 49 0.1 6964.3 0.6X Coalesce Num Partitions: 5000 Num Hosts: 40 456 483 39 0.2 4555.8 0.9X Coalesce Num Partitions: 5000 Num Hosts: 80 334 353 17 0.3 3340.8 1.2X Coalesce Num Partitions: 10000 Num Hosts: 1 12789 12875 123 0.0 127889.3 0.0X Coalesce Num Partitions: 10000 Num Hosts: 5 4040 4117 67 0.0 40402.9 0.1X Coalesce Num Partitions: 10000 Num Hosts: 10 2141 2185 61 0.0 21414.0 0.2X Coalesce Num Partitions: 10000 Num Hosts: 20 1152 1153 2 0.1 11516.1 0.3X Coalesce Num Partitions: 10000 Num Hosts: 40 687 695 10 0.1 6869.5 0.6X Coalesce Num Partitions: 10000 Num Hosts: 80 451 458 7 0.2 4505.0 0.9X ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
