LuciferYang commented on PR #37609:
URL: https://github.com/apache/spark/pull/37609#issuecomment-1245286689
```
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 150: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 1407 1408
1 0.1 14071.2 1.0X
Use zip + collection.breakOut 1327 1328
2 0.1 13270.1 1.1X
Use Manual builder 1282 1282
0 0.1 12815.3 1.1X
Use Manual map 1734 1735
1 0.1 17339.8 0.8X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 200: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 1769 1769
1 0.1 17688.7 1.0X
Use zip + collection.breakOut 1595 1598
5 0.1 15949.0 1.1X
Use Manual builder 1544 1546
3 0.1 15440.0 1.1X
Use Manual map 2416 2418
2 0.0 24161.1 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 300: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 2701 2705
6 0.0 27007.0 1.0X
Use zip + collection.breakOut 2472 2475
4 0.0 24719.1 1.1X
Use Manual builder 2379 2384
8 0.0 23787.5 1.1X
Use Manual map 3803 3807
5 0.0 38031.9 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 400: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 3757 3758
2 0.0 37565.1 1.0X
Use zip + collection.breakOut 3446 3447
3 0.0 34455.1 1.1X
Use Manual builder 3314 3318
5 0.0 33139.8 1.1X
Use Manual map 5283 5287
5 0.0 52832.3 0.7X
```
Add results of input size 150, 200, 300, 400.
@cloud-fan , from bench results:
- If input data size < 500, the performance of using `zip +
collection.breakOut` and `while loop manually to build the map with mapbuilder`
are close, 10%+ faster than `zip(...).toMap`.
- If input data size >= 500, will be no significant performance gap
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]