LuciferYang commented on PR #37609:
URL: https://github.com/apache/spark/pull/37609#issuecomment-1244867478
> The idea is still valid: we can write a while loop manually to build the
map, instead of `zip(...).toMap`, if this code path is proven to be performance
critical.
@cloud-fan Do you mean like follows?
```scala
private def zipToMapUseMapBuilder[A, B, K, V](keys: Seq[A], values:
Seq[B]): Map[K, V] = {
import scala.collection.immutable
val builder = immutable.Map.newBuilder[K, V]
val keyIter = keys.iterator
val valueIter = values.iterator
while (keyIter.hasNext && valueIter.hasNext) {
builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)]
}
builder.result()
}
private def zipToMapUseMap[A, B, K, V](keys: Seq[A], values: Seq[B]):
Map[K, V] = {
var elems: Map[K, V] = Map.empty[K, V]
val keyIter = keys.iterator
val valueIter = values.iterator
while (keyIter.hasNext && valueIter.hasNext) {
elems += (keyIter.next().asInstanceOf[K] ->
valueIter.next().asInstanceOf[V])
}
elems
}
```
I write a microben to compare `data.zip(data).toMap`,
`data.zip(data)(collection.breakOut)` and above methods, the result as follows:
```
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 1: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 22 22
1 4.6 217.6 1.0X
Use zip + collection.breakOut 8 9
1 11.9 84.4 2.6X
Use Manual builder 3 3
0 32.1 31.2 7.0X
Use Manual map 3 3
0 36.5 27.4 7.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 5: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 100 100
1 1.0 998.8 1.0X
Use zip + collection.breakOut 11 11
1 9.1 110.5 9.0X
Use Manual builder 76 76
1 1.3 755.6 1.3X
Use Manual map 47 47
1 2.1 468.1 2.1X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 10: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 123 123
1 0.8 1226.3 1.0X
Use zip + collection.breakOut 16 16
1 6.2 160.9 7.6X
Use Manual builder 95 95
1 1.1 947.2 1.3X
Use Manual map 92 94
1 1.1 922.5 1.3X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 20: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 162 162
1 0.6 1615.7 1.0X
Use zip + collection.breakOut 26 27
1 3.8 261.3 6.2X
Use Manual builder 132 133
1 0.8 1321.4 1.2X
Use Manual map 185 186
1 0.5 1846.3 0.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 50: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 604 606
2 0.2 6042.7 1.0X
Use zip + collection.breakOut 76 77
2 1.3 759.9 8.0X
Use Manual builder 534 537
2 0.2 5335.6 1.1X
Use Manual map 510 513
2 0.2 5102.1 1.2X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 100: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 1087 1087
0 0.1 10865.5 1.0X
Use zip + collection.breakOut 134 135
1 0.7 1336.2 8.1X
Use Manual builder 1000 1002
3 0.1 9996.8 1.1X
Use Manual map 1081 1083
2 0.1 10813.0 1.0X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 500: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 4536 4544
10 0.0 45364.9 1.0X
Use zip + collection.breakOut 778 784
5 0.1 7783.8 5.8X
Use Manual builder 4347 4347
0 0.0 43470.2 1.0X
Use Manual map 6775 6785
15 0.0 67745.2 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 1000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 11813 11822
13 0.0 118125.2 1.0X
Use zip + collection.breakOut 1590 1601
15 0.1 15898.3 7.4X
Use Manual builder 11431 11450
27 0.0 114312.3 1.0X
Use Manual map 14801 14812
16 0.0 148005.2 0.8X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 5000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 64917 65007
127 0.0 649172.0 1.0X
Use zip + collection.breakOut 8127 8130
5 0.0 81265.8 8.0X
Use Manual builder 63836 63959
174 0.0 638356.4 1.0X
Use Manual map 88139 88308
239 0.0 881392.4 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 10000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 130985 131331
489 0.0 1309847.7 1.0X
Use zip + collection.breakOut 16133 16142
13 0.0 161325.8 8.1X
Use Manual builder 136655 136916
369 0.0 1366553.8 1.0X
Use Manual map 190525 190794
380 0.0 1905252.4 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Test zip to map with collectionSize = 20000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Use zip + toMap 306207 306628
595 0.0 3062071.9 1.0X
Use zip + collection.breakOut 32482 32498
23 0.0 324818.3 9.4X
Use Manual builder 336547 337705
1637 0.0 3365473.0 0.9X
Use Manual map 410734 411271
758 0.0 4107344.5 0.7X
```
From the results, the performance of `while loop manually to build the map`
is not fast enough. Is there a problem with my test code?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]