LuciferYang commented on PR #37940:
URL: https://github.com/apache/spark/pull/37940#issuecomment-1251869079
Test the following code with input size
`1,5,10,20,50,100,150,200,300,400,500,1000,5000,10000,20000`
```
def testZipWithIndexToMap(valuesPerIteration: Int, collectionSize: Int):
Unit = {
val benchmark = new Benchmark(
s"Test zip with index to map with collectionSize = $collectionSize",
valuesPerIteration,
output = output)
val data = 0 until collectionSize
benchmark.addCase("Use zipWithIndex + toMap") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
val map: Map[Int, Int] = data.zipWithIndex.toMap
}
}
benchmark.addCase("Use zipWithIndex + collection.breakOut") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
val map: Map[Int, Int] =
data.zipWithIndex(collection.breakOut[IndexedSeq[Int], (Int,
Int), Map[Int, Int]])
}
}
benchmark.addCase("Use Manual builder") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
val map: Map[Int, Int] = zipToMapUseMapBuilder[Int](data)
}
}
benchmark.addCase("Use Manual map") { _: Int =>
for (_ <- 0L until valuesPerIteration) {
val map: Map[Int, Int] = zipWithIndexToMapUseMap[Int](data)
}
}
benchmark.run()
}
private def zipToMapUseMapBuilder[K](keys: Iterable[K]): Map[K, Int] = {
import scala.collection.immutable
val builder = immutable.Map.newBuilder[K, Int]
val keyIter = keys.iterator
var idx = 0
while (keyIter.hasNext) {
builder += (keyIter.next(), idx).asInstanceOf[(K, Int)]
idx = idx + 1
}
builder.result()
}
private def zipWithIndexToMapUseMap[K](keys: Iterable[K]): Map[K, Int] = {
var elems: Map[K, Int] = Map.empty[K, Int]
val keyIter = keys.iterator
var idx = 0
while (keyIter.hasNext) {
elems += (keyIter.next().asInstanceOf[K] -> idx)
idx = idx + 1
}
elems
}
```
result as follows:
**Java 8**
```
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 1: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 41
43 3 2.5 406.8 1.0X
Use zipWithIndex + collection.breakOut 4
4 0 23.6 42.4 9.6X
Use Manual builder 4
4 0 27.8 35.9 11.3X
Use Manual map 3
3 0 37.4 26.8 15.2X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 5: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 142
143 2 0.7 1421.2 1.0X
Use zipWithIndex + collection.breakOut 101
102 1 1.0 1011.0 1.4X
Use Manual builder 99
101 2 1.0 994.0 1.4X
Use Manual map 49
49 1 2.1 485.6 2.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 10: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 166
170 5 0.6 1660.0 1.0X
Use zipWithIndex + collection.breakOut 123
128 5 0.8 1226.3 1.4X
Use Manual builder 121
123 3 0.8 1207.9 1.4X
Use Manual map 102
104 3 1.0 1024.0 1.6X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 20: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 215
227 10 0.5 2151.1 1.0X
Use zipWithIndex + collection.breakOut 167
173 6 0.6 1667.0 1.3X
Use Manual builder 161
167 6 0.6 1614.5 1.3X
Use Manual map 208
218 10 0.5 2082.3 1.0X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 50: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 755
756 1 0.1 7553.8 1.0X
Use zipWithIndex + collection.breakOut 652
654 2 0.2 6521.1 1.2X
Use Manual builder 642
667 30 0.2 6420.7 1.2X
Use Manual map 597
604 12 0.2 5966.6 1.3X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 100: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 1380
1381 2 0.1 13799.3 1.0X
Use zipWithIndex + collection.breakOut 1237
1263 37 0.1 12365.3 1.1X
Use Manual builder 1213
1226 19 0.1 12126.3 1.1X
Use Manual map 1283
1290 10 0.1 12833.9 1.1X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 150: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 1882
1905 33 0.1 18816.7 1.0X
Use zipWithIndex + collection.breakOut 1716
1725 13 0.1 17155.8 1.1X
Use Manual builder 1731
1733 4 0.1 17307.2 1.1X
Use Manual map 2121
2138 24 0.0 21211.1 0.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 200: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 2271
2293 31 0.0 22707.3 1.0X
Use zipWithIndex + collection.breakOut 2124
2135 16 0.0 21238.1 1.1X
Use Manual builder 2051
2055 5 0.0 20509.8 1.1X
Use Manual map 2859
2892 46 0.0 28592.6 0.8X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 300: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 3441
3475 49 0.0 34406.0 1.0X
Use zipWithIndex + collection.breakOut 3271
3302 44 0.0 32711.7 1.1X
Use Manual builder 3098
3115 23 0.0 30981.3 1.1X
Use Manual map 4620
4643 32 0.0 46200.8 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 400: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 4734
4752 26 0.0 47340.5 1.0X
Use zipWithIndex + collection.breakOut 4519
4554 50 0.0 45187.5 1.0X
Use Manual builder 4299
4321 30 0.0 42993.4 1.1X
Use Manual map 6030
6075 63 0.0 60301.8 0.8X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 500: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 5720
5784 91 0.0 57197.4 1.0X
Use zipWithIndex + collection.breakOut 5763
5764 2 0.0 57626.8 1.0X
Use Manual builder 5242
5292 72 0.0 52417.1 1.1X
Use Manual map 7913
7943 43 0.0 79125.3 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 1000: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 15654
15654 1 0.0 156536.8 1.0X
Use zipWithIndex + collection.breakOut 15384
15384 0 0.0 153838.5 1.0X
Use Manual builder 14604
14680 108 0.0 146038.0 1.1X
Use Manual map 17196
17206 15 0.0 171955.2 0.9X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 5000: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 82036
82173 194 0.0 820362.9 1.0X
Use zipWithIndex + collection.breakOut 82824
83256 610 0.0 828240.2 1.0X
Use Manual builder 78756
78791 50 0.0 787561.0 1.0X
Use Manual map 101324
101637 443 0.0 1013241.3 0.8X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 10000: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 164053
164987 1322 0.0 1640526.0 1.0X
Use zipWithIndex + collection.breakOut 171380
171931 778 0.0 1713804.3 1.0X
Use Manual builder 161528
161667 196 0.0 1615280.2 1.0X
Use Manual map 219308
219999 977 0.0 2193079.7 0.7X
OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1019-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Test zip with index to map with collectionSize = 20000: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
Use zipWithIndex + toMap 378370
379247 1241 0.0 3783699.2 1.0X
Use zipWithIndex + collection.breakOut 412945
413050 147 0.0 4129454.8 0.9X
Use Manual builder 392057
393046 1400 0.0 3920566.0 1.0X
Use Manual map 471860
471867 11 0.0 4718596.0 0.8X
```
from bench results:
- If input data size <= 1000, the performance of using `while loop manually
to build the map with mapbuilder` will be 10%+ faster than zip(...).toMap.
- If input data size > 5000, will be no significant performance gap
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]