panbingkun commented on PR #48237:
URL: https://github.com/apache/spark/pull/48237#issuecomment-2373403603
## size(map_from_arrays(...))
### Benchmark code:
```scala
object SizeBenchmark extends SqlBasedBenchmark {
private val N = 10_000_00
private val M = 100
private val path =
"/Users/panbingkun/Developer/spark/spark-community/SizeBenchmark"
private val df = spark.range(N).to(new StructType().add("id", "int")).
withColumn("id1", col("id") + 1).
withColumn("id2", col("id") + 2).
withColumn("id3", col("id") + 3).
withColumn("id4", col("id") + 4).
withColumn("id5", col("id") + 5)
df.write.parquet(path)
private val table = spark.read.parquet(path)
private def doBenchmark(): Unit = {
table.selectExpr("size(map_from_arrays(array(id, id1, id2), array(id3,
id4, id5)))").noop()
}
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
runBenchmark("size") {
val benchmark = new Benchmark("size", N, output = output)
benchmark.addCase("optimize", M) { _ =>
doBenchmark()
}
benchmark.run()
}
}
}
```
### Result
#### Before
```shell
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 15653 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 142 157
8 7.0 142.0 1.0X
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 17672 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 160 177
25 6.3 159.9 1.0X
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 15140 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 141 151
13 7.1 140.6 1.0X
```
#### After
```shell
After:
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 3923 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 24 39
13 42.4 23.6 1.0X
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 3778 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 31 38
7 32.1 31.2 1.0X
Running benchmark: size
Running case: optimize
Stopped after 100 iterations, 3040 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 15.0
Apple M2
size: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
optimize 23 30
7 42.8 23.4 1.0X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]