Kimahriman commented on PR #53468:
URL: https://github.com/apache/spark/pull/53468#issuecomment-3649449843
I created a simple benchmark to test:
```scala
object ArraySetLikeBenchmark extends SqlBasedBenchmark {
private val N = 1000L
private val arrayElements = 100000
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val benchmark = new Benchmark(s"Array Set Like", N, output = output)
val arr = (1 to arrayElements).map(x => Array(x, x)).toArray
benchmark.addCase("array_union", 1) { _ =>
spark.range(N)
.select(array_union(lit(arr), lit(arr)).alias("arr"))
.write
.format("noop")
.mode("append")
.save()
}
benchmark.run()
}
}
```
Before:
```
info] Array Set Like: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] array_distinct 56198
56198 0 0.0 56197860.3 1.0X
```
After:
```
[info] Array Set Like: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] array_distinct 3113
3113 0 0.0 3112680.3 1.0X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]