Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22420
@hellodengfei Could you change the PR against `master` branch? This change
LGTM. I did a benchmark about `Set` and `Array`:
```scala
def benchmark(func: () => Unit): Long = {
val start = System.currentTimeMillis()
func()
val end = System.currentTimeMillis()
end - start
}
val range = Range(1, 1000000)
val set = range.toSet
val array = range.toArray
for (i <- 0 until 5) {
val setExecutionTime =
benchmark(() => for (i <- 0 until 500) {
set.contains(scala.util.Random.nextInt()) })
val arrayExecutionTime =
benchmark(() => for (i <- 0 until 500) {
array.contains(scala.util.Random.nextInt()) })
println(s"set execution time: $setExecutionTime, array execution time:
$arrayExecutionTime")
}
```
benchmark result:
```
set execution time: 4, array execution time: 2760
set execution time: 1, array execution time: 1911
set execution time: 3, array execution time: 2043
set execution time: 12, array execution time: 2214
set execution time: 6, array execution time: 1770
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]