Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19860
@kiszk @viirya I made the following performance test:
```
val a = (1 to 100000).map(x => 1).toDS
val filtered = a.where($"value".isin((1 to 100000): _*))
(1 to 20).map(x=>time(filtered.count)).sum / 20 // where time is an easy
function which measures the function time
```
before the PR the average execution time over the 20 trials is 3,428 s,
while after the PR it is 3,121 s (on OSX 2,8 GHz Intel Core i7). This means
about 10% improvement of the overall performance in this case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]