Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20806
@cloud-fan @WeichenXu123 Ok. I've setup a Spark cluster with 5 nodes for
the benchmark.
The used data:
```
val r = new Random
val ds = (0 to 10000).map { _ =>
val a = Array.fill(10000)(if (r.nextDouble() > 0.5) 1.0 else 0.0 )
Tuple1(Vectors.dense(a))
}.toDS
```
Two versions of `treeAggregate` perform very close. Thus, directly using
`RDD.treeAggregate` can be much simpler.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]