[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

viirya Mon, 19 Mar 2018 01:07:58 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20806
  
    @cloud-fan @WeichenXu123 Ok. I've setup a Spark cluster with 5 nodes for 
the benchmark. 
    
    The used data:
    ```
    val r = new Random
    val ds = (0 to 10000).map { _ =>
      val a = Array.fill(10000)(if (r.nextDouble() > 0.5) 1.0 else 0.0 )
      Tuple1(Vectors.dense(a))
    }.toDS
    ```
    
    Two versions of `treeAggregate` perform very close. Thus, directly using 
`RDD.treeAggregate` can be much simpler.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...

Reply via email to