For the reduce/aggregate question, driver collects results in
sequence. We now use tree aggregation in MLlib to reduce driver's
load:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala#L89

It is faster than aggregate when there are many partitions and the
data size is not small.

-Xiangrui

On Fri, Aug 8, 2014 at 1:44 PM, makevnin <makev...@lanl.gov> wrote:
> Hey I was reading the Berkley paper "Spark: Cluster Computung with Working
> Sets" and come across an sentence which is bothering me. Currently I am
> trying to run an python script on Spark which executes a parallel k-means
> ... my problem is ...
> after the algorithm finish working with the dataset (ca. 50s) it seams that
> spark needs the rest of the time (ca 7m) to collect all the data. The paper
> from Berkley mentioned that Spark does not support parallel collection.
> Is that really the case?\
>
> If I can make something run faster in Spark please tell me how since I have
> another problem, that Spark is not really responding to my configuration
> changes. I ran over 25 tests with configuration of executor.memory and
> task.cpus or akka.threads but nothing changed (conf from 2-62g RAM, 4-912
> cpus and 4-912 threads).
>
> I also read that you can not run more than 1 executor per node while Spark
> is running in standalone mode. Do I really need to run Spark on Yarn to get
> more than 1 executor on a node? If so does anyone has an tutorial how to
> install yarn and spark on top of it?
>
> Thank you for your help
>
> best
>
> makevnin
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-1-0-1-stil-collect-results-in-serial-tp11816.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to