For the reduce/aggregate question, driver collects results in sequence. We now use tree aggregation in MLlib to reduce driver's load:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala#L89 It is faster than aggregate when there are many partitions and the data size is not small. -Xiangrui On Fri, Aug 8, 2014 at 1:44 PM, makevnin <makev...@lanl.gov> wrote: > Hey I was reading the Berkley paper "Spark: Cluster Computung with Working > Sets" and come across an sentence which is bothering me. Currently I am > trying to run an python script on Spark which executes a parallel k-means > ... my problem is ... > after the algorithm finish working with the dataset (ca. 50s) it seams that > spark needs the rest of the time (ca 7m) to collect all the data. The paper > from Berkley mentioned that Spark does not support parallel collection. > Is that really the case?\ > > If I can make something run faster in Spark please tell me how since I have > another problem, that Spark is not really responding to my configuration > changes. I ran over 25 tests with configuration of executor.memory and > task.cpus or akka.threads but nothing changed (conf from 2-62g RAM, 4-912 > cpus and 4-912 threads). > > I also read that you can not run more than 1 executor per node while Spark > is running in standalone mode. Do I really need to run Spark on Yarn to get > more than 1 executor on a node? If so does anyone has an tutorial how to > install yarn and spark on top of it? > > Thank you for your help > > best > > makevnin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-1-0-1-stil-collect-results-in-serial-tp11816.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org