Foreach is slightly more efficient because Spark doesn't bother to try and collect results from each task since it's understood there will be no return type. I think the difference is very marginal though - it's mostly stylistic... typically you use foreach for something that is intended to produce a side effect and map for something that will return a new dataset.
On Wed, Dec 17, 2014 at 5:43 AM, Gerard Maas <gerard.m...@gmail.com> wrote: > Patrick, > > I was wondering why one would choose for rdd.map vs rdd.foreach to execute a > side-effecting function on an RDD. > > -kr, Gerard. > > On Sat, Dec 6, 2014 at 12:57 AM, Patrick Wendell <pwend...@gmail.com> wrote: >> >> The second choice is better. Once you call collect() you are pulling >> all of the data onto a single node, you want to do most of the >> processing in parallel on the cluster, which is what map() will do. >> Ideally you'd try to summarize the data or reduce it before calling >> collect(). >> >> On Fri, Dec 5, 2014 at 5:26 AM, david <david...@free.fr> wrote: >> > hi, >> > >> > What is the bet way to process a batch window in SparkStreaming : >> > >> > kafkaStream.foreachRDD(rdd => { >> > rdd.collect().foreach(event => { >> > // process the event >> > process(event) >> > }) >> > }) >> > >> > >> > Or >> > >> > kafkaStream.foreachRDD(rdd => { >> > rdd.map(event => { >> > // process the event >> > process(event) >> > }).collect() >> > }) >> > >> > >> > thank's >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafa-best-practices-tp20470.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org