Foreach is slightly more efficient because Spark doesn't bother to try
and collect results from each task since it's understood there will be
no return type. I think the difference is very marginal though - it's
mostly stylistic... typically you use foreach for something that is
intended to produce a side effect and map for something that will
return a new dataset.

On Wed, Dec 17, 2014 at 5:43 AM, Gerard Maas <gerard.m...@gmail.com> wrote:
> Patrick,
>
> I was wondering why one would choose for rdd.map vs rdd.foreach to execute a
> side-effecting function on an RDD.
>
> -kr, Gerard.
>
> On Sat, Dec 6, 2014 at 12:57 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>>
>> The second choice is better. Once you call collect() you are pulling
>> all of the data onto a single node, you want to do most of the
>> processing  in parallel on the cluster, which is what map() will do.
>> Ideally you'd try to summarize the data or reduce it before calling
>> collect().
>>
>> On Fri, Dec 5, 2014 at 5:26 AM, david <david...@free.fr> wrote:
>> > hi,
>> >
>> >   What is the bet way to process a batch window in SparkStreaming :
>> >
>> >     kafkaStream.foreachRDD(rdd => {
>> >       rdd.collect().foreach(event => {
>> >         // process the event
>> >         process(event)
>> >       })
>> >     })
>> >
>> >
>> > Or
>> >
>> >     kafkaStream.foreachRDD(rdd => {
>> >       rdd.map(event => {
>> >         // process the event
>> >         process(event)
>> >       }).collect()
>> >     })
>> >
>> >
>> > thank's
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafa-best-practices-tp20470.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to