Hi, On Thu, Dec 18, 2014 at 3:08 AM, Patrick Wendell <pwend...@gmail.com> wrote: > > On Wed, Dec 17, 2014 at 5:43 AM, Gerard Maas <gerard.m...@gmail.com> > wrote: > > I was wondering why one would choose for rdd.map vs rdd.foreach to > execute a > > side-effecting function on an RDD. >
Personally, I like to get the count of processed items, so I do something like rdd.map(item => processItem(item)).count() instead of rdd.foreach(item => processItem(item)) but I would be happy to learn about a better way. Tobias