I don't think this is guaranteed and don't think I'd rely on it. Ideally your functions here aren't even stateful, because they could be reinstantiated and/or re-executed many times due to, say, failures. Not being stateful dodges a lot of thread-safety issues. If you're doing this because you have some expensive shared resource, and you're mapping, consider mapPartitions, and setting up the resource at the start.
On Wed, Oct 5, 2016 at 5:23 PM Matthew Dailey <matthew.dail...@gmail.com> wrote: > Looking at the programming guide > <http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes> > for Spark 1.6.1, it states > > Prior to execution, Spark computes the task’s closure. The closure is > those variables and methods which must be visible for the executor to > perform its computations on the RDD > > The variables within the closure sent to each executor are now copies > > So my question is, will an executor access a single copy of the closure > with more than one thread? I ask because I want to know if I can ignore > thread-safety in a function I write. Take a look at this gist as a > simplified example with a thread-unsafe operation being passed to map(): > https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc > > This is for Spark Streaming, but I suspect the answer is the same between > batch and streaming. > > Thanks for any help, > Matt >