Looking at the programming guide <http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes> for Spark 1.6.1, it states > Prior to execution, Spark computes the task’s closure. The closure is those variables and methods which must be visible for the executor to perform its computations on the RDD > The variables within the closure sent to each executor are now copies
So my question is, will an executor access a single copy of the closure with more than one thread? I ask because I want to know if I can ignore thread-safety in a function I write. Take a look at this gist as a simplified example with a thread-unsafe operation being passed to map(): https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc This is for Spark Streaming, but I suspect the answer is the same between batch and streaming. Thanks for any help, Matt