Looking at the programming guide
<http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes>
for Spark 1.6.1, it states
> Prior to execution, Spark computes the task’s closure. The closure is
those variables and methods which must be visible for the executor to
perform its computations on the RDD
> The variables within the closure sent to each executor are now copies

So my question is, will an executor access a single copy of the closure
with more than one thread?  I ask because I want to know if I can ignore
thread-safety in a function I write.  Take a look at this gist as a
simplified example with a thread-unsafe operation being passed to map():
https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc

This is for Spark Streaming, but I suspect the answer is the same between
batch and streaming.

Thanks for any help,
Matt

Reply via email to