In a few words, you cannot ignore thread safety if you use more than 1 core per executer. Year ago I faced a race conditiob issue with SimpleDateFormat. And I solved it using ThreadLocal.
5 Окт 2016 г. 20:12 пользователь "Sean Owen" <so...@cloudera.com> написал: > I don't think this is guaranteed and don't think I'd rely on it. Ideally > your functions here aren't even stateful, because they could be > reinstantiated and/or re-executed many times due to, say, failures. Not > being stateful dodges a lot of thread-safety issues. If you're doing this > because you have some expensive shared resource, and you're mapping, > consider mapPartitions, and setting up the resource at the start. > > On Wed, Oct 5, 2016 at 5:23 PM Matthew Dailey <matthew.dail...@gmail.com> > wrote: > >> Looking at the programming guide >> <http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes> >> for Spark 1.6.1, it states >> > Prior to execution, Spark computes the task’s closure. The closure is >> those variables and methods which must be visible for the executor to >> perform its computations on the RDD >> > The variables within the closure sent to each executor are now copies >> >> So my question is, will an executor access a single copy of the closure >> with more than one thread? I ask because I want to know if I can ignore >> thread-safety in a function I write. Take a look at this gist as a >> simplified example with a thread-unsafe operation being passed to map(): >> https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc >> >> This is for Spark Streaming, but I suspect the answer is the same between >> batch and streaming. >> >> Thanks for any help, >> Matt >> >