In a few words, you cannot ignore thread safety if you use more than 1 core
per executer. Year ago I faced a race conditiob issue with
SimpleDateFormat. And I solved it using ThreadLocal.

5 Окт 2016 г. 20:12 пользователь "Sean Owen" <so...@cloudera.com> написал:

> I don't think this is guaranteed and don't think I'd rely on it. Ideally
> your functions here aren't even stateful, because they could be
> reinstantiated and/or re-executed many times due to, say, failures. Not
> being stateful dodges a lot of thread-safety issues. If you're doing this
> because you have some expensive shared resource, and you're mapping,
> consider mapPartitions, and setting up the resource at the start.
>
> On Wed, Oct 5, 2016 at 5:23 PM Matthew Dailey <matthew.dail...@gmail.com>
> wrote:
>
>> Looking at the programming guide
>> <http://spark.apache.org/docs/1.6.1/programming-guide.html#local-vs-cluster-modes>
>> for Spark 1.6.1, it states
>> > Prior to execution, Spark computes the task’s closure. The closure is
>> those variables and methods which must be visible for the executor to
>> perform its computations on the RDD
>> > The variables within the closure sent to each executor are now copies
>>
>> So my question is, will an executor access a single copy of the closure
>> with more than one thread?  I ask because I want to know if I can ignore
>> thread-safety in a function I write.  Take a look at this gist as a
>> simplified example with a thread-unsafe operation being passed to map():
>> https://gist.github.com/matthew-dailey/4e1ab0aac580151dcfd7fbe6beab84dc
>>
>> This is for Spark Streaming, but I suspect the answer is the same between
>> batch and streaming.
>>
>> Thanks for any help,
>> Matt
>>
>

Reply via email to