No I did not mean that. What I meant was something more simple. Let's say
the ScriptEngine maintains some internal state and the function
ScriptEngine.eval(...) is not thread-safe. That is, calling
ScriptEngine.eval simultaneously from multiple threads would cause race
conditions in the internal state and eval() would give incorrect answers.
That would be a problem if you use ScriptEngine in a map function, because
multiple threads in a worker JVM may be running the map function
simultaneously. Something you should be aware of when using static stateful
objects within Spark.

TD


On Sun, Dec 29, 2013 at 7:32 PM, Bao <[email protected]> wrote:

> Thanks guys, that's interesting. Though it looks like singleton object is
> defined at driver, spark actually will serialize closure and send to
> workers. The interesting thing is that ScriptEngine is NOT serializable,
> but
> till it hasn't been initialized spark can serialize the closure well. But
> if
> I force it initialize first then spark throws NotSerializeableException.
>
> Anyway, following Christopher's suggestion to avoid reference to outside
> closure is better.
>
> TD, do you mean that Executors share the same SerializerInstance and there
> is a case that more than 1 thread call the same closure instance?
>
> -Bao.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Stateful-RDD-tp71p97.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to