No I did not mean that. What I meant was something more simple. Let's say the ScriptEngine maintains some internal state and the function ScriptEngine.eval(...) is not thread-safe. That is, calling ScriptEngine.eval simultaneously from multiple threads would cause race conditions in the internal state and eval() would give incorrect answers. That would be a problem if you use ScriptEngine in a map function, because multiple threads in a worker JVM may be running the map function simultaneously. Something you should be aware of when using static stateful objects within Spark.
TD On Sun, Dec 29, 2013 at 7:32 PM, Bao <[email protected]> wrote: > Thanks guys, that's interesting. Though it looks like singleton object is > defined at driver, spark actually will serialize closure and send to > workers. The interesting thing is that ScriptEngine is NOT serializable, > but > till it hasn't been initialized spark can serialize the closure well. But > if > I force it initialize first then spark throws NotSerializeableException. > > Anyway, following Christopher's suggestion to avoid reference to outside > closure is better. > > TD, do you mean that Executors share the same SerializerInstance and there > is a case that more than 1 thread call the same closure instance? > > -Bao. > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Stateful-RDD-tp71p97.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
