Just to add to Christopher's suggestion, do make sure that the ScriptEngine.eval is thread-safe. If it is not, you can use ThreadLocal<http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html>to make sure there is one instance per execution thread.
TD On Fri, Dec 27, 2013 at 8:12 PM, Christopher Nguyen <[email protected]> wrote: > Bao, as described, your use case doesn't need to invoke anything like > custom RDDs or DStreams. > > In a call like > > val resultRdd = scripts.map(s => ScriptEngine.eval(s)) > > Spark will do its best to serialize/deserialize ScriptEngine to each of > the workers---if ScriptEngine is Serializable. > > Now, if it makes no difference to you, consider instantiating ScriptEngine > within the closure itself, thus obviating the need for serdes of things > outside the closure. > > -- > Christopher T. Nguyen > Co-founder & CEO, Adatao <http://adatao.com> > linkedin.com/in/ctnguyen > > > > On Fri, Dec 27, 2013 at 7:56 PM, Bao <[email protected]> wrote: > >> It looks like I need to use DStream instead... >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Stateful-RDD-tp71p85.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >
