object MyCoreNLP { @transient lazy val coreNLP = new coreNLP() } and then refer to it from your map/reduce/map partitions or that it should be fine (presuming its thread safe), it will only be initialized once per classloader per jvm
On Mon, Nov 24, 2014 at 7:58 AM, Evan Sparks <evan.spa...@gmail.com> wrote: > We have gotten this to work, but it requires instantiating the CoreNLP > object on the worker side. Because of the initialization time it makes a > lot of sense to do this inside of a .mapPartitions instead of a .map, for > example. > > As an aside, if you're using it from Scala, have a look at sistanlp, which > provided a nicer, scala-friendly interface to CoreNLP. > > > > On Nov 24, 2014, at 7:46 AM, tvas <theodoros.vasilou...@gmail.com> > wrote: > > > > Hello, > > > > I was wondering if anyone has gotten the Stanford CoreNLP Java library to > > work with Spark. > > > > My attempts to use the parser/annotator fail because of task > serialization > > errors since the class > > StanfordCoreNLP cannot be serialized. > > > > I've tried the remedies of registering StanfordCoreNLP through kryo, as > well > > as using chill.MeatLocker, > > but these still produce serialization errors. > > Passing the StanfordCoreNLP object as transient leads to a > > NullPointerException instead. > > > > Has anybody managed to get this work? > > > > Regards, > > Theodore > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Stanford-CoreNLP-tp19654.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >