Re: Spark and Stanford CoreNLP

2015-05-27 Thread mathewvinoj
Evan, could you please look into this post.Below is the link.Any thoughts or suggestion is really appreciated http://apache-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-td23048.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabb

Re: Spark and Stanford CoreNLP

2014-11-25 Thread Evan R. Sparks
Chris, Thanks for stopping by! Here's a simple example. Imagine I've got a corpus of data, which is an RDD[String], and I want to do some POS tagging on it. In naive spark, that might look like this: val props = new Properties.setAnnotators("pos") val proc = new StanfordCoreNLP(props) val data =

Re: Spark and Stanford CoreNLP

2014-11-25 Thread Christopher Manning
I’m not (yet!) an active Spark user, but saw this thread on twitter … and am involved with Stanford CoreNLP. Could someone explain how things need to be to work better with Spark — since that would be a useful goal. That is, while Stanford CoreNLP is not quite uniform (being developed by vario

Re: Spark and Stanford CoreNLP

2014-11-25 Thread Evan Sparks
If you only mark it as transient, then the object won't be serialized, and on the worker the field will be null. When the worker goes to use it, you get an NPE. Marking it lazy defers initialization to first use. If that use happens to be after serialization time (e.g. on the worker), then the

Re: Spark and Stanford CoreNLP

2014-11-25 Thread Theodore Vasiloudis
Great, Ian's approach seems to work fine. Can anyone provide an explanation as to why this works, but passing the CoreNLP object itself as transient does not? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Stanford-CoreNLP-tp19654p19739.html Sent

Re: Spark and Stanford CoreNLP

2014-11-24 Thread Evan R. Sparks
Neat hack! This is cute and actually seems to work. The fact that it works is a little surprising and somewhat unintuitive. On Mon, Nov 24, 2014 at 8:08 AM, Ian O'Connell wrote: > > object MyCoreNLP { > @transient lazy val coreNLP = new coreNLP() > } > > and then refer to it from your map/redu

Re: Spark and Stanford CoreNLP

2014-11-24 Thread Evan R. Sparks
This is probably not the right venue for general questions on CoreNLP - the project website (http://nlp.stanford.edu/software/corenlp.shtml) provides documentation and links to mailing lists/stack overflow topics. On Mon, Nov 24, 2014 at 9:08 AM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wr

Re: Spark and Stanford CoreNLP

2014-11-24 Thread Madabhattula Rajesh Kumar
Hello, I'm new to Stanford CoreNLP. Could any one share good training material and examples(java or scala) on NLP. Regards, Rajesh On Mon, Nov 24, 2014 at 9:38 PM, Ian O'Connell wrote: > > object MyCoreNLP { > @transient lazy val coreNLP = new coreNLP() > } > > and then refer to it from your

Re: Spark and Stanford CoreNLP

2014-11-24 Thread Ian O'Connell
object MyCoreNLP { @transient lazy val coreNLP = new coreNLP() } and then refer to it from your map/reduce/map partitions or that it should be fine (presuming its thread safe), it will only be initialized once per classloader per jvm On Mon, Nov 24, 2014 at 7:58 AM, Evan Sparks wrote: > We ha

Re: Spark and Stanford CoreNLP

2014-11-24 Thread Evan Sparks
We have gotten this to work, but it requires instantiating the CoreNLP object on the worker side. Because of the initialization time it makes a lot of sense to do this inside of a .mapPartitions instead of a .map, for example. As an aside, if you're using it from Scala, have a look at sistanlp,