Evan,
could you please look into this post.Below is the link.Any thoughts or
suggestion is really appreciated
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-td23048.html
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabb
Chris,
Thanks for stopping by! Here's a simple example. Imagine I've got a corpus
of data, which is an RDD[String], and I want to do some POS tagging on it.
In naive spark, that might look like this:
val props = new Properties.setAnnotators("pos")
val proc = new StanfordCoreNLP(props)
val data =
I’m not (yet!) an active Spark user, but saw this thread on twitter … and am
involved with Stanford CoreNLP.
Could someone explain how things need to be to work better with Spark — since
that would be a useful goal.
That is, while Stanford CoreNLP is not quite uniform (being developed by
vario
If you only mark it as transient, then the object won't be serialized, and on
the worker the field will be null. When the worker goes to use it, you get an
NPE.
Marking it lazy defers initialization to first use. If that use happens to be
after serialization time (e.g. on the worker), then the
Great, Ian's approach seems to work fine.
Can anyone provide an explanation as to why this works, but passing the
CoreNLP object itself
as transient does not?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Stanford-CoreNLP-tp19654p19739.html
Sent
Neat hack! This is cute and actually seems to work. The fact that it works
is a little surprising and somewhat unintuitive.
On Mon, Nov 24, 2014 at 8:08 AM, Ian O'Connell wrote:
>
> object MyCoreNLP {
> @transient lazy val coreNLP = new coreNLP()
> }
>
> and then refer to it from your map/redu
This is probably not the right venue for general questions on CoreNLP - the
project website (http://nlp.stanford.edu/software/corenlp.shtml) provides
documentation and links to mailing lists/stack overflow topics.
On Mon, Nov 24, 2014 at 9:08 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wr
Hello,
I'm new to Stanford CoreNLP. Could any one share good training material and
examples(java or scala) on NLP.
Regards,
Rajesh
On Mon, Nov 24, 2014 at 9:38 PM, Ian O'Connell wrote:
>
> object MyCoreNLP {
> @transient lazy val coreNLP = new coreNLP()
> }
>
> and then refer to it from your
object MyCoreNLP {
@transient lazy val coreNLP = new coreNLP()
}
and then refer to it from your map/reduce/map partitions or that it should
be fine (presuming its thread safe), it will only be initialized once per
classloader per jvm
On Mon, Nov 24, 2014 at 7:58 AM, Evan Sparks wrote:
> We ha
We have gotten this to work, but it requires instantiating the CoreNLP object
on the worker side. Because of the initialization time it makes a lot of sense
to do this inside of a .mapPartitions instead of a .map, for example.
As an aside, if you're using it from Scala, have a look at sistanlp,
10 matches
Mail list logo