subject:"Re\: Lemmatization using StanfordNLP in ML 2.0"

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-24 Thread Timur Shenkao

Hello, everybody!

May be it's not a reason of your problem, but I've noticed the line in your
commentaries:
*java version "1.8.0_51"*

It's strongly advised to use Java 1.8.0_66+
I use even Java 1.8.0_101


On Tue, Sep 20, 2016 at 1:09 AM, janardhan shetty 
wrote:

> Yes Sujit I have tried that option as well.
> Also tried sbt assembly but hitting below issue:
>
> http://stackoverflow.com/questions/35197120/java-outofmemory
> error-on-sbt-assembly
>
> Just wondering if there any clean approach to include StanfordCoreNLP
> classes in spark ML ?
>
>
> On Mon, Sep 19, 2016 at 1:41 PM, Sujit Pal  wrote:
>
>> Hi Janardhan,
>>
>> You need the classifier "models" attribute on the second entry for
>> stanford-corenlp to indicate that you want the models JAR, as shown below.
>> Right now you are importing two instances of stanford-corenlp JARs.
>>
>> libraryDependencies ++= {
>>   val sparkVersion = "2.0.0"
>>   Seq(
>> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>   )
>> }
>>
>> -sujit
>>
>>
>> On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty > > wrote:
>>
>>> Hi Sujit,
>>>
>>> Tried that option but same error:
>>>
>>> java version "1.8.0_51"
>>>
>>>
>>> libraryDependencies ++= {
>>>   val sparkVersion = "2.0.0"
>>>   Seq(
>>> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
>>> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
>>> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
>>> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>>   )
>>> }
>>>
>>> Error:
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> edu/stanford/nlp/pipeline/StanfordCoreNLP
>>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>>> ply(Lemmatizer.scala:37)
>>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>>> ply(Lemmatizer.scala:33)
>>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>>> 2.apply(ScalaUDF.scala:88)
>>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>>> 2.apply(ScalaUDF.scala:87)
>>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal
>>> aUDF.scala:1060)
>>> at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx
>>> pressions.scala:142)
>>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>>> ion.apply(Projection.scala:45)
>>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>>> ion.apply(Projection.scala:29)
>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>> sableLike.scala:234)
>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>> sableLike.scala:234)
>>> at scala.collection.immutable.List.foreach(List.scala:381)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.s
>>> cala:234)
>>>
>>>
>>>
>>> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal 
>>> wrote:
>>>
 Hi Janardhan,

 Maybe try removing the string "test" from this line in your build.sbt?
 IIRC, this restricts the models JAR to be called from a test.

 "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
 classifier "models",

 -sujit


 On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty <
 janardhan...@gmail.com> wrote:

> Hi,
>
> I am trying to use lemmatization as a transformer and added belwo to
> the build.sbt
>
>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
> classifier "models",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>
>
> Error:
> *Exception in thread "main" java.lang.NoClassDefFoundError:
> edu/stanford/nlp/pipeline/StanfordCoreNLP*
>
> I have tried other versions of this spark package.
>
> Any help is appreciated..
>


>>>
>>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread janardhan shetty

Yes Sujit I have tried that option as well.
Also tried sbt assembly but hitting below issue:

http://stackoverflow.com/questions/35197120/java-outofmemoryerror-on-sbt-
assembly

Just wondering if there any clean approach to include StanfordCoreNLP
classes in spark ML ?


On Mon, Sep 19, 2016 at 1:41 PM, Sujit Pal  wrote:

> Hi Janardhan,
>
> You need the classifier "models" attribute on the second entry for
> stanford-corenlp to indicate that you want the models JAR, as shown below.
> Right now you are importing two instances of stanford-corenlp JARs.
>
> libraryDependencies ++= {
>   val sparkVersion = "2.0.0"
>   Seq(
> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>   )
> }
>
> -sujit
>
>
> On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty 
> wrote:
>
>> Hi Sujit,
>>
>> Tried that option but same error:
>>
>> java version "1.8.0_51"
>>
>>
>> libraryDependencies ++= {
>>   val sparkVersion = "2.0.0"
>>   Seq(
>> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
>> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>   )
>> }
>>
>> Error:
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> edu/stanford/nlp/pipeline/StanfordCoreNLP
>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>> ply(Lemmatizer.scala:37)
>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>> ply(Lemmatizer.scala:33)
>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>> 2.apply(ScalaUDF.scala:88)
>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>> 2.apply(ScalaUDF.scala:87)
>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal
>> aUDF.scala:1060)
>> at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx
>> pressions.scala:142)
>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>> ion.apply(Projection.scala:45)
>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>> ion.apply(Projection.scala:29)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:234)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:234)
>> at scala.collection.immutable.List.foreach(List.scala:381)
>> at scala.collection.TraversableLike$class.map(TraversableLike.
>> scala:234)
>>
>>
>>
>> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal 
>> wrote:
>>
>>> Hi Janardhan,
>>>
>>> Maybe try removing the string "test" from this line in your build.sbt?
>>> IIRC, this restricts the models JAR to be called from a test.
>>>
>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>>> classifier "models",
>>>
>>> -sujit
>>>
>>>
>>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty <
>>> janardhan...@gmail.com> wrote:
>>>
 Hi,

 I am trying to use lemmatization as a transformer and added belwo to
 the build.sbt

  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
 "com.google.protobuf" % "protobuf-java" % "2.6.1",
 "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
 classifier "models",
 "org.scalatest" %% "scalatest" % "2.2.6" % "test"


 Error:
 *Exception in thread "main" java.lang.NoClassDefFoundError:
 edu/stanford/nlp/pipeline/StanfordCoreNLP*

 I have tried other versions of this spark package.

 Any help is appreciated..

>>>
>>>
>>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread Sujit Pal

Hi Janardhan,

You need the classifier "models" attribute on the second entry for
stanford-corenlp to indicate that you want the models JAR, as shown below.
Right now you are importing two instances of stanford-corenlp JARs.

libraryDependencies ++= {
  val sparkVersion = "2.0.0"
  Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
"com.google.protobuf" % "protobuf-java" % "2.6.1",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
"org.scalatest" %% "scalatest" % "2.2.6" % "test"
  )
}

-sujit


On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty 
wrote:

> Hi Sujit,
>
> Tried that option but same error:
>
> java version "1.8.0_51"
>
>
> libraryDependencies ++= {
>   val sparkVersion = "2.0.0"
>   Seq(
> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>   )
> }
>
> Error:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> edu/stanford/nlp/pipeline/StanfordCoreNLP
> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.
> apply(Lemmatizer.scala:37)
> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.
> apply(Lemmatizer.scala:33)
> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
> 2.apply(ScalaUDF.scala:88)
> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
> 2.apply(ScalaUDF.scala:87)
> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(
> ScalaUDF.scala:1060)
> at org.apache.spark.sql.catalyst.expressions.Alias.eval(
> namedExpressions.scala:142)
> at org.apache.spark.sql.catalyst.expressions.
> InterpretedProjection.apply(Projection.scala:45)
> at org.apache.spark.sql.catalyst.expressions.
> InterpretedProjection.apply(Projection.scala:29)
> at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(
> TraversableLike.scala:234)
>
>
>
> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal  wrote:
>
>> Hi Janardhan,
>>
>> Maybe try removing the string "test" from this line in your build.sbt?
>> IIRC, this restricts the models JAR to be called from a test.
>>
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
>> "models",
>>
>> -sujit
>>
>>
>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty <
>> janardhan...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use lemmatization as a transformer and added belwo to the
>>> build.sbt
>>>
>>>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>>> classifier "models",
>>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>>
>>>
>>> Error:
>>> *Exception in thread "main" java.lang.NoClassDefFoundError:
>>> edu/stanford/nlp/pipeline/StanfordCoreNLP*
>>>
>>> I have tried other versions of this spark package.
>>>
>>> Any help is appreciated..
>>>
>>
>>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread Jacek Laskowski

Hi Janardhan,

What's the command to build the project (sbt package or sbt assembly)?
What's the command you execute to run the application?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Sep 19, 2016 at 2:12 AM, janardhan shetty
 wrote:
> Hi Sujit,
>
> Tried that option but same error:
>
> java version "1.8.0_51"
>
>
> libraryDependencies ++= {
>   val sparkVersion = "2.0.0"
>   Seq(
> "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>   )
> }
>
> Error:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> edu/stanford/nlp/pipeline/StanfordCoreNLP
> at
> transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:37)
> at
> transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:33)
> at
> org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88)
> at
> org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:87)
> at
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1060)
> at
> org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
> at
> org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45)
> at
> org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>
>
>
> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal  wrote:
>>
>> Hi Janardhan,
>>
>> Maybe try removing the string "test" from this line in your build.sbt?
>> IIRC, this restricts the models JAR to be called from a test.
>>
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
>> "models",
>>
>> -sujit
>>
>>
>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty
>>  wrote:
>>>
>>> Hi,
>>>
>>> I am trying to use lemmatization as a transformer and added belwo to the
>>> build.sbt
>>>
>>>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
>>> "models",
>>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>>
>>>
>>> Error:
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> edu/stanford/nlp/pipeline/StanfordCoreNLP
>>>
>>> I have tried other versions of this spark package.
>>>
>>> Any help is appreciated..
>>
>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty

Hi Sujit,

Tried that option but same error:

java version "1.8.0_51"


libraryDependencies ++= {
  val sparkVersion = "2.0.0"
  Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
"com.google.protobuf" % "protobuf-java" % "2.6.1",
"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
"org.scalatest" %% "scalatest" % "2.2.6" % "test"
  )
}

Error:

Exception in thread "main" java.lang.NoClassDefFoundError:
edu/stanford/nlp/pipeline/StanfordCoreNLP
at
transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:37)
at
transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:33)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:87)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1060)
at
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45)
at
org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)



On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal  wrote:

> Hi Janardhan,
>
> Maybe try removing the string "test" from this line in your build.sbt?
> IIRC, this restricts the models JAR to be called from a test.
>
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
> "models",
>
> -sujit
>
>
> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty  > wrote:
>
>> Hi,
>>
>> I am trying to use lemmatization as a transformer and added belwo to the
>> build.sbt
>>
>>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>> "com.google.protobuf" % "protobuf-java" % "2.6.1",
>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
>> "models",
>> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>
>>
>> Error:
>> *Exception in thread "main" java.lang.NoClassDefFoundError:
>> edu/stanford/nlp/pipeline/StanfordCoreNLP*
>>
>> I have tried other versions of this spark package.
>>
>> Any help is appreciated..
>>
>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread Sujit Pal

Hi Janardhan,

Maybe try removing the string "test" from this line in your build.sbt?
IIRC, this restricts the models JAR to be called from a test.

"edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
"models",

-sujit


On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty 
wrote:

> Hi,
>
> I am trying to use lemmatization as a transformer and added belwo to the
> build.sbt
>
>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
> "models",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>
>
> Error:
> *Exception in thread "main" java.lang.NoClassDefFoundError:
> edu/stanford/nlp/pipeline/StanfordCoreNLP*
>
> I have tried other versions of this spark package.
>
> Any help is appreciated..
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty

Also sometimes hitting this Error when spark-shell is used:

Caused by: edu.stanford.nlp.io.RuntimeIOException: Error while loading a
tagger model (probably missing model file)
  at
edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:770)
  at
edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:298)
  at
edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:263)
  at
edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:97)
  at
edu.stanford.nlp.pipeline.POSTaggerAnnotator.(POSTaggerAnnotator.java:77)
  at
edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:59)
  at
edu.stanford.nlp.pipeline.AnnotatorFactories$4.create(AnnotatorFactories.java:290)
  ... 114 more
Caused by: java.io.IOException: Unable to open
"edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger"
as class path, filename or URL
  at
edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485)
  at
edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765)

On Sun, Sep 18, 2016 at 12:27 PM, janardhan shetty 
wrote:

> Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11
>
> On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty  > wrote:
>
>> Hi Jacek,
>>
>> Thanks for your response. This is the code I am trying to execute
>>
>> import org.apache.spark.sql.functions._
>> import com.databricks.spark.corenlp.functions._
>>
>> val inputd = Seq(
>>   (1, "Stanford University is located in California. ")
>> ).toDF("id", "text")
>>
>> val output = 
>> inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)",
>> "text")
>>
>> val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)",
>> "text")
>>
>> output.show() works
>>
>> Error happens when I execute *out.show()*
>>
>>
>>
>> On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski 
>> wrote:
>>
>>> Hi Jonardhan,
>>>
>>> Can you share the code that you execute? What's the command? Mind
>>> sharing the complete project on github?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty
>>>  wrote:
>>> > Hi,
>>> >
>>> > I am trying to use lemmatization as a transformer and added belwo to
>>> the
>>> > build.sbt
>>> >
>>> >  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>> > "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>>> classifier
>>> > "models",
>>> > "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>> >
>>> >
>>> > Error:
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > edu/stanford/nlp/pipeline/StanfordCoreNLP
>>> >
>>> > I have tried other versions of this spark package.
>>> >
>>> > Any help is appreciated..
>>>
>>
>>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty

Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11

On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty 
wrote:

> Hi Jacek,
>
> Thanks for your response. This is the code I am trying to execute
>
> import org.apache.spark.sql.functions._
> import com.databricks.spark.corenlp.functions._
>
> val inputd = Seq(
>   (1, "Stanford University is located in California. ")
> ).toDF("id", "text")
>
> val output = 
> inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)",
> "text")
>
> val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)",
> "text")
>
> output.show() works
>
> Error happens when I execute *out.show()*
>
>
>
> On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski  wrote:
>
>> Hi Jonardhan,
>>
>> Can you share the code that you execute? What's the command? Mind
>> sharing the complete project on github?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty
>>  wrote:
>> > Hi,
>> >
>> > I am trying to use lemmatization as a transformer and added belwo to the
>> > build.sbt
>> >
>> >  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>> > "com.google.protobuf" % "protobuf-java" % "2.6.1",
>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>> classifier
>> > "models",
>> > "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>> >
>> >
>> > Error:
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > edu/stanford/nlp/pipeline/StanfordCoreNLP
>> >
>> > I have tried other versions of this spark package.
>> >
>> > Any help is appreciated..
>>
>
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty

Hi Jacek,

Thanks for your response. This is the code I am trying to execute

import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._

val inputd = Seq(
  (1, "Stanford University is located in California. ")
).toDF("id", "text")

val output =
inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)", "text")

val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)",
"text")

output.show() works

Error happens when I execute *out.show()*



On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski  wrote:

> Hi Jonardhan,
>
> Can you share the code that you execute? What's the command? Mind
> sharing the complete project on github?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty
>  wrote:
> > Hi,
> >
> > I am trying to use lemmatization as a transformer and added belwo to the
> > build.sbt
> >
> >  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> > "com.google.protobuf" % "protobuf-java" % "2.6.1",
> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
> > "models",
> > "org.scalatest" %% "scalatest" % "2.2.6" % "test"
> >
> >
> > Error:
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > edu/stanford/nlp/pipeline/StanfordCoreNLP
> >
> > I have tried other versions of this spark package.
> >
> > Any help is appreciated..
>

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread Jacek Laskowski

Hi Jonardhan,

Can you share the code that you execute? What's the command? Mind
sharing the complete project on github?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty
 wrote:
> Hi,
>
> I am trying to use lemmatization as a transformer and added belwo to the
> build.sbt
>
>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
> "com.google.protobuf" % "protobuf-java" % "2.6.1",
> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier
> "models",
> "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>
>
> Error:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> edu/stanford/nlp/pipeline/StanfordCoreNLP
>
> I have tried other versions of this spark package.
>
> Any help is appreciated..

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

Re: Lemmatization using StanfordNLP in ML 2.0

10 matches

Site Navigation

Mail list logo

Footer information