Re: Lemmatization using StanfordNLP in ML 2.0
Hello, everybody! May be it's not a reason of your problem, but I've noticed the line in your commentaries: *java version "1.8.0_51"* It's strongly advised to use Java 1.8.0_66+ I use even Java 1.8.0_101 On Tue, Sep 20, 2016 at 1:09 AM, janardhan shettywrote: > Yes Sujit I have tried that option as well. > Also tried sbt assembly but hitting below issue: > > http://stackoverflow.com/questions/35197120/java-outofmemory > error-on-sbt-assembly > > Just wondering if there any clean approach to include StanfordCoreNLP > classes in spark ML ? > > > On Mon, Sep 19, 2016 at 1:41 PM, Sujit Pal wrote: > >> Hi Janardhan, >> >> You need the classifier "models" attribute on the second entry for >> stanford-corenlp to indicate that you want the models JAR, as shown below. >> Right now you are importing two instances of stanford-corenlp JARs. >> >> libraryDependencies ++= { >> val sparkVersion = "2.0.0" >> Seq( >> "org.apache.spark" %% "spark-core" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> "com.google.protobuf" % "protobuf-java" % "2.6.1", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", >> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >> ) >> } >> >> -sujit >> >> >> On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty > > wrote: >> >>> Hi Sujit, >>> >>> Tried that option but same error: >>> >>> java version "1.8.0_51" >>> >>> >>> libraryDependencies ++= { >>> val sparkVersion = "2.0.0" >>> Seq( >>> "org.apache.spark" %% "spark-core" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", >>> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> ) >>> } >>> >>> Error: >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> edu/stanford/nlp/pipeline/StanfordCoreNLP >>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >>> ply(Lemmatizer.scala:37) >>> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >>> ply(Lemmatizer.scala:33) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >>> 2.apply(ScalaUDF.scala:88) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >>> 2.apply(ScalaUDF.scala:87) >>> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal >>> aUDF.scala:1060) >>> at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx >>> pressions.scala:142) >>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >>> ion.apply(Projection.scala:45) >>> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >>> ion.apply(Projection.scala:29) >>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>> sableLike.scala:234) >>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>> sableLike.scala:234) >>> at scala.collection.immutable.List.foreach(List.scala:381) >>> at scala.collection.TraversableLike$class.map(TraversableLike.s >>> cala:234) >>> >>> >>> >>> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal >>> wrote: >>> Hi Janardhan, Maybe try removing the string "test" from this line in your build.sbt? IIRC, this restricts the models JAR to be called from a test. "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", -sujit On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty < janardhan...@gmail.com> wrote: > Hi, > > I am trying to use lemmatization as a transformer and added belwo to > the build.sbt > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" > classifier "models", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > > > Error: > *Exception in thread "main" java.lang.NoClassDefFoundError: > edu/stanford/nlp/pipeline/StanfordCoreNLP* > > I have tried other versions of this spark package. > > Any help is appreciated.. > >>> >> >
Re: Lemmatization using StanfordNLP in ML 2.0
Yes Sujit I have tried that option as well. Also tried sbt assembly but hitting below issue: http://stackoverflow.com/questions/35197120/java-outofmemoryerror-on-sbt- assembly Just wondering if there any clean approach to include StanfordCoreNLP classes in spark ML ? On Mon, Sep 19, 2016 at 1:41 PM, Sujit Palwrote: > Hi Janardhan, > > You need the classifier "models" attribute on the second entry for > stanford-corenlp to indicate that you want the models JAR, as shown below. > Right now you are importing two instances of stanford-corenlp JARs. > > libraryDependencies ++= { > val sparkVersion = "2.0.0" > Seq( > "org.apache.spark" %% "spark-core" % sparkVersion % "provided", > "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", > "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", > "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > ) > } > > -sujit > > > On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty > wrote: > >> Hi Sujit, >> >> Tried that option but same error: >> >> java version "1.8.0_51" >> >> >> libraryDependencies ++= { >> val sparkVersion = "2.0.0" >> Seq( >> "org.apache.spark" %% "spark-core" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", >> "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> "com.google.protobuf" % "protobuf-java" % "2.6.1", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >> ) >> } >> >> Error: >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> edu/stanford/nlp/pipeline/StanfordCoreNLP >> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >> ply(Lemmatizer.scala:37) >> at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap >> ply(Lemmatizer.scala:33) >> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >> 2.apply(ScalaUDF.scala:88) >> at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ >> 2.apply(ScalaUDF.scala:87) >> at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal >> aUDF.scala:1060) >> at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx >> pressions.scala:142) >> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >> ion.apply(Projection.scala:45) >> at org.apache.spark.sql.catalyst.expressions.InterpretedProject >> ion.apply(Projection.scala:29) >> at scala.collection.TraversableLike$$anonfun$map$1.apply( >> TraversableLike.scala:234) >> at scala.collection.TraversableLike$$anonfun$map$1.apply( >> TraversableLike.scala:234) >> at scala.collection.immutable.List.foreach(List.scala:381) >> at scala.collection.TraversableLike$class.map(TraversableLike. >> scala:234) >> >> >> >> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal >> wrote: >> >>> Hi Janardhan, >>> >>> Maybe try removing the string "test" from this line in your build.sbt? >>> IIRC, this restricts the models JAR to be called from a test. >>> >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>> classifier "models", >>> >>> -sujit >>> >>> >>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty < >>> janardhan...@gmail.com> wrote: >>> Hi, I am trying to use lemmatization as a transformer and added belwo to the build.sbt "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", "org.scalatest" %% "scalatest" % "2.2.6" % "test" Error: *Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/pipeline/StanfordCoreNLP* I have tried other versions of this spark package. Any help is appreciated.. >>> >>> >> >
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Janardhan, You need the classifier "models" attribute on the second entry for stanford-corenlp to indicate that you want the models JAR, as shown below. Right now you are importing two instances of stanford-corenlp JARs. libraryDependencies ++= { val sparkVersion = "2.0.0" Seq( "org.apache.spark" %% "spark-core" % sparkVersion % "provided", "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", "org.scalatest" %% "scalatest" % "2.2.6" % "test" ) } -sujit On Sun, Sep 18, 2016 at 5:12 PM, janardhan shettywrote: > Hi Sujit, > > Tried that option but same error: > > java version "1.8.0_51" > > > libraryDependencies ++= { > val sparkVersion = "2.0.0" > Seq( > "org.apache.spark" %% "spark-core" % sparkVersion % "provided", > "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", > "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", > "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > ) > } > > Error: > > Exception in thread "main" java.lang.NoClassDefFoundError: > edu/stanford/nlp/pipeline/StanfordCoreNLP > at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1. > apply(Lemmatizer.scala:37) > at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1. > apply(Lemmatizer.scala:33) > at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ > 2.apply(ScalaUDF.scala:88) > at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$ > 2.apply(ScalaUDF.scala:87) > at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval( > ScalaUDF.scala:1060) > at org.apache.spark.sql.catalyst.expressions.Alias.eval( > namedExpressions.scala:142) > at org.apache.spark.sql.catalyst.expressions. > InterpretedProjection.apply(Projection.scala:45) > at org.apache.spark.sql.catalyst.expressions. > InterpretedProjection.apply(Projection.scala:29) > at scala.collection.TraversableLike$$anonfun$map$ > 1.apply(TraversableLike.scala:234) > at scala.collection.TraversableLike$$anonfun$map$ > 1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map( > TraversableLike.scala:234) > > > > On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal wrote: > >> Hi Janardhan, >> >> Maybe try removing the string "test" from this line in your build.sbt? >> IIRC, this restricts the models JAR to be called from a test. >> >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier >> "models", >> >> -sujit >> >> >> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty < >> janardhan...@gmail.com> wrote: >> >>> Hi, >>> >>> I am trying to use lemmatization as a transformer and added belwo to the >>> build.sbt >>> >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>> classifier "models", >>> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> >>> >>> Error: >>> *Exception in thread "main" java.lang.NoClassDefFoundError: >>> edu/stanford/nlp/pipeline/StanfordCoreNLP* >>> >>> I have tried other versions of this spark package. >>> >>> Any help is appreciated.. >>> >> >> >
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Janardhan, What's the command to build the project (sbt package or sbt assembly)? What's the command you execute to run the application? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Sep 19, 2016 at 2:12 AM, janardhan shettywrote: > Hi Sujit, > > Tried that option but same error: > > java version "1.8.0_51" > > > libraryDependencies ++= { > val sparkVersion = "2.0.0" > Seq( > "org.apache.spark" %% "spark-core" % sparkVersion % "provided", > "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", > "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", > "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > ) > } > > Error: > > Exception in thread "main" java.lang.NoClassDefFoundError: > edu/stanford/nlp/pipeline/StanfordCoreNLP > at > transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:37) > at > transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:33) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:87) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1060) > at > org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142) > at > org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45) > at > org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > > > > On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal wrote: >> >> Hi Janardhan, >> >> Maybe try removing the string "test" from this line in your build.sbt? >> IIRC, this restricts the models JAR to be called from a test. >> >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier >> "models", >> >> -sujit >> >> >> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty >> wrote: >>> >>> Hi, >>> >>> I am trying to use lemmatization as a transformer and added belwo to the >>> build.sbt >>> >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier >>> "models", >>> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> >>> >>> Error: >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> edu/stanford/nlp/pipeline/StanfordCoreNLP >>> >>> I have tried other versions of this spark package. >>> >>> Any help is appreciated.. >> >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Sujit, Tried that option but same error: java version "1.8.0_51" libraryDependencies ++= { val sparkVersion = "2.0.0" Seq( "org.apache.spark" %% "spark-core" % sparkVersion % "provided", "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "org.scalatest" %% "scalatest" % "2.2.6" % "test" ) } Error: Exception in thread "main" java.lang.NoClassDefFoundError: edu/stanford/nlp/pipeline/StanfordCoreNLP at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:37) at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.apply(Lemmatizer.scala:33) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:88) at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$2.apply(ScalaUDF.scala:87) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1060) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142) at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:45) at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:29) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) On Sun, Sep 18, 2016 at 2:21 PM, Sujit Palwrote: > Hi Janardhan, > > Maybe try removing the string "test" from this line in your build.sbt? > IIRC, this restricts the models JAR to be called from a test. > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier > "models", > > -sujit > > > On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty > wrote: > >> Hi, >> >> I am trying to use lemmatization as a transformer and added belwo to the >> build.sbt >> >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> "com.google.protobuf" % "protobuf-java" % "2.6.1", >> "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier >> "models", >> "org.scalatest" %% "scalatest" % "2.2.6" % "test" >> >> >> Error: >> *Exception in thread "main" java.lang.NoClassDefFoundError: >> edu/stanford/nlp/pipeline/StanfordCoreNLP* >> >> I have tried other versions of this spark package. >> >> Any help is appreciated.. >> > >
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Janardhan, Maybe try removing the string "test" from this line in your build.sbt? IIRC, this restricts the models JAR to be called from a test. "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", -sujit On Sun, Sep 18, 2016 at 11:01 AM, janardhan shettywrote: > Hi, > > I am trying to use lemmatization as a transformer and added belwo to the > build.sbt > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier > "models", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > > > Error: > *Exception in thread "main" java.lang.NoClassDefFoundError: > edu/stanford/nlp/pipeline/StanfordCoreNLP* > > I have tried other versions of this spark package. > > Any help is appreciated.. >
Re: Lemmatization using StanfordNLP in ML 2.0
Also sometimes hitting this Error when spark-shell is used: Caused by: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:770) at edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:298) at edu.stanford.nlp.tagger.maxent.MaxentTagger.(MaxentTagger.java:263) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:97) at edu.stanford.nlp.pipeline.POSTaggerAnnotator.(POSTaggerAnnotator.java:77) at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:59) at edu.stanford.nlp.pipeline.AnnotatorFactories$4.create(AnnotatorFactories.java:290) ... 114 more Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765) On Sun, Sep 18, 2016 at 12:27 PM, janardhan shettywrote: > Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 > > On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty > wrote: > >> Hi Jacek, >> >> Thanks for your response. This is the code I am trying to execute >> >> import org.apache.spark.sql.functions._ >> import com.databricks.spark.corenlp.functions._ >> >> val inputd = Seq( >> (1, "Stanford University is located in California. ") >> ).toDF("id", "text") >> >> val output = >> inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)", >> "text") >> >> val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)", >> "text") >> >> output.show() works >> >> Error happens when I execute *out.show()* >> >> >> >> On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski >> wrote: >> >>> Hi Jonardhan, >>> >>> Can you share the code that you execute? What's the command? Mind >>> sharing the complete project on github? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >>> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty >>> wrote: >>> > Hi, >>> > >>> > I am trying to use lemmatization as a transformer and added belwo to >>> the >>> > build.sbt >>> > >>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >>> > "com.google.protobuf" % "protobuf-java" % "2.6.1", >>> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >>> classifier >>> > "models", >>> > "org.scalatest" %% "scalatest" % "2.2.6" % "test" >>> > >>> > >>> > Error: >>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>> > edu/stanford/nlp/pipeline/StanfordCoreNLP >>> > >>> > I have tried other versions of this spark package. >>> > >>> > Any help is appreciated.. >>> >> >> >
Re: Lemmatization using StanfordNLP in ML 2.0
Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 On Sun, Sep 18, 2016 at 12:26 PM, janardhan shettywrote: > Hi Jacek, > > Thanks for your response. This is the code I am trying to execute > > import org.apache.spark.sql.functions._ > import com.databricks.spark.corenlp.functions._ > > val inputd = Seq( > (1, "Stanford University is located in California. ") > ).toDF("id", "text") > > val output = > inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)", > "text") > > val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)", > "text") > > output.show() works > > Error happens when I execute *out.show()* > > > > On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowski wrote: > >> Hi Jonardhan, >> >> Can you share the code that you execute? What's the command? Mind >> sharing the complete project on github? >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty >> wrote: >> > Hi, >> > >> > I am trying to use lemmatization as a transformer and added belwo to the >> > build.sbt >> > >> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", >> > "com.google.protobuf" % "protobuf-java" % "2.6.1", >> > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" >> classifier >> > "models", >> > "org.scalatest" %% "scalatest" % "2.2.6" % "test" >> > >> > >> > Error: >> > Exception in thread "main" java.lang.NoClassDefFoundError: >> > edu/stanford/nlp/pipeline/StanfordCoreNLP >> > >> > I have tried other versions of this spark package. >> > >> > Any help is appreciated.. >> > >
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Jacek, Thanks for your response. This is the code I am trying to execute import org.apache.spark.sql.functions._ import com.databricks.spark.corenlp.functions._ val inputd = Seq( (1, "Stanford University is located in California. ") ).toDF("id", "text") val output = inputd.select(cleanxml(col("text"))).withColumnRenamed("UDF(text)", "text") val out = output.select(lemma(col("text"))).withColumnRenamed("UDF(text)", "text") output.show() works Error happens when I execute *out.show()* On Sun, Sep 18, 2016 at 11:58 AM, Jacek Laskowskiwrote: > Hi Jonardhan, > > Can you share the code that you execute? What's the command? Mind > sharing the complete project on github? > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty > wrote: > > Hi, > > > > I am trying to use lemmatization as a transformer and added belwo to the > > build.sbt > > > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > > "com.google.protobuf" % "protobuf-java" % "2.6.1", > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier > > "models", > > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > > > > > > Error: > > Exception in thread "main" java.lang.NoClassDefFoundError: > > edu/stanford/nlp/pipeline/StanfordCoreNLP > > > > I have tried other versions of this spark package. > > > > Any help is appreciated.. >
Re: Lemmatization using StanfordNLP in ML 2.0
Hi Jonardhan, Can you share the code that you execute? What's the command? Mind sharing the complete project on github? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Sep 18, 2016 at 8:01 PM, janardhan shettywrote: > Hi, > > I am trying to use lemmatization as a transformer and added belwo to the > build.sbt > > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", > "com.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier > "models", > "org.scalatest" %% "scalatest" % "2.2.6" % "test" > > > Error: > Exception in thread "main" java.lang.NoClassDefFoundError: > edu/stanford/nlp/pipeline/StanfordCoreNLP > > I have tried other versions of this spark package. > > Any help is appreciated.. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org