Fwd: unsubscribe
unsubscribe
Re: Upgrade the scala code using the most updated Spark version
Hi, Thanks everybody to help me to solve my problem :) As Zhu said, I had to use mapPartitionsWithIndex in my code. Thanks, Have a nice day, Anahita On Wed, Mar 29, 2017 at 2:51 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > mapPartitionsWithSplit was removed in Spark 2.0.0. You can > use mapPartitionsWithIndex instead. > > On Tue, Mar 28, 2017 at 3:52 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Thanks. >> I tried this one, as well. Unfortunately I still get the same error. >> >> >> On Wednesday, March 29, 2017, Marco Mistroni <mmistr...@gmail.com> wrote: >> >>> 1.7.5 >>> >>> On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.am...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Thanks for your answer. >>>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file? >>>> I think the problem might come from this part. >>>> >>>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com> >>>> wrote: >>>> >>>>> Hello >>>>> uhm ihave a project whose build,sbt is closest to yours, where i am >>>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it >>>>> works fine >>>>> in my projects though i don thave any of the following libraries that >>>>> you mention >>>>> - breeze >>>>> - netlib,all >>>>> - scoopt >>>>> >>>>> hth >>>>> >>>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi < >>>>> anahita.t.am...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Thanks for your answer. >>>>>> >>>>>> I first changed the scala version to 2.11.8 and kept the spark >>>>>> version 1.5.2 (old version). Then I changed the scalatest version into >>>>>> "3.0.1". With this configuration, I could run the code and compile it and >>>>>> generate the .jar file. >>>>>> >>>>>> When I changed the spark version into 2.1.0, I get the same error as >>>>>> before. So I imagine the problem should be somehow related to the version >>>>>> of spark. >>>>>> >>>>>> Cheers, >>>>>> Anahita >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> import AssemblyKeys._ >>>>>> >>>>>> assemblySettings >>>>>> >>>>>> name := "proxcocoa" >>>>>> >>>>>> version := "0.1" >>>>>> >>>>>> organization := "edu.berkeley.cs.amplab" >>>>>> >>>>>> scalaVersion := "2.11.8" >>>>>> >>>>>> parallelExecution in Test := false >>>>>> >>>>>> { >>>>>> val excludeHadoop = ExclusionRule(organization = >>>>>> "org.apache.hadoop") >>>>>> libraryDependencies ++= Seq( >>>>>> "org.slf4j" % "slf4j-api" % "1.7.2", >>>>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >>>>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >>>>>> "org.apache.spark" %% "spark-core" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.spark" %% "spark-mllib" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.spark" %% "spark-sql" % "2.1.0" >>>>>> excludeAll(excludeHadoop), >>>>>> "org.apache.commons" % "commons-compress" % "1.7", >>>>>> "commons-io" % "commons-io" % "2.4", >>>>>> "org.scalanlp" % "breeze_2.11" % "0.11.2", >>>>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >>>>>&
Re: Upgrade the scala code using the most updated Spark version
Thanks. I tried this one, as well. Unfortunately I still get the same error. On Wednesday, March 29, 2017, Marco Mistroni <mmistr...@gmail.com> wrote: > 1.7.5 > > On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.am...@gmail.com > <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote: > >> Hi, >> >> Thanks for your answer. >> What is the version of "org.slf4j" % "slf4j-api" in your sbt file? >> I think the problem might come from this part. >> >> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com >> <javascript:_e(%7B%7D,'cvml','mmistr...@gmail.com');>> wrote: >> >>> Hello >>> uhm ihave a project whose build,sbt is closest to yours, where i am >>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it >>> works fine >>> in my projects though i don thave any of the following libraries that >>> you mention >>> - breeze >>> - netlib,all >>> - scoopt >>> >>> hth >>> >>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi < >>> anahita.t.am...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote: >>> >>>> Hi, >>>> >>>> Thanks for your answer. >>>> >>>> I first changed the scala version to 2.11.8 and kept the spark version >>>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1". >>>> With this configuration, I could run the code and compile it and generate >>>> the .jar file. >>>> >>>> When I changed the spark version into 2.1.0, I get the same error as >>>> before. So I imagine the problem should be somehow related to the version >>>> of spark. >>>> >>>> Cheers, >>>> Anahita >>>> >>>> >>>> >>>> >>>> import AssemblyKeys._ >>>> >>>> assemblySettings >>>> >>>> name := "proxcocoa" >>>> >>>> version := "0.1" >>>> >>>> organization := "edu.berkeley.cs.amplab" >>>> >>>> scalaVersion := "2.11.8" >>>> >>>> parallelExecution in Test := false >>>> >>>> { >>>> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") >>>> libraryDependencies ++= Seq( >>>> "org.slf4j" % "slf4j-api" % "1.7.2", >>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >>>> "org.apache.spark" %% "spark-core" % "2.1.0" >>>> excludeAll(excludeHadoop), >>>> "org.apache.spark" %% "spark-mllib" % "2.1.0" >>>> excludeAll(excludeHadoop), >>>> "org.apache.spark" %% "spark-sql" % "2.1.0" >>>> excludeAll(excludeHadoop), >>>> "org.apache.commons" % "commons-compress" % "1.7", >>>> "commons-io" % "commons-io" % "2.4", >>>> "org.scalanlp" % "breeze_2.11" % "0.11.2", >>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >>>> "com.github.scopt" %% "scopt" % "3.3.0" >>>> ) >>>> } >>>> >>>> { >>>> val defaultHadoopVersion = "1.0.4" >>>> val hadoopVersion = >>>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >>>> defaultHadoopVersion) >>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >>>> hadoopVersion >>>> } >>>> >>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" >>>> >>>> resolvers ++= Seq( >>>> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >>>> ".m2/repository", >>>> "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, >&
Re: Upgrade the scala code using the most updated Spark version
Hello again, I just tried to change the version to 3.0.0 and remove the libraries breeze, netlib and scoopt but I still get the same error. On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hello > uhm ihave a project whose build,sbt is closest to yours, where i am using > spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine > in my projects though i don thave any of the following libraries that you > mention > - breeze > - netlib,all > - scoopt > > hth > > On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Hi, >> >> Thanks for your answer. >> >> I first changed the scala version to 2.11.8 and kept the spark version >> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1". >> With this configuration, I could run the code and compile it and generate >> the .jar file. >> >> When I changed the spark version into 2.1.0, I get the same error as >> before. So I imagine the problem should be somehow related to the version >> of spark. >> >> Cheers, >> Anahita >> >> >> >> >> import AssemblyKeys._ >> >> assemblySettings >> >> name := "proxcocoa" >> >> version := "0.1" >> >> organization := "edu.berkeley.cs.amplab" >> >> scalaVersion := "2.11.8" >> >> parallelExecution in Test := false >> >> { >> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") >> libraryDependencies ++= Seq( >> "org.slf4j" % "slf4j-api" % "1.7.2", >> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> "org.apache.spark" %% "spark-core" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" %% "spark-mllib" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop), >> "org.apache.commons" % "commons-compress" % "1.7", >> "commons-io" % "commons-io" % "2.4", >> "org.scalanlp" % "breeze_2.11" % "0.11.2", >> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >> "com.github.scopt" %% "scopt" % "3.3.0" >> ) >> } >> >> { >> val defaultHadoopVersion = "1.0.4" >> val hadoopVersion = >> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >> defaultHadoopVersion) >> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >> hadoopVersion >> } >> >> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" >> >> resolvers ++= Seq( >> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >> ".m2/repository", >> "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, >> "Spray" at "http://repo.spray.cc; >> ) >> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >> { >> case PathList("javax", "servlet", xs @ _*) => >> MergeStrategy.first >> case PathList(ps @ _*) if ps.last endsWith ".html" => >> MergeStrategy.first >> case "application.conf" => >> MergeStrategy.concat >> case "reference.conf"=> >> MergeStrategy.concat >> case "log4j.properties" => >> MergeStrategy.discard >> case m if m.toLowerCase.endsWith("manifest.mf") => >> MergeStrategy.discard >> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >> MergeStrategy.discard >> case _ => MergeStrategy.first >> } >> } >> >> test in assembly := {} >> >> >> >
Re: Upgrade the scala code using the most updated Spark version
Hi, Thanks for your answer. What is the version of "org.slf4j" % "slf4j-api" in your sbt file? I think the problem might come from this part. On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hello > uhm ihave a project whose build,sbt is closest to yours, where i am using > spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine > in my projects though i don thave any of the following libraries that you > mention > - breeze > - netlib,all > - scoopt > > hth > > On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Hi, >> >> Thanks for your answer. >> >> I first changed the scala version to 2.11.8 and kept the spark version >> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1". >> With this configuration, I could run the code and compile it and generate >> the .jar file. >> >> When I changed the spark version into 2.1.0, I get the same error as >> before. So I imagine the problem should be somehow related to the version >> of spark. >> >> Cheers, >> Anahita >> >> >> >> >> import AssemblyKeys._ >> >> assemblySettings >> >> name := "proxcocoa" >> >> version := "0.1" >> >> organization := "edu.berkeley.cs.amplab" >> >> scalaVersion := "2.11.8" >> >> parallelExecution in Test := false >> >> { >> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") >> libraryDependencies ++= Seq( >> "org.slf4j" % "slf4j-api" % "1.7.2", >> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >> "org.scalatest" %% "scalatest" % "3.0.1" % "test", >> "org.apache.spark" %% "spark-core" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" %% "spark-mllib" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop), >> "org.apache.commons" % "commons-compress" % "1.7", >> "commons-io" % "commons-io" % "2.4", >> "org.scalanlp" % "breeze_2.11" % "0.11.2", >> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >> "com.github.scopt" %% "scopt" % "3.3.0" >> ) >> } >> >> { >> val defaultHadoopVersion = "1.0.4" >> val hadoopVersion = >> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >> defaultHadoopVersion) >> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >> hadoopVersion >> } >> >> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" >> >> resolvers ++= Seq( >> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >> ".m2/repository", >> "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, >> "Spray" at "http://repo.spray.cc; >> ) >> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >> { >> case PathList("javax", "servlet", xs @ _*) => >> MergeStrategy.first >> case PathList(ps @ _*) if ps.last endsWith ".html" => >> MergeStrategy.first >> case "application.conf" => >> MergeStrategy.concat >> case "reference.conf"=> >> MergeStrategy.concat >> case "log4j.properties" => >> MergeStrategy.discard >> case m if m.toLowerCase.endsWith("manifest.mf") => >> MergeStrategy.discard >> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >> MergeStrategy.discard >> case _ => MergeStrategy.first >> } >> } >> >> test in assembly := {} >> >> >>
Re: Upgrade the scala code using the most updated Spark version
Hi, Thanks for your answer. I first changed the scala version to 2.11.8 and kept the spark version 1.5.2 (old version). Then I changed the scalatest version into "3.0.1". With this configuration, I could run the code and compile it and generate the .jar file. When I changed the spark version into 2.1.0, I get the same error as before. So I imagine the problem should be somehow related to the version of spark. Cheers, Anahita import AssemblyKeys._ assemblySettings name := "proxcocoa" version := "0.1" organization := "edu.berkeley.cs.amplab" scalaVersion := "2.11.8" parallelExecution in Test := false { val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") libraryDependencies ++= Seq( "org.slf4j" % "slf4j-api" % "1.7.2", "org.slf4j" % "slf4j-log4j12" % "1.7.2", "org.scalatest" %% "scalatest" % "3.0.1" % "test", "org.apache.spark" %% "spark-core" % "2.1.0" excludeAll(excludeHadoop), "org.apache.spark" %% "spark-mllib" % "2.1.0" excludeAll(excludeHadoop), "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop), "org.apache.commons" % "commons-compress" % "1.7", "commons-io" % "commons-io" % "2.4", "org.scalanlp" % "breeze_2.11" % "0.11.2", "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), "com.github.scopt" %% "scopt" % "3.3.0" ) } { val defaultHadoopVersion = "1.0.4" val hadoopVersion = scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", defaultHadoopVersion) libraryDependencies += "org.apache.hadoop" % "hadoop-client" % hadoopVersion } libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0" resolvers ++= Seq( "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + ".m2/repository", "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, "Spray" at "http://repo.spray.cc; ) mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first case "application.conf" => MergeStrategy.concat case "reference.conf"=> MergeStrategy.concat case "log4j.properties" => MergeStrategy.discard case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard case _ => MergeStrategy.first } } test in assembly := {} On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hello > that looks to me like there's something dodgy withyour Scala installation > Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i > suggest you change one thing at a time in your sbt > First Spark version. run it and see if it works > Then amend the scala version > > hth > marco > > On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Hello, >> >> Thanks you all for your informative answers. >> I actually changed the scala version to the 2.11.8 and spark version into >> 2.1.0 in the build.sbt >> >> Except for these two guys (scala and spark version), I kept the same >> values for the rest in the build.sbt file. >> >> --- >> import AssemblyKeys._ >> >> assemblySettings >> >> name := "proxcocoa" >> >> version := "0.1" >> >> scalaVersion := "2.11.8" >> >> parallelExecution in Test := false >> >> { >> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") >> libraryDependencies ++= Seq( >> "org.slf4j" % "slf4j-api" % "1.7.2", >> "org.slf4j" % "slf4j-l
Re: Upgrade the scala code using the most updated Spark version
Hi, Thanks for your answer. I just changes the sbt file and set the scala version to 2.10.4 But I still get the same error [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.10/classes... [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not a member of org.apache.spark.rdd.RDD[String] [error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) => [error] ^ Thanks, Anahita On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hello > that looks to me like there's something dodgy withyour Scala installation > Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i > suggest you change one thing at a time in your sbt > First Spark version. run it and see if it works > Then amend the scala version > > hth > marco > > On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <anahita.t.am...@gmail.com > > wrote: > >> Hello, >> >> Thanks you all for your informative answers. >> I actually changed the scala version to the 2.11.8 and spark version into >> 2.1.0 in the build.sbt >> >> Except for these two guys (scala and spark version), I kept the same >> values for the rest in the build.sbt file. >> >> --- >> import AssemblyKeys._ >> >> assemblySettings >> >> name := "proxcocoa" >> >> version := "0.1" >> >> scalaVersion := "2.11.8" >> >> parallelExecution in Test := false >> >> { >> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") >> libraryDependencies ++= Seq( >> "org.slf4j" % "slf4j-api" % "1.7.2", >> "org.slf4j" % "slf4j-log4j12" % "1.7.2", >> "org.scalatest" %% "scalatest" % "1.9.1" % "test", >> "org.apache.spark" % "spark-core_2.11" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.spark" % "spark-sql_2.11" % "2.1.0" >> excludeAll(excludeHadoop), >> "org.apache.commons" % "commons-compress" % "1.7", >> "commons-io" % "commons-io" % "2.4", >> "org.scalanlp" % "breeze_2.11" % "0.11.2", >> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), >> "com.github.scopt" %% "scopt" % "3.3.0" >> ) >> } >> >> { >> val defaultHadoopVersion = "1.0.4" >> val hadoopVersion = >> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", >> defaultHadoopVersion) >> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % >> hadoopVersion >> } >> >> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % >> "2.1.0" >> >> resolvers ++= Seq( >> "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + >> ".m2/repository", >> "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, >> "Spray" at "http://repo.spray.cc; >> ) >> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >> { >> case PathList("javax", "servlet", xs @ _*) => >> MergeStrategy.first >> case PathList(ps @ _*) if ps.last endsWith ".html" => >> MergeStrategy.first >> case "application.conf" => >> MergeStrategy.concat >> case "reference.conf"=> >> MergeStrategy.concat >> case "log4j.properties" => >> MergeStrategy.discard >> case m if m.toLowerCase.endsWith("manifest.mf") => >> MergeStrategy.discard >> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >> MergeStrategy.discard >> case _ => MergeStrategy.first >> } >> } >> >> test in assembly := {} >> >> >> When I compile the code, I get the following error: >>
Re: Upgrade the scala code using the most updated Spark version
are compiled against a certain Scala version. Java > libraries are unaffected (have nothing to do with Scala), e.g. for > `slf4j` one only uses single `%`s: > > "org.slf4j" % "slf4j-api" % "1.7.2" > > Cheers, > Dinko > > On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > check these versions > > > > function create_build_sbt_file { > > BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt > > [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE} > > cat >> $BUILD_SBT_FILE << ! > > lazy val root = (project in file(".")). > > settings( > > name := "${APPLICATION}", > > version := "1.0", > > scalaVersion := "2.11.8", > > mainClass in Compile := Some("myPackage.${APPLICATION}") > > ) > > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" % > > "provided" > > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" % > > "provided" > > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" % > > "provided" > > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" > % > > "provided" > > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % > > "1.6.1" % "provided" > > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2" > > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" % > > "4.6.0-HBase-1.0" > > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3" > > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3" > > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3" > > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3" > > // META-INF discarding > > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => > >{ > > case PathList("META-INF", xs @ _*) => MergeStrategy.discard > > case x => MergeStrategy.first > >} > > } > > ! > > } > > > > HTH > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn > > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > Disclaimer: Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > The > > author will in no case be liable for any monetary damages arising from > such > > loss, damage or destruction. > > > > > > > > > > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> wrote: > >> > >> Usually you define the dependencies to the Spark library as provided. > You > >> also seem to mix different Spark versions which should be avoided. > >> The Hadoop library seems to be outdated and should also only be > provided. > >> > >> The other dependencies you could assemble in a fat jar. > >> > >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.am...@gmail.com> > >> wrote: > >> > >> Hi friends, > >> > >> I have a code which is written in Scala. The scala version 2.10.4 and > >> Spark version 1.5.2 are used to run the code. > >> > >> I would like to upgrade the code to the most updated version of spark, > >> meaning 2.1.0. > >> > >> Here is the build.sbt: > >> > >> import AssemblyKeys._ > >> > >> assemblySettings > >> > >> name := "proxcocoa" > >> > >> version := "0.1" > >> > >> scalaVersion := "2.10.4" > >> > >> parallelExecution in Test := false > >> > >> { > >> val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") > >> libraryDependencies ++= Seq( > >> "org.slf4j" % "slf4j-api" % "1.7.2", > >> "org.slf4j&quo
Upgrade the scala code using the most updated Spark version
Hi friends, I have a code which is written in Scala. The scala version 2.10.4 and Spark version 1.5.2 are used to run the code. I would like to upgrade the code to the most updated version of spark, meaning 2.1.0. Here is the build.sbt: import AssemblyKeys._ assemblySettings name := "proxcocoa" version := "0.1" scalaVersion := "2.10.4" parallelExecution in Test := false { val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop") libraryDependencies ++= Seq( "org.slf4j" % "slf4j-api" % "1.7.2", "org.slf4j" % "slf4j-log4j12" % "1.7.2", "org.scalatest" %% "scalatest" % "1.9.1" % "test", "org.apache.spark" % "spark-core_2.10" % "1.5.2" excludeAll(excludeHadoop), "org.apache.spark" % "spark-mllib_2.10" % "1.5.2" excludeAll(excludeHadoop), "org.apache.spark" % "spark-sql_2.10" % "1.5.2" excludeAll(excludeHadoop), "org.apache.commons" % "commons-compress" % "1.7", "commons-io" % "commons-io" % "2.4", "org.scalanlp" % "breeze_2.10" % "0.11.2", "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(), "com.github.scopt" %% "scopt" % "3.3.0" ) } { val defaultHadoopVersion = "1.0.4" val hadoopVersion = scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", defaultHadoopVersion) libraryDependencies += "org.apache.hadoop" % "hadoop-client" % hadoopVersion } libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.0" resolvers ++= Seq( "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + ".m2/repository", "Typesafe" at "http://repo.typesafe.com/typesafe/releases;, "Spray" at "http://repo.spray.cc; ) mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first case "application.conf" => MergeStrategy.concat case "reference.conf"=> MergeStrategy.concat case "log4j.properties" => MergeStrategy.discard case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard case _ => MergeStrategy.first } } test in assembly := {} --- I downloaded the spark 2.1.0 and change the version of spark and scalaversion in the build.sbt. But unfortunately, I was failed to run the code. Does anybody know how I can upgrade the code to the most recent spark version by changing the build.sbt file? Or do you have any other suggestion? Thanks a lot, Anahita
Re: How to run a spark on Pycharm
Hi, Thanks for your answer. Sorry, I am completely beginner in running the code in spark. Could you please tell me a bit more in details how to do that? I installed ipython and Jupyter notebook on my local machine. But how can I run the code using them? Before, I tried to run the code with Pycharm that I was failed. Thanks, Anahita On Fri, Mar 3, 2017 at 3:48 PM, Pushkar.Gujar <pushkarvgu...@gmail.com> wrote: > Jupyter notebook/ipython can be connected to apache spark > > > Thank you, > *Pushkar Gujar* > > > On Fri, Mar 3, 2017 at 9:43 AM, Anahita Talebi <anahita.t.am...@gmail.com> > wrote: > >> Hi everyone, >> >> I am trying to run a spark code on Pycharm. I tried to give the path of >> spark as a environment variable to the configuration of Pycharm. >> Unfortunately, I get the error. Does anyone know how I can run the spark >> code on Pycharm? >> It shouldn't be necessarily on Pycharm. if you know any other software, >> It would be nice to tell me. >> >> Thanks a lot, >> Anahita >> >> >> >
How to run a spark on Pycharm
Hi everyone, I am trying to run a spark code on Pycharm. I tried to give the path of spark as a environment variable to the configuration of Pycharm. Unfortunately, I get the error. Does anyone know how I can run the spark code on Pycharm? It shouldn't be necessarily on Pycharm. if you know any other software, It would be nice to tell me. Thanks a lot, Anahita
Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException
You're welcome. You need to specify the class. I meant like that: spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0. 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class" On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote: > Thank you, it is still not working: > > [image: Inline image 1] > > By the way, here is the original source: > > https://github.com/apache/spark/blob/master/examples/ > src/main/python/streaming/kafka_wordcount.py > > > ** > *Sincerely yours,* > > > *Raymond* > > On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <anahita.t.am...@gmail.com > <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote: > >> Hi, >> >> I think if you remove --jars, it will work. Like: >> >> spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0. >> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar >> >> I had the same problem before and solved it by removing --jars. >> >> Cheers, >> Anahita >> >> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com >> <javascript:_e(%7B%7D,'cvml','xie3208...@gmail.com');>> wrote: >> >>> I am doing a spark streaming on a hortonworks sandbox and am stuck here >>> now, can anyone tell me what's wrong with the following code and the >>> exception it causes and how do I fix it? Thank you very much in advance. >>> >>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li >>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>> /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test >>> >>> Error: >>> No main class set in JAR; please specify one with --class >>> >>> >>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li >>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>> /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test >>> >>> Error: >>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li >>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>> >>> spark-submit --class /usr/hdp/2.5.0.0-1245/kafka/l >>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/li >>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test >>> >>> Error: >>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li >>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>> >>> ** >>> *Sincerely yours,* >>> >>> >>> *Raymond* >>> >> >
Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException
Hi, I think if you remove --jars, it will work. Like: spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5. 0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar I had the same problem before and solved it by removing --jars. Cheers, Anahita On Saturday, February 25, 2017, Raymond Xiewrote: > I am doing a spark streaming on a hortonworks sandbox and am stuck here > now, can anyone tell me what's wrong with the following code and the > exception it causes and how do I fix it? Thank you very much in advance. > > spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/ > lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar > /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar > /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test > > Error: > No main class set in JAR; please specify one with --class > > > spark-submit --class /usr/hdp/2.5.0.0-1245/spark/ > lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar > /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar > /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test > > Error: > java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/ > lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar > > spark-submit --class /usr/hdp/2.5.0.0-1245/kafka/ > libs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/ > lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar > /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test > > Error: > java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/ > libs/kafka-streams-0.10.0.2.5.0.0-1245.jar > > ** > *Sincerely yours,* > > > *Raymond* >
submit a spark code on google cloud
Hello Friends, I am trying to run a spark code on multiple machines. To this aim, I submit a spark code on submit job on google cloud platform. https://cloud.google.com/dataproc/docs/guides/submit-job I have created a cluster with 6 nodes. Does anyone know how I can realize which nodes are participated when I run the code on the cluster? Thanks a lot, Anahita
Re: Running a spark code on multiple machines using google cloud platform
Thanks for your answer. do you mean Amazon EMR? On Thu, Feb 2, 2017 at 2:30 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > U can use EMR if u want to run. On a cluster > Kr > > On 2 Feb 2017 12:30 pm, "Anahita Talebi" <anahita.t.am...@gmail.com> > wrote: > >> Dear all, >> >> I am trying to run a spark code on multiple machines using submit job in >> google cloud platform. >> As the inputs of my code, I have a training and testing datasets. >> >> When I use small training data set like (10kb), the code can be >> successfully ran on the google cloud while when I have a large data set >> like 50Gb, I received the following error: >> >> 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: >> SparkListenerBus has already stopped! Dropping event >> SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null) >> >> Does anyone can give me a hint how I can solve my problem? >> >> PS: I cannot use small training data set because I have an optimization code >> which needs to use all the data. >> >> I have to use google could platform because I need to run the code on >> multiple machines. >> >> Thanks a lot, >> >> Anahita >> >>
Running a spark code on multiple machines using google cloud platform
Dear all, I am trying to run a spark code on multiple machines using submit job in google cloud platform. As the inputs of my code, I have a training and testing datasets. When I use small training data set like (10kb), the code can be successfully ran on the google cloud while when I have a large data set like 50Gb, I received the following error: 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null) Does anyone can give me a hint how I can solve my problem? PS: I cannot use small training data set because I have an optimization code which needs to use all the data. I have to use google could platform because I need to run the code on multiple machines. Thanks a lot, Anahita
Re: Running a spark code using submit job in google cloud platform
Hello, Thanks a lot Dinko. Yes, now it is working perfectly. Cheers, Anahita On Fri, Jan 13, 2017 at 2:19 PM, Dinko Srkoč <dinko.sr...@gmail.com> wrote: > On 13 January 2017 at 13:55, Anahita Talebi <anahita.t.am...@gmail.com> > wrote: > > Hi, > > > > Thanks for your answer. > > > > I have chose "Spark" in the "job type". There is not any option where we > can > > choose the version. How I can choose different version? > > There's "Preemptible workers, bucket, network, version, > initialization, & access options" link just above the "Create" and > "Cancel" buttons on the "Create a cluster" page. When you click it, > you'll find "Image version" field where you can enter the image > version. > > Dataproc versions: > * 1.1 would be Spark 2.0.2, > * 1.0 includes Spark 1.6.2 > > More about versions can be found here: > https://cloud.google.com/dataproc/docs/concepts/dataproc-versions > > Cheers, > Dinko > > > > > Thanks, > > Anahita > > > > > > On Thu, Jan 12, 2017 at 6:39 PM, A Shaikh <shaikh.af...@gmail.com> > wrote: > >> > >> You may have tested this code on Spark version on your local machine > >> version of which may be different to whats in Google Cloud Storage. > >> You need to select appropraite Spark version when you submit your job. > >> > >> On 12 January 2017 at 15:51, Anahita Talebi <anahita.t.am...@gmail.com> > >> wrote: > >>> > >>> Dear all, > >>> > >>> I am trying to run a .jar file as a job using submit job in google > cloud > >>> console. > >>> https://cloud.google.com/dataproc/docs/guides/submit-job > >>> > >>> I actually ran the spark code on my local computer to generate a .jar > >>> file. Then in the Argument folder, I give the value of the arguments > that I > >>> used in the spark code. One of the argument is training data set that > I put > >>> in the same bucket that I save my .jar file. In the bucket, I put only > the > >>> .jar file, training dataset and testing dataset. > >>> > >>> Main class or jar > >>> gs://Anahita/test.jar > >>> > >>> Arguments > >>> > >>> --lambda=.001 > >>> --eta=1.0 > >>> --trainFile=gs://Anahita/small_train.dat > >>> --testFile=gs://Anahita/small_test.dat > >>> > >>> The problem is that when I run the job I get the following error and > >>> actually it cannot read my training and testing data sets. > >>> > >>> Exception in thread "main" java.lang.NoSuchMethodError: > >>> org.apache.spark.rdd.RDD.coalesce(IZLscala/math/ > Ordering;)Lorg/apache/spark/rdd/RDD; > >>> > >>> Can anyone help me how I can solve this problem? > >>> > >>> Thanks, > >>> > >>> Anahita > >>> > >>> > >> > > >
Re: Running a spark code using submit job in google cloud platform
Hi, Thanks for your answer. I have chose "Spark" in the "job type". There is not any option where we can choose the version. How I can choose different version? Thanks, Anahita On Thu, Jan 12, 2017 at 6:39 PM, A Shaikh <shaikh.af...@gmail.com> wrote: > You may have tested this code on Spark version on your local machine > version of which may be different to whats in Google Cloud Storage. > You need to select appropraite Spark version when you submit your job. > > On 12 January 2017 at 15:51, Anahita Talebi <anahita.t.am...@gmail.com> > wrote: > >> Dear all, >> >> I am trying to run a .jar file as a job using submit job in google cloud >> console. >> https://cloud.google.com/dataproc/docs/guides/submit-job >> >> I actually ran the spark code on my local computer to generate a .jar >> file. Then in the Argument folder, I give the value of the arguments that I >> used in the spark code. One of the argument is training data set that I put >> in the same bucket that I save my .jar file. In the bucket, I put only the >> .jar file, training dataset and testing dataset. >> >> Main class or jar >> gs://Anahita/test.jar >> >> Arguments >> >> --lambda=.001 >> --eta=1.0 >> --trainFile=gs://Anahita/small_train.dat >> --testFile=gs://Anahita/small_test.dat >> >> The problem is that when I run the job I get the following error and >> actually it cannot read my training and testing data sets. >> >> Exception in thread "main" java.lang.NoSuchMethodError: >> org.apache.spark.rdd.RDD.coalesce(IZLscala/math/Ordering;)Lorg/apache/spark/rdd/RDD; >> >> Can anyone help me how I can solve this problem? >> >> Thanks, >> >> Anahita >> >> >> >
Running a spark code using submit job in google cloud platform
Dear all, I am trying to run a .jar file as a job using submit job in google cloud console. https://cloud.google.com/dataproc/docs/guides/submit-job I actually ran the spark code on my local computer to generate a .jar file. Then in the Argument folder, I give the value of the arguments that I used in the spark code. One of the argument is training data set that I put in the same bucket that I save my .jar file. In the bucket, I put only the .jar file, training dataset and testing dataset. Main class or jar gs://Anahita/test.jar Arguments --lambda=.001 --eta=1.0 --trainFile=gs://Anahita/small_train.dat --testFile=gs://Anahita/small_test.dat The problem is that when I run the job I get the following error and actually it cannot read my training and testing data sets. Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.coalesce(IZLscala/math/Ordering;)Lorg/apache/spark/rdd/RDD; Can anyone help me how I can solve this problem? Thanks, Anahita
Fwd: Entering the variables in the Argument part in Submit job section to run a spark code on Google Cloud
Dear friends, I am trying to run a run a spark code on Google cloud using submit job. https://cloud.google.com/dataproc/docs/tutorials/spark-scala My question is about the part "argument". In my spark code, they are some variables that their values are defined in a shell file (.sh), as following: --trainFile=small_train.dat \ --testFile=small_test.dat \ --numFeatures=9947 \ --numRounds=100 \ - I have tried to enter only the values and each value in a separate box as following but it is not working: data/small_train.dat data/small_test.dat 9947 100 I have also tried to give the parameters like in this below, but it is not working neither: trainFile=small_train.dat testFile=small_test.dat numFeatures=9947 numRounds=100 I added the files small_train.dat and small_test.dat in the same bucket where I saved the .jar file. Let's say if my bucket is named Anahita, I added spark.jar, small_train.dat and small_test.dat in the bucket "Anahita". Does anyone know, how I can enter these values in the argument part? Thanks in advance, Anahita