Re: Upgrade the scala code using the most updated Spark version

2017-03-29 Thread Anahita Talebi
Hi,

Thanks everybody to help me to solve my problem :)
As Zhu said, I had to use mapPartitionsWithIndex in my code.

Thanks,
Have a nice day,
Anahita

On Wed, Mar 29, 2017 at 2:51 AM, Shixiong(Ryan) Zhu  wrote:

> mapPartitionsWithSplit was removed in Spark 2.0.0. You can
> use mapPartitionsWithIndex instead.
>
> On Tue, Mar 28, 2017 at 3:52 PM, Anahita Talebi  > wrote:
>
>> Thanks.
>> I tried this one, as well. Unfortunately I still get the same error.
>>
>>
>> On Wednesday, March 29, 2017, Marco Mistroni  wrote:
>>
>>> 1.7.5
>>>
>>> On 28 Mar 2017 10:10 pm, "Anahita Talebi" 
>>> wrote:
>>>
 Hi,

 Thanks for your answer.
 What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
 I think the problem might come from this part.

 On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni 
 wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am
> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
> works fine
> in my projects though i don thave any of the following libraries that
> you mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
> anahita.t.am...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark
>> version 1.5.2 (old version). Then I changed the scalatest version into
>> "3.0.1". With this configuration, I could run the code and compile it and
>> generate the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> 
>> 
>> 
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization =
>> "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>> "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-sql" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>> "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"  =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>> 
>> 
>>
>> On Tue, Mar 28, 2017 at 9:33 PM, Marco 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Shixiong(Ryan) Zhu
mapPartitionsWithSplit was removed in Spark 2.0.0. You can
use mapPartitionsWithIndex instead.

On Tue, Mar 28, 2017 at 3:52 PM, Anahita Talebi 
wrote:

> Thanks.
> I tried this one, as well. Unfortunately I still get the same error.
>
>
> On Wednesday, March 29, 2017, Marco Mistroni  wrote:
>
>> 1.7.5
>>
>> On 28 Mar 2017 10:10 pm, "Anahita Talebi" 
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks for your answer.
>>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
>>> I think the problem might come from this part.
>>>
>>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni 
>>> wrote:
>>>
 Hello
  uhm ihave a project whose build,sbt is closest to yours, where i am
 using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
 works fine
 in my projects though i don thave any of the following libraries that
 you mention
 - breeze
 - netlib,all
 -  scoopt

 hth

 On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
 anahita.t.am...@gmail.com> wrote:

> Hi,
>
> Thanks for your answer.
>
> I first changed the scala version to 2.11.8 and kept the spark version
> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
> With this configuration, I could run the code and compile it and generate
> the .jar file.
>
> When I changed the spark version into 2.1.0, I get the same error as
> before. So I imagine the problem should be somehow related to the version
> of spark.
>
> Cheers,
> Anahita
>
> 
> 
> 
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := "0.1"
>
> organization := "edu.berkeley.cs.amplab"
>
> scalaVersion := "2.11.8"
>
> parallelExecution in Test := false
>
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
> "org.slf4j" % "slf4j-api" % "1.7.2",
> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
> "org.apache.spark" %% "spark-core" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.spark" %% "spark-mllib" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.spark" %% "spark-sql" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.commons" % "commons-compress" % "1.7",
> "commons-io" % "commons-io" % "2.4",
> "org.scalanlp" % "breeze_2.11" % "0.11.2",
> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
>
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> hadoopVersion
> }
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming" %
> "2.1.0"
>
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>   "Spray" at "http://repo.spray.cc;
> )
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
> case PathList("javax", "servlet", xs @ _*)   =>
> MergeStrategy.first
> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> MergeStrategy.first
> case "application.conf"  =>
> MergeStrategy.concat
> case "reference.conf"=>
> MergeStrategy.concat
> case "log4j.properties"  =>
> MergeStrategy.discard
> case m if m.toLowerCase.endsWith("manifest.mf")  =>
> MergeStrategy.discard
> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> MergeStrategy.discard
> case _ => MergeStrategy.first
>   }
> }
>
> test in assembly := {}
> 
> 
> 
>
> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni 
> wrote:
>
>> Hello
>>  that looks to me like there's something dodgy withyour Scala
>> installation
>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>> suggest you change one thing at a time in your sbt
>> First Spark version. run it and see if it works
>> Then amend the scala version
>>
>> hth

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Thanks.
I tried this one, as well. Unfortunately I still get the same error.

On Wednesday, March 29, 2017, Marco Mistroni  wrote:

> 1.7.5
>
> On 28 Mar 2017 10:10 pm, "Anahita Talebi"  > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
>> I think the problem might come from this part.
>>
>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni > > wrote:
>>
>>> Hello
>>>  uhm ihave a project whose build,sbt is closest to yours, where i am
>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
>>> works fine
>>> in my projects though i don thave any of the following libraries that
>>> you mention
>>> - breeze
>>> - netlib,all
>>> -  scoopt
>>>
>>> hth
>>>
>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com
>>> > wrote:
>>>
 Hi,

 Thanks for your answer.

 I first changed the scala version to 2.11.8 and kept the spark version
 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
 With this configuration, I could run the code and compile it and generate
 the .jar file.

 When I changed the spark version into 2.1.0, I get the same error as
 before. So I imagine the problem should be somehow related to the version
 of spark.

 Cheers,
 Anahita

 
 
 
 import AssemblyKeys._

 assemblySettings

 name := "proxcocoa"

 version := "0.1"

 organization := "edu.berkeley.cs.amplab"

 scalaVersion := "2.11.8"

 parallelExecution in Test := false

 {
   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
   libraryDependencies ++= Seq(
 "org.slf4j" % "slf4j-api" % "1.7.2",
 "org.slf4j" % "slf4j-log4j12" % "1.7.2",
 "org.scalatest" %% "scalatest" % "3.0.1" % "test",
 "org.apache.spark" %% "spark-core" % "2.1.0"
 excludeAll(excludeHadoop),
 "org.apache.spark" %% "spark-mllib" % "2.1.0"
 excludeAll(excludeHadoop),
 "org.apache.spark" %% "spark-sql" % "2.1.0"
 excludeAll(excludeHadoop),
 "org.apache.commons" % "commons-compress" % "1.7",
 "commons-io" % "commons-io" % "2.4",
 "org.scalanlp" % "breeze_2.11" % "0.11.2",
 "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
 "com.github.scopt" %% "scopt" % "3.3.0"
   )
 }

 {
   val defaultHadoopVersion = "1.0.4"
   val hadoopVersion =
 scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
 defaultHadoopVersion)
   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
 hadoopVersion
 }

 libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"

 resolvers ++= Seq(
   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
 ".m2/repository",
   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
   "Spray" at "http://repo.spray.cc;
 )

 mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
 case PathList("javax", "servlet", xs @ _*)   =>
 MergeStrategy.first
 case PathList(ps @ _*) if ps.last endsWith ".html"   =>
 MergeStrategy.first
 case "application.conf"  =>
 MergeStrategy.concat
 case "reference.conf"=>
 MergeStrategy.concat
 case "log4j.properties"  =>
 MergeStrategy.discard
 case m if m.toLowerCase.endsWith("manifest.mf")  =>
 MergeStrategy.discard
 case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
 MergeStrategy.discard
 case _ => MergeStrategy.first
   }
 }

 test in assembly := {}
 
 
 

 On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni > wrote:

> Hello
>  that looks to me like there's something dodgy withyour Scala
> installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
> suggest you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
>
> hth
>  marco
>
> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Marco Mistroni
1.7.5

On 28 Mar 2017 10:10 pm, "Anahita Talebi"  wrote:

> Hi,
>
> Thanks for your answer.
> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
> I think the problem might come from this part.
>
> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni 
> wrote:
>
>> Hello
>>  uhm ihave a project whose build,sbt is closest to yours, where i am
>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
>> works fine
>> in my projects though i don thave any of the following libraries that you
>> mention
>> - breeze
>> - netlib,all
>> -  scoopt
>>
>> hth
>>
>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
>> anahita.t.am...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for your answer.
>>>
>>> I first changed the scala version to 2.11.8 and kept the spark version
>>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>>> With this configuration, I could run the code and compile it and generate
>>> the .jar file.
>>>
>>> When I changed the spark version into 2.1.0, I get the same error as
>>> before. So I imagine the problem should be somehow related to the version
>>> of spark.
>>>
>>> Cheers,
>>> Anahita
>>>
>>> 
>>> 
>>> 
>>> import AssemblyKeys._
>>>
>>> assemblySettings
>>>
>>> name := "proxcocoa"
>>>
>>> version := "0.1"
>>>
>>> organization := "edu.berkeley.cs.amplab"
>>>
>>> scalaVersion := "2.11.8"
>>>
>>> parallelExecution in Test := false
>>>
>>> {
>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>   libraryDependencies ++= Seq(
>>> "org.slf4j" % "slf4j-api" % "1.7.2",
>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>>> "org.apache.spark" %% "spark-core" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> "org.apache.spark" %% "spark-sql" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> "org.apache.commons" % "commons-compress" % "1.7",
>>> "commons-io" % "commons-io" % "2.4",
>>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>> "com.github.scopt" %% "scopt" % "3.3.0"
>>>   )
>>> }
>>>
>>> {
>>>   val defaultHadoopVersion = "1.0.4"
>>>   val hadoopVersion =
>>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>> defaultHadoopVersion)
>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>> hadoopVersion
>>> }
>>>
>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>>
>>> resolvers ++= Seq(
>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>> ".m2/repository",
>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>>   "Spray" at "http://repo.spray.cc;
>>> )
>>>
>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>   {
>>> case PathList("javax", "servlet", xs @ _*)   =>
>>> MergeStrategy.first
>>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>> MergeStrategy.first
>>> case "application.conf"  =>
>>> MergeStrategy.concat
>>> case "reference.conf"=>
>>> MergeStrategy.concat
>>> case "log4j.properties"  =>
>>> MergeStrategy.discard
>>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>>> MergeStrategy.discard
>>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>> MergeStrategy.discard
>>> case _ => MergeStrategy.first
>>>   }
>>> }
>>>
>>> test in assembly := {}
>>> 
>>> 
>>> 
>>>
>>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni 
>>> wrote:
>>>
 Hello
  that looks to me like there's something dodgy withyour Scala
 installation
 Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
 suggest you change one thing at a time in your sbt
 First Spark version. run it and see if it works
 Then amend the scala version

 hth
  marco

 On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
 anahita.t.am...@gmail.com> wrote:

> Hello,
>
> Thanks you all for your informative answers.
> I actually changed the scala version to the 2.11.8 and spark version
> into 2.1.0 in the build.sbt
>
> Except for these two guys (scala and spark version), I kept the same
> values for the rest in the build.sbt file.
> 
> ---
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hello again,
I just tried to change the version to 3.0.0 and remove the libraries
breeze, netlib and scoopt but I still get the same error.

On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni 
wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am using
> spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
> in my projects though i don thave any of the following libraries that you
> mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi  > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark version
>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>> With this configuration, I could run the code and compile it and generate
>> the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> 
>> 
>> 
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>> "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"  =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>> 
>> 
>>
>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni 
>> wrote:
>>
>>> Hello
>>>  that looks to me like there's something dodgy withyour Scala
>>> installation
>>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>>> suggest you change one thing at a time in your sbt
>>> First Spark version. run it and see if it works
>>> Then amend the scala version
>>>
>>> hth
>>>  marco
>>>
>>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com> wrote:
>>>
 Hello,

 Thanks you all for your informative answers.
 I actually changed the scala version to the 2.11.8 and spark version
 into 2.1.0 in the build.sbt

 Except for these two guys (scala and spark version), I kept the same
 values for the rest in the build.sbt file.
 
 ---
 import AssemblyKeys._

 assemblySettings

 name := "proxcocoa"

 version := "0.1"

 scalaVersion := "2.11.8"

 parallelExecution in Test := false

 {
   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
   libraryDependencies ++= Seq(
 "org.slf4j" % "slf4j-api" % 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,

Thanks for your answer.
What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
I think the problem might come from this part.

On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni 
wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am using
> spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
> in my projects though i don thave any of the following libraries that you
> mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi  > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark version
>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>> With this configuration, I could run the code and compile it and generate
>> the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> 
>> 
>> 
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>> "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"  =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>> 
>> 
>>
>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni 
>> wrote:
>>
>>> Hello
>>>  that looks to me like there's something dodgy withyour Scala
>>> installation
>>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>>> suggest you change one thing at a time in your sbt
>>> First Spark version. run it and see if it works
>>> Then amend the scala version
>>>
>>> hth
>>>  marco
>>>
>>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com> wrote:
>>>
 Hello,

 Thanks you all for your informative answers.
 I actually changed the scala version to the 2.11.8 and spark version
 into 2.1.0 in the build.sbt

 Except for these two guys (scala and spark version), I kept the same
 values for the rest in the build.sbt file.
 
 ---
 import AssemblyKeys._

 assemblySettings

 name := "proxcocoa"

 version := "0.1"

 scalaVersion := "2.11.8"

 parallelExecution in Test := false

 {
   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
   libraryDependencies ++= Seq(
 "org.slf4j" % "slf4j-api" % 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Marco Mistroni
Hello
 uhm ihave a project whose build,sbt is closest to yours, where i am using
spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
in my projects though i don thave any of the following libraries that you
mention
- breeze
- netlib,all
-  scoopt

hth

On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi 
wrote:

> Hi,
>
> Thanks for your answer.
>
> I first changed the scala version to 2.11.8 and kept the spark version
> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
> With this configuration, I could run the code and compile it and generate
> the .jar file.
>
> When I changed the spark version into 2.1.0, I get the same error as
> before. So I imagine the problem should be somehow related to the version
> of spark.
>
> Cheers,
> Anahita
>
> 
> 
> 
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := "0.1"
>
> organization := "edu.berkeley.cs.amplab"
>
> scalaVersion := "2.11.8"
>
> parallelExecution in Test := false
>
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
> "org.slf4j" % "slf4j-api" % "1.7.2",
> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
> "org.apache.spark" %% "spark-core" % "2.1.0" excludeAll(excludeHadoop),
> "org.apache.spark" %% "spark-mllib" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
> "org.apache.commons" % "commons-compress" % "1.7",
> "commons-io" % "commons-io" % "2.4",
> "org.scalanlp" % "breeze_2.11" % "0.11.2",
> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
>
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> hadoopVersion
> }
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>   "Spray" at "http://repo.spray.cc;
> )
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
> case PathList("javax", "servlet", xs @ _*)   =>
> MergeStrategy.first
> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> MergeStrategy.first
> case "application.conf"  =>
> MergeStrategy.concat
> case "reference.conf"=>
> MergeStrategy.concat
> case "log4j.properties"  =>
> MergeStrategy.discard
> case m if m.toLowerCase.endsWith("manifest.mf")  =>
> MergeStrategy.discard
> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> MergeStrategy.discard
> case _ => MergeStrategy.first
>   }
> }
>
> test in assembly := {}
> 
> 
> 
>
> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni 
> wrote:
>
>> Hello
>>  that looks to me like there's something dodgy withyour Scala installation
>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>> suggest you change one thing at a time in your sbt
>> First Spark version. run it and see if it works
>> Then amend the scala version
>>
>> hth
>>  marco
>>
>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>> anahita.t.am...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Thanks you all for your informative answers.
>>> I actually changed the scala version to the 2.11.8 and spark version
>>> into 2.1.0 in the build.sbt
>>>
>>> Except for these two guys (scala and spark version), I kept the same
>>> values for the rest in the build.sbt file.
>>> 
>>> ---
>>> import AssemblyKeys._
>>>
>>> assemblySettings
>>>
>>> name := "proxcocoa"
>>>
>>> version := "0.1"
>>>
>>> scalaVersion := "2.11.8"
>>>
>>> parallelExecution in Test := false
>>>
>>> {
>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>   libraryDependencies ++= Seq(
>>> "org.slf4j" % "slf4j-api" % "1.7.2",
>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>> "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,

Thanks for your answer.

I first changed the scala version to 2.11.8 and kept the spark version
1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
With this configuration, I could run the code and compile it and generate
the .jar file.

When I changed the spark version into 2.1.0, I get the same error as
before. So I imagine the problem should be somehow related to the version
of spark.

Cheers,
Anahita


import AssemblyKeys._

assemblySettings

name := "proxcocoa"

version := "0.1"

organization := "edu.berkeley.cs.amplab"

scalaVersion := "2.11.8"

parallelExecution in Test := false

{
  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
  libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.2",
"org.slf4j" % "slf4j-log4j12" % "1.7.2",
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"org.apache.spark" %% "spark-core" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.spark" %% "spark-mllib" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.commons" % "commons-compress" % "1.7",
"commons-io" % "commons-io" % "2.4",
"org.scalanlp" % "breeze_2.11" % "0.11.2",
"com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
"com.github.scopt" %% "scopt" % "3.3.0"
  )
}

{
  val defaultHadoopVersion = "1.0.4"
  val hadoopVersion =
scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
defaultHadoopVersion)
  libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
hadoopVersion
}

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"

resolvers ++= Seq(
  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
".m2/repository",
  "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
  "Spray" at "http://repo.spray.cc;
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
case PathList("javax", "servlet", xs @ _*)   =>
MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html"   =>
MergeStrategy.first
case "application.conf"  =>
MergeStrategy.concat
case "reference.conf"=>
MergeStrategy.concat
case "log4j.properties"  =>
MergeStrategy.discard
case m if m.toLowerCase.endsWith("manifest.mf")  =>
MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
case _ => MergeStrategy.first
  }
}

test in assembly := {}


On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni  wrote:

> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
> suggest you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
>
> hth
>  marco
>
> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi  > wrote:
>
>> Hello,
>>
>> Thanks you all for your informative answers.
>> I actually changed the scala version to the 2.11.8 and spark version into
>> 2.1.0 in the build.sbt
>>
>> Except for these two guys (scala and spark version), I kept the same
>> values for the rest in the build.sbt file.
>> 
>> ---
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Jörn Franke
I personally never add the _scala version to the dependency but always 
crosscompile. This seems to be cleanest. Additionally Spark dependencies and 
hadoop dependencies should be provided not compile. Scalatest seems to be 
outdated.

I would also not use a local repo, but either an artefact manager (e.g. 
Artifactory or Nexus) or download them from the official Spark repos.

Can you publish the full source code? It is hard to assess if the merge 
strategy is needed. Maybe start with a simpler build file and a small 
application and then add your source code.


> On 28. Mar 2017, at 21:33, Marco Mistroni  wrote:
> 
> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i suggest 
> you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
> 
> hth
>  marco
> 
>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi  
>> wrote:
>> Hello, 
>> 
>> Thanks you all for your informative answers. 
>> I actually changed the scala version to the 2.11.8 and spark version into 
>> 2.1.0 in the build.sbt
>> 
>> Except for these two guys (scala and spark version), I kept the same values 
>> for the rest in the build.sbt file. 
>> ---
>> import AssemblyKeys._
>> 
>> assemblySettings
>> 
>> name := "proxcocoa"
>> 
>> version := "0.1"
>> 
>> scalaVersion := "2.11.8"
>> 
>> parallelExecution in Test := false
>> 
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> "org.apache.spark" % "spark-core_2.11" % "2.1.0" 
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0" 
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0" 
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>> 
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", 
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" % 
>> hadoopVersion
>> }
>> 
>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0"
>> 
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + 
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>> 
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   => 
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   => 
>> MergeStrategy.first
>> case "application.conf"  => 
>> MergeStrategy.concat
>> case "reference.conf"=> 
>> MergeStrategy.concat
>> case "log4j.properties"  => 
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  => 
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  => 
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>> 
>> test in assembly := {}
>> 
>> 
>> When I compile the code, I get the following error:
>> 
>> [info] Compiling 4 Scala sources to 
>> /Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.11/classes...
>> [error] 
>> /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
>>  value mapPartitionsWithSplit is not a member of 
>> org.apache.spark.rdd.RDD[String]
>> [error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
>> [error]  ^
>> [error] 
>> /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:41:
>>  value length is not a member of Any
>> [error]   Iterator(i -> lines.length)
>> [error]   ^
>> 
>> It gets the error in the code. Does it mean that for the different version 
>> of the spark and scala, I need to change the main code? 
>> 
>> Thanks, 
>> Anahita
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč  wrote:
>>> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11, so 
>>> set:
>>> 
>>>   scalaVersion := "2.11.8"
>>> 
>>> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,
Thanks for your answer. I just changes the sbt file and set the scala
version to 2.10.4
But I still get the same error


[info] Compiling 4 Scala sources to
/Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.10/classes...
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
value mapPartitionsWithSplit is not a member of
org.apache.spark.rdd.RDD[String]
[error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
[error]  ^


Thanks,
Anahita

On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni  wrote:

> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
> suggest you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
>
> hth
>  marco
>
> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi  > wrote:
>
>> Hello,
>>
>> Thanks you all for your informative answers.
>> I actually changed the scala version to the 2.11.8 and spark version into
>> 2.1.0 in the build.sbt
>>
>> Except for these two guys (scala and spark version), I kept the same
>> values for the rest in the build.sbt file.
>> 
>> ---
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %
>> "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"  =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>>
>> When I compile the code, I get the following error:
>>
>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_ver
>> sion_proxcocoa-master/target/scala-2.11/classes...
>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>> /scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not a
>> member of org.apache.spark.rdd.RDD[String]
>> [error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
>> [error]  ^
>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>> /scala/utils/OptUtils.scala:41: value length is not a member of Any
>> [error]   Iterator(i -> lines.length)
>> [error]   ^
>> 
>> It gets the error in the code. Does it mean that for the different
>> version of the spark and scala, I need to change the main code?
>>
>> Thanks,
>> Anahita
>>
>>
>>
>>
>>
>>
>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč 
>> wrote:
>>
>>> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11,
>>> so set:
>>>
>>>   scalaVersion := "2.11.8"
>>>
>>> When you see something like:
>>>
>>>   "org.apache.spark" % 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Marco Mistroni
Hello
 that looks to me like there's something dodgy withyour Scala installation
Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i suggest
you change one thing at a time in your sbt
First Spark version. run it and see if it works
Then amend the scala version

hth
 marco

On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi 
wrote:

> Hello,
>
> Thanks you all for your informative answers.
> I actually changed the scala version to the 2.11.8 and spark version into
> 2.1.0 in the build.sbt
>
> Except for these two guys (scala and spark version), I kept the same
> values for the rest in the build.sbt file.
> 
> ---
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := "0.1"
>
> scalaVersion := "2.11.8"
>
> parallelExecution in Test := false
>
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
> "org.slf4j" % "slf4j-api" % "1.7.2",
> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
> "org.apache.spark" % "spark-core_2.11" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
> excludeAll(excludeHadoop),
> "org.apache.commons" % "commons-compress" % "1.7",
> "commons-io" % "commons-io" % "2.4",
> "org.scalanlp" % "breeze_2.11" % "0.11.2",
> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
>
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> hadoopVersion
> }
>
> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %
> "2.1.0"
>
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>   "Spray" at "http://repo.spray.cc;
> )
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
> case PathList("javax", "servlet", xs @ _*)   =>
> MergeStrategy.first
> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> MergeStrategy.first
> case "application.conf"  =>
> MergeStrategy.concat
> case "reference.conf"=>
> MergeStrategy.concat
> case "log4j.properties"  =>
> MergeStrategy.discard
> case m if m.toLowerCase.endsWith("manifest.mf")  =>
> MergeStrategy.discard
> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> MergeStrategy.discard
> case _ => MergeStrategy.first
>   }
> }
>
> test in assembly := {}
> 
>
> When I compile the code, I get the following error:
>
> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_
> version_proxcocoa-master/target/scala-2.11/classes...
> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/
> main/scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not a
> member of org.apache.spark.rdd.RDD[String]
> [error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
> [error]  ^
> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/
> main/scala/utils/OptUtils.scala:41: value length is not a member of Any
> [error]   Iterator(i -> lines.length)
> [error]   ^
> 
> It gets the error in the code. Does it mean that for the different version
> of the spark and scala, I need to change the main code?
>
> Thanks,
> Anahita
>
>
>
>
>
>
> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč 
> wrote:
>
>> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11,
>> so set:
>>
>>   scalaVersion := "2.11.8"
>>
>> When you see something like:
>>
>>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>
>> that means that library `spark-core` is compiled against Scala 2.10,
>> so you would have to change that to 2.11:
>>
>>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>
>> better yet, let SBT worry about libraries built against particular
>> Scala versions:
>>
>>   "org.apache.spark" %% "spark-core" % "2.1.0"
>>
>> The `%%` will instruct SBT to choose the library appropriate for a
>> version of Scala that is set in `scalaVersion`.
>>
>> It may be worth mentioning that the `%%` thing works only with Scala
>> libraries as they are compiled against a certain Scala version. Java
>> libraries are unaffected (have nothing to do with Scala), e.g. for
>> `slf4j` one only uses single `%`s:
>>
>>   "org.slf4j" % "slf4j-api" % "1.7.2"

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hello,

Thanks you all for your informative answers.
I actually changed the scala version to the 2.11.8 and spark version into
2.1.0 in the build.sbt

Except for these two guys (scala and spark version), I kept the same values
for the rest in the build.sbt file.
---
import AssemblyKeys._

assemblySettings

name := "proxcocoa"

version := "0.1"

scalaVersion := "2.11.8"

parallelExecution in Test := false

{
  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
  libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.2",
"org.slf4j" % "slf4j-log4j12" % "1.7.2",
"org.scalatest" %% "scalatest" % "1.9.1" % "test",
"org.apache.spark" % "spark-core_2.11" % "2.1.0"
excludeAll(excludeHadoop),
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
excludeAll(excludeHadoop),
"org.apache.spark" % "spark-sql_2.11" % "2.1.0"
excludeAll(excludeHadoop),
"org.apache.commons" % "commons-compress" % "1.7",
"commons-io" % "commons-io" % "2.4",
"org.scalanlp" % "breeze_2.11" % "0.11.2",
"com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
"com.github.scopt" %% "scopt" % "3.3.0"
  )
}

{
  val defaultHadoopVersion = "1.0.4"
  val hadoopVersion =
scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
defaultHadoopVersion)
  libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
hadoopVersion
}

libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0"

resolvers ++= Seq(
  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
".m2/repository",
  "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
  "Spray" at "http://repo.spray.cc;
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
case PathList("javax", "servlet", xs @ _*)   =>
MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html"   =>
MergeStrategy.first
case "application.conf"  =>
MergeStrategy.concat
case "reference.conf"=>
MergeStrategy.concat
case "log4j.properties"  =>
MergeStrategy.discard
case m if m.toLowerCase.endsWith("manifest.mf")  =>
MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
case _ => MergeStrategy.first
  }
}

test in assembly := {}


When I compile the code, I get the following error:

[info] Compiling 4 Scala sources to
/Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.11/classes...
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
value mapPartitionsWithSplit is not a member of
org.apache.spark.rdd.RDD[String]
[error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
[error]  ^
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:41:
value length is not a member of Any
[error]   Iterator(i -> lines.length)
[error]   ^

It gets the error in the code. Does it mean that for the different version
of the spark and scala, I need to change the main code?

Thanks,
Anahita






On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč  wrote:

> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11,
> so set:
>
>   scalaVersion := "2.11.8"
>
> When you see something like:
>
>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>
> that means that library `spark-core` is compiled against Scala 2.10,
> so you would have to change that to 2.11:
>
>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>
> better yet, let SBT worry about libraries built against particular
> Scala versions:
>
>   "org.apache.spark" %% "spark-core" % "2.1.0"
>
> The `%%` will instruct SBT to choose the library appropriate for a
> version of Scala that is set in `scalaVersion`.
>
> It may be worth mentioning that the `%%` thing works only with Scala
> libraries as they are compiled against a certain Scala version. Java
> libraries are unaffected (have nothing to do with Scala), e.g. for
> `slf4j` one only uses single `%`s:
>
>   "org.slf4j" % "slf4j-api" % "1.7.2"
>
> Cheers,
> Dinko
>
> On 27 March 2017 at 23:30, Mich Talebzadeh 
> wrote:
> > check these versions
> >
> > function create_build_sbt_file {
> > BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
> > [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
> > cat >> $BUILD_SBT_FILE << !
> > lazy val root = (project in file(".")).
> >   settings(
> > name := "${APPLICATION}",
> > version := "1.0",
> > scalaVersion := "2.11.8",
> > mainClass in Compile := Some("myPackage.${APPLICATION}")
> >   )
> > libraryDependencies += 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Dinko Srkoč
Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11, so set:

  scalaVersion := "2.11.8"

When you see something like:

  "org.apache.spark" % "spark-core_2.10" % "1.5.2"

that means that library `spark-core` is compiled against Scala 2.10,
so you would have to change that to 2.11:

  "org.apache.spark" % "spark-core_2.11" % "2.1.0"

better yet, let SBT worry about libraries built against particular
Scala versions:

  "org.apache.spark" %% "spark-core" % "2.1.0"

The `%%` will instruct SBT to choose the library appropriate for a
version of Scala that is set in `scalaVersion`.

It may be worth mentioning that the `%%` thing works only with Scala
libraries as they are compiled against a certain Scala version. Java
libraries are unaffected (have nothing to do with Scala), e.g. for
`slf4j` one only uses single `%`s:

  "org.slf4j" % "slf4j-api" % "1.7.2"

Cheers,
Dinko

On 27 March 2017 at 23:30, Mich Talebzadeh  wrote:
> check these versions
>
> function create_build_sbt_file {
> BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
> [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
> cat >> $BUILD_SBT_FILE << !
> lazy val root = (project in file(".")).
>   settings(
> name := "${APPLICATION}",
> version := "1.0",
> scalaVersion := "2.11.8",
> mainClass in Compile := Some("myPackage.${APPLICATION}")
>   )
> libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" %
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
> "1.6.1" % "provided"
> libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
> libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
> "4.6.0-HBase-1.0"
> libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
> libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
> libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
> libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
> // META-INF discarding
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>{
> case PathList("META-INF", xs @ _*) => MergeStrategy.discard
> case x => MergeStrategy.first
>}
> }
> !
> }
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 27 March 2017 at 21:45, Jörn Franke  wrote:
>>
>> Usually you define the dependencies to the Spark library as provided. You
>> also seem to mix different Spark versions which should be avoided.
>> The Hadoop library seems to be outdated and should also only be provided.
>>
>> The other dependencies you could assemble in a fat jar.
>>
>> On 27 Mar 2017, at 21:25, Anahita Talebi 
>> wrote:
>>
>> Hi friends,
>>
>> I have a code which is written in Scala. The scala version 2.10.4 and
>> Spark version 1.5.2 are used to run the code.
>>
>> I would like to upgrade the code to the most updated version of spark,
>> meaning 2.1.0.
>>
>> Here is the build.sbt:
>>
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.10.4"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.10" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-27 Thread vvshvv
it yo



On Jörn Franke , Mar 28, 2017 12:11 AM wrote:Usually you define the dependencies to the Spark library as provided. You also seem to mix different Spark versions which should be avoided.The Hadoop library seems to be outdated and should also only be provided.The other dependencies you could assemble in a fat jar.On 27 Mar 2017, at 21:25, Anahita Talebi  wrote:Hi friends, I have a code which is written in Scala. The scala version 2.10.4 and Spark version 1.5.2 are used to run the code. I would like to upgrade the code to the most updated version of spark, meaning 2.1.0.Here is the build.sbt:import AssemblyKeys._assemblySettingsname := "proxcocoa"version := "0.1"scalaVersion := "2.10.4"parallelExecution in Test := false{  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")  libraryDependencies ++= Seq(    "org.slf4j" % "slf4j-api" % "1.7.2",    "org.slf4j" % "slf4j-log4j12" % "1.7.2",    "org.scalatest" %% "scalatest" % "1.9.1" % "test",    "org.apache.spark" % "spark-core_2.10" % "1.5.2" excludeAll(excludeHadoop),    "org.apache.spark" % "spark-mllib_2.10" % "1.5.2" excludeAll(excludeHadoop),    "org.apache.spark" % "spark-sql_2.10" % "1.5.2" excludeAll(excludeHadoop),    "org.apache.commons" % "commons-compress" % "1.7",    "commons-io" % "commons-io" % "2.4",    "org.scalanlp" % "breeze_2.10" % "0.11.2",    "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),    "com.github.scopt" %% "scopt" % "3.3.0"  )}{  val defaultHadoopVersion = "1.0.4"  val hadoopVersion =    scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", defaultHadoopVersion)  libraryDependencies += "org.apache.hadoop" % "hadoop-client" % hadoopVersion}libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.0"resolvers ++= Seq(  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + ".m2/repository",  "Typesafe" at "http://repo.typesafe.com/typesafe/releases",  "Spray" at "http://repo.spray.cc")mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>  {    case PathList("javax", "servlet", xs @ _*)   => MergeStrategy.first    case PathList(ps @ _*) if ps.last endsWith ".html"   => MergeStrategy.first    case "application.conf"  => MergeStrategy.concat    case "reference.conf"    => MergeStrategy.concat    case "log4j.properties"  => MergeStrategy.discard    case m if m.toLowerCase.endsWith("manifest.mf")  => MergeStrategy.discard    case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  => MergeStrategy.discard    case _ => MergeStrategy.first  }}test in assembly := {}---I downloaded the spark 2.1.0 and change the version of spark and scalaversion in the build.sbt. But unfortunately, I was failed to run the code. Does anybody know how I can upgrade the code to the most recent spark version by changing the build.sbt file? Or do you have any other suggestion?Thanks a lot, Anahita 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: Upgrade the scala code using the most updated Spark version

2017-03-27 Thread Mich Talebzadeh
check these versions

function create_build_sbt_file {
BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
[ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
cat >> $BUILD_SBT_FILE << !
lazy val root = (project in file(".")).
  settings(
name := "${APPLICATION}",
version := "1.0",
scalaVersion := "2.11.8",
mainClass in Compile := Some("myPackage.${APPLICATION}")
  )
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" %
"provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
"1.6.1" % "provided"
libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
"4.6.0-HBase-1.0"
libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
   }
}
!
}

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 27 March 2017 at 21:45, Jörn Franke  wrote:

> Usually you define the dependencies to the Spark library as provided. You
> also seem to mix different Spark versions which should be avoided.
> The Hadoop library seems to be outdated and should also only be provided.
>
> The other dependencies you could assemble in a fat jar.
>
> On 27 Mar 2017, at 21:25, Anahita Talebi 
> wrote:
>
> Hi friends,
>
> I have a code which is written in Scala. The scala version 2.10.4 and
> Spark version 1.5.2 are used to run the code.
>
> I would like to upgrade the code to the most updated version of spark,
> meaning 2.1.0.
>
> Here is the build.sbt:
>
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := "0.1"
>
> scalaVersion := "2.10.4"
>
> parallelExecution in Test := false
>
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
> "org.slf4j" % "slf4j-api" % "1.7.2",
> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
> "org.apache.spark" % "spark-core_2.10" % "1.5.2"
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
> excludeAll(excludeHadoop),
> "org.apache.commons" % "commons-compress" % "1.7",
> "commons-io" % "commons-io" % "2.4",
> "org.scalanlp" % "breeze_2.10" % "0.11.2",
> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
>
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> hadoopVersion
> }
>
> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" %
> "1.5.0"
>
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>   "Spray" at "http://repo.spray.cc;
> )
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
> case PathList("javax", "servlet", xs @ _*)   =>
> MergeStrategy.first
> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> MergeStrategy.first
> case "application.conf"  =>
> MergeStrategy.concat
> case "reference.conf"=>
> MergeStrategy.concat
> case "log4j.properties"  =>
> MergeStrategy.discard
> case m if m.toLowerCase.endsWith("manifest.mf")  =>
> MergeStrategy.discard
> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> MergeStrategy.discard
> case _ => MergeStrategy.first
>   }
> }
>
> test in assembly := {}
>
> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-27 Thread Jörn Franke
Usually you define the dependencies to the Spark library as provided. You also 
seem to mix different Spark versions which should be avoided.
The Hadoop library seems to be outdated and should also only be provided.

The other dependencies you could assemble in a fat jar.

> On 27 Mar 2017, at 21:25, Anahita Talebi  wrote:
> 
> Hi friends, 
> 
> I have a code which is written in Scala. The scala version 2.10.4 and Spark 
> version 1.5.2 are used to run the code. 
> 
> I would like to upgrade the code to the most updated version of spark, 
> meaning 2.1.0.
> 
> Here is the build.sbt:
> 
> import AssemblyKeys._
> 
> assemblySettings
> 
> name := "proxcocoa"
> 
> version := "0.1"
> 
> scalaVersion := "2.10.4"
> 
> parallelExecution in Test := false
> 
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
> "org.slf4j" % "slf4j-api" % "1.7.2",
> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
> "org.apache.spark" % "spark-core_2.10" % "1.5.2" 
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-mllib_2.10" % "1.5.2" 
> excludeAll(excludeHadoop),
> "org.apache.spark" % "spark-sql_2.10" % "1.5.2" excludeAll(excludeHadoop),
> "org.apache.commons" % "commons-compress" % "1.7",
> "commons-io" % "commons-io" % "2.4",
> "org.scalanlp" % "breeze_2.10" % "0.11.2",
> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
> "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
> 
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", 
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" % hadoopVersion
> }
> 
> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.0"
> 
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + 
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>   "Spray" at "http://repo.spray.cc;
> )
> 
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
> case PathList("javax", "servlet", xs @ _*)   => 
> MergeStrategy.first
> case PathList(ps @ _*) if ps.last endsWith ".html"   => 
> MergeStrategy.first
> case "application.conf"  => 
> MergeStrategy.concat
> case "reference.conf"=> 
> MergeStrategy.concat
> case "log4j.properties"  => 
> MergeStrategy.discard
> case m if m.toLowerCase.endsWith("manifest.mf")  => 
> MergeStrategy.discard
> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  => 
> MergeStrategy.discard
> case _ => MergeStrategy.first
>   }
> }
> 
> test in assembly := {}
> 
> ---
> I downloaded the spark 2.1.0 and change the version of spark and scalaversion 
> in the build.sbt. But unfortunately, I was failed to run the code. 
> 
> Does anybody know how I can upgrade the code to the most recent spark version 
> by changing the build.sbt file? 
> 
> Or do you have any other suggestion?
> 
> Thanks a lot, 
> Anahita 
>