Fwd: unsubscribe

2019-01-24 Thread Anahita Talebi
unsubscribe


Re: Upgrade the scala code using the most updated Spark version

2017-03-29 Thread Anahita Talebi
Hi,

Thanks everybody to help me to solve my problem :)
As Zhu said, I had to use mapPartitionsWithIndex in my code.

Thanks,
Have a nice day,
Anahita

On Wed, Mar 29, 2017 at 2:51 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com
> wrote:

> mapPartitionsWithSplit was removed in Spark 2.0.0. You can
> use mapPartitionsWithIndex instead.
>
> On Tue, Mar 28, 2017 at 3:52 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Thanks.
>> I tried this one, as well. Unfortunately I still get the same error.
>>
>>
>> On Wednesday, March 29, 2017, Marco Mistroni <mmistr...@gmail.com> wrote:
>>
>>> 1.7.5
>>>
>>> On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.am...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for your answer.
>>>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
>>>> I think the problem might come from this part.
>>>>
>>>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello
>>>>>  uhm ihave a project whose build,sbt is closest to yours, where i am
>>>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
>>>>> works fine
>>>>> in my projects though i don thave any of the following libraries that
>>>>> you mention
>>>>> - breeze
>>>>> - netlib,all
>>>>> -  scoopt
>>>>>
>>>>> hth
>>>>>
>>>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
>>>>> anahita.t.am...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for your answer.
>>>>>>
>>>>>> I first changed the scala version to 2.11.8 and kept the spark
>>>>>> version 1.5.2 (old version). Then I changed the scalatest version into
>>>>>> "3.0.1". With this configuration, I could run the code and compile it and
>>>>>> generate the .jar file.
>>>>>>
>>>>>> When I changed the spark version into 2.1.0, I get the same error as
>>>>>> before. So I imagine the problem should be somehow related to the version
>>>>>> of spark.
>>>>>>
>>>>>> Cheers,
>>>>>> Anahita
>>>>>>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> import AssemblyKeys._
>>>>>>
>>>>>> assemblySettings
>>>>>>
>>>>>> name := "proxcocoa"
>>>>>>
>>>>>> version := "0.1"
>>>>>>
>>>>>> organization := "edu.berkeley.cs.amplab"
>>>>>>
>>>>>> scalaVersion := "2.11.8"
>>>>>>
>>>>>> parallelExecution in Test := false
>>>>>>
>>>>>> {
>>>>>>   val excludeHadoop = ExclusionRule(organization =
>>>>>> "org.apache.hadoop")
>>>>>>   libraryDependencies ++= Seq(
>>>>>> "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>>>>>> "org.apache.spark" %% "spark-core" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>> "org.apache.spark" %% "spark-sql" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>> "org.apache.commons" % "commons-compress" % "1.7",
>>>>>> "commons-io" % "commons-io" % "2.4",
>>>>>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>>&

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Thanks.
I tried this one, as well. Unfortunately I still get the same error.

On Wednesday, March 29, 2017, Marco Mistroni <mmistr...@gmail.com> wrote:

> 1.7.5
>
> On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.am...@gmail.com
> <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
>> I think the problem might come from this part.
>>
>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','mmistr...@gmail.com');>> wrote:
>>
>>> Hello
>>>  uhm ihave a project whose build,sbt is closest to yours, where i am
>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
>>> works fine
>>> in my projects though i don thave any of the following libraries that
>>> you mention
>>> - breeze
>>> - netlib,all
>>> -  scoopt
>>>
>>> hth
>>>
>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for your answer.
>>>>
>>>> I first changed the scala version to 2.11.8 and kept the spark version
>>>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>>>> With this configuration, I could run the code and compile it and generate
>>>> the .jar file.
>>>>
>>>> When I changed the spark version into 2.1.0, I get the same error as
>>>> before. So I imagine the problem should be somehow related to the version
>>>> of spark.
>>>>
>>>> Cheers,
>>>> Anahita
>>>>
>>>> 
>>>> 
>>>> 
>>>> import AssemblyKeys._
>>>>
>>>> assemblySettings
>>>>
>>>> name := "proxcocoa"
>>>>
>>>> version := "0.1"
>>>>
>>>> organization := "edu.berkeley.cs.amplab"
>>>>
>>>> scalaVersion := "2.11.8"
>>>>
>>>> parallelExecution in Test := false
>>>>
>>>> {
>>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>>   libraryDependencies ++= Seq(
>>>> "org.slf4j" % "slf4j-api" % "1.7.2",
>>>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>>>> "org.apache.spark" %% "spark-core" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>> "org.apache.spark" %% "spark-sql" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>> "org.apache.commons" % "commons-compress" % "1.7",
>>>> "commons-io" % "commons-io" % "2.4",
>>>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>> "com.github.scopt" %% "scopt" % "3.3.0"
>>>>   )
>>>> }
>>>>
>>>> {
>>>>   val defaultHadoopVersion = "1.0.4"
>>>>   val hadoopVersion =
>>>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>> defaultHadoopVersion)
>>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>> hadoopVersion
>>>> }
>>>>
>>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>>>
>>>> resolvers ++= Seq(
>>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>> ".m2/repository",
>>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>&

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hello again,
I just tried to change the version to 3.0.0 and remove the libraries
breeze, netlib and scoopt but I still get the same error.

On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am using
> spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
> in my projects though i don thave any of the following libraries that you
> mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark version
>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>> With this configuration, I could run the code and compile it and generate
>> the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> 
>> 
>> 
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>> "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"      =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>> 
>> 
>

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,

Thanks for your answer.
What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
I think the problem might come from this part.

On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hello
>  uhm ihave a project whose build,sbt is closest to yours, where i am using
> spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
> in my projects though i don thave any of the following libraries that you
> mention
> - breeze
> - netlib,all
> -  scoopt
>
> hth
>
> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>>
>> I first changed the scala version to 2.11.8 and kept the spark version
>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>> With this configuration, I could run the code and compile it and generate
>> the .jar file.
>>
>> When I changed the spark version into 2.1.0, I get the same error as
>> before. So I imagine the problem should be somehow related to the version
>> of spark.
>>
>> Cheers,
>> Anahita
>>
>> 
>> 
>> 
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> organization := "edu.berkeley.cs.amplab"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>> "org.apache.spark" %% "spark-core" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-mllib" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"      =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>> 
>> 

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,

Thanks for your answer.

I first changed the scala version to 2.11.8 and kept the spark version
1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
With this configuration, I could run the code and compile it and generate
the .jar file.

When I changed the spark version into 2.1.0, I get the same error as
before. So I imagine the problem should be somehow related to the version
of spark.

Cheers,
Anahita


import AssemblyKeys._

assemblySettings

name := "proxcocoa"

version := "0.1"

organization := "edu.berkeley.cs.amplab"

scalaVersion := "2.11.8"

parallelExecution in Test := false

{
  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
  libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.2",
"org.slf4j" % "slf4j-log4j12" % "1.7.2",
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
"org.apache.spark" %% "spark-core" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.spark" %% "spark-mllib" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
"org.apache.commons" % "commons-compress" % "1.7",
"commons-io" % "commons-io" % "2.4",
"org.scalanlp" % "breeze_2.11" % "0.11.2",
"com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
"com.github.scopt" %% "scopt" % "3.3.0"
  )
}

{
  val defaultHadoopVersion = "1.0.4"
  val hadoopVersion =
scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
defaultHadoopVersion)
  libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
hadoopVersion
}

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"

resolvers ++= Seq(
  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
".m2/repository",
  "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
  "Spray" at "http://repo.spray.cc;
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
case PathList("javax", "servlet", xs @ _*)   =>
MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html"   =>
MergeStrategy.first
case "application.conf"  =>
MergeStrategy.concat
case "reference.conf"=>
MergeStrategy.concat
case "log4j.properties"  =>
MergeStrategy.discard
case m if m.toLowerCase.endsWith("manifest.mf")  =>
MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
case _ => MergeStrategy.first
  }
}

test in assembly := {}


On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
> suggest you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
>
> hth
>  marco
>
> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Hello,
>>
>> Thanks you all for your informative answers.
>> I actually changed the scala version to the 2.11.8 and spark version into
>> 2.1.0 in the build.sbt
>>
>> Except for these two guys (scala and spark version), I kept the same
>> values for the rest in the build.sbt file.
>> 
>> ---
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-l

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
Hi,
Thanks for your answer. I just changes the sbt file and set the scala
version to 2.10.4
But I still get the same error


[info] Compiling 4 Scala sources to
/Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.10/classes...
[error]
/Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
value mapPartitionsWithSplit is not a member of
org.apache.spark.rdd.RDD[String]
[error] val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
[error]  ^


Thanks,
Anahita

On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
> suggest you change one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
>
> hth
>  marco
>
> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> Hello,
>>
>> Thanks you all for your informative answers.
>> I actually changed the scala version to the 2.11.8 and spark version into
>> 2.1.0 in the build.sbt
>>
>> Except for these two guys (scala and spark version), I kept the same
>> values for the rest in the build.sbt file.
>> 
>> ---
>> import AssemblyKeys._
>>
>> assemblySettings
>>
>> name := "proxcocoa"
>>
>> version := "0.1"
>>
>> scalaVersion := "2.11.8"
>>
>> parallelExecution in Test := false
>>
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>> "org.slf4j" % "slf4j-api" % "1.7.2",
>> "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>> "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>> excludeAll(excludeHadoop),
>> "org.apache.commons" % "commons-compress" % "1.7",
>> "commons-io" % "commons-io" % "2.4",
>> "org.scalanlp" % "breeze_2.11" % "0.11.2",
>> "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>> "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>>
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>> scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>> defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>> hadoopVersion
>> }
>>
>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %
>> "2.1.0"
>>
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>> ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
>>   "Spray" at "http://repo.spray.cc;
>> )
>>
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>> case PathList("javax", "servlet", xs @ _*)   =>
>> MergeStrategy.first
>> case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>> MergeStrategy.first
>> case "application.conf"  =>
>> MergeStrategy.concat
>> case "reference.conf"=>
>> MergeStrategy.concat
>> case "log4j.properties"  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.endsWith("manifest.mf")  =>
>> MergeStrategy.discard
>> case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>> MergeStrategy.discard
>> case _ => MergeStrategy.first
>>   }
>> }
>>
>> test in assembly := {}
>> 
>>
>> When I compile the code, I get the following error:
>>

Re: Upgrade the scala code using the most updated Spark version

2017-03-28 Thread Anahita Talebi
 are compiled against a certain Scala version. Java
> libraries are unaffected (have nothing to do with Scala), e.g. for
> `slf4j` one only uses single `%`s:
>
>   "org.slf4j" % "slf4j-api" % "1.7.2"
>
> Cheers,
> Dinko
>
> On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
> > check these versions
> >
> > function create_build_sbt_file {
> > BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
> > [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
> > cat >> $BUILD_SBT_FILE << !
> > lazy val root = (project in file(".")).
> >   settings(
> > name := "${APPLICATION}",
> > version := "1.0",
> > scalaVersion := "2.11.8",
> > mainClass in Compile := Some("myPackage.${APPLICATION}")
> >   )
> > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0"
> %
> > "provided"
> > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
> > "1.6.1" % "provided"
> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
> > "4.6.0-HBase-1.0"
> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
> > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
> > // META-INF discarding
> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
> >{
> > case PathList("META-INF", xs @ _*) => MergeStrategy.discard
> > case x => MergeStrategy.first
> >}
> > }
> > !
> > }
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> wrote:
> >>
> >> Usually you define the dependencies to the Spark library as provided.
> You
> >> also seem to mix different Spark versions which should be avoided.
> >> The Hadoop library seems to be outdated and should also only be
> provided.
> >>
> >> The other dependencies you could assemble in a fat jar.
> >>
> >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.am...@gmail.com>
> >> wrote:
> >>
> >> Hi friends,
> >>
> >> I have a code which is written in Scala. The scala version 2.10.4 and
> >> Spark version 1.5.2 are used to run the code.
> >>
> >> I would like to upgrade the code to the most updated version of spark,
> >> meaning 2.1.0.
> >>
> >> Here is the build.sbt:
> >>
> >> import AssemblyKeys._
> >>
> >> assemblySettings
> >>
> >> name := "proxcocoa"
> >>
> >> version := "0.1"
> >>
> >> scalaVersion := "2.10.4"
> >>
> >> parallelExecution in Test := false
> >>
> >> {
> >>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
> >>   libraryDependencies ++= Seq(
> >> "org.slf4j" % "slf4j-api" % "1.7.2",
> >> "org.slf4j&quo

Upgrade the scala code using the most updated Spark version

2017-03-27 Thread Anahita Talebi
Hi friends,

I have a code which is written in Scala. The scala version 2.10.4 and Spark
version 1.5.2 are used to run the code.

I would like to upgrade the code to the most updated version of spark,
meaning 2.1.0.

Here is the build.sbt:

import AssemblyKeys._

assemblySettings

name := "proxcocoa"

version := "0.1"

scalaVersion := "2.10.4"

parallelExecution in Test := false

{
  val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
  libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.2",
"org.slf4j" % "slf4j-log4j12" % "1.7.2",
"org.scalatest" %% "scalatest" % "1.9.1" % "test",
"org.apache.spark" % "spark-core_2.10" % "1.5.2"
excludeAll(excludeHadoop),
"org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
excludeAll(excludeHadoop),
"org.apache.spark" % "spark-sql_2.10" % "1.5.2"
excludeAll(excludeHadoop),
"org.apache.commons" % "commons-compress" % "1.7",
"commons-io" % "commons-io" % "2.4",
"org.scalanlp" % "breeze_2.10" % "0.11.2",
"com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
"com.github.scopt" %% "scopt" % "3.3.0"
  )
}

{
  val defaultHadoopVersion = "1.0.4"
  val hadoopVersion =
scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
defaultHadoopVersion)
  libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
hadoopVersion
}

libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.5.0"

resolvers ++= Seq(
  "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
".m2/repository",
  "Typesafe" at "http://repo.typesafe.com/typesafe/releases;,
  "Spray" at "http://repo.spray.cc;
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
case PathList("javax", "servlet", xs @ _*)   =>
MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html"   =>
MergeStrategy.first
case "application.conf"  =>
MergeStrategy.concat
case "reference.conf"=>
MergeStrategy.concat
case "log4j.properties"  =>
MergeStrategy.discard
case m if m.toLowerCase.endsWith("manifest.mf")  =>
MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
case _ => MergeStrategy.first
  }
}

test in assembly := {}

---
I downloaded the spark 2.1.0 and change the version of spark and
scalaversion in the build.sbt. But unfortunately, I was failed to run the
code.

Does anybody know how I can upgrade the code to the most recent spark
version by changing the build.sbt file?

Or do you have any other suggestion?

Thanks a lot,
Anahita


Re: How to run a spark on Pycharm

2017-03-03 Thread Anahita Talebi
Hi,

Thanks for your answer.

Sorry, I am completely beginner in running the code in spark.

Could you please tell me a bit more in details how to do that?
I installed ipython and Jupyter notebook on my local machine. But how can I
run the code using them? Before, I tried to run the code with Pycharm that
I was failed.

Thanks,
Anahita

On Fri, Mar 3, 2017 at 3:48 PM, Pushkar.Gujar <pushkarvgu...@gmail.com>
wrote:

> Jupyter notebook/ipython can be connected to apache spark
>
>
> Thank you,
> *Pushkar Gujar*
>
>
> On Fri, Mar 3, 2017 at 9:43 AM, Anahita Talebi <anahita.t.am...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I am trying to run a spark code on Pycharm. I tried to give the path of
>> spark as a environment variable to the configuration of Pycharm.
>> Unfortunately, I get the error. Does anyone know how I can run the spark
>> code on Pycharm?
>> It shouldn't be necessarily on Pycharm. if you know any other software,
>> It would be nice to tell me.
>>
>> Thanks a lot,
>> Anahita
>>
>>
>>
>


How to run a spark on Pycharm

2017-03-03 Thread Anahita Talebi
Hi everyone,

I am trying to run a spark code on Pycharm. I tried to give the path of
spark as a environment variable to the configuration of Pycharm.
Unfortunately, I get the error. Does anyone know how I can run the spark
code on Pycharm?
It shouldn't be necessarily on Pycharm. if you know any other software, It
would be nice to tell me.

Thanks a lot,
Anahita


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Anahita Talebi
You're welcome.
You need to specify the class. I meant like that:

spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"



On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:

> Thank you, it is still not working:
>
> [image: Inline image 1]
>
> By the way, here is the original source:
>
> https://github.com/apache/spark/blob/master/examples/
> src/main/python/streaming/kafka_wordcount.py
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <anahita.t.am...@gmail.com
> <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote:
>
>> Hi,
>>
>> I think if you remove --jars, it will work. Like:
>>
>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
>> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>
>>  I had the same problem before and solved it by removing --jars.
>>
>> Cheers,
>> Anahita
>>
>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','xie3208...@gmail.com');>> wrote:
>>
>>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>>> now, can anyone tell me what's wrong with the following code and the
>>> exception it causes and how do I fix it? Thank you very much in advance.
>>>
>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> No main class set in JAR; please specify one with --class
>>>
>>>
>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>
>>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>
>>> **
>>> *Sincerely yours,*
>>>
>>>
>>> *Raymond*
>>>
>>
>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Anahita Talebi
Hi,

I think if you remove --jars, it will work. Like:

spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.
0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar

 I had the same problem before and solved it by removing --jars.

Cheers,
Anahita

On Saturday, February 25, 2017, Raymond Xie  wrote:

> I am doing a spark streaming on a hortonworks sandbox and am stuck here
> now, can anyone tell me what's wrong with the following code and the
> exception it causes and how do I fix it? Thank you very much in advance.
>
> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> No main class set in JAR; please specify one with --class
>
>
> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>
> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/
> libs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/
> libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>


submit a spark code on google cloud

2017-02-07 Thread Anahita Talebi
Hello Friends,

I am trying to run a spark code on multiple machines. To this aim, I submit
a spark code on submit job on google cloud platform.
https://cloud.google.com/dataproc/docs/guides/submit-job

I have created a cluster with 6 nodes. Does anyone know how I can realize
which nodes are participated when I run the code on the cluster?

Thanks a lot,
Anahita


Re: Running a spark code on multiple machines using google cloud platform

2017-02-02 Thread Anahita Talebi
Thanks for your answer.
do you mean Amazon EMR?

On Thu, Feb 2, 2017 at 2:30 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> U can use EMR if u want to run. On a cluster
> Kr
>
> On 2 Feb 2017 12:30 pm, "Anahita Talebi" <anahita.t.am...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I am trying to run a spark code on multiple machines using submit job in
>> google cloud platform.
>> As the inputs of my code, I have a training and testing datasets.
>>
>> When I use small training data set like (10kb), the code can be
>> successfully ran on the google cloud while when I have a large data set
>> like 50Gb, I received the following error:
>>
>> 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: 
>> SparkListenerBus has already stopped! Dropping event 
>> SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)
>>
>> Does anyone can give me a hint how I can solve my problem?
>>
>> PS: I cannot use small training data set because I have an optimization code 
>> which needs to use all the data.
>>
>> I have to use google could platform because I need to run the code on 
>> multiple machines.
>>
>> Thanks a lot,
>>
>> Anahita
>>
>>


Running a spark code on multiple machines using google cloud platform

2017-02-02 Thread Anahita Talebi
Dear all,

I am trying to run a spark code on multiple machines using submit job in
google cloud platform.
As the inputs of my code, I have a training and testing datasets.

When I use small training data set like (10kb), the code can be
successfully ran on the google cloud while when I have a large data set
like 50Gb, I received the following error:

17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)

Does anyone can give me a hint how I can solve my problem?

PS: I cannot use small training data set because I have an
optimization code which needs to use all the data.

I have to use google could platform because I need to run the code on
multiple machines.

Thanks a lot,

Anahita


Re: Running a spark code using submit job in google cloud platform

2017-01-13 Thread Anahita Talebi
Hello,

Thanks a lot Dinko.
Yes, now it is working perfectly.


Cheers,
Anahita

On Fri, Jan 13, 2017 at 2:19 PM, Dinko Srkoč <dinko.sr...@gmail.com> wrote:

> On 13 January 2017 at 13:55, Anahita Talebi <anahita.t.am...@gmail.com>
> wrote:
> > Hi,
> >
> > Thanks for your answer.
> >
> > I have chose "Spark" in the "job type". There is not any option where we
> can
> > choose the version. How I can choose different version?
>
> There's "Preemptible workers, bucket, network, version,
> initialization, & access options" link just above the "Create" and
> "Cancel" buttons on the "Create a cluster" page. When you click it,
> you'll find "Image version" field where you can enter the image
> version.
>
> Dataproc versions:
> * 1.1 would be Spark 2.0.2,
> * 1.0 includes Spark 1.6.2
>
> More about versions can be found here:
> https://cloud.google.com/dataproc/docs/concepts/dataproc-versions
>
> Cheers,
> Dinko
>
> >
> > Thanks,
> > Anahita
> >
> >
> > On Thu, Jan 12, 2017 at 6:39 PM, A Shaikh <shaikh.af...@gmail.com>
> wrote:
> >>
> >> You may have tested this code on Spark version on your local machine
> >> version of which may be different to whats in Google Cloud Storage.
> >> You need to select appropraite Spark version when you submit your job.
> >>
> >> On 12 January 2017 at 15:51, Anahita Talebi <anahita.t.am...@gmail.com>
> >> wrote:
> >>>
> >>> Dear all,
> >>>
> >>> I am trying to run a .jar file as a job using submit job in google
> cloud
> >>> console.
> >>> https://cloud.google.com/dataproc/docs/guides/submit-job
> >>>
> >>> I actually ran the spark code on my local computer to generate a .jar
> >>> file. Then in the Argument folder, I give the value of the arguments
> that I
> >>> used in the spark code. One of the argument is training data set that
> I put
> >>> in the same bucket that I save my .jar file. In the bucket, I put only
> the
> >>> .jar file, training dataset and testing dataset.
> >>>
> >>> Main class or jar
> >>> gs://Anahita/test.jar
> >>>
> >>> Arguments
> >>>
> >>> --lambda=.001
> >>> --eta=1.0
> >>> --trainFile=gs://Anahita/small_train.dat
> >>> --testFile=gs://Anahita/small_test.dat
> >>>
> >>> The problem is that when I run the job I get the following error and
> >>> actually it cannot read  my training and testing data sets.
> >>>
> >>> Exception in thread "main" java.lang.NoSuchMethodError:
> >>> org.apache.spark.rdd.RDD.coalesce(IZLscala/math/
> Ordering;)Lorg/apache/spark/rdd/RDD;
> >>>
> >>> Can anyone help me how I can solve this problem?
> >>>
> >>> Thanks,
> >>>
> >>> Anahita
> >>>
> >>>
> >>
> >
>


Re: Running a spark code using submit job in google cloud platform

2017-01-13 Thread Anahita Talebi
Hi,

Thanks for your answer.

I have chose "Spark" in the "job type". There is not any option where we
can choose the version. How I can choose different version?

Thanks,
Anahita


On Thu, Jan 12, 2017 at 6:39 PM, A Shaikh <shaikh.af...@gmail.com> wrote:

> You may have tested this code on Spark version on your local machine
> version of which may be different to whats in Google Cloud Storage.
> You need to select appropraite Spark version when you submit your job.
>
> On 12 January 2017 at 15:51, Anahita Talebi <anahita.t.am...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I am trying to run a .jar file as a job using submit job in google cloud
>> console.
>> https://cloud.google.com/dataproc/docs/guides/submit-job
>>
>> I actually ran the spark code on my local computer to generate a .jar
>> file. Then in the Argument folder, I give the value of the arguments that I
>> used in the spark code. One of the argument is training data set that I put
>> in the same bucket that I save my .jar file. In the bucket, I put only the
>> .jar file, training dataset and testing dataset.
>>
>> Main class or jar
>> gs://Anahita/test.jar
>>
>> Arguments
>>
>> --lambda=.001
>> --eta=1.0
>> --trainFile=gs://Anahita/small_train.dat
>> --testFile=gs://Anahita/small_test.dat
>>
>> The problem is that when I run the job I get the following error and
>> actually it cannot read  my training and testing data sets.
>>
>> Exception in thread "main" java.lang.NoSuchMethodError: 
>> org.apache.spark.rdd.RDD.coalesce(IZLscala/math/Ordering;)Lorg/apache/spark/rdd/RDD;
>>
>> Can anyone help me how I can solve this problem?
>>
>> Thanks,
>>
>> Anahita
>>
>>
>>
>


Running a spark code using submit job in google cloud platform

2017-01-12 Thread Anahita Talebi
Dear all,

I am trying to run a .jar file as a job using submit job in google cloud
console.
https://cloud.google.com/dataproc/docs/guides/submit-job

I actually ran the spark code on my local computer to generate a .jar file.
Then in the Argument folder, I give the value of the arguments that I used
in the spark code. One of the argument is training data set that I put in
the same bucket that I save my .jar file. In the bucket, I put only the
.jar file, training dataset and testing dataset.

Main class or jar
gs://Anahita/test.jar

Arguments

--lambda=.001
--eta=1.0
--trainFile=gs://Anahita/small_train.dat
--testFile=gs://Anahita/small_test.dat

The problem is that when I run the job I get the following error and
actually it cannot read  my training and testing data sets.

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.rdd.RDD.coalesce(IZLscala/math/Ordering;)Lorg/apache/spark/rdd/RDD;

Can anyone help me how I can solve this problem?

Thanks,

Anahita


Fwd: Entering the variables in the Argument part in Submit job section to run a spark code on Google Cloud

2017-01-09 Thread Anahita Talebi
Dear friends,

I am trying to run a run a spark code on Google cloud using submit job.
https://cloud.google.com/dataproc/docs/tutorials/spark-scala

My question is about the part "argument".
In my spark code, they are some variables that their values are defined in
a shell file (.sh), as following:

--trainFile=small_train.dat \
--testFile=small_test.dat \
--numFeatures=9947 \
--numRounds=100 \


- I have tried to enter only the values and each value in a separate box as
following but it is not working:

data/small_train.dat
data/small_test.dat
9947
100

I have also tried to give the parameters like in this below, but it is not
working neither:
trainFile=small_train.dat
testFile=small_test.dat
numFeatures=9947
numRounds=100

I added the files small_train.dat and small_test.dat in the same bucket
where I saved the .jar file. Let's say if my bucket is named Anahita, I
added spark.jar, small_train.dat and small_test.dat in the bucket "Anahita".


Does anyone know, how I can enter these values in the argument part?

Thanks in advance,
Anahita