date:20141218

Re: Unable to start Spark 1.3 after building:java.lang. NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2014-12-18 Thread Sean Owen

Adding a hadoop-2.6 profile is not necessary. Use hadoop-2.4, which
already exists and is intended for 2.4+. In fact this declaration is
missing things that Hadoop 2 needs.

On Thu, Dec 18, 2014 at 3:46 AM, Kyle Lin kylelin2...@gmail.com wrote:
 Hi there

 The following is my steps. And got the same exception with Daniel's.
 Another question: how can I build a tgz file like the pre-build file I
 download from official website?

 1. download trunk from git.

 2. add following lines in pom.xml
 + profile
 +  idhadoop-2.6/id
 +  properties
 +hadoop.version2.6.0/hadoop.version
 +protobuf.version2.5.0/protobuf.version
 +jets3t.version0.9.0/jets3t.version
 +  /properties
 +/profile

 3. run mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean
 package

 4. in $SPARK_HOME, run following command
 ./bin/spark-submit --master yarn-cluster --class
 org.apache.spark.examples.SparkPi lib/spark-examples*.jar 10

 Kyle


 2014-12-18 2:24 GMT+08:00 Daniel Haviv danielru...@gmail.com:

 Thanks for your replies.
 I was building spark from trunk.

 Daniel

 On 17 בדצמ׳ 2014, at 19:49, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Thanks for the correction, Sean. Do the docs need to be updated on this
 point, or is it safer for now just to note 2.4 specifically?

 On Wed Dec 17 2014 at 5:54:53 AM Sean Owen so...@cloudera.com wrote:

 Spark works fine with 2.4 *and later*. The docs don't mean to imply
 2.4 is the last supported version.

 On Wed, Dec 17, 2014 at 10:19 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  Spark 1.3 does not exist. Spark 1.2 hasn't been released just yet.
  Which
  version of Spark did you mean?
 
  Also, from what I can see in the docs, I believe the latest version of
  Hadoop that Spark supports is 2.4, not 2.6.
 
  Nick
 
  On Wed Dec 17 2014 at 2:09:56 AM Kyle Lin kylelin2...@gmail.com
  wrote:
 
 
  I also got the same problem..
 
  2014-12-09 22:58 GMT+08:00 Daniel Haviv danielru...@gmail.com:
 
  Hi,
  I've built spark 1.3 with hadoop 2.6 but when I startup the
  spark-shell I
  get the following exception:
 
  14/12/09 06:54:24 INFO server.AbstractConnector: Started
  SelectChannelConnector@0.0.0.0:4040
  14/12/09 06:54:24 INFO util.Utils: Successfully started service
  'SparkUI'
  on port 4040.
  14/12/09 06:54:24 INFO ui.SparkUI: Started SparkUI at
  http://hdname:4040
  14/12/09 06:54:25 INFO impl.TimelineClientImpl: Timeline service
  address:
  http://0.0.0.0:8188/ws/v1/timeline/
  java.lang.NoClassDefFoundError:
  org/codehaus/jackson/map/deser/std/StdDeserializer
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
  at
 
  java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 
  Any idea why ?
 
  Thanks,
  Daniel
 
 
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Can Spark 1.0.2 run on CDH-4.3.0 with yarn? And Will Spark 1.2.0 support CDH5.1.2 with yarn?

2014-12-18 Thread Canoe

I did not compile spark 1.1.0 source code on CDH4.3.0 with yarn successfully. 
Does it support CDH4.3.0 with yarn ? 
And will spark 1.2.0 support CDH5.1.2?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-Spark-1-0-2-run-on-CDH-4-3-0-with-yarn-And-Will-Spark-1-2-0-support-CDH5-1-2-with-yarn-tp20760.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-18 Thread Sean Owen

Well, it's always a good idea to used matched binary versions. Here it
is more acutely necessary. You can use a pre built binary -- if you
use it to compile and also run. Why does it not make sense to publish
artifacts?

Not sure what you mean about core vs assembly, as the assembly
contains all of the modules. You don't literally need the same jar
file.

On Thu, Dec 18, 2014 at 3:20 AM, Sun, Rui rui@intel.com wrote:
 Not using spark-submit. The App directly communicates with the Spark cluster
 in standalone mode.



 If mark the Spark dependency as 'provided’, then the spark-core .jar
 elsewhere must be pointe to in CLASSPATH. However, the pre-built Spark
 binary only has an assembly jar, not having individual module jars. So you
 don’t have a chance to point to a module.jar which is the same binary as
 that in the pre-built Spark binary.



 Maybe the Spark distribution should contain not only the assembly jar but
 also individual module jars. Any opinion?



 From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
 Sent: Thursday, December 18, 2014 2:20 AM
 To: Sean Owen
 Cc: Sun, Rui; user@spark.apache.org
 Subject: Re: weird bytecode incompatability issue between spark-core jar
 from mvn repo and official spark prebuilt binary



 Just to clarify, are you running the application using spark-submit after
 packaging with sbt package ? One thing that might help is to mark the Spark
 dependency as 'provided' as then you shouldn't have the Spark classes in
 your jar.



 Thanks

 Shivaram



 On Wed, Dec 17, 2014 at 4:39 AM, Sean Owen so...@cloudera.com wrote:

 You should use the same binaries everywhere. The problem here is that
 anonymous functions get compiled to different names when you build
 different (potentially) so you actually have one function being called
 when another function is meant.


 On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui rui@intel.com wrote:
 Hi,



 I encountered a weird bytecode incompatability issue between spark-core
 jar
 from mvn repo and official spark prebuilt binary.



 Steps to reproduce:

 1. Download the official pre-built Spark binary 1.1.1 at
 http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz

 2. Launch the Spark cluster in pseudo cluster mode

 3. A small scala APP which calls RDD.saveAsObjectFile()

 scalaVersion := 2.10.4



 libraryDependencies ++= Seq(

   org.apache.spark %% spark-core % 1.1.1

 )



 val sc = new SparkContext(args(0), test) //args[0] is the Spark master
 URI

   val rdd = sc.parallelize(List(1, 2, 3))

   rdd.saveAsObjectFile(/tmp/mysaoftmp)

   sc.stop



 throws an exception as follows:

 [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
 stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
 Lost
 task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com):
 java.lang.ClassCastException: scala.Tuple2 cannot be cast to
 scala.collection.Iterator

 [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)

 [error] org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)

 [error]
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

 [error]
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 [error] org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)

 [error]
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 [error]
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

 [error] org.apache.spark.scheduler.Task.run(Task.scala:54)

 [error]
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)

 [error]

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)

 [error]

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 [error] java.lang.Thread.run(Thread.java:701)



 After investigation, I found that this is caused by bytecode
 incompatibility
 issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built
 spark
 assembly respectively.



 This issue also happens with spark 1.1.0.



 Is there anything wrong in my usage of Spark? Or anything wrong in the
 process of deploying Spark module jars to maven repo?



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: weird bytecode incompatability issue between spark-core jar from mvn repo and official spark prebuilt binary

2014-12-18 Thread Shixiong Zhu

@Rui do you mean the spark-core jar in the maven central repo
are incompatible with the same version of the the official pre-built Spark
binary? That's really weird. I thought they should have used the same codes.

Best Regards,
Shixiong Zhu

2014-12-18 17:22 GMT+08:00 Sean Owen so...@cloudera.com:

 Well, it's always a good idea to used matched binary versions. Here it
 is more acutely necessary. You can use a pre built binary -- if you
 use it to compile and also run. Why does it not make sense to publish
 artifacts?

 Not sure what you mean about core vs assembly, as the assembly
 contains all of the modules. You don't literally need the same jar
 file.

 On Thu, Dec 18, 2014 at 3:20 AM, Sun, Rui rui@intel.com wrote:
  Not using spark-submit. The App directly communicates with the Spark
 cluster
  in standalone mode.
 
 
 
  If mark the Spark dependency as 'provided’, then the spark-core .jar
  elsewhere must be pointe to in CLASSPATH. However, the pre-built Spark
  binary only has an assembly jar, not having individual module jars. So
 you
  don’t have a chance to point to a module.jar which is the same binary as
  that in the pre-built Spark binary.
 
 
 
  Maybe the Spark distribution should contain not only the assembly jar but
  also individual module jars. Any opinion?
 
 
 
  From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
  Sent: Thursday, December 18, 2014 2:20 AM
  To: Sean Owen
  Cc: Sun, Rui; user@spark.apache.org
  Subject: Re: weird bytecode incompatability issue between spark-core jar
  from mvn repo and official spark prebuilt binary
 
 
 
  Just to clarify, are you running the application using spark-submit after
  packaging with sbt package ? One thing that might help is to mark the
 Spark
  dependency as 'provided' as then you shouldn't have the Spark classes in
  your jar.
 
 
 
  Thanks
 
  Shivaram
 
 
 
  On Wed, Dec 17, 2014 at 4:39 AM, Sean Owen so...@cloudera.com wrote:
 
  You should use the same binaries everywhere. The problem here is that
  anonymous functions get compiled to different names when you build
  different (potentially) so you actually have one function being called
  when another function is meant.
 
 
  On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui rui@intel.com wrote:
  Hi,
 
 
 
  I encountered a weird bytecode incompatability issue between spark-core
  jar
  from mvn repo and official spark prebuilt binary.
 
 
 
  Steps to reproduce:
 
  1. Download the official pre-built Spark binary 1.1.1 at
  http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz
 
  2. Launch the Spark cluster in pseudo cluster mode
 
  3. A small scala APP which calls RDD.saveAsObjectFile()
 
  scalaVersion := 2.10.4
 
 
 
  libraryDependencies ++= Seq(
 
org.apache.spark %% spark-core % 1.1.1
 
  )
 
 
 
  val sc = new SparkContext(args(0), test) //args[0] is the Spark master
  URI
 
val rdd = sc.parallelize(List(1, 2, 3))
 
rdd.saveAsObjectFile(/tmp/mysaoftmp)
 
sc.stop
 
 
 
  throws an exception as follows:
 
  [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
  stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
  Lost
  task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com):
  java.lang.ClassCastException: scala.Tuple2 cannot be cast to
  scala.collection.Iterator
 
  [error]
  org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 
  [error]
  org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 
  [error]
  org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 
  [error]
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 
  [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  [error]
  org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 
  [error]
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 
  [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  [error]
  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 
  [error] org.apache.spark.scheduler.Task.run(Task.scala:54)
 
  [error]
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
 
  [error]
 
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
 
  [error]
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
  [error] java.lang.Thread.run(Thread.java:701)
 
 
 
  After investigation, I found that this is caused by bytecode
  incompatibility
  issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built
  spark
  assembly respectively.
 
 
 
  This issue also happens with spark 1.1.0.
 
 
 
  Is there anything wrong in my usage of Spark? Or anything wrong in the
  process of deploying Spark module jars to maven repo?
 
 
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail:

Re: Can Spark 1.0.2 run on CDH-4.3.0 with yarn? And Will Spark 1.2.0 support CDH5.1.2 with yarn?

2014-12-18 Thread Sean Owen

The question is really: will Spark 1.1 work with a particular version
of YARN? many, but not all versions of YARN are supported. The
stable versions are (2.2.x+). Before that, support is patchier, and
in fact has been removed in Spark 1.3.

The yarn profile supports YARN stable which is about 2.2.x and
onwards. The yarn-alpha profile should work for YARN about 0.23.x.
2.0.x and 2.1.x were a sort of beta period and I recall that
yarn-alpha works with some of it, but not all, and there is no
yarn-beta profile.

I believe early CDH 4.x has basically YARN beta. Later 4.x has
stable. I think I'd try the yarn-alpha profile and see if it compiles.
But the version of YARN in that release may well be among those that
fall in the gap between alpha and stable support.

Thankfully things got a lot more stable past Hadoop / YARN 2.2 or so,
so it far more just works without version issues. And CDH 5 is based
on Hadoop 2.3 and then 2.5, so you should be much more able to build
whatever versions together that you want.

CDH 5.1.x ships Spark 1.0.x. There should be no problem using 1.1.x,
1.2.x, etc. with it; you just need to make and support your own
binaries. 5.2.x has 1.1.x; 5.3.x will have 1.2.x.

On Thu, Dec 18, 2014 at 9:18 AM, Canoe canoe...@gmail.com wrote:
I did not compile spark 1.1.0 source code on CDH4.3.0 with yarn successfully.
Does it support CDH4.3.0 with yarn ?
And will spark 1.2.0 support CDH5.1.2?

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-Spark-1-0-2-run-on-CDH-4-3-0-with-yarn-And-Will-Spark-1-2-0-support-CDH5-1-2-with-yarn-tp20760.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Semantics of foreachPartition()

2014-12-18 Thread Tobias Pfeiffer

Hi,

I have the following code in my application:

tmpRdd.foreach(item = {
  println(abc:  + item)
})
tmpRdd.foreachPartition(iter = {
  iter.map(item = {
println(xyz:  + item)
  })
})

In the output, I see only the abc prints (i.e. from the foreach() call).
(The result is the same also if I exchange the order.) What exactly is the
meaning of foreachPartition and how would I use it correctly?

Thanks
Tobias

Re: Can Spark 1.0.2 run on CDH-4.3.0 with yarn? And Will Spark 1.2.0 support CDH5.1.2 with yarn?

2014-12-18 Thread Zhihang Fan

Hi, Sean
Thank you for your reply. I will try to use Spark 1.1 and 1.2 on CHD5.X.
:)

2014-12-18 17:38 GMT+08:00 Sean Owen so...@cloudera.com:

CDH 5.1.x ships Spark 1.0.x. There should be no problem using 1.1.x,
1.2.x, etc. with it; you just need to make and support your own
binaries. 5.2.x has 1.1.x; 5.3.x will have 1.2.x.

On Thu, Dec 18, 2014 at 9:18 AM, Canoe canoe...@gmail.com wrote:
I did not compile spark 1.1.0 source code on CDH4.3.0 with yarn
successfully.
Does it support CDH4.3.0 with yarn ?
And will spark 1.2.0 support CDH5.1.2?

67 matches

Mail list logo