Hi, all. I have a question for spark execution on mesos.
I faced task deserialization error at mesos-slave. Environment is below. JDK : 1.7.0_45(OpenJDK) Spark : 0.8.0-incubating CDH : 4.4.0 Mesos : 0.14.0-rc4 I made spark executor by spark-0.8.0-incubating-bin-cdh4.tgz and put executor to HDFS. And started by sbt. But spark execution was failed. What should I do for it? Error console is below. ======= [root@spark1 TextCount]# ../sbt/sbt run [info] Set current project to Count Project (in build file:/opt/spark-0.8.0-incubating-bin-cdh4/TextCount/) [info] Running TextCountApp log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. I1202 14:46:04.393213 10802 detector.cpp:234] Master detector (scheduler(1)@ 192.168.100.246:45418) connected to ZooKeeper ... I1202 14:46:04.394464 10802 detector.cpp:251] Trying to create path '/mesos' in ZooKeeper I1202 14:46:04.398108 10802 detector.cpp:420] Master detector (scheduler(1)@ 192.168.100.246:45418) found 3 registered masters I1202 14:46:04.398380 10802 detector.cpp:467] Master detector (scheduler(1)@ 192.168.100.246:45418) got new master pid: [email protected]:5050 [error] (run-main) org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than 4 times at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) java.lang.RuntimeException: Nonzero exit code: 1 at scala.sys.package$.error(package.scala:27) [error] {file:/opt/spark-0.8.0-incubating-bin-cdh4/TextCount/}default-f2b34a/compile:run: Nonzero exit code: 1 [error] Total time: 8 s, completed 2013/12/02 14:46:09 ======= In mesos-slave, stderr prints task deserialization error. ======= 3/12/02 14:46:09 INFO executor.MesosExecutorBackend: Registered with Mesos as executor ID 201311130859-4133791936-5050-15712-0 13/12/02 14:46:10 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started 13/12/02 14:46:10 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka://spark@spark1:37454/user/BlockManagerMaster 13/12/02 14:46:10 INFO storage.MemoryStore: MemoryStore started with capacity 324.4 MB. 13/12/02 14:46:10 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20131202144610-040d 13/12/02 14:46:10 INFO network.ConnectionManager: Bound socket to port 55159 with id = ConnectionManagerId(spark3,55159) 13/12/02 14:46:10 INFO storage.BlockManagerMaster: Trying to register BlockManager 13/12/02 14:46:10 INFO storage.BlockManagerMaster: Registered BlockManager 13/12/02 14:46:10 INFO spark.SparkEnv: Connecting to MapOutputTracker: akka://spark@spark1:37454/user/MapOutputTracker 13/12/02 14:46:10 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-7bb708b4-8503-4465-89bd-27821799f7b3 13/12/02 14:46:10 INFO server.Server: jetty-7.x.y-SNAPSHOT 13/12/02 14:46:10 INFO server.AbstractConnector: Started [email protected]:34636 13/12/02 14:46:10 INFO executor.Executor: Running task ID 0 13/12/02 14:46:10 INFO broadcast.HttpBroadcast: Started reading broadcast variable 0 13/12/02 14:46:10 INFO storage.MemoryStore: ensureFreeSpace(228503) called with curMem=0, maxMem=340147568 13/12/02 14:46:10 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 223.1 KB, free 324.2 MB) 13/12/02 14:46:10 INFO broadcast.HttpBroadcast: Reading broadcast variable 0 took 0.306537306 s 13/12/02 14:46:10 ERROR executor.Executor: Exception in task ID 0 java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:258) at org.apache.hadoop.io.UTF8.readString(UTF8.java:250) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1956) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1850) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 13/12/02 14:46:11 INFO executor.Executor: Running task ID 2 ======= Created app is below. -TextCount/src/main/scala/TextCountApp.scala ======= import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ object TextCountApp { def main(args: Array[String]) { System.setProperty("spark.executor.uri", "hdfs://spark1/sparkarchive/spark-0.8.0-2.0.0-mr1-cdh4.4.0.tgz") val logFile = "hdfs://spark1/inputdata/README.md" val sc = new SparkContext("zk://spark1:2181/mesos", "TextCountApp") val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() val numSparks = logData.filter(line => line.contains("Spark")).count() println("Lines with a: %s, Lines with b: %s, Lines with Spark: %s".format(numAs, numBs, numSparks)) } } ======= -TextCount/count.sbt ======= name := "Count Project" version := "1.0" scalaVersion := "2.9.3" libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.0-incubating" libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.0.0-cdh4.4.0" resolvers += "Akka Repository" at "http://repo.akka.io/releases/" resolvers += "Cloudera Repository" at " https://repository.cloudera.com/artifactory/repo/" ======= regards. -- ################################# Sotaro Kimura <[email protected]> #################################
