Hi all,

We have a Spark Streaming job that's been working great under Mesos for
many months with "local" as the master.

We just tried to run it on our new Mesos cluster.  This cluster has been
set up properly, and the Spark examples (e.g. SparkPi) run distributed,
correctly, under Mesos.  But our job does not.

This stack trace is widely seen on the web, but nowhere is a root cause
identified.  Now we're seeing it as well:

$ ./stagingviewbeta.sh
java -cp
/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-0.20.2-2.1.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-diagnostic-tools-0.20.2-2.1.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-jni-0.20.2-2.1.3.jar:./target/scala-2.9.2/vodviews2_2.9.2-1.0.jar:./lib/eventgateway2.jar:./lib/spark-core-assembly-0.7.3.jar:./lib/spark-streaming-assembly-0.7.3.jar:/usr/share/spark/lib/spark-core-assembly-0.7.3.jar:/usr/share/spark/lib/spark-streaming-assembly-0.7.3.jar:/usr/share/spark/lib/spark-examples_2.9.3-0.7.3.jar:/usr/share/spark/lib/spark-repl_2.9.3-0.7.3.jar:/usr/share/java/scala-library.jar:/usr/share/java/scala-compiler.jar:/usr/share/java/jline.jar
-Dspark.local.dir=/spark/tmp/viewbeta  -Djava.library.path=/opt/mapr/lib
-Xms6g -Xmx6g ViewBeta mesos://bigd-mesos-01:5050
cdp-sleuth-kafka-01.cdp.webapps.rr.com:2181 localhost:9092 stagingviewbeta5
prod-eg_v2_2-big_data 1 true
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/craigv/viewbeta/lib/spark-core-assembly-0.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/craigv/viewbeta/lib/spark-streaming-assembly-0.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/share/spark/lib/spark-core-assembly-0.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/share/spark/lib/spark-streaming-assembly-0.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
13/11/13 22:29:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
topicMap: Map(prod-eg_v2_2-big_data_v0.3.201-8158c -> 1)
13/11/13 22:29:20 ERROR cluster.TaskSetManager: Task 1.0:1 failed more than
4 times; aborting job
Exception in thread "Thread-27" spark.SparkException: Job failed: Task
1.0:1 failed more than 4 times
        at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
        at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
        at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:303)
        at
spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
        at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)
13/11/13 22:29:21 ERROR cluster.TaskSetManager: Task 3.0:2 failed more than
4 times; aborting job
Exception in thread "Thread-30" spark.SparkException: Job failed: Task
3.0:2 failed more than 4 times
        at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
        at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
        at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:303)
        at
spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
        at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)

Does anyone know what's causing this?  There is very little information to
go on here, and as soon as these two exceptions burp forth the job hangs.

Please advise.

Thanks in advance,
Craig Vanderborgh

Reply via email to