Can't run spark-submit with an application jar on a Mesos cluster

seglo Sun, 29 Mar 2015 12:15:07 -0700

Mesosphere did a great job on simplifying the process of running Spark on
Mesos. I am using this guide to setup a development Mesos cluster on Google
Cloud Compute.


https://mesosphere.com/docs/tutorials/run-spark-on-mesos/

I can run the example that's in the guide by using spark-shell (finding
numbers less than 10). However, when I attempt to submit an application that
otherwise works fine with Spark locally it blows up with TASK_FAILED
messages (i.e. CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED).

Here's the command I'm using with the provided Spark Pi example.

./spark-submit --class org.apache.spark.examples.SparkPi --master
mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100

And the output:

jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/22 16:44:02 INFO SparkContext: Running Spark version 1.3.0
15/03/22 16:44:02 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/03/22 16:44:03 INFO SecurityManager: Changing view acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: Changing modify acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(jclouds); users
with modify permissions: Set(jclouds)
15/03/22 16:44:03 INFO Slf4jLogger: Slf4jLogger started
15/03/22 16:44:03 INFO Remoting: Starting remoting
15/03/22 16:44:03 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:60301]
15/03/22 16:44:03 INFO Utils: Successfully started service 'sparkDriver' on
port 60301.
15/03/22 16:44:03 INFO SparkEnv: Registering MapOutputTracker
15/03/22 16:44:03 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 16:44:03 INFO DiskBlockManager: Created local directory at
/tmp/spark-27fad7e3-4ad7-44d6-845f-4a09ac9cce90/blockmgr-a558b7be-0d72-49b9-93fd-5ef8731b314b
15/03/22 16:44:03 INFO MemoryStore: MemoryStore started with capacity 265.0
MB
15/03/22 16:44:04 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-de9ac795-381b-4acd-a723-a9a6778773c9/httpd-7115216c-0223-492b-ae6f-4134ba7228ba
15/03/22 16:44:04 INFO HttpServer: Starting HTTP Server
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:36663
15/03/22 16:44:04 INFO Utils: Successfully started service 'HTTP file
server' on port 36663.
15/03/22 16:44:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/22 16:44:04 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/22 16:44:04 INFO SparkUI: Started SparkUI at
http://development-5159-d9.c.learning-spark.internal:4040
15/03/22 16:44:04 INFO SparkContext: Added JAR
file:/home/jclouds/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar
at http://10.173.40.36:36663/jars/spark-examples-1.3.0-hadoop2.4.0.jar with
timestamp 1427042644934
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
I0322 16:44:05.035423   308 sched.cpp:137] Version: 0.21.1
I0322 16:44:05.038136   309 sched.cpp:234] New master detected at
master@10.173.40.36:5050
I0322 16:44:05.039261   309 sched.cpp:242] No credentials provided.
Attempting to register without authentication
I0322 16:44:05.040351   310 sched.cpp:408] Framework registered with
20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Registered as framework
ID 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO NettyBlockTransferService: Server created on 44177
15/03/22 16:44:05 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 16:44:05 INFO BlockManagerMasterActor: Registering block manager
development-5159-d9.c.learning-spark.internal:44177 with 265.0 MB RAM,
BlockManagerId(<driver>, development-5159-d9.c.learning-spark.internal,
44177)
15/03/22 16:44:05 INFO BlockManagerMaster: Registered BlockManager
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
15/03/22 16:44:05 INFO SparkContext: Starting job: reduce at
SparkPi.scala:35
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35)
with 100 output partitions (allowLocal=false)
15/03/22 16:44:05 INFO DAGScheduler: Final stage: Stage 0(reduce at
SparkPi.scala:35)
15/03/22 16:44:05 INFO DAGScheduler: Parents of final stage: List()
15/03/22 16:44:05 INFO DAGScheduler: Missing parents: List()
15/03/22 16:44:05 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1]
at map at SparkPi.scala:31), which has no missing parents
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave
value: "20150322-040336-606645514-5050-2744-S1"
 due to too many failures; is Spark installed on it?
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S0"
  due to too many failures; is Spark installed on it?
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1848) called with
curMem=0, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1848.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1296) called with
curMem=1848, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1296.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on development-5159-d9.c.learning-spark.internal:44177 (size: 1296.0
B, free: 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
  15/03/22 16:44:05 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:839
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_RUNNING
  15/03/22 16:44:05 INFO DAGScheduler: Submitting 100 missing tasks from
Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
  15/03/22 16:44:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 100
tasks
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_FAILED
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S2"
   due to too many failures; is Spark installed on it?
   15/03/22 16:44:20 WARN TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient resources

I suspect it may have something to do with the mesos slave nodes not finding
the application jar, but when I put it in HDFS and provide the URL to it,
spark-submit tells me it will Skip remote jar.

jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar 100Spark
assembly has been built with Hive, including Datanucleus jars on classpath
Warning: Skip remote jar
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties

I created a StackOverflow question for this as well

http://stackoverflow.com/questions/29198522/cant-run-spark-submit-with-an-application-jar-on-a-mesos-cluster



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Can't run spark-submit with an application jar on a Mesos cluster

Reply via email to