Re: Can't run spark-submit with an application jar on a Mesos cluster
Thanks hbogert. There it is plain as day; it can't find my spark binaries. I thought it was enough to set SPARK_EXECUTOR_URI in my spark-env.sh since this is all that's necessary to run spark-shell.sh against a mesos master, but I also had to set spark.executor.uri in my spark-defaults.conf (or in my app itself). Thanks again for your help to troubleshoot this problem. jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$ cat stderr I0329 20:34:26.107267 10026 exec.cpp:132] Version: 0.21.1 I0329 20:34:26.109591 10031 exec.cpp:206] Executor registered on slave 20150322-040336-606645514-5050-2744-S1 sh: 1: /home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class: not found jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$ cat stdout Registered executor on 10.217.7.180 Starting task 1 Forked command at 10036 sh -c ' "/home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class" org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:54746/user/CoarseGrainedScheduler --executor-id 20150322-040336-606645514-5050-2744-S1 --hostname 10.217.7.180 --cores 10 --app-id 20150322-040336-606645514-5050-2744-0037' Command exited with status 127 (pid: 10036) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22331.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
Well that are only the logs of the slaves on mesos level, I'm not sure from your reply if you can ssh into a specific slave or not, if you can, you should look at actual output of the application (spark in this case) on a slave in e.g. /tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out} actual UUIDs, run number (in this example '4') in the path can differ from slave-node to slave-node. look into those stderr and stdout files and you'll probably have your answer why it is failing. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
Thanks for the response. I'll admit I'm rather new to Mesos. Due to the nature of my setup I can't use the Mesos web portal effectively because I'm not connected by VPN, so the local network links from the mesos-master dashboard I SSH tunnelled aren't working. Anyway, I was able to dig up some logs for a failed job (framework?) run on one of my slaves "20150322-040336-606645514-5050-2744-0037" $ cat mesos-slave.INFO | grep 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.004115 2524 slave.cpp:1083] Got assigned task 1 for framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.004812 2524 slave.cpp:1193] Launching task 1 for framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.005879 2524 slave.cpp:3997] Launching executor 1 of framework 20150322-040336-606645514-5050-2744-0037 in work directory '/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b' I0329 20:34:26.006145 2524 slave.cpp:1316] Queuing task '1' for executor 1 of framework '20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.006722 2531 containerizer.cpp:424] Starting container '79cf96ba-bf58-45cd-927b-f6c864f6e44b' for executor '1' of framework '20150322-040336-606645514-5050-2744-0037' I0329 20:34:26.089171 2529 slave.cpp:2840] Monitoring executor '1' of framework '20150322-040336-606645514-5050-2744-0037' in container '79cf96ba-bf58-45cd-927b-f6c864f6e44b' I0329 20:34:26.108610 2529 slave.cpp:1860] Got registration for executor '1' of framework 20150322-040336-606645514-5050-2744-0037 from executor(1)@10.217.7.180:52410 I0329 20:34:26.109136 2529 slave.cpp:1979] Flushing queued task 1 for executor '1' of framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.112584 2527 slave.cpp:2215] Handling status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 from executor(1)@10.217.7.180:52410 I0329 20:34:26.112751 2527 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.113052 2527 slave.cpp:2458] Forwarding the update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 to master@10.173.40.36:5050 I0329 20:34:26.113131 2527 slave.cpp:2391] Sending acknowledgement for status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 to executor(1)@10.217.7.180:52410 I0329 20:34:26.115972 2527 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.214292 2530 slave.cpp:2215] Handling status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 from executor(1)@10.217.7.180:52410 I0329 20:34:26.215005 2526 status_update_manager.cpp:317] Received status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.215144 2526 slave.cpp:2458] Forwarding the update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 to master@10.173.40.36:5050 I0329 20:34:26.215277 2526 slave.cpp:2391] Sending acknowledgement for status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 to executor(1)@10.217.7.180:52410 I0329 20:34:26.18 2524 status_update_manager.cpp:389] Received status update acknowledgement (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.239357 2524 slave.cpp:1083] Got assigned task 4 for framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.239853 2524 slave.cpp:1193] Launching task 4 for framework 20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.240880 2524 slave.cpp:3997] Launching executor 4 of framework 20150322-040336-606645514-5050-2744-0037 in work directory '/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948' I0329 20:34:26.241065 2524 slave.cpp:1316] Queuing task '4' for executor 4 of framework '20150322-040336-606645514-5050-2744-0037 I0329 20:34:26.241554 2528 containerizer.cpp:424] Starting container 'e3cf195d-525b-4148-aa38-1789d378a948' for executor '4' of framework '20150322-040336-606645514-5050-2744-0037' I0329 20:34:26.292538 2527 slave.cpp:2840] Monitoring executor '4' of framework '20150322-040336-606645514-5050-2744-0037' in container 'e3cf195d-525
Re: Can't run spark-submit with an application jar on a Mesos cluster
Hi, What do the mesos slave logs say? Usually this gives a clearcut error, they are probably local on a slave node. I'm not sure about your config, so I can;t pinpoint you to a specific path. might look something like: /???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-/executors/5/runs/latest/stderr -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
I left a comment on your stackoverflow earlier. Can you share what's the output in your stderr log from your Mesos task? It Can be found in your Mesos UI and going to its sandbox. Tim Sent from my iPhone > On Mar 29, 2015, at 12:14 PM, seglo wrote: > > The latter part of this question where I try to submit the application by > referring to it on HDFS is very similar to the recent question > > Spark-submit not working when application jar is in hdfs > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
The latter part of this question where I try to submit the application by referring to it on HDFS is very similar to the recent question Spark-submit not working when application jar is in hdfs http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org