Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-31 Thread seglo
Thanks hbogert.  There it is plain as day; it can't find my spark binaries. 
I thought it was enough to set SPARK_EXECUTOR_URI in my spark-env.sh since
this is all that's necessary to run spark-shell.sh against a mesos master,
but I also had to set spark.executor.uri in my spark-defaults.conf (or in my
app itself).  Thanks again for your help to troubleshoot this problem.

jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stderr
I0329 20:34:26.107267 10026 exec.cpp:132] Version: 0.21.1
I0329 20:34:26.109591 10031 exec.cpp:206] Executor registered on slave
20150322-040336-606645514-5050-2744-S1
sh: 1: /home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class: not found
jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stdout
Registered executor on 10.217.7.180
Starting task 1
Forked command at 10036
sh -c ' "/home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class"
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:54746/user/CoarseGrainedScheduler
--executor-id 20150322-040336-606645514-5050-2744-S1 --hostname 10.217.7.180
--cores 10 --app-id 20150322-040336-606645514-5050-2744-0037'
Command exited with status 127 (pid: 10036)






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22331.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-31 Thread hbogert
Well that are only the logs of the slaves on mesos level,  I'm not sure from
your reply if you can ssh into a specific slave or not, if you can, you
should  look at actual output of the application (spark in this case) on a
slave in e.g.
 
/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out}

actual UUIDs, run number (in this example '4') in the path can differ from
slave-node to slave-node.

look into those stderr and stdout files and you'll probably have your answer
why it is failing.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-29 Thread seglo
Thanks for the response.  I'll admit I'm rather new to Mesos.  Due to the
nature of my setup I can't use the Mesos web portal effectively because I'm
not connected by VPN, so the local network links from the mesos-master
dashboard I SSH tunnelled aren't working.

Anyway, I was able to dig up some logs for a failed job (framework?) run on
one of my slaves "20150322-040336-606645514-5050-2744-0037"

$ cat mesos-slave.INFO | grep 20150322-040336-606645514-5050-2744-0037

I0329 20:34:26.004115  2524 slave.cpp:1083] Got assigned task 1 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.004812  2524 slave.cpp:1193] Launching task 1 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.005879  2524 slave.cpp:3997] Launching executor 1 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.006145  2524 slave.cpp:1316] Queuing task '1' for executor 1
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.006722  2531 containerizer.cpp:424] Starting container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b' for executor '1' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.089171  2529 slave.cpp:2840] Monitoring executor '1' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.108610  2529 slave.cpp:1860] Got registration for executor
'1' of framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.109136  2529 slave.cpp:1979] Flushing queued task 1 for
executor '1' of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.112584  2527 slave.cpp:2215] Handling status update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.112751  2527 status_update_manager.cpp:317] Received status
update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.113052  2527 slave.cpp:2458] Forwarding the update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.113131  2527 slave.cpp:2391] Sending acknowledgement for
status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.115972  2527 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.214292  2530 slave.cpp:2215] Handling status update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.215005  2526 status_update_manager.cpp:317] Received status
update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.215144  2526 slave.cpp:2458] Forwarding the update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.215277  2526 slave.cpp:2391] Sending acknowledgement for
status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.18  2524 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239357  2524 slave.cpp:1083] Got assigned task 4 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239853  2524 slave.cpp:1193] Launching task 4 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.240880  2524 slave.cpp:3997] Launching executor 4 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948'
I0329 20:34:26.241065  2524 slave.cpp:1316] Queuing task '4' for executor 4
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.241554  2528 containerizer.cpp:424] Starting container
'e3cf195d-525b-4148-aa38-1789d378a948' for executor '4' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.292538  2527 slave.cpp:2840] Monitoring executor '4' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'e3cf195d-525

Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-29 Thread hbogert
Hi, 

What do the mesos slave logs say? Usually this gives a clearcut error, they
are probably local on a slave node.

I'm not sure about your config, so I can;t pinpoint you to a specific path.

might look something like:

/???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-/executors/5/runs/latest/stderr





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-29 Thread Timothy Chen
I left a comment on your stackoverflow earlier. Can you share what's the output 
in your stderr log from your Mesos task? It
Can be found in your Mesos UI and going to its sandbox.

Tim

Sent from my iPhone

> On Mar 29, 2015, at 12:14 PM, seglo  wrote:
> 
> The latter part of this question where I try to submit the application by
> referring to it on HDFS is very similar to the recent question
> 
> Spark-submit not working when application jar is in hdfs
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't run spark-submit with an application jar on a Mesos cluster

2015-03-29 Thread seglo
The latter part of this question where I try to submit the application by
referring to it on HDFS is very similar to the recent question

Spark-submit not working when application jar is in hdfs
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org