
Thanks for your email.

Two things:

1. I think a whole lot of data scientists and other people would love
it if they could just fire off jobs from their laptops. It is, in my
opinion, a common desired use case.

2. Did anyone actually get the Ooyala job server to work? I asked that
question 6 months ago and never got a straight answer. I ended up
writing a middle-layer using Scalatra and actors to submit jobs via an
API and receive results back in JSON. In that I ran into the inability
to share the SparkContext "feature" and it took a lot of finagling to
make things work (but it never felt "production ready").


On Sat, Nov 15, 2014 at 03:36:43PM +0000, Ashic Mahtab wrote:
> Hi Ben,I haven't tried it with Python, but the instructions are the same as 
> for Scala compiled (jar) apps. What it's saying is that it's not possible to 
> offload the entire work to the master (ala hadoop) in a fire and forget (or 
> rather submit-and-forget) manner when running on stand alone. There are two 
> deployment modes - client and cluster. For standalone, only client is 
> supported. What this means is that the "submitting process" will be the 
> driver process (not to be confused with "master"). It should very well be 
> possible to submit from you laptop to a standalone cluster, but the process 
> running spark-submit will be alive until the job finishes. If you terminate 
> the process (via kill-9 or otherwise), then the job will be terminated as 
> well. The driver process will submit the work to the spark master, which will 
> do the usually divvying up of tasks, distribution, fault tolerance, etc. and 
> the results will get reported back to the driver process. 
> Often it's not possible to have arbitrary access to the spark master, and if 
> jobs take hours to complete, it's not feasible to have the process running on 
> the laptop without interruptions, disconnects, etc. As such, a "gateway" 
> machine is used closer to the spark master that's used to submit jobs from. 
> That way, the process on the gateway machine lives for the duration of the 
> job, and no connection from the laptop, etc. is needed. It's not uncommon to 
> actually have an api to the gateway machine. For example, Ooyala's job server 
> provides a restful interface to 
> submit jobs.
> Does that help?
> Regards,Ashic.
Submitting Python Applications from Remote to Master
> Hi All,
> I'm not quite clear on whether submitting a python application to spark 
> standalone on ec2 is possible. 
> Am I reading this correctly:
> *A common deployment strategy is to submit your application from a gateway 
> machine that is physically co-located with your worker machines (e.g. Master 
> node in a standalone EC2 cluster). In this setup, client mode is appropriate. 
> In client mode, the driver is launched directly within the client 
> spark-submit process, with the input and output of the application attached 
> to the console. Thus, this mode is especially suitable for applications that 
> involve the REPL (e.g. Spark shell).Alternatively, if your application is 
> submitted from a machine far from the worker machines (e.g. locally on your 
> laptop), it is common to usecluster mode to minimize network latency between 
> the drivers and the executors. Note that cluster mode is currently not 
> supported for standalone clusters, Mesos clusters, or python applications.
> So I shouldn't be able to do something like:./bin/spark-submit  --master 
> spark:/  examples/src/main/python/ 
> From a laptop connecting to a previously launched spark cluster using the 
> default spark-ec2 script, correct?
> If I am not mistaken about this then docs are slightly confusing -- the above 
> example is more or less the example here: 
> If I am mistaken, apologies, can you help me figure out where I went 
> wrong?I've also taken to opening port 7077 to
> --Ben

