Ashic, Thanks for your email.
Two things: 1. I think a whole lot of data scientists and other people would love it if they could just fire off jobs from their laptops. It is, in my opinion, a common desired use case. 2. Did anyone actually get the Ooyala job server to work? I asked that question 6 months ago and never got a straight answer. I ended up writing a middle-layer using Scalatra and actors to submit jobs via an API and receive results back in JSON. In that I ran into the inability to share the SparkContext "feature" and it took a lot of finagling to make things work (but it never felt "production ready"). Ognen On Sat, Nov 15, 2014 at 03:36:43PM +0000, Ashic Mahtab wrote: > Hi Ben,I haven't tried it with Python, but the instructions are the same as > for Scala compiled (jar) apps. What it's saying is that it's not possible to > offload the entire work to the master (ala hadoop) in a fire and forget (or > rather submit-and-forget) manner when running on stand alone. There are two > deployment modes - client and cluster. For standalone, only client is > supported. What this means is that the "submitting process" will be the > driver process (not to be confused with "master"). It should very well be > possible to submit from you laptop to a standalone cluster, but the process > running spark-submit will be alive until the job finishes. If you terminate > the process (via kill-9 or otherwise), then the job will be terminated as > well. The driver process will submit the work to the spark master, which will > do the usually divvying up of tasks, distribution, fault tolerance, etc. and > the results will get reported back to the driver process. > Often it's not possible to have arbitrary access to the spark master, and if > jobs take hours to complete, it's not feasible to have the process running on > the laptop without interruptions, disconnects, etc. As such, a "gateway" > machine is used closer to the spark master that's used to submit jobs from. > That way, the process on the gateway machine lives for the duration of the > job, and no connection from the laptop, etc. is needed. It's not uncommon to > actually have an api to the gateway machine. For example, Ooyala's job server > https://github.com/ooyala/spark-jobserver provides a restful interface to > submit jobs. > Does that help? > Regards,Ashic. > Date: Fri, 14 Nov 2014 13:40:43 -0600 > Subject: Submitting Python Applications from Remote to Master > From: quasi...@gmail.com > To: user@spark.apache.org > > Hi All, > I'm not quite clear on whether submitting a python application to spark > standalone on ec2 is possible. > Am I reading this correctly: > *A common deployment strategy is to submit your application from a gateway > machine that is physically co-located with your worker machines (e.g. Master > node in a standalone EC2 cluster). In this setup, client mode is appropriate. > In client mode, the driver is launched directly within the client > spark-submit process, with the input and output of the application attached > to the console. Thus, this mode is especially suitable for applications that > involve the REPL (e.g. Spark shell).Alternatively, if your application is > submitted from a machine far from the worker machines (e.g. locally on your > laptop), it is common to usecluster mode to minimize network latency between > the drivers and the executors. Note that cluster mode is currently not > supported for standalone clusters, Mesos clusters, or python applications. > So I shouldn't be able to do something like:./bin/spark-submit --master > spark:/xxxxx.compute-1.amazonaws.com:7077 examples/src/main/python/pi.py > From a laptop connecting to a previously launched spark cluster using the > default spark-ec2 script, correct? > If I am not mistaken about this then docs are slightly confusing -- the above > example is more or less the example here: > https://spark.apache.org/docs/1.1.0/submitting-applications.html > If I am mistaken, apologies, can you help me figure out where I went > wrong?I've also taken to opening port 7077 to 0.0.0.0/0 > --Ben > > > -- "Convictions are more dangerous enemies of truth than lies." - Friedrich Nietzsche --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org