The linked thread does a good job answering your question. You should create a SparkContext at startup and re-use it for all of your queries. For example we create a SparkContext in a web server at startup, and are then able to use the Spark cluster for serving Ajax queries with latency of a second or less. The executors keep running during this time, so there is minimal overhead to starting a job.
On Thu, Apr 17, 2014 at 8:02 PM, Jim Carroll <jimfcarr...@gmail.com> wrote: > Is there a way to create continuously-running, or at least > continuously-loaded, jobs that can be 'invoked' rather than 'sent' to to > avoid the job creation overhead of a couple seconds? > > I read through the following: > > http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-td2016.html > > Thanks. > Jim > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Continuously-running-non-streaming-jobs-tp4391.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >