Could you possibly describe what you are trying to learn how to do in more detail? Some basics of submitting programmatically:
- Create a SparkContext instance and use that to build your RDDs - You can only have 1 SparkContext per JVM you are running, so if you need to satisfy concurrent job requests you would need to manage a SparkContext as a shared resource on that server. Keep in mind if something goes wrong with that SparkContext, all running jobs would probably be in a failed state and you'd need to try to get a new SparkContext. - There are System.exit calls built into Spark as of now that could kill your running JVM. We have shadowed some of the most offensive bits within our own application to work around this. You'd likely want to do that or to do your own Spark fork. For example, if the SparkContext can't connect to your cluster master node when it is created, it will System.exit. - You'll need to provide all of the relevant classes that your platform uses in the jobs on the classpath of the spark cluster. We do this with a JAR file loaded from S3 dynamically by a SparkContext, but there are other options. On Mon, Apr 20, 2015 at 10:12 PM, firemonk9 <dhiraj.peech...@gmail.com> wrote: > I have built a data analytics SaaS platform by creating Rest end points and > based on the type of job request I would invoke the necessary spark > job/jobs > and return the results as json(async). I used yarn-client mode to submit > the > jobs to yarn cluster. > > hope this helps. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Instantiating-starting-Spark-jobs-programmatically-tp22577p22584.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >