I assume that you would like to trigger Spark batch jobs, and not streaming jobs.
For production jobs, I recommend avoiding scheduling batch jobs directly with cron or cron services like Chronos. Sometimes, jobs will fail, either due to missing input data, or due to execution problems. When it happens, you will need a mechanism to backfill missing datasets by retrying jobs, or your system will be brittle. The component that does this for you is called a workflow manager. I suggest using either Luigi (https://github.com/spotify/luigi) or Airflow (https://github.com/apache/incubator-airflow). You will need to periodically schedule the workflow manager to evaluate your pipeline status and run jobs (at least with Luigi), but the workflow manager verifies input data presence before starting jobs, and can cover up for transient failures and delayed input data, making the system as a whole stable. Oozie, mentioned in this thread, is also a workflow manager. It has an XML-based DSL, however. It is therefore syntax-wise clumsy, and limited in expressivity, which prevents you from using it for some complex, but common scenarios, e.g. pipelines requiring dynamic dependencies. Some frameworks for running services are capable of also executing batch jobs, e.g. Aurora and Kubernetes. They have weak DSLs for expressing dependencies, however, and are suitable only if you have simple pipelines. They are useful if you want to run Spark Streaming jobs, however. Marathon did not support batch jobs last I checked, and is only useful for streaming scenarios. You can find more context and advice on running batch jobs in production from the resources in this list, under the sections "End to end" and "Batch processing": http://www.mapflat.com/lands/resources/reading-list/ Regards, Lars Albertsson Data engineering consultant www.mapflat.com https://twitter.com/lalleal +46 70 7687109 Calendar: https://goo.gl/6FBtlS On Wed, Jul 20, 2016 at 3:47 PM, Sathish Kumaran Vairavelu <vsathishkuma...@gmail.com> wrote: > If you are using Mesos, then u can use Chronos or Marathon > > On Wed, Jul 20, 2016 at 6:08 AM Rabin Banerjee > <dev.rabin.baner...@gmail.com> wrote: >> >> ++ crontab :) >> >> On Wed, Jul 20, 2016 at 9:07 AM, Andrew Ehrlich <and...@aehrlich.com> >> wrote: >>> >>> Another option is Oozie with the spark action: >>> https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html >>> >>> On Jul 18, 2016, at 12:15 AM, Jagat Singh <jagatsi...@gmail.com> wrote: >>> >>> You can use following options >>> >>> * spark-submit from shell >>> * some kind of job server. See spark-jobserver for details >>> * some notebook environment See Zeppelin for example >>> >>> >>> >>> >>> >>> On 18 July 2016 at 17:13, manish jaiswal <manishsr...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> >>>> What is the best approach to trigger spark job in production cluster? >>> >>> >>> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org