Spark still supports the ability to submit jobs programmatically without shell scripts.
Koert, The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster. In this case, the SparkContext is created on a remote node well after the application is launched. On Wed, Jul 9, 2014 at 8:34 AM, Andrei <faithlessfri...@gmail.com> wrote: > One another +1. For me it's a question of embedding. With > SparkConf/SparkContext I can easily create larger projects with Spark as a > separate service (just like MySQL and JDBC, for example). With spark-submit > I'm bound to Spark as a main framework that defines how my application > should look like. In my humble opinion, using Spark as embeddable library > rather than main framework and runtime is much easier. > > > > > On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <chiling...@gmail.com> wrote: > >> +1 as well for being able to submit jobs programmatically without using >> shell script. >> >> we also experience issues of submitting jobs programmatically without >> using spark-submit. In fact, even in the Hadoop World, I rarely used >> "hadoop jar" to submit jobs in shell. >> >> >> >> On Wed, Jul 9, 2014 at 9:47 AM, Robert James <srobertja...@gmail.com> >> wrote: >> >>> +1 to be able to do anything via SparkConf/SparkContext. Our app >>> worked fine in Spark 0.9, but, after several days of wrestling with >>> uber jars and spark-submit, and so far failing to get Spark 1.0 >>> working, we'd like to go back to doing it ourself with SparkConf. >>> >>> As the previous poster said, a few scripts should be able to give us >>> the classpath and any other params we need, and be a lot more >>> transparent and debuggable. >>> >>> On 7/9/14, Surendranauth Hiraman <suren.hira...@velos.io> wrote: >>> > Are there any gaps beyond convenience and code/config separation in >>> using >>> > spark-submit versus SparkConf/SparkContext if you are willing to set >>> your >>> > own config? >>> > >>> > If there are any gaps, +1 on having parity within >>> SparkConf/SparkContext >>> > where possible. In my use case, we launch our jobs programmatically. In >>> > theory, we could shell out to spark-submit but it's not the best >>> option for >>> > us. >>> > >>> > So far, we are only using Standalone Cluster mode, so I'm not >>> knowledgeable >>> > on the complexities of other modes, though. >>> > >>> > -Suren >>> > >>> > >>> > >>> > On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> > >>> >> not sure I understand why unifying how you submit app for different >>> >> platforms and dynamic configuration cannot be part of SparkConf and >>> >> SparkContext? >>> >> >>> >> for classpath a simple script similar to "hadoop classpath" that shows >>> >> what needs to be added should be sufficient. >>> >> >>> >> on spark standalone I can launch a program just fine with just >>> SparkConf >>> >> and SparkContext. not on yarn, so the spark-launch script must be >>> doing a >>> >> few things extra there I am missing... which makes things more >>> difficult >>> >> because I am not sure its realistic to expect every application that >>> >> needs >>> >> to run something on spark to be launched using spark-submit. >>> >> On Jul 9, 2014 3:45 AM, "Patrick Wendell" <pwend...@gmail.com> >>> wrote: >>> >> >>> >>> It fulfills a few different functions. The main one is giving users a >>> >>> way to inject Spark as a runtime dependency separately from their >>> >>> program and make sure they get exactly the right version of Spark. So >>> >>> a user can bundle an application and then use spark-submit to send it >>> >>> to different types of clusters (or using different versions of >>> Spark). >>> >>> >>> >>> It also unifies the way you bundle and submit an app for Yarn, Mesos, >>> >>> etc... this was something that became very fragmented over time >>> before >>> >>> this was added. >>> >>> >>> >>> Another feature is allowing users to set configuration values >>> >>> dynamically rather than compile them inside of their program. That's >>> >>> the one you mention here. You can choose to use this feature or not. >>> >>> If you know your configs are not going to change, then you don't need >>> >>> to set them with spark-submit. >>> >>> >>> >>> >>> >>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James < >>> srobertja...@gmail.com> >>> >>> wrote: >>> >>> > What is the purpose of spark-submit? Does it do anything outside of >>> >>> > the standard val conf = new SparkConf ... val sc = new SparkContext >>> >>> > ... ? >>> >>> >>> >> >>> > >>> > >>> > -- >>> > >>> > SUREN HIRAMAN, VP TECHNOLOGY >>> > Velos >>> > Accelerating Machine Learning >>> > >>> > 440 NINTH AVENUE, 11TH FLOOR >>> > NEW YORK, NY 10001 >>> > O: (917) 525-2466 ext. 105 >>> > F: 646.349.4063 >>> > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io >>> > W: www.velos.io >>> > >>> >> >> >