We use maven for building our code and then invoke spark-submit through the exec plugin, passing in our parameters. Works well for us.
Best Regards, Sonal Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> wrote: > P.S. Last but not least we use sbt-assembly to build fat JAR's and build > dist-style TAR.GZ packages with launch scripts, JAR's and everything needed > to run a Job. These are automatically built from source by our Jenkins and > stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ > direct from HDFS, unpack it and launch the appropriate script. > > Makes for a much cleaner development / testing / deployment to package > everything required in one go instead of relying on cluster specific > classpath additions or any add-jars functionality. > > > On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: > >> When you start seriously using Spark in production there are basically >> two things everyone eventually needs: >> >> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. >> 2. Always-On Jobs - that require monitoring, restarting etc. >> >> There are lots of ways to implement these requirements, everything from >> crontab through to workflow managers like Oozie. >> >> We opted for the following stack: >> >> - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution) >> >> >> - Marathon <https://github.com/mesosphere/marathon> - init/control >> system for starting, stopping, and maintaining always-on applications. >> >> >> - Chronos <http://airbnb.github.io/chronos/> - general-purpose >> scheduler for Mesos, supports job dependency graphs. >> >> >> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> - >> primarily for it's ability to reuse shared contexts with multiple jobs >> >> The majority of our jobs are periodic (batch) jobs run through >> spark-sumit, and we have several always-on Spark Streaming jobs (also run >> through spark-submit). >> >> We always use "client mode" with spark-submit because the Mesos cluster >> has direct connectivity to the Spark cluster and it means all the Spark >> stdout/stderr is externalised into Mesos logs which helps diagnosing >> problems. >> >> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run >> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you >> can be up and running in literally minutes. The Web UI's to both make it >> easy to get started without talking to REST API's etc. >> >> Best, >> >> Michael >> >> >> >> >> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: >> >>> I use SBT, create an assembly, and then add the assembly jars when I >>> create my spark context. The main executor I run with something like "java >>> -cp ... MyDriver". >>> >>> That said - as of spark 1.0 the preferred way to run spark applications >>> is via spark-submit - >>> http://spark.apache.org/docs/latest/submitting-applications.html >>> >>> >>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >>> >>>> I want to ask this, not because I can't read endless documentation and >>>> several tutorials, but because there seems to be many ways of doing >>>> things >>>> and I keep having issues. How do you run /your /spark app? >>>> >>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I >>>> had >>>> to get Spark and Shark working and ended upgrading everything and >>>> dropped >>>> CDH support. Anyways, this is what I used with master=yarn-client and >>>> app_jar being Scala code compiled with Maven. >>>> >>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>>> $CLASSNAME >>>> $ARGS >>>> >>>> Do you use this? or something else? I could never figure out this >>>> method. >>>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>>> >>>> For example: >>>> bin/spark-class jar >>>> >>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>>> pi 10 10 >>>> >>>> Do you use SBT or Maven to compile? or something else? >>>> >>>> >>>> ** It seams that I can't get subscribed to the mailing list and I tried >>>> both >>>> my work email and personal. >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>> >>> >> >