P.S. Last but not least we use sbt-assembly to build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. These are automatically built from source by our Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ direct from HDFS, unpack it and launch the appropriate script.
Makes for a much cleaner development / testing / deployment to package everything required in one go instead of relying on cluster specific classpath additions or any add-jars functionality. On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: > When you start seriously using Spark in production there are basically two > things everyone eventually needs: > > 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. > 2. Always-On Jobs - that require monitoring, restarting etc. > > There are lots of ways to implement these requirements, everything from > crontab through to workflow managers like Oozie. > > We opted for the following stack: > > - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution) > > > - Marathon <https://github.com/mesosphere/marathon> - init/control > system for starting, stopping, and maintaining always-on applications. > > > - Chronos <http://airbnb.github.io/chronos/> - general-purpose > scheduler for Mesos, supports job dependency graphs. > > > - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> - > primarily for it's ability to reuse shared contexts with multiple jobs > > The majority of our jobs are periodic (batch) jobs run through > spark-sumit, and we have several always-on Spark Streaming jobs (also run > through spark-submit). > > We always use "client mode" with spark-submit because the Mesos cluster > has direct connectivity to the Spark cluster and it means all the Spark > stdout/stderr is externalised into Mesos logs which helps diagnosing > problems. > > I thoroughly recommend you explore using Mesos/Marathon/Chronos to run > Spark and manage your Jobs, the Mesosphere tutorials are awesome and you > can be up and running in literally minutes. The Web UI's to both make it > easy to get started without talking to REST API's etc. > > Best, > > Michael > > > > > On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: > >> I use SBT, create an assembly, and then add the assembly jars when I >> create my spark context. The main executor I run with something like "java >> -cp ... MyDriver". >> >> That said - as of spark 1.0 the preferred way to run spark applications >> is via spark-submit - >> http://spark.apache.org/docs/latest/submitting-applications.html >> >> >> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >> >>> I want to ask this, not because I can't read endless documentation and >>> several tutorials, but because there seems to be many ways of doing >>> things >>> and I keep having issues. How do you run /your /spark app? >>> >>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I >>> had >>> to get Spark and Shark working and ended upgrading everything and dropped >>> CDH support. Anyways, this is what I used with master=yarn-client and >>> app_jar being Scala code compiled with Maven. >>> >>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>> $CLASSNAME >>> $ARGS >>> >>> Do you use this? or something else? I could never figure out this method. >>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>> >>> For example: >>> bin/spark-class jar >>> >>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>> pi 10 10 >>> >>> Do you use SBT or Maven to compile? or something else? >>> >>> >>> ** It seams that I can't get subscribed to the mailing list and I tried >>> both >>> my work email and personal. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> >