When you start seriously using Spark in production there are basically two
things everyone eventually needs:

   1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
   2. Always-On Jobs - that require monitoring, restarting etc.

There are lots of ways to implement these requirements, everything from
crontab through to workflow managers like Oozie.

We opted for the following stack:

   - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)


   - Marathon <https://github.com/mesosphere/marathon> - init/control
   system for starting, stopping, and maintaining always-on applications.


   - Chronos <http://airbnb.github.io/chronos/> - general-purpose scheduler
   for Mesos, supports job dependency graphs.


   - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> -
   primarily for it's ability to reuse shared contexts with multiple jobs

The majority of our jobs are periodic (batch) jobs run through spark-sumit,
and we have several always-on Spark Streaming jobs (also run through
spark-submit).

We always use "client mode" with spark-submit because the Mesos cluster has
direct connectivity to the Spark cluster and it means all the Spark
stdout/stderr is externalised into Mesos logs which helps diagnosing
problems.

I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
can be up and running in literally minutes.  The Web UI's to both make it
easy to get started without talking to REST API's etc.

Best,

Michael




On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote:

> I use SBT, create an assembly, and then add the assembly jars when I
> create my spark context. The main executor I run with something like "java
> -cp ... MyDriver".
>
> That said - as of spark 1.0 the preferred way to run spark applications is
> via spark-submit -
> http://spark.apache.org/docs/latest/submitting-applications.html
>
>
> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote:
>
>> I want to ask this, not because I can't read endless documentation and
>> several tutorials, but because there seems to be many ways of doing things
>> and I keep having issues. How do you run /your /spark app?
>>
>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I had
>> to get Spark and Shark working and ended upgrading everything and dropped
>> CDH support. Anyways, this is what I used with master=yarn-client and
>> app_jar being Scala code compiled with Maven.
>>
>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>> $CLASSNAME
>> $ARGS
>>
>> Do you use this? or something else? I could never figure out this method.
>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>
>> For example:
>> bin/spark-class jar
>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>> pi 10 10
>>
>> Do you use SBT or Maven to compile? or something else?
>>
>>
>> ** It seams that I can't get subscribed to the mailing list and I tried
>> both
>> my work email and personal.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Reply via email to