We use maven for building our code and then invoke spark-submit through the
exec plugin, passing in our parameters. Works well for us.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> wrote:

> P.S. Last but not least we use sbt-assembly to build fat JAR's and build
> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
> to run a Job.  These are automatically built from source by our Jenkins and
> stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
> direct from HDFS, unpack it and launch the appropriate script.
>
> Makes for a much cleaner development / testing / deployment to package
> everything required in one go instead of relying on cluster specific
> classpath additions or any add-jars functionality.
>
>
> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote:
>
>> When you start seriously using Spark in production there are basically
>> two things everyone eventually needs:
>>
>>    1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>>    2. Always-On Jobs - that require monitoring, restarting etc.
>>
>> There are lots of ways to implement these requirements, everything from
>> crontab through to workflow managers like Oozie.
>>
>> We opted for the following stack:
>>
>>    - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>>
>>
>>    - Marathon <https://github.com/mesosphere/marathon> - init/control
>>    system for starting, stopping, and maintaining always-on applications.
>>
>>
>>    - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>>    scheduler for Mesos, supports job dependency graphs.
>>
>>
>>    - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> -
>>    primarily for it's ability to reuse shared contexts with multiple jobs
>>
>> The majority of our jobs are periodic (batch) jobs run through
>> spark-sumit, and we have several always-on Spark Streaming jobs (also run
>> through spark-submit).
>>
>> We always use "client mode" with spark-submit because the Mesos cluster
>> has direct connectivity to the Spark cluster and it means all the Spark
>> stdout/stderr is externalised into Mesos logs which helps diagnosing
>> problems.
>>
>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
>> can be up and running in literally minutes.  The Web UI's to both make it
>> easy to get started without talking to REST API's etc.
>>
>> Best,
>>
>> Michael
>>
>>
>>
>>
>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote:
>>
>>> I use SBT, create an assembly, and then add the assembly jars when I
>>> create my spark context. The main executor I run with something like "java
>>> -cp ... MyDriver".
>>>
>>> That said - as of spark 1.0 the preferred way to run spark applications
>>> is via spark-submit -
>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>
>>>
>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote:
>>>
>>>> I want to ask this, not because I can't read endless documentation and
>>>> several tutorials, but because there seems to be many ways of doing
>>>> things
>>>> and I keep having issues. How do you run /your /spark app?
>>>>
>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I
>>>> had
>>>> to get Spark and Shark working and ended upgrading everything and
>>>> dropped
>>>> CDH support. Anyways, this is what I used with master=yarn-client and
>>>> app_jar being Scala code compiled with Maven.
>>>>
>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>>> $CLASSNAME
>>>> $ARGS
>>>>
>>>> Do you use this? or something else? I could never figure out this
>>>> method.
>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>>
>>>> For example:
>>>> bin/spark-class jar
>>>>
>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>>> pi 10 10
>>>>
>>>> Do you use SBT or Maven to compile? or something else?
>>>>
>>>>
>>>> ** It seams that I can't get subscribed to the mailing list and I tried
>>>> both
>>>> my work email and personal.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>
>

Reply via email to