Hi Shivani, I use sbt assembly to create a fat jar . https://github.com/sbt/sbt-assembly
Example of the sbt file is below. import AssemblyKeys._ // put this at the top of the file assemblySettings mainClass in assembly := Some("FifaSparkStreaming") name := "FifaSparkStreaming" version := "1.0" scalaVersion := "2.10.4" libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" % "provided", "org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided", ("org.apache.spark" %% "spark-streaming-twitter" % "1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction") .exclude("org.eclipse.jetty.orbit","javax.servlet") .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish") .exclude("org.eclipse.jetty.orbit","javax.activation") .exclude("com.esotericsoftware.minlog", "minlog"), ("net.debasishg" % "redisclient_2.10" % "2.12").exclude("com.typesafe.akka","akka-actor_2.10")) mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first case PathList("org", "apache", xs @ _*) => MergeStrategy.first case PathList("org", "apache", xs @ _*) => MergeStrategy.first case "application.conf" => MergeStrategy.concat case "unwanted.txt" => MergeStrategy.discard case x => old(x) } } resolvers += "Akka Repository" at "http://repo.akka.io/releases/" And I run as mentioned below. LOCALLY : 1) sbt 'run AP1z4IYraYm5fqWhITWArY53x Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014' If you want to submit on the cluster CLUSTER: 2) spark-submit --class FifaSparkStreaming --master "spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014 Hope this helps. Thanks, Shrikar On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <raoshiv...@gmail.com> wrote: > Hello Michael, > > I have a quick question for you. Can you clarify the statement " build > fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's > and everything needed to run a Job". Can you give an example. > > I am using sbt assembly as well to create a fat jar, and supplying the > spark and hadoop locations in the class path. Inside the main() function > where spark context is created, I use SparkContext.jarOfClass(this).toList > add the fat jar to my spark context. However, I seem to be running into > issues with this approach. I was wondering if you had any inputs Michael. > > Thanks, > Shivani > > > On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com> > wrote: > >> We use maven for building our code and then invoke spark-submit through >> the exec plugin, passing in our parameters. Works well for us. >> >> Best Regards, >> Sonal >> Nube Technologies <http://www.nubetech.co> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> >> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> >> wrote: >> >>> P.S. Last but not least we use sbt-assembly to build fat JAR's and build >>> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed >>> to run a Job. These are automatically built from source by our Jenkins and >>> stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ >>> direct from HDFS, unpack it and launch the appropriate script. >>> >>> Makes for a much cleaner development / testing / deployment to package >>> everything required in one go instead of relying on cluster specific >>> classpath additions or any add-jars functionality. >>> >>> >>> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: >>> >>>> When you start seriously using Spark in production there are basically >>>> two things everyone eventually needs: >>>> >>>> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. >>>> 2. Always-On Jobs - that require monitoring, restarting etc. >>>> >>>> There are lots of ways to implement these requirements, everything from >>>> crontab through to workflow managers like Oozie. >>>> >>>> We opted for the following stack: >>>> >>>> - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution) >>>> >>>> >>>> - Marathon <https://github.com/mesosphere/marathon> - init/control >>>> system for starting, stopping, and maintaining always-on applications. >>>> >>>> >>>> - Chronos <http://airbnb.github.io/chronos/> - general-purpose >>>> scheduler for Mesos, supports job dependency graphs. >>>> >>>> >>>> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> - >>>> primarily for it's ability to reuse shared contexts with multiple jobs >>>> >>>> The majority of our jobs are periodic (batch) jobs run through >>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run >>>> through spark-submit). >>>> >>>> We always use "client mode" with spark-submit because the Mesos cluster >>>> has direct connectivity to the Spark cluster and it means all the Spark >>>> stdout/stderr is externalised into Mesos logs which helps diagnosing >>>> problems. >>>> >>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run >>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you >>>> can be up and running in literally minutes. The Web UI's to both make it >>>> easy to get started without talking to REST API's etc. >>>> >>>> Best, >>>> >>>> Michael >>>> >>>> >>>> >>>> >>>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: >>>> >>>>> I use SBT, create an assembly, and then add the assembly jars when I >>>>> create my spark context. The main executor I run with something like "java >>>>> -cp ... MyDriver". >>>>> >>>>> That said - as of spark 1.0 the preferred way to run spark >>>>> applications is via spark-submit - >>>>> http://spark.apache.org/docs/latest/submitting-applications.html >>>>> >>>>> >>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >>>>> >>>>>> I want to ask this, not because I can't read endless documentation and >>>>>> several tutorials, but because there seems to be many ways of doing >>>>>> things >>>>>> and I keep having issues. How do you run /your /spark app? >>>>>> >>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then >>>>>> I had >>>>>> to get Spark and Shark working and ended upgrading everything and >>>>>> dropped >>>>>> CDH support. Anyways, this is what I used with master=yarn-client and >>>>>> app_jar being Scala code compiled with Maven. >>>>>> >>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>>>>> $CLASSNAME >>>>>> $ARGS >>>>>> >>>>>> Do you use this? or something else? I could never figure out this >>>>>> method. >>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>>>>> >>>>>> For example: >>>>>> bin/spark-class jar >>>>>> >>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>>>>> pi 10 10 >>>>>> >>>>>> Do you use SBT or Maven to compile? or something else? >>>>>> >>>>>> >>>>>> ** It seams that I can't get subscribed to the mailing list and I >>>>>> tried both >>>>>> my work email and personal. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>> >>>>> >>>> >>> >> > > >