Hello Shrikar, Thanks for your email. I have been using the same workflow as you did. But my questions was related to creation of the sparkContext. My question was
If I am specifying jars in the "java -cp <jar-paths>", and adding to them to my build.sbt, do I need to additionally add them in my code while creating the sparkContext (sparkContext.setJars(" "))?? Thanks, Shivani On Fri, Jun 20, 2014 at 11:03 AM, Shrikar archak <shrika...@gmail.com> wrote: > Hi Shivani, > > I use sbt assembly to create a fat jar . > https://github.com/sbt/sbt-assembly > > Example of the sbt file is below. > > import AssemblyKeys._ // put this at the top of the file > > assemblySettings > > mainClass in assembly := Some("FifaSparkStreaming") > > name := "FifaSparkStreaming" > > version := "1.0" > > scalaVersion := "2.10.4" > > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" % > "provided", > "org.apache.spark" %% "spark-streaming" % > "1.0.0" % "provided", > ("org.apache.spark" %% > "spark-streaming-twitter" % > "1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction") > > .exclude("org.eclipse.jetty.orbit","javax.servlet") > > .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish") > > .exclude("org.eclipse.jetty.orbit","javax.activation") > > .exclude("com.esotericsoftware.minlog", "minlog"), > ("net.debasishg" % "redisclient_2.10" % > "2.12").exclude("com.typesafe.akka","akka-actor_2.10")) > > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => > { > case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first > case PathList("org", "apache", xs @ _*) => MergeStrategy.first > case PathList("org", "apache", xs @ _*) => MergeStrategy.first > case "application.conf" => MergeStrategy.concat > case "unwanted.txt" => MergeStrategy.discard > case x => old(x) > } > } > > > resolvers += "Akka Repository" at "http://repo.akka.io/releases/" > > > And I run as mentioned below. > > LOCALLY : > 1) sbt 'run AP1z4IYraYm5fqWhITWArY53x > Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 > 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN > Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014' > > If you want to submit on the cluster > > CLUSTER: > 2) spark-submit --class FifaSparkStreaming --master > "spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster > FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x > Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 > 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN > Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014 > > > Hope this helps. > > Thanks, > Shrikar > > > On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <raoshiv...@gmail.com> wrote: > >> Hello Michael, >> >> I have a quick question for you. Can you clarify the statement " build >> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's >> and everything needed to run a Job". Can you give an example. >> >> I am using sbt assembly as well to create a fat jar, and supplying the >> spark and hadoop locations in the class path. Inside the main() function >> where spark context is created, I use SparkContext.jarOfClass(this).toList >> add the fat jar to my spark context. However, I seem to be running into >> issues with this approach. I was wondering if you had any inputs Michael. >> >> Thanks, >> Shivani >> >> >> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com> >> wrote: >> >>> We use maven for building our code and then invoke spark-submit through >>> the exec plugin, passing in our parameters. Works well for us. >>> >>> Best Regards, >>> Sonal >>> Nube Technologies <http://www.nubetech.co> >>> >>> <http://in.linkedin.com/in/sonalgoyal> >>> >>> >>> >>> >>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> >>> wrote: >>> >>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and >>>> build dist-style TAR.GZ packages with launch scripts, JAR's and everything >>>> needed to run a Job. These are automatically built from source by our >>>> Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest >>>> release TAR.GZ direct from HDFS, unpack it and launch the appropriate >>>> script. >>>> >>>> Makes for a much cleaner development / testing / deployment to package >>>> everything required in one go instead of relying on cluster specific >>>> classpath additions or any add-jars functionality. >>>> >>>> >>>> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: >>>> >>>>> When you start seriously using Spark in production there are basically >>>>> two things everyone eventually needs: >>>>> >>>>> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. >>>>> 2. Always-On Jobs - that require monitoring, restarting etc. >>>>> >>>>> There are lots of ways to implement these requirements, everything >>>>> from crontab through to workflow managers like Oozie. >>>>> >>>>> We opted for the following stack: >>>>> >>>>> - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution) >>>>> >>>>> >>>>> - Marathon <https://github.com/mesosphere/marathon> - init/control >>>>> system for starting, stopping, and maintaining always-on applications. >>>>> >>>>> >>>>> - Chronos <http://airbnb.github.io/chronos/> - general-purpose >>>>> scheduler for Mesos, supports job dependency graphs. >>>>> >>>>> >>>>> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> >>>>> - primarily for it's ability to reuse shared contexts with multiple >>>>> jobs >>>>> >>>>> The majority of our jobs are periodic (batch) jobs run through >>>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run >>>>> through spark-submit). >>>>> >>>>> We always use "client mode" with spark-submit because the Mesos >>>>> cluster has direct connectivity to the Spark cluster and it means all the >>>>> Spark stdout/stderr is externalised into Mesos logs which helps diagnosing >>>>> problems. >>>>> >>>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run >>>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you >>>>> can be up and running in literally minutes. The Web UI's to both make it >>>>> easy to get started without talking to REST API's etc. >>>>> >>>>> Best, >>>>> >>>>> Michael >>>>> >>>>> >>>>> >>>>> >>>>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: >>>>> >>>>>> I use SBT, create an assembly, and then add the assembly jars when I >>>>>> create my spark context. The main executor I run with something like >>>>>> "java >>>>>> -cp ... MyDriver". >>>>>> >>>>>> That said - as of spark 1.0 the preferred way to run spark >>>>>> applications is via spark-submit - >>>>>> http://spark.apache.org/docs/latest/submitting-applications.html >>>>>> >>>>>> >>>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >>>>>> >>>>>>> I want to ask this, not because I can't read endless documentation >>>>>>> and >>>>>>> several tutorials, but because there seems to be many ways of doing >>>>>>> things >>>>>>> and I keep having issues. How do you run /your /spark app? >>>>>>> >>>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then >>>>>>> I had >>>>>>> to get Spark and Shark working and ended upgrading everything and >>>>>>> dropped >>>>>>> CDH support. Anyways, this is what I used with master=yarn-client and >>>>>>> app_jar being Scala code compiled with Maven. >>>>>>> >>>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>>>>>> $CLASSNAME >>>>>>> $ARGS >>>>>>> >>>>>>> Do you use this? or something else? I could never figure out this >>>>>>> method. >>>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>>>>>> >>>>>>> For example: >>>>>>> bin/spark-class jar >>>>>>> >>>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>>>>>> pi 10 10 >>>>>>> >>>>>>> Do you use SBT or Maven to compile? or something else? >>>>>>> >>>>>>> >>>>>>> ** It seams that I can't get subscribed to the mailing list and I >>>>>>> tried both >>>>>>> my work email and personal. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> > -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA