Hi Shivani, Adding JARs to classpath (e.g. via "-cp" option) is needed to run your _local_ Java application, whatever it is. To deliver them to _other machines_ for execution you need to add them to SparkContext. And you can do it in 2 different ways:
1. Add them right from your code (your suggested "sparkContext.setJars(...)"). 2. Use "spark-submit" and pass JARs from command line. Note, that both options are easier to do if you assemble your code and all its dependencies into a single "fat" JAR instead of manually listing all needed libraries. On Sat, Jun 21, 2014 at 1:47 AM, Shivani Rao <raoshiv...@gmail.com> wrote: > Hello Shrikar, > > Thanks for your email. I have been using the same workflow as you did. But > my questions was related to creation of the sparkContext. My question was > > If I am specifying jars in the "java -cp <jar-paths>", and adding to them > to my build.sbt, do I need to additionally add them in my code while > creating the sparkContext (sparkContext.setJars(" "))?? > > > Thanks, > Shivani > > > On Fri, Jun 20, 2014 at 11:03 AM, Shrikar archak <shrika...@gmail.com> > wrote: > >> Hi Shivani, >> >> I use sbt assembly to create a fat jar . >> https://github.com/sbt/sbt-assembly >> >> Example of the sbt file is below. >> >> import AssemblyKeys._ // put this at the top of the file >> >> assemblySettings >> >> mainClass in assembly := Some("FifaSparkStreaming") >> >> name := "FifaSparkStreaming" >> >> version := "1.0" >> >> scalaVersion := "2.10.4" >> >> libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" >> % "provided", >> "org.apache.spark" %% "spark-streaming" % >> "1.0.0" % "provided", >> ("org.apache.spark" %% >> "spark-streaming-twitter" % >> "1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction") >> >> .exclude("org.eclipse.jetty.orbit","javax.servlet") >> >> .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish") >> >> .exclude("org.eclipse.jetty.orbit","javax.activation") >> >> .exclude("com.esotericsoftware.minlog", "minlog"), >> ("net.debasishg" % "redisclient_2.10" % >> "2.12").exclude("com.typesafe.akka","akka-actor_2.10")) >> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => >> { >> case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first >> case PathList("org", "apache", xs @ _*) => MergeStrategy.first >> case PathList("org", "apache", xs @ _*) => MergeStrategy.first >> case "application.conf" => MergeStrategy.concat >> case "unwanted.txt" => MergeStrategy.discard >> case x => old(x) >> } >> } >> >> >> resolvers += "Akka Repository" at "http://repo.akka.io/releases/" >> >> >> And I run as mentioned below. >> >> LOCALLY : >> 1) sbt 'run AP1z4IYraYm5fqWhITWArY53x >> Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 >> 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN >> Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014' >> >> If you want to submit on the cluster >> >> CLUSTER: >> 2) spark-submit --class FifaSparkStreaming --master >> "spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster >> FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x >> Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6 >> 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN >> Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014 >> >> >> Hope this helps. >> >> Thanks, >> Shrikar >> >> >> On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <raoshiv...@gmail.com> >> wrote: >> >>> Hello Michael, >>> >>> I have a quick question for you. Can you clarify the statement " build >>> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's >>> and everything needed to run a Job". Can you give an example. >>> >>> I am using sbt assembly as well to create a fat jar, and supplying the >>> spark and hadoop locations in the class path. Inside the main() function >>> where spark context is created, I use SparkContext.jarOfClass(this).toList >>> add the fat jar to my spark context. However, I seem to be running into >>> issues with this approach. I was wondering if you had any inputs Michael. >>> >>> Thanks, >>> Shivani >>> >>> >>> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com> >>> wrote: >>> >>>> We use maven for building our code and then invoke spark-submit through >>>> the exec plugin, passing in our parameters. Works well for us. >>>> >>>> Best Regards, >>>> Sonal >>>> Nube Technologies <http://www.nubetech.co> >>>> >>>> <http://in.linkedin.com/in/sonalgoyal> >>>> >>>> >>>> >>>> >>>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com> >>>> wrote: >>>> >>>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and >>>>> build dist-style TAR.GZ packages with launch scripts, JAR's and everything >>>>> needed to run a Job. These are automatically built from source by our >>>>> Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest >>>>> release TAR.GZ direct from HDFS, unpack it and launch the appropriate >>>>> script. >>>>> >>>>> Makes for a much cleaner development / testing / deployment to package >>>>> everything required in one go instead of relying on cluster specific >>>>> classpath additions or any add-jars functionality. >>>>> >>>>> >>>>> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote: >>>>> >>>>>> When you start seriously using Spark in production there are >>>>>> basically two things everyone eventually needs: >>>>>> >>>>>> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. >>>>>> 2. Always-On Jobs - that require monitoring, restarting etc. >>>>>> >>>>>> There are lots of ways to implement these requirements, everything >>>>>> from crontab through to workflow managers like Oozie. >>>>>> >>>>>> We opted for the following stack: >>>>>> >>>>>> - Apache Mesos <http://mesosphere.io/> (mesosphere.io >>>>>> distribution) >>>>>> >>>>>> >>>>>> - Marathon <https://github.com/mesosphere/marathon> - >>>>>> init/control system for starting, stopping, and maintaining always-on >>>>>> applications. >>>>>> >>>>>> >>>>>> - Chronos <http://airbnb.github.io/chronos/> - general-purpose >>>>>> scheduler for Mesos, supports job dependency graphs. >>>>>> >>>>>> >>>>>> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> >>>>>> - primarily for it's ability to reuse shared contexts with multiple >>>>>> jobs >>>>>> >>>>>> The majority of our jobs are periodic (batch) jobs run through >>>>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run >>>>>> through spark-submit). >>>>>> >>>>>> We always use "client mode" with spark-submit because the Mesos >>>>>> cluster has direct connectivity to the Spark cluster and it means all the >>>>>> Spark stdout/stderr is externalised into Mesos logs which helps >>>>>> diagnosing >>>>>> problems. >>>>>> >>>>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to >>>>>> run Spark and manage your Jobs, the Mesosphere tutorials are awesome and >>>>>> you can be up and running in literally minutes. The Web UI's to both >>>>>> make >>>>>> it easy to get started without talking to REST API's etc. >>>>>> >>>>>> Best, >>>>>> >>>>>> Michael >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote: >>>>>> >>>>>>> I use SBT, create an assembly, and then add the assembly jars when I >>>>>>> create my spark context. The main executor I run with something like >>>>>>> "java >>>>>>> -cp ... MyDriver". >>>>>>> >>>>>>> That said - as of spark 1.0 the preferred way to run spark >>>>>>> applications is via spark-submit - >>>>>>> http://spark.apache.org/docs/latest/submitting-applications.html >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote: >>>>>>> >>>>>>>> I want to ask this, not because I can't read endless documentation >>>>>>>> and >>>>>>>> several tutorials, but because there seems to be many ways of doing >>>>>>>> things >>>>>>>> and I keep having issues. How do you run /your /spark app? >>>>>>>> >>>>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), >>>>>>>> then I had >>>>>>>> to get Spark and Shark working and ended upgrading everything and >>>>>>>> dropped >>>>>>>> CDH support. Anyways, this is what I used with master=yarn-client >>>>>>>> and >>>>>>>> app_jar being Scala code compiled with Maven. >>>>>>>> >>>>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER >>>>>>>> $CLASSNAME >>>>>>>> $ARGS >>>>>>>> >>>>>>>> Do you use this? or something else? I could never figure out this >>>>>>>> method. >>>>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS >>>>>>>> >>>>>>>> For example: >>>>>>>> bin/spark-class jar >>>>>>>> >>>>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar >>>>>>>> pi 10 10 >>>>>>>> >>>>>>>> Do you use SBT or Maven to compile? or something else? >>>>>>>> >>>>>>>> >>>>>>>> ** It seams that I can't get subscribed to the mailing list and I >>>>>>>> tried both >>>>>>>> my work email and personal. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> >> > > > -- > Software Engineer > Analytics Engineering Team@ Box > Mountain View, CA >