Re: How do you run your spark app?

Shivani Rao Fri, 20 Jun 2014 15:48:11 -0700

Hello Shrikar,

Thanks for your email. I have been using the same workflow as you did. But
my questions was related to creation of the sparkContext. My question was


If I am specifying jars in the "java -cp <jar-paths>", and adding to them
to my build.sbt, do I need to additionally add them in my code while
creating the sparkContext (sparkContext.setJars(" "))??


Thanks,
Shivani


On Fri, Jun 20, 2014 at 11:03 AM, Shrikar archak <shrika...@gmail.com>
wrote:

> Hi Shivani,
>
> I use sbt assembly to create a fat jar .
> https://github.com/sbt/sbt-assembly
>
> Example of the sbt file is below.
>
> import AssemblyKeys._ // put this at the top of the file
>
> assemblySettings
>
> mainClass in assembly := Some("FifaSparkStreaming")
>
>  name := "FifaSparkStreaming"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
> libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" %
> "provided",
>                             "org.apache.spark" %% "spark-streaming" %
> "1.0.0" % "provided",
>                             ("org.apache.spark" %%
> "spark-streaming-twitter" %
> "1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction")
>
>              .exclude("org.eclipse.jetty.orbit","javax.servlet")
>
>              .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish")
>
>              .exclude("org.eclipse.jetty.orbit","javax.activation")
>
>              .exclude("com.esotericsoftware.minlog", "minlog"),
>                             ("net.debasishg" % "redisclient_2.10" %
> "2.12").exclude("com.typesafe.akka","akka-actor_2.10"))
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
>     case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
>     case PathList("org", "apache", xs @ _*) => MergeStrategy.first
>     case PathList("org", "apache", xs @ _*) => MergeStrategy.first
>     case "application.conf" => MergeStrategy.concat
>     case "unwanted.txt"     => MergeStrategy.discard
>     case x => old(x)
>   }
> }
>
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>
>
> And I run as mentioned below.
>
> LOCALLY :
> 1)  sbt 'run AP1z4IYraYm5fqWhITWArY53x
> Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
> 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
> Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014'
>
> If you want to submit on the cluster
>
> CLUSTER:
> 2) spark-submit --class FifaSparkStreaming --master
> "spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster
> FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x
> Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
> 115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
> Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014
>
>
> Hope this helps.
>
> Thanks,
> Shrikar
>
>
> On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <raoshiv...@gmail.com> wrote:
>
>> Hello Michael,
>>
>> I have a quick question for you. Can you clarify the statement " build
>> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's
>> and everything needed to run a Job".  Can you give an example.
>>
>> I am using sbt assembly as well to create a fat jar, and supplying the
>> spark and hadoop locations in the class path. Inside the main() function
>> where spark context is created, I use SparkContext.jarOfClass(this).toList
>> add the fat jar to my spark context. However, I seem to be running into
>> issues with this approach. I was wondering if you had any inputs Michael.
>>
>> Thanks,
>> Shivani
>>
>>
>> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoy...@gmail.com>
>> wrote:
>>
>>> We use maven for building our code and then invoke spark-submit through
>>> the exec plugin, passing in our parameters. Works well for us.
>>>
>>> Best Regards,
>>> Sonal
>>> Nube Technologies <http://www.nubetech.co>
>>>
>>> <http://in.linkedin.com/in/sonalgoyal>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <mich...@tumra.com>
>>> wrote:
>>>
>>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and
>>>> build dist-style TAR.GZ packages with launch scripts, JAR's and everything
>>>> needed to run a Job.  These are automatically built from source by our
>>>> Jenkins and stored in HDFS.  Our Chronos/Marathon jobs fetch the latest
>>>> release TAR.GZ direct from HDFS, unpack it and launch the appropriate
>>>> script.
>>>>
>>>> Makes for a much cleaner development / testing / deployment to package
>>>> everything required in one go instead of relying on cluster specific
>>>> classpath additions or any add-jars functionality.
>>>>
>>>>
>>>> On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote:
>>>>
>>>>> When you start seriously using Spark in production there are basically
>>>>> two things everyone eventually needs:
>>>>>
>>>>>    1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>>>>>    2. Always-On Jobs - that require monitoring, restarting etc.
>>>>>
>>>>> There are lots of ways to implement these requirements, everything
>>>>> from crontab through to workflow managers like Oozie.
>>>>>
>>>>> We opted for the following stack:
>>>>>
>>>>>    - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>>>>>
>>>>>
>>>>>    - Marathon <https://github.com/mesosphere/marathon> - init/control
>>>>>    system for starting, stopping, and maintaining always-on applications.
>>>>>
>>>>>
>>>>>    - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>>>>>    scheduler for Mesos, supports job dependency graphs.
>>>>>
>>>>>
>>>>>    - ** Spark Job Server <https://github.com/ooyala/spark-jobserver>
>>>>>    - primarily for it's ability to reuse shared contexts with multiple 
>>>>> jobs
>>>>>
>>>>> The majority of our jobs are periodic (batch) jobs run through
>>>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run
>>>>> through spark-submit).
>>>>>
>>>>> We always use "client mode" with spark-submit because the Mesos
>>>>> cluster has direct connectivity to the Spark cluster and it means all the
>>>>> Spark stdout/stderr is externalised into Mesos logs which helps diagnosing
>>>>> problems.
>>>>>
>>>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
>>>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
>>>>> can be up and running in literally minutes.  The Web UI's to both make it
>>>>> easy to get started without talking to REST API's etc.
>>>>>
>>>>> Best,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote:
>>>>>
>>>>>> I use SBT, create an assembly, and then add the assembly jars when I
>>>>>> create my spark context. The main executor I run with something like 
>>>>>> "java
>>>>>> -cp ... MyDriver".
>>>>>>
>>>>>> That said - as of spark 1.0 the preferred way to run spark
>>>>>> applications is via spark-submit -
>>>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote:
>>>>>>
>>>>>>> I want to ask this, not because I can't read endless documentation
>>>>>>> and
>>>>>>> several tutorials, but because there seems to be many ways of doing
>>>>>>> things
>>>>>>> and I keep having issues. How do you run /your /spark app?
>>>>>>>
>>>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then
>>>>>>> I had
>>>>>>> to get Spark and Shark working and ended upgrading everything and
>>>>>>> dropped
>>>>>>> CDH support. Anyways, this is what I used with master=yarn-client and
>>>>>>> app_jar being Scala code compiled with Maven.
>>>>>>>
>>>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>>>>>> $CLASSNAME
>>>>>>> $ARGS
>>>>>>>
>>>>>>> Do you use this? or something else? I could never figure out this
>>>>>>> method.
>>>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>>>>>
>>>>>>> For example:
>>>>>>> bin/spark-class jar
>>>>>>>
>>>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>>>>>> pi 10 10
>>>>>>>
>>>>>>> Do you use SBT or Maven to compile? or something else?
>>>>>>>
>>>>>>>
>>>>>>> ** It seams that I can't get subscribed to the mailing list and I
>>>>>>> tried both
>>>>>>> my work email and personal.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: How do you run your spark app?

Reply via email to