How about when developing the spark application, do you use "localhost", or "spark://localhost:7077" for spark context master during development?
Using "spark://localhost:7077" is a good way to simulate the production driver and it provides the web ui. When using "spark://localhost:7077", is it required to create the fat jar? Wouldn't that significantly slow down the development cycle? On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi <[email protected]> wrote: > It depends how you deploy, I don't find it so complicated... > > 1) To build the fat jar I am using maven (as I am not familiar with sbt). > > Inside I have something like that, saying which libs should be used in the > fat jar (the others won't be present in the final artifact). > > <plugin> > <groupId>org.apache.maven.plugins</groupId> > <artifactId>maven-shade-plugin</artifactId> > <version>2.1</version> > <executions> > <execution> > <phase>package</phase> > <goals> > <goal>shade</goal> > </goals> > <configuration> > <minimizeJar>true</minimizeJar> > > <createDependencyReducedPom>false</createDependencyReducedPom> > <artifactSet> > <includes> > <include>org.apache.hbase:*</include> > <include>org.apache.hadoop:*</include> > <include>com.typesafe:config</include> > <include>org.apache.avro:*</include> > <include>joda-time:*</include> > <include>org.joda:*</include> > </includes> > </artifactSet> > <filters> > <filter> > <artifact>*:*</artifact> > <excludes> > <exclude>META-INF/*.SF</exclude> > <exclude>META-INF/*.DSA</exclude> > <exclude>META-INF/*.RSA</exclude> > </excludes> > </filter> > </filters> > </configuration> > </execution> > </executions> > </plugin> > > > 2) The App is the jar you have built, so you ship it to the driver node > (it depends a lot on how you are planing to use it, debian packaging, a > plain old scp, etc) to run it you can do something like: > > $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob > > where MyJob is the entry point to your job it defines a main method. > > 3) I don't know whats the "common way" but I am doing things this way: > build the fat jar, provide some launch scripts, make debian packaging, ship > it to a node that plays the role of the driver, run it over mesos using the > launch scripts + some conf. > > > 2014/1/2 Aureliano Buendia <[email protected]> > >> I wasn't aware of jarOfClass. I wish there was only one good way of >> deploying in spark, instead of many ambiguous methods. (seems like spark >> has followed scala in that there are more than one way of accomplishing a >> job, making scala an overcomplicated language) >> >> 1. Should sbt assembly be used to make the fat jar? If so, which sbt >> should be used? My local sbt or that $SPARK_HOME/sbt/sbt? Why is that spark >> is shipped with a separate sbt? >> >> 2. Let's say we have the dependencies fat jar which is supposed to be >> shipped to the workers. Now how do we deploy the main app which is supposed >> to be executed on the driver? Make jar another jar out of it? Does sbt >> assembly also create that jar? >> >> 3. Is calling sc.jarOfClass() the most common way of doing this? I cannot >> find any example by googling. What's the most common way that people use? >> >> >> >> On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi <[email protected]>wrote: >> >>> Hi, >>> >>> This is the list of the jars you use in your job, the driver will send >>> all those jars to each worker (otherwise the workers won't have the classes >>> you need in your job). The easy way to go is to build a fat jar with your >>> code and all the libs you depend on and then use this utility to get the >>> path: SparkContext.jarOfClass(YourJob.getClass) >>> >>> >>> 2014/1/2 Aureliano Buendia <[email protected]> >>> >>>> Hi, >>>> >>>> I do not understand why spark context has an option for loading jars at >>>> runtime. >>>> >>>> As an example, consider >>>> this<https://github.com/apache/incubator-spark/blob/50fd8d98c00f7db6aa34183705c9269098c62486/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala#L36> >>>> : >>>> >>>> object BroadcastTest { >>>> def main(args: Array[String]) { >>>> >>>> >>>> >>>> val sc = new SparkContext(args(0), "Broadcast Test", >>>> >>>> >>>> >>>> System.getenv("SPARK_HOME"), >>>> Seq(System.getenv("SPARK_EXAMPLES_JAR"))) >>>> >>>> >>>> >>>> } >>>> } >>>> >>>> >>>> This is *the* example, or *the* application that we want to run, what does >>>> SPARK_EXAMPLES_JAR supposed to be? >>>> In this particular case, the BroadcastTest example is self-contained, why >>>> would it want to load other unrelated example jars? >>>> >>>> >>>> >>>> >>>> Finally, how does this help a real world spark application? >>>> >>>> >>> >> >
