Aureliano, It doesn't matter actually. specifying "local" as your spark master only does is It uses the single JVM to run whole application. Making a cluster and then specifying "spark://localhost:7077" runs it on a set of machines. Running spark in lcoal mode will be helpful for debugging purposes but will perform much slower than if you have a cluster of 3-4-n machines. If you do not have a set of machines, you can make your same machine as a slave and start both master and slave on the same machine. Go through Apache Spark home to know more about starting various node. Thx.
On Thu, Jan 2, 2014 at 5:21 PM, Aureliano Buendia <[email protected]>wrote: > How about when developing the spark application, do you use "localhost", > or "spark://localhost:7077" for spark context master during development? > > Using "spark://localhost:7077" is a good way to simulate the production > driver and it provides the web ui. When using "spark://localhost:7077", is > it required to create the fat jar? Wouldn't that significantly slow down > the development cycle? > > > On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi <[email protected]>wrote: > >> It depends how you deploy, I don't find it so complicated... >> >> 1) To build the fat jar I am using maven (as I am not familiar with sbt). >> >> Inside I have something like that, saying which libs should be used in >> the fat jar (the others won't be present in the final artifact). >> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-shade-plugin</artifactId> >> <version>2.1</version> >> <executions> >> <execution> >> <phase>package</phase> >> <goals> >> <goal>shade</goal> >> </goals> >> <configuration> >> <minimizeJar>true</minimizeJar> >> >> <createDependencyReducedPom>false</createDependencyReducedPom> >> <artifactSet> >> <includes> >> <include>org.apache.hbase:*</include> >> <include>org.apache.hadoop:*</include> >> <include>com.typesafe:config</include> >> <include>org.apache.avro:*</include> >> <include>joda-time:*</include> >> <include>org.joda:*</include> >> </includes> >> </artifactSet> >> <filters> >> <filter> >> <artifact>*:*</artifact> >> <excludes> >> <exclude>META-INF/*.SF</exclude> >> <exclude>META-INF/*.DSA</exclude> >> <exclude>META-INF/*.RSA</exclude> >> </excludes> >> </filter> >> </filters> >> </configuration> >> </execution> >> </executions> >> </plugin> >> >> >> 2) The App is the jar you have built, so you ship it to the driver node >> (it depends a lot on how you are planing to use it, debian packaging, a >> plain old scp, etc) to run it you can do something like: >> >> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob >> >> where MyJob is the entry point to your job it defines a main method. >> >> 3) I don't know whats the "common way" but I am doing things this way: >> build the fat jar, provide some launch scripts, make debian packaging, ship >> it to a node that plays the role of the driver, run it over mesos using the >> launch scripts + some conf. >> >> >> 2014/1/2 Aureliano Buendia <[email protected]> >> >>> I wasn't aware of jarOfClass. I wish there was only one good way of >>> deploying in spark, instead of many ambiguous methods. (seems like spark >>> has followed scala in that there are more than one way of accomplishing a >>> job, making scala an overcomplicated language) >>> >>> 1. Should sbt assembly be used to make the fat jar? If so, which sbt >>> should be used? My local sbt or that $SPARK_HOME/sbt/sbt? Why is that spark >>> is shipped with a separate sbt? >>> >>> 2. Let's say we have the dependencies fat jar which is supposed to be >>> shipped to the workers. Now how do we deploy the main app which is supposed >>> to be executed on the driver? Make jar another jar out of it? Does sbt >>> assembly also create that jar? >>> >>> 3. Is calling sc.jarOfClass() the most common way of doing this? I >>> cannot find any example by googling. What's the most common way that people >>> use? >>> >>> >>> >>> On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> This is the list of the jars you use in your job, the driver will send >>>> all those jars to each worker (otherwise the workers won't have the classes >>>> you need in your job). The easy way to go is to build a fat jar with your >>>> code and all the libs you depend on and then use this utility to get the >>>> path: SparkContext.jarOfClass(YourJob.getClass) >>>> >>>> >>>> 2014/1/2 Aureliano Buendia <[email protected]> >>>> >>>>> Hi, >>>>> >>>>> I do not understand why spark context has an option for loading jars >>>>> at runtime. >>>>> >>>>> As an example, consider >>>>> this<https://github.com/apache/incubator-spark/blob/50fd8d98c00f7db6aa34183705c9269098c62486/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala#L36> >>>>> : >>>>> >>>>> object BroadcastTest { >>>>> def main(args: Array[String]) { >>>>> >>>>> >>>>> >>>>> >>>>> val sc = new SparkContext(args(0), "Broadcast Test", >>>>> >>>>> >>>>> >>>>> >>>>> System.getenv("SPARK_HOME"), >>>>> Seq(System.getenv("SPARK_EXAMPLES_JAR"))) >>>>> >>>>> >>>>> >>>>> >>>>> } >>>>> } >>>>> >>>>> >>>>> This is *the* example, or *the* application that we want to run, what >>>>> does SPARK_EXAMPLES_JAR supposed to be? >>>>> In this particular case, the BroadcastTest example is self-contained, why >>>>> would it want to load other unrelated example jars? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Finally, how does this help a real world spark application? >>>>> >>>>> >>>> >>> >> >
