It depends how you deploy, I don't find it so complicated...
1) To build the fat jar I am using maven (as I am not familiar with sbt).
Inside I have something like that, saying which libs should be used in the
fat jar (the others won't be present in the final artifact).
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>true</minimizeJar>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<include>org.apache.hbase:*</include>
<include>org.apache.hadoop:*</include>
<include>com.typesafe:config</include>
<include>org.apache.avro:*</include>
<include>joda-time:*</include>
<include>org.joda:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
2) The App is the jar you have built, so you ship it to the driver node (it
depends a lot on how you are planing to use it, debian packaging, a plain
old scp, etc) to run it you can do something like:
$SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
where MyJob is the entry point to your job it defines a main method.
3) I don't know whats the "common way" but I am doing things this way:
build the fat jar, provide some launch scripts, make debian packaging, ship
it to a node that plays the role of the driver, run it over mesos using the
launch scripts + some conf.
2014/1/2 Aureliano Buendia <[email protected]>
> I wasn't aware of jarOfClass. I wish there was only one good way of
> deploying in spark, instead of many ambiguous methods. (seems like spark
> has followed scala in that there are more than one way of accomplishing a
> job, making scala an overcomplicated language)
>
> 1. Should sbt assembly be used to make the fat jar? If so, which sbt
> should be used? My local sbt or that $SPARK_HOME/sbt/sbt? Why is that spark
> is shipped with a separate sbt?
>
> 2. Let's say we have the dependencies fat jar which is supposed to be
> shipped to the workers. Now how do we deploy the main app which is supposed
> to be executed on the driver? Make jar another jar out of it? Does sbt
> assembly also create that jar?
>
> 3. Is calling sc.jarOfClass() the most common way of doing this? I cannot
> find any example by googling. What's the most common way that people use?
>
>
>
> On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi <[email protected]>wrote:
>
>> Hi,
>>
>> This is the list of the jars you use in your job, the driver will send
>> all those jars to each worker (otherwise the workers won't have the classes
>> you need in your job). The easy way to go is to build a fat jar with your
>> code and all the libs you depend on and then use this utility to get the
>> path: SparkContext.jarOfClass(YourJob.getClass)
>>
>>
>> 2014/1/2 Aureliano Buendia <[email protected]>
>>
>>> Hi,
>>>
>>> I do not understand why spark context has an option for loading jars at
>>> runtime.
>>>
>>> As an example, consider
>>> this<https://github.com/apache/incubator-spark/blob/50fd8d98c00f7db6aa34183705c9269098c62486/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala#L36>
>>> :
>>>
>>> object BroadcastTest {
>>> def main(args: Array[String]) {
>>>
>>>
>>> val sc = new SparkContext(args(0), "Broadcast Test",
>>>
>>>
>>> System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR")))
>>>
>>>
>>> }
>>> }
>>>
>>>
>>> This is *the* example, or *the* application that we want to run, what does
>>> SPARK_EXAMPLES_JAR supposed to be?
>>> In this particular case, the BroadcastTest example is self-contained, why
>>> would it want to load other unrelated example jars?
>>>
>>>
>>>
>>> Finally, how does this help a real world spark application?
>>>
>>>
>>
>