Re: Spark as a application library vs infra

Sandy Ryza Sun, 27 Jul 2014 23:59:32 -0700

At Cloudera we recommend bundling your application separately from the
Spark libraries.  The two biggest reasons are:
* No need to modify your application jar when upgrading or applying a patch.
* When running on YARN, the Spark jar can be cached as a YARN local
resource, meaning it doesn't need to be transferred every time.




On Sun, Jul 27, 2014 at 8:52 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Mayur,
>
> I don't know if I exactly understand the context of what you are asking,
> but let me just mention issues I had with deploying.
>
> * As my application is a streaming application, it doesn't read any files
> from disk, so therefore I have no Hadoop/HDFS in place and I there is no
> need for it, either. There should be no dependency on Hadoop or HDFS, since
> you can perfectly run Spark applications without it.
> * I use Mesos and so far I always had the downloaded Spark distribution
> accessible for all machines (e.g., via HTTP) and then added my application
> code by uploading a jar built with `sbt assembly`. As the Spark code itself
> must not be contained in that jar file, I had to add '% "provided"' in the
> sbt file, which in turn prevented me from running the application locally
> from IntelliJ IDEA (it would not find the libraries marked with
> "provided"), I always had to use `sbt run`.
> * When using Mesos, on the Spark slaves the Spark jar is loaded before the
> application jar, and so the log4j file from the Spark jar is used instead
> of my custom one (that is different when running locally), so I had to edit
> that file in the Spark distribution jar to customize logging of my Spark
> nodes.
>
> I wonder if the two latter problems would vanish if the Spark libraries
> were bundled together with the application. (That would be your approach
> #1, I guess.)
>
> Tobias
>

Re: Spark as a application library vs infra

Reply via email to