Re: How to distribute dependent files (.so , jar ) across spark worker nodes

Tristan Nixon Mon, 14 Mar 2016 15:59:20 -0700

I see - so you want the dependencies pre-installed on the cluster nodes so they 
do not need to be submitted along with the job jar?


Where are you planning on deploying/running spark? Do you have your own cluster 
or are you using AWS/other IaaS/PaaS provider?

Somehow you’ll need to get the dependencies onto each node and add them to 
Spark’s classpaths. You could modify an existing VM image or use chef to 
distribute the jars and update the class-paths.

> On Mar 14, 2016, at 5:26 PM, prateek arora <[email protected]> wrote:
> 
> Hi
> 
> I do not want create single jar that contains all the other dependencies .  
> because it will increase the size of my spark job jar . 
> so i want to copy all libraries in cluster using some automation process . 
> just like currently i am using chef .
> but i am not sure is it a right method or not ?
> 
> 
> Regards
> Prateek
> 
> 
> On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <[email protected] 
> <mailto:[email protected]>> wrote:
> Have you tried setting the configuration
> `spark.executor.extraLibraryPath` to point to a location where your
> .so's are available? (Not sure if non-local files, such as HDFS, are
> supported)
> 
> On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon <[email protected] 
> <mailto:[email protected]>> wrote:
> > What build system are you using to compile your code?
> > If you use a dependency management system like maven or sbt, then you 
> > should be able to instruct it to build a single jar that contains all the 
> > other dependencies, including third-party jars and .so’s. I am a maven user 
> > myself, and I use the shade plugin for this:
> > https://maven.apache.org/plugins/maven-shade-plugin/ 
> > <https://maven.apache.org/plugins/maven-shade-plugin/>
> >
> > However, if you are using SBT or another dependency manager, someone else 
> > on this list may be able to give you help on that.
> >
> > If you’re not using a dependency manager - well, you should be. Trying to 
> > manage this manually is a pain that you do not want to get in the way of 
> > your project. There are perfectly good tools to do this for you; use them.
> >
> >> On Mar 14, 2016, at 3:56 PM, prateek arora <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >>
> >> Hi
> >>
> >> Thanks for the information .
> >>
> >> but my problem is that if i want to write spark application which depend on
> >> third party libraries like opencv then whats is the best approach to
> >> distribute all .so and jar file of opencv in all cluster ?
> >>
> >> Regards
> >> Prateek
> >>
> >>
> >>
> >> --
> >> View this message in context: 
> >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html
> >>  
> >> <http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html>
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected] 
> >> <mailto:[email protected]>
> >> For additional commands, e-mail: [email protected] 
> >> <mailto:[email protected]>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected] 
> > <mailto:[email protected]>
> > For additional commands, e-mail: [email protected] 
> > <mailto:[email protected]>
> >
>

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

Reply via email to