I have a similar problem and I can tell you what I am doing currently,
just in case it is useful. I have a tool that generates PIG scripts from
some other representation (Informatica mappings), and in many cases the
scripts also call UDFs that depend on about 300 jars & 580 native
libraries. Additionally, I generate a jar for each PIG script that
contains the UDFs called from that script. I add the latter jar in the
script in a register statement. But registering the 300 jars that the
UDFs depend on individually is error prone & tedious; so I have
automated that part. I have a top-level jar that includes all the 300
jars on its Class-path in the MANIFEST.MF and I add this top-level jar
to the classpath. I generate that (top-level jar) using maven's assembly
plugin. I also generate a zip of everything (jars, native libs) using
maven's assembly plugin and use dist cache to distribute it and add the
native libs to the LD_LIBRARY_PATH.

-----Original Message-----
From: Dmitriy Ryaboy [mailto:[email protected]] 
Sent: 21 January 2011 05:57
To: [email protected]
Subject: Re: Managing pig script jar dependencies

This is becoming a bigger problem for us as well, as use of Pig becomes
more varied across the company.
Would love some to hear what others have found to work for them.

D

On Wed, Jan 19, 2011 at 2:24 PM, Geoffrey Gallaway
<[email protected]>wrote:

> I'm looking for some suggestions and ideas for how to handle JAR 
> dependencies in a production environment.
>
> Most of the pig scripts I write require multiple JAR files. For 
> instance, I have a pig script that processes some data through a Solr 
> instance which requires my Solr UDF and some solr, lucene and apache 
> commons jars. These pig scripts are stored in a git repo and that git 
> repo is deployed to our production cluster. Obviously we don't want to

> store the jars in git; I'd rather store them in our mvn repo with the 
> rest of the jars the company uses.
>
> The plan is to have a maven pom.xml for each pig script that defines 
> which jars that pig script depends on. A shell script will then call 
> "mvn dependency:copy-dependencies -DoutputDirectory=pig-jars" before 
> calling the actual pig command to run the script. Given that, I'm 
> trying to figure out the best solution to a few questions.
>
> * For development I'd like to store the pig jar (pig-0.7.0-core.jar) 
> in maven but there is no pom.xml for that jar (easily fixed) and that 
> jar contains all the java prerequisites (javax.servlet, apache 
> commons, etc) which seem to be making maven unhappy when I try to 
> import it into the maven company repo. Is there a pig-only jar?
>
> * What do other people use to deploy their code to various systems? 
> Check in jars with the code? Keep jars in a separate, network-based 
> directory?
>
> Geoff
> --
> Sent from my email client.
>

Reply via email to