This is becoming a bigger problem for us as well, as use of Pig becomes more varied across the company. Would love some to hear what others have found to work for them.
D On Wed, Jan 19, 2011 at 2:24 PM, Geoffrey Gallaway <[email protected]>wrote: > I'm looking for some suggestions and ideas for how to handle JAR > dependencies in a production environment. > > Most of the pig scripts I write require multiple JAR files. For instance, I > have a pig script that processes some data through a Solr instance which > requires my Solr UDF and some solr, lucene and apache commons jars. These > pig scripts are stored in a git repo and that git repo is deployed to our > production cluster. Obviously we don't want to store the jars in git; I'd > rather store them in our mvn repo with the rest of the jars the company > uses. > > The plan is to have a maven pom.xml for each pig script that defines which > jars that pig script depends on. A shell script will then call "mvn > dependency:copy-dependencies -DoutputDirectory=pig-jars" before calling the > actual pig command to run the script. Given that, I'm trying to figure out > the best solution to a few questions. > > * For development I'd like to store the pig jar (pig-0.7.0-core.jar) in > maven but there is no pom.xml for that jar (easily fixed) and that jar > contains all the java prerequisites (javax.servlet, apache commons, etc) > which seem to be making maven unhappy when I try to import it into the > maven > company repo. Is there a pig-only jar? > > * What do other people use to deploy their code to various systems? Check > in > jars with the code? Keep jars in a separate, network-based directory? > > Geoff > -- > Sent from my email client. >
