I'm looking for some suggestions and ideas for how to handle JAR dependencies in a production environment.
Most of the pig scripts I write require multiple JAR files. For instance, I have a pig script that processes some data through a Solr instance which requires my Solr UDF and some solr, lucene and apache commons jars. These pig scripts are stored in a git repo and that git repo is deployed to our production cluster. Obviously we don't want to store the jars in git; I'd rather store them in our mvn repo with the rest of the jars the company uses. The plan is to have a maven pom.xml for each pig script that defines which jars that pig script depends on. A shell script will then call "mvn dependency:copy-dependencies -DoutputDirectory=pig-jars" before calling the actual pig command to run the script. Given that, I'm trying to figure out the best solution to a few questions. * For development I'd like to store the pig jar (pig-0.7.0-core.jar) in maven but there is no pom.xml for that jar (easily fixed) and that jar contains all the java prerequisites (javax.servlet, apache commons, etc) which seem to be making maven unhappy when I try to import it into the maven company repo. Is there a pig-only jar? * What do other people use to deploy their code to various systems? Check in jars with the code? Keep jars in a separate, network-based directory? Geoff -- Sent from my email client.
