Sanjay, Informatica compiles to Pig now, eh? Interesting... How do you handle jar conflicts if you bundle the whole lot? Doesn't this cost you a lot on job startup time?
Dmitriy On Thu, Jan 20, 2011 at 5:41 PM, Kaluskar, Sanjay <[email protected] > wrote: > I have a similar problem and I can tell you what I am doing currently, > just in case it is useful. I have a tool that generates PIG scripts from > some other representation (Informatica mappings), and in many cases the > scripts also call UDFs that depend on about 300 jars & 580 native > libraries. Additionally, I generate a jar for each PIG script that > contains the UDFs called from that script. I add the latter jar in the > script in a register statement. But registering the 300 jars that the > UDFs depend on individually is error prone & tedious; so I have > automated that part. I have a top-level jar that includes all the 300 > jars on its Class-path in the MANIFEST.MF and I add this top-level jar > to the classpath. I generate that (top-level jar) using maven's assembly > plugin. I also generate a zip of everything (jars, native libs) using > maven's assembly plugin and use dist cache to distribute it and add the > native libs to the LD_LIBRARY_PATH. > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[email protected]] > Sent: 21 January 2011 05:57 > To: [email protected] > Subject: Re: Managing pig script jar dependencies > > This is becoming a bigger problem for us as well, as use of Pig becomes > more varied across the company. > Would love some to hear what others have found to work for them. > > D > > On Wed, Jan 19, 2011 at 2:24 PM, Geoffrey Gallaway > <[email protected]>wrote: > > > I'm looking for some suggestions and ideas for how to handle JAR > > dependencies in a production environment. > > > > Most of the pig scripts I write require multiple JAR files. For > > instance, I have a pig script that processes some data through a Solr > > instance which requires my Solr UDF and some solr, lucene and apache > > commons jars. These pig scripts are stored in a git repo and that git > > repo is deployed to our production cluster. Obviously we don't want to > > > store the jars in git; I'd rather store them in our mvn repo with the > > rest of the jars the company uses. > > > > The plan is to have a maven pom.xml for each pig script that defines > > which jars that pig script depends on. A shell script will then call > > "mvn dependency:copy-dependencies -DoutputDirectory=pig-jars" before > > calling the actual pig command to run the script. Given that, I'm > > trying to figure out the best solution to a few questions. > > > > * For development I'd like to store the pig jar (pig-0.7.0-core.jar) > > in maven but there is no pom.xml for that jar (easily fixed) and that > > jar contains all the java prerequisites (javax.servlet, apache > > commons, etc) which seem to be making maven unhappy when I try to > > import it into the maven company repo. Is there a pig-only jar? > > > > * What do other people use to deploy their code to various systems? > > Check in jars with the code? Keep jars in a separate, network-based > > directory? > > > > Geoff > > -- > > Sent from my email client. > > >
