Hi Dmitriy, Well, what I have is still experimental & not in any product. But, yes we can compile to a Pig script. I try to use the native relational operators where possible & use UDFs in other cases.
I don't understand which conflicts you are referring to. Initially, I was trying to create a single jar (containing all the 300 dependencies) using the maven-dependency-plugin (BTW that seems to be the recommended approach & should work in many cases) but it turned out that some of our internal components had conflicting file names for some of the resources (should probably be fixed!). My current approach works better because I don't try to re-package any dependency. Yes, startup times are slow - of course, I am open to other ideas :-) -----Original Message----- From: Dmitriy Ryaboy [mailto:[email protected]] Sent: 21 January 2011 07:57 To: [email protected] Subject: Re: Managing pig script jar dependencies Sanjay, Informatica compiles to Pig now, eh? Interesting... How do you handle jar conflicts if you bundle the whole lot? Doesn't this cost you a lot on job startup time? Dmitriy On Thu, Jan 20, 2011 at 5:41 PM, Kaluskar, Sanjay <[email protected] > wrote: > I have a similar problem and I can tell you what I am doing currently, > just in case it is useful. I have a tool that generates PIG scripts > from some other representation (Informatica mappings), and in many > cases the scripts also call UDFs that depend on about 300 jars & 580 > native libraries. Additionally, I generate a jar for each PIG script > that contains the UDFs called from that script. I add the latter jar > in the script in a register statement. But registering the 300 jars > that the UDFs depend on individually is error prone & tedious; so I > have automated that part. I have a top-level jar that includes all the > 300 jars on its Class-path in the MANIFEST.MF and I add this top-level > jar to the classpath. I generate that (top-level jar) using maven's > assembly plugin. I also generate a zip of everything (jars, native > libs) using maven's assembly plugin and use dist cache to distribute > it and add the native libs to the LD_LIBRARY_PATH. > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[email protected]] > Sent: 21 January 2011 05:57 > To: [email protected] > Subject: Re: Managing pig script jar dependencies > > This is becoming a bigger problem for us as well, as use of Pig > becomes more varied across the company. > Would love some to hear what others have found to work for them. > > D > > On Wed, Jan 19, 2011 at 2:24 PM, Geoffrey Gallaway > <[email protected]>wrote: > > > I'm looking for some suggestions and ideas for how to handle JAR > > dependencies in a production environment. > > > > Most of the pig scripts I write require multiple JAR files. For > > instance, I have a pig script that processes some data through a > > Solr instance which requires my Solr UDF and some solr, lucene and > > apache commons jars. These pig scripts are stored in a git repo and > > that git repo is deployed to our production cluster. Obviously we > > don't want to > > > store the jars in git; I'd rather store them in our mvn repo with > > the rest of the jars the company uses. > > > > The plan is to have a maven pom.xml for each pig script that defines > > which jars that pig script depends on. A shell script will then call > > "mvn dependency:copy-dependencies -DoutputDirectory=pig-jars" before > > calling the actual pig command to run the script. Given that, I'm > > trying to figure out the best solution to a few questions. > > > > * For development I'd like to store the pig jar (pig-0.7.0-core.jar) > > in maven but there is no pom.xml for that jar (easily fixed) and > > that jar contains all the java prerequisites (javax.servlet, apache > > commons, etc) which seem to be making maven unhappy when I try to > > import it into the maven company repo. Is there a pig-only jar? > > > > * What do other people use to deploy their code to various systems? > > Check in jars with the code? Keep jars in a separate, network-based > > directory? > > > > Geoff > > -- > > Sent from my email client. > > >
