This is interesting - so one need not explicitly tell Pig to REGISTER a jar on classpath for distributed-cache loading?
On Thu, Nov 8, 2012 at 3:46 AM, Eduardo Afonso Ferreira <[email protected]> wrote: > Hi there, > > Just adding to the option you described, if you put your extra jars in the > lib directory at the same level of your workflow, you don't need to use > REGISTER on your Pig script since Oozie will include all your jars to the > classpath it uses to run Pig. > > You can also put the jars in a separate directory on HDFS and refer to that > with the property "oozie.libpath". For example, your job.properties can have > something like follows (besides NameNode, JobTracker and other properties you > may need): > > ...... > oozie.wf.application.path=hdfs://localhost:8020/user/${user.name}/your_path/your_app > oozie.libpath=/user/${user.name}/your_path/common_libs > ...... > > You can write your Pig script with no need to REGISTER the jars you need and > added to the common_libs directory. > > If you submit your workflow as a user named "awesome", you should have your > whole directory structure pushed to HDFS under /user/awesome/ and you're good > to go. > > > Eduardo. > > > > ________________________________ > From: Harsh J <[email protected]> > To: [email protected] > Sent: Wednesday, November 7, 2012 2:52 PM > Subject: Re: Pig action, REGISTER and additional jars > > Grant, > > Globbing is supported by Pig (for pig.additional.jars) only for > LocalFileSystem. > > The <file>'s pre-# component can be any arbitrary HDFS path though, > but not the argument to pig.additional.jars (these are picked up from > resources such as uber-jars or local file systems only). > > On Thu, Nov 8, 2012 at 1:12 AM, Grant Ingersoll <[email protected]> wrote: >> >> On Nov 7, 2012, at 12:51 PM, Harsh J wrote: >> >>> Hi Grant, >>> >>> You can leverage the <argument> feature of the Pig action, in tandem >>> with the distributed-cache-using <file> element to do this I think >>> (over pig action schema 0.2). >>> >>> If you add after your <script>, the following: >>> >>> <argument>-Dpig.additional.jars=jar1.jar:jar2.jar</argument> >>> >>> And then in the outer section, add: >>> >>> <file>lib/jar1.jar#jar1.jar</file> >>> <file>lib/jar2.jar#jar2.jar</file> >>> >>> (Assuming your WF has a lib/ directory with jar1.jar and jar2.jar in it) >>> >>> Then Oozie will load these jars onto distributed cache, and symlink >>> them (during runtime) to the task working directory (sorta like a pwd >>> for the task). Hence, your Pig will "see" these files locally and >>> utilize them properly for the "pig.additional.jars" feature. >>> >>> Does this work for you? >> >> I'll give it a try. >> >> Is an HDFS path and glob OK? >> >> >>> >>> On Wed, Nov 7, 2012 at 10:54 PM, Grant Ingersoll <[email protected]> >>> wrote: >>>> Hi, >>>> >>>> I was wondering how Oozie deals with additional JARs one needs for Pig >>>> files. Currently, I have a REGISTER statement in Pig that points at the >>>> location of the libs, but I'd like to get away from that and use Pig's >>>> additional.jars mechanism, but I don't see support for that in the Oozie >>>> spec for the Pig action. >>>> >>>> Is this possible? I'm on 3.2-SNAPSHOT. >>>> >>>> Thanks, >>>> Grant >>>> >>>> -------------------------------------------- >>>> Grant Ingersoll >>>> http://www.lucidworks.com >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >> >> -------------------------------------------- >> Grant Ingersoll >> http://www.lucidworks.com >> >> >> >> > > > > -- > Harsh J -- Harsh J
