This is interesting - so one need not explicitly tell Pig to REGISTER
a jar on classpath for distributed-cache loading?

On Thu, Nov 8, 2012 at 3:46 AM, Eduardo Afonso Ferreira
<[email protected]> wrote:
> Hi there,
>
> Just adding to the option you described, if you put your extra jars in the 
> lib directory at the same level of your workflow, you don't need to use 
> REGISTER on your Pig script since Oozie will include all your jars to the 
> classpath it uses to run Pig.
>
> You can also put the jars in a separate directory on HDFS and refer to that 
> with the property "oozie.libpath". For example, your job.properties can have 
> something like follows (besides NameNode, JobTracker and other properties you 
> may need):
>
> ......
> oozie.wf.application.path=hdfs://localhost:8020/user/${user.name}/your_path/your_app
> oozie.libpath=/user/${user.name}/your_path/common_libs
> ......
>
> You can write your Pig script with no need to REGISTER the jars you need and 
> added to the common_libs directory.
>
> If you submit your workflow as a user named "awesome", you should have your 
> whole directory structure pushed to HDFS under /user/awesome/ and you're good 
> to go.
>
>
> Eduardo.
>
>
>
> ________________________________
>  From: Harsh J <[email protected]>
> To: [email protected]
> Sent: Wednesday, November 7, 2012 2:52 PM
> Subject: Re: Pig action, REGISTER and additional jars
>
> Grant,
>
> Globbing is supported by Pig (for pig.additional.jars) only for 
> LocalFileSystem.
>
> The <file>'s pre-# component can be any arbitrary HDFS path though,
> but not the argument to pig.additional.jars (these are picked up from
> resources such as uber-jars or local file systems only).
>
> On Thu, Nov 8, 2012 at 1:12 AM, Grant Ingersoll <[email protected]> wrote:
>>
>> On Nov 7, 2012, at 12:51 PM, Harsh J wrote:
>>
>>> Hi Grant,
>>>
>>> You can leverage the <argument> feature of the Pig action, in tandem
>>> with the distributed-cache-using <file> element to do this I think
>>> (over pig action schema 0.2).
>>>
>>> If you add after your <script>, the following:
>>>
>>> <argument>-Dpig.additional.jars=jar1.jar:jar2.jar</argument>
>>>
>>> And then in the outer section, add:
>>>
>>> <file>lib/jar1.jar#jar1.jar</file>
>>> <file>lib/jar2.jar#jar2.jar</file>
>>>
>>> (Assuming your WF has a lib/ directory with jar1.jar and jar2.jar in it)
>>>
>>> Then Oozie will load these jars onto distributed cache, and symlink
>>> them (during runtime) to the task working directory (sorta like a pwd
>>> for the task). Hence, your Pig will "see" these files locally and
>>> utilize them properly for the "pig.additional.jars" feature.
>>>
>>> Does this work for you?
>>
>> I'll give it a try.
>>
>> Is an HDFS path and glob OK?
>>
>>
>>>
>>> On Wed, Nov 7, 2012 at 10:54 PM, Grant Ingersoll <[email protected]> 
>>> wrote:
>>>> Hi,
>>>>
>>>> I was wondering how Oozie deals with additional JARs one needs for Pig 
>>>> files.  Currently, I have a REGISTER statement in Pig that points at the 
>>>> location of the libs, but I'd like to get away from that and use Pig's 
>>>> additional.jars mechanism, but I don't see support for that in the Oozie 
>>>> spec for the Pig action.
>>>>
>>>> Is this possible?  I'm on 3.2-SNAPSHOT.
>>>>
>>>> Thanks,
>>>> Grant
>>>>
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidworks.com
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidworks.com
>>
>>
>>
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

Reply via email to