I agree with Jon.  We do it with the distributed cache.  The GeoIP files that 
we use are updated monthly.  So it makes more sense to put it in the cache than 
recompile Pig monthly.



Will Duckworth  Senior Vice President, Software Engineering  | comScore, 
Inc.(NASDAQ:SCOR)
o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:[email protected]
.....................................................................................................

Introducing Mobile Metrix 2.0 - The next generation of mobile behavioral 
measurement
www.comscore.com/MobileMetrix
-----Original Message-----
From: Jonathan Coveney [mailto:[email protected]]
Sent: Tuesday, August 28, 2012 1:53 PM
To: [email protected]
Subject: Re: Add file command in Pig

Using the distributed cache is more ideal, IMHO. The UDF that uses it can just 
add it to the distributed cache (should be in 9 and 10, I can check if you 
like).

If you want to include it with pig, then you have to include it in the Pig jar, 
and then you can call it from the Pig script. It's a little tricky but doable. 
A bit of a hack.

2012/8/28 Haitao Yao <[email protected]>

> hi, all
>         I want to add GeoIP.dat to my pig scripts. Does Pig have the
> "add file XXX" command like hive? I want to distribute the data file
> GeoIP.dat with Pig.
>         Or is there any other work around?
>         I don't want to install GeoIP on every hadoop node, so I want
> to distribute the data file with pig itself.
>
>         thanks.
>
>
>
> Haitao Yao
> [email protected]
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>
>

Reply via email to