Matt,

So dont ship the GeoIP database with the jar?  Does your mapper then cache the 
locations.csv?  Would you mind sending me your UDF?  That sounds like an 
interesting solution but I don't really understand how you would do that.  I 
was under the impression the fastest way to do it would be to ship and cache 
the binary database instead of calling from the HDFS.  

-Ross

----- Original Message -----
From: "Matt Davies" <m...@mattdavies.net>
To: user@pig.apache.org, "Ross" <rjnord...@semesteratsea.net>
Sent: Monday, July 11, 2011 12:34:38 PM GMT -08:00 US/Canada Pacific
Subject: Re: GeoIP database lookups

We wrote a snazzy UDF that does 1 initialization per mapper and does all the
necessary conversions. Quite efficient and fast.

The trick to maintainability is to have your UDF initialize the
locations.csv from HDFS and not to include the csv file within your jar.
 That way you can easily update the locations without recompiling.

-Matt

On Mon, Jul 11, 2011 at 12:57 PM, Ross Nordeen <rjnor...@mtu.edu> wrote:

> Hello all,
>
> Is there an accepted way to use the GeoIP database with pig?
>
> I've found some people have tried to write UDF's with their java api.
> http://www.maxmind.com/java
>
> Others say to use the streaming interface within pig and run the queries
> through a perl script.
>
> http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/#comments
>
> I'm just trying to find the most efficient way to run this.  any ideas?
>
> -Ross
>

Reply via email to