Matt, So dont ship the GeoIP database with the jar? Does your mapper then cache the locations.csv? Would you mind sending me your UDF? That sounds like an interesting solution but I don't really understand how you would do that. I was under the impression the fastest way to do it would be to ship and cache the binary database instead of calling from the HDFS.
-Ross ----- Original Message ----- From: "Matt Davies" <m...@mattdavies.net> To: user@pig.apache.org, "Ross" <rjnord...@semesteratsea.net> Sent: Monday, July 11, 2011 12:34:38 PM GMT -08:00 US/Canada Pacific Subject: Re: GeoIP database lookups We wrote a snazzy UDF that does 1 initialization per mapper and does all the necessary conversions. Quite efficient and fast. The trick to maintainability is to have your UDF initialize the locations.csv from HDFS and not to include the csv file within your jar. That way you can easily update the locations without recompiling. -Matt On Mon, Jul 11, 2011 at 12:57 PM, Ross Nordeen <rjnor...@mtu.edu> wrote: > Hello all, > > Is there an accepted way to use the GeoIP database with pig? > > I've found some people have tried to write UDF's with their java api. > http://www.maxmind.com/java > > Others say to use the streaming interface within pig and run the queries > through a perl script. > > http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/#comments > > I'm just trying to find the most efficient way to run this. any ideas? > > -Ross >