Thanks for the tip, it looks like this will work. On Thursday, July 24, 2014, Jeremy Karn <jk...@mortardata.com> wrote:
> Hi Russell, > > This might be a bit late, but here's an example of how you can load a file > in python and pass the results back to Pig: > https://github.com/mortarcode/python-files > > It's a Mortar project but the pig script ( > > https://github.com/mortarcode/python-files/blob/master/pigscripts/python-files.pig > ) > and python udf file ( > > https://github.com/mortarcode/python-files/blob/master/udfs/python/python-files.py > ) > should work fine without Mortar as long as you explicitly set the AWS key > parameters in the Pig script and have boto installed. > > This example uses a small file - if you want to read a larger file you'll > need to handle boto/s3 issues with downloading large files or have Python > read directly from hdfs. I've found s3 actually works pretty well though > for small files like this. Reading larger files in Python doesn't work > very well because you have to worry about running out of memory when > passing everything back from Python to Java. > > Jeremy Karn / Lead Developer > MORTAR DATA / 519 277 4391 / www.mortardata.com > > > On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <russell.jur...@gmail.com > <javascript:;>> > wrote: > > > I need to load a file and loop through it during the execution of a > python > > UDF. Is this possible? How? > > > > -- > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > <javascript:;> > > datasyndrome.com > > ᐧ > > > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com