Thank you John,
I have checked the SHIP function, but It's a bit confusing to me. all the
examples I found were related to streaming-through.
And I have figured out a solution by putting the file on hdfs, and specifying
the following options when executing pig:
pig -Dmapred.cache.files=hdfs://host:port/path/to/file#link_name
-Dmapred.create.symlink=yes some.pig
then I can just use `f = open('link_name')` in my jython UDF. since the file is
small, and loaded only once, it works good so far.
On Dec 9, 2012, at 2:18 PM, John Gordon <[email protected]> wrote:
> If you ship the file explicitly, you can use this syntax from there. It will
> pack it with the job jar and make sure it is in the working directory
> wherever the job runs. Be careful of shipping very large files, it is
> probably better to refactor your logic into multiple tiplevel pig statements
> on data loaded from hdfs if you find yourself shipping fixed, very large
> files.
> ________________________________
> From: Young Ng
> Sent: 12/9/2012 12:53 PM
> To: [email protected]
> Subject: How can I load external files within jython UDF?
>
> Hi,
>
> I am trying to load some external resources within my jython udf functions,
> e.g:
>
> @outputSchema(....)
> def test():
> f = open('test.txt.')
> text = f.read()
> f.close()
> return text
>
> I have place the 'test.txt' on both working folder and hdfs, and I got the
> following error:
> IOError: (2, 'No such file or directory', 'test.txt')
>
> I have also try to print out the working path of jython, with os.getcwd(),
> below is what I got:
>
> /home/hduser/tmp/mapred/local/taskTracker/hduser/jobcache/job_201212080111_0007/attempt_201212080111_0007_m_000000_0/work
> ....
>
> I suspect that I can use absolute path within udf, but how can I transfer the
> external resources to
> other hadoop datanodes?
>
>
> Thanks,
> Young Wu
>
>
>
>
>