UDFs should have API for transparently opening and reading files from HDFS or 
from local file system with only relative path
----------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-756
                 URL: https://issues.apache.org/jira/browse/PIG-756
             Project: Pig
          Issue Type: Bug
            Reporter: David Ciemiewicz


I have a utility function util.INSETFROMFILE() that I pass a file name during 
initialization.

{code}
define inQuerySet util.INSETFROMFILE(analysis/queries);
A = load 'logs' using PigStorage() as ( date int, query chararray );
B = filter A by inQuerySet(query);
{code}

This provides a computationally inexpensive way to effect map-side joins for 
small sets plus functions of this style provide the ability to encapsulate more 
complex matching rules.

For rapid development and debugging purposes, I want this code to run without 
modification on both my local file system when I do pig -exectype local and on 
HDFS.

Pig needs to provide an API for UDFs which allow them to either:

1) "know"  when they are in local or HDFS mode and let them open and read from 
files as appropriate
2) just provide a file name and read statements and have pig transparently 
manage local or HDFS opens and reads for the UDF

UDFs need to read configuration information off the filesystem and it 
simplifies the process if one can just flip the switch of -exectype local.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to