UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path ----------------------------------------------------------------------------------------------------------------------------
Key: PIG-756 URL: https://issues.apache.org/jira/browse/PIG-756 Project: Pig Issue Type: Bug Reporter: David Ciemiewicz I have a utility function util.INSETFROMFILE() that I pass a file name during initialization. {code} define inQuerySet util.INSETFROMFILE(analysis/queries); A = load 'logs' using PigStorage() as ( date int, query chararray ); B = filter A by inQuerySet(query); {code} This provides a computationally inexpensive way to effect map-side joins for small sets plus functions of this style provide the ability to encapsulate more complex matching rules. For rapid development and debugging purposes, I want this code to run without modification on both my local file system when I do pig -exectype local and on HDFS. Pig needs to provide an API for UDFs which allow them to either: 1) "know" when they are in local or HDFS mode and let them open and read from files as appropriate 2) just provide a file name and read statements and have pig transparently manage local or HDFS opens and reads for the UDF UDFs need to read configuration information off the filesystem and it simplifies the process if one can just flip the switch of -exectype local. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.