[ https://issues.apache.org/jira/browse/PIG-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-48. ------------------------------- Resolution: Fixed This can already be done with custom splits > LoadFunc API is too limiting > ---------------------------- > > Key: PIG-48 > URL: https://issues.apache.org/jira/browse/PIG-48 > Project: Pig > Issue Type: Improvement > Reporter: Sam Pullara > Priority: Minor > > Currently the LoadFunc API assumes that you are pulling data from a Hadoop > filesystem and that PIG will have already found the file and split it. I > would like a lower-level API that hands me the information so I can find the > data and do the split. For instance, this is a very inconvenient way to load > data from an RSS URL: > register /Users/samp/Projects/pigrss/out/getfeed-all.jar > define getFeed com.sampullara.pig.storage.GetFeed(); > URL = LOAD 'url' using PigStorage() as (url); > A = FOREACH URL GENERATE FLATTEN(getFeed(url)); > Where GetFeed is an EvalFunc because there was no way to do this as a > LoadFunc. While we are at we could add the ability to create a literal Tuple > in the PIG language :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.