[ https://issues.apache.org/jira/browse/PIG-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Hagleitner updated PIG-758: ----------------------------------- Description: As part of the multiquery optimization work there is a need to use absolute paths for load and store operations (because the current directory changes during the execution of the script). In order to do so, we are suggesting a change to the semantics of the location/filename string used in LoadFunc and Slicer/Slice. The proposed change is: * Load locations without a scheme part are expected to be hdfs (mapreduce mode) or local (local mode) paths * Any hdfs or local path will be translated to a fully qualified absolute path before it is handed to either a LoadFunc or Slicer * Any scheme other than "file" or "hdfs" will result in the load path to be passed through to the LoadFunc or Slicer without any modification. Example: If you have a LoadFunc that reads from a database, in the current system the following could be used: {noformat} a = load 'table' using DBLoader(); {noformat} With the proposed changes table would be translated into an hdfs path though ("hdfs://..../table"). Probably not what the DBLoader would want to see. In order to make it work one could use: {noformat} a = load 'sql://table' using DBLoader(); {noformat} Now the DBLoader would see the unchanged string "sql://table". This is an incompatible change, but hopefully not affecting many existing Loaders/Slicers. Since this is needed with the multiquery feature, the behavior can be reverted back by using the "no_multiquery" pig flag. was: As part of the multiquery optimization work there is a need to use absolute paths for load and store operations (because the current directory changes during the execution of the script). In order to do so, we are suggesting a change to the semantics of the location/filename string used in LoadFunc and Slicer/Slice. The proposed change is: * Load locations without a scheme part are expected to be hdfs (mapreduce mode) or local (local mode) paths * Any hdfs or local path will be translated to a fully qualified absolute path before it is handed to either a LoadFunc or Slicer * Any scheme other than "file" or "hdfs" will result in the load path to be passed through to the LoadFunc or Slicer without any modification. Example: If you have a LoadFunc that reads from a database, in the current system the following could be used: {code} a = load 'table' using DBLoader(); {code} With the proposed changes table would be translated into an hdfs path though ("hdfs://..../table"). Probably not what the DBLoader would want to see. In order to make it work one could use: {code} a = load 'sql://table' using DBLoader(); {code} Now the DBLoader would see the unchanged string "sql://table". This is an incompatible change, but hopefully not affecting many existing Loaders/Slicers. Since this is needed with the multiquery feature, the behavior can be reverted back by using the "no_multiquery" pig flag. > Converting load/store locations into fully qualified absolute paths > ------------------------------------------------------------------- > > Key: PIG-758 > URL: https://issues.apache.org/jira/browse/PIG-758 > Project: Pig > Issue Type: Bug > Reporter: Gunther Hagleitner > > As part of the multiquery optimization work there is a need to use absolute > paths for load and store operations (because the current directory changes > during the execution of the script). In order to do so, we are suggesting a > change to the semantics of the location/filename string used in LoadFunc and > Slicer/Slice. > The proposed change is: > * Load locations without a scheme part are expected to be hdfs (mapreduce > mode) or local (local mode) paths > * Any hdfs or local path will be translated to a fully qualified absolute > path before it is handed to either a LoadFunc or Slicer > * Any scheme other than "file" or "hdfs" will result in the load path to > be passed through to the LoadFunc or Slicer without any modification. > Example: > If you have a LoadFunc that reads from a database, in the current system the > following could be used: > {noformat} > a = load 'table' using DBLoader(); > {noformat} > With the proposed changes table would be translated into an hdfs path though > ("hdfs://..../table"). Probably not what the DBLoader would want to see. In > order to make it work one could use: > {noformat} > a = load 'sql://table' using DBLoader(); > {noformat} > Now the DBLoader would see the unchanged string "sql://table". > This is an incompatible change, but hopefully not affecting many existing > Loaders/Slicers. Since this is needed with the multiquery feature, the > behavior can be reverted back by using the "no_multiquery" pig flag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.