[ 
https://issues.apache.org/jira/browse/PIG-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-758:
-----------------------------------

    Description: 
As part of the multiquery optimization work there is a need to use absolute 
paths for load and store operations (because the current directory changes 
during the execution of the script). In order to do so, we are suggesting a 
change to the semantics of the location/filename string used in LoadFunc and 
Slicer/Slice.

The proposed change is:

   * Load locations without a scheme part are expected to be hdfs (mapreduce 
mode) or local (local mode) paths
   * Any hdfs or local path will be translated to a fully qualified absolute 
path before it is handed to either a LoadFunc or Slicer
   * Any scheme other than "file" or "hdfs" will result in the load path to be 
passed through to the LoadFunc or Slicer without any modification.

Example:

If you have a LoadFunc that reads from a database, in the current system the 
following could be used:

{noformat}
a = load 'table' using DBLoader();
{noformat}

With the proposed changes table would be translated into an hdfs path though 
("hdfs://..../table"). Probably not what the DBLoader would want to see. In 
order to make it work one could use:

{noformat}
a = load 'sql://table' using DBLoader();
{noformat}

Now the DBLoader would see the unchanged string "sql://table".

This is an incompatible change, but hopefully not affecting many existing 
Loaders/Slicers. Since this is needed with the multiquery feature, the behavior 
can be reverted back by using the "no_multiquery" pig flag.

  was:
As part of the multiquery optimization work there is a need to use absolute 
paths for load and store operations (because the current directory changes 
during the execution of the script). In order to do so, we are suggesting a 
change to the semantics of the location/filename string used in LoadFunc and 
Slicer/Slice.

The proposed change is:

   * Load locations without a scheme part are expected to be hdfs (mapreduce 
mode) or local (local mode) paths
   * Any hdfs or local path will be translated to a fully qualified absolute 
path before it is handed to either a LoadFunc or Slicer
   * Any scheme other than "file" or "hdfs" will result in the load path to be 
passed through to the LoadFunc or Slicer without any modification.

Example:

If you have a LoadFunc that reads from a database, in the current system the 
following could be used:

{code}
a = load 'table' using DBLoader();
{code}

With the proposed changes table would be translated into an hdfs path though 
("hdfs://..../table"). Probably not what the DBLoader would want to see. In 
order to make it work one could use:

{code}
a = load 'sql://table' using DBLoader();
{code}

Now the DBLoader would see the unchanged string "sql://table".

This is an incompatible change, but hopefully not affecting many existing 
Loaders/Slicers. Since this is needed with the multiquery feature, the behavior 
can be reverted back by using the "no_multiquery" pig flag.


> Converting load/store locations into fully qualified absolute paths
> -------------------------------------------------------------------
>
>                 Key: PIG-758
>                 URL: https://issues.apache.org/jira/browse/PIG-758
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>
> As part of the multiquery optimization work there is a need to use absolute 
> paths for load and store operations (because the current directory changes 
> during the execution of the script). In order to do so, we are suggesting a 
> change to the semantics of the location/filename string used in LoadFunc and 
> Slicer/Slice.
> The proposed change is:
>    * Load locations without a scheme part are expected to be hdfs (mapreduce 
> mode) or local (local mode) paths
>    * Any hdfs or local path will be translated to a fully qualified absolute 
> path before it is handed to either a LoadFunc or Slicer
>    * Any scheme other than "file" or "hdfs" will result in the load path to 
> be passed through to the LoadFunc or Slicer without any modification.
> Example:
> If you have a LoadFunc that reads from a database, in the current system the 
> following could be used:
> {noformat}
> a = load 'table' using DBLoader();
> {noformat}
> With the proposed changes table would be translated into an hdfs path though 
> ("hdfs://..../table"). Probably not what the DBLoader would want to see. In 
> order to make it work one could use:
> {noformat}
> a = load 'sql://table' using DBLoader();
> {noformat}
> Now the DBLoader would see the unchanged string "sql://table".
> This is an incompatible change, but hopefully not affecting many existing 
> Loaders/Slicers. Since this is needed with the multiquery feature, the 
> behavior can be reverted back by using the "no_multiquery" pig flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to