Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

------------------------------------------------------------------------------
  = Parameter Substitution in Pig =
  
- ==  Motivation ==
+ == Motivation ==
  
  This document describes a proposal for implementing parameter substitution in 
pig. This proposal is motivated by multiple requests from users who would like 
to create a template pig script and then use it with different parameters on a 
regular basis. For instance, if you have daily processing that is identical 
every day except the date it needs to process, it would be very convenient to 
put a placeholder for the date and provide the actual value at run time.
  
@@ -29, +29 @@

  
  For this example, pig would expect `date` to be passed from pig command line 
or from a parameter file. The value would be substituted prior to running the 
load statement.
  
- In addition to supplying parameter value, a user can supply a command to 
execute to generate a parameter value. This can be done using `declare` 
statement.
+ In addition to supplying parameter value, a user can supply a command to 
execute to generate a parameter value. This can be done using `declare` 
statement. 
  
  {{{
- declare CMD `generate_date`
+ #declare CMD `generate_date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0>'5';
  .....
@@ -40, +40 @@

  
  For this example, pig would execute `generate_date` command when it 
encounters the `declare` statement and assigns the result (stdout) to parameter 
`CMD`. The value of `CMD` is substituted prior to running the load statement.
  
- A command can take parameters which need to be substituted as well.
+ `declare` statement starts with `#` to indicate that it is part of the 
preprocessor that performs parameter substitution rather than Pig language 
itself. 
+ 
+ `declare` can also be used to define one parameter in terms of others:
  
  {{{
+ #declare param1 ($param2 + $param3)
+ }}}
+ 
+ With exception to string literals that can span multiple lines, for initial 
release, `declare` is a single-line command.
+ 
+ The command specified within `declare` statement can take parameters which 
need to be substituted as well.
+ 
+ {{{
- declare CMD `generate_date $date`
+ #declare CMD `generate_date $date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0>'5';
  .....
@@ -54, +64 @@

  Note that variables passed on the command line must be resolved prior to the 
declare statement. The following sequence would cause an error:
  
  {{{
- declare A `cmd1 $B`
+ #declare A `cmd1 $B`
- declare $B `cmd2`
+ #declare $B `cmd2`
  }}}
  
  Command name itself can be a parameter.
  
  {{{
- declare CMD `$mycmd $date`
+ #declare CMD `$mycmd $date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0>'5';
  .....
@@ -96, +106 @@

  
  Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
  
- The fault parameter values can be specified in a script using `declare 
<param>=<value>` statement:
+ `declare` command takes the highest precedence. Having multiple `declare` 
commands defining the same parameter is an error that results in an error 
message and abort of the processing.
+ 
+ Default parameter values can be specified in a script using `#default <param> 
<value>` statement. This statement is identical to `declare` except that it has 
the lowest precedence meaning that its value is only used if it has not been 
defined before.
  
  {{{
- declare cmd=generate_name
+ #default cmd=generate_name
  }}}
- 
- Default values are only used if parameters is not specified.
- 
- `declare` can also be used to define one parameter in terms of others:
- 
- {{{
- declare param1 ($param2 + $param3)
- }}}
- 
- Note that `param2` and `param3` must be defined prior to this `declare` 
statement.
  
  === Debugging ===
  
@@ -122, +124 @@

  
  A C-style preprocessor will be written to perform parameter substitution. The 
preprocessor will do the following:
  
-  1. Create  an empty `<original name>.substituted` file in the current 
working directory
+  1. Create an empty `<original name>.substituted` file in the current working 
directory
   2. Read parameters from files, command line and populate parameter hash 
using precedence rules describe above.
   3. For each line in the input script
    * if comment or empty line, copy over
@@ -130, +132 @@

     * search the line for variables that need to be replaced and perform 
replacement if needed. Generate an error and abort if replacement is needed but 
the correspondent parameter is not found in the parameter hash.
     * if the param value is enclosed in backticks, run the command and capture 
its stdout. If the command succeeds, store the parameter defined in `declare` 
in the parameter hash with its value set to command's stdout. If the command 
fails, report the error and abort the processing.
     * if declare statement is not a command, store it in the parameter hash.
+   * default line is encountered, the parameter defined is looked up in the 
parameter hash. If the parameter is not found, processing identical to declare 
line is performed; otherwise, the line is skipped.
    * for all other lines
     * search the line for variables that need to be replaced and perform 
replacement if needed. Generate an error and abort if replacement is needed but 
the correspondent parameter is not found in the parameter hash. (Reuse the code 
from the parameter substitution in declare statement.)
     * place the substituted line into the output file.

Reply via email to