[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-04-09 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  Parameters can be passed via pig command line using `-param param=val` 
construct. Multiple parameters can be specified. If the same parameter is 
specified multiple times, the last value will be used and a warning will be 
generated.
  
  {{{
- pig -param date=\'20080201\'
+ pig -param date=20080201
  }}}
  
   Parameter File 
@@ -56, +56 @@

  {{{
  # my parameters
  
- date = '20080201'
+ date = 20080201
  cmd = `generate_name`
  }}}
  
@@ -95, +95 @@

  
   Value Format 
  
- Value formats are identical regardless of how the parameter is specified and 
can be of two types. First is a sequence of characters enclosed in single or 
double quotes. In this case the unquoted version of the value is used during 
substitution. Quotes within the value can be escaped.
+ Value formats are identical regardless of how the parameter is specified and 
can be of two types. First is a sequence of characters enclosed in single or 
double quotes. In this case the unquoted version of the value is used during 
substitution. Quotes within the value can be escaped. Single word values that 
dont use special characters such as `%` or `=` don't have to be quoted.
  
  {{{
  %declare DESC 'Joe\'s URL'


[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-04-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  
  == Motivation ==
  
- This document describes a proposal for implementing parameter substitution in 
pig. This proposal is motivated by multiple requests from users who would like 
to create a template pig script and then use it with different parameters on a 
regular basis. For instance, if you have daily processing that is identical 
every day except the date it needs to process, it would be very convenient to 
put a placeholder for the date and provide the actual value at run time.
+ This document describes a proposal for implementing parameter substitution in 
pig. This proposal is motivated by multiple requests from users who would like 
to create a  
+ template pig script and then use it with different parameters on a regular 
basis. For instance, if you have daily processing that is identical every day 
except the date it  
+ needs to process, it would be very convenient to put a placeholder for the 
date and provide the actual value at run time.
  
  == Requirements ==
  
@@ -17, +19 @@

  
  == Interface ==
  
- === Parameter Specification ===
+ === Using Parameters ===
  
- Parameters in a pig script will be of the form `$identifier`. 
+ Parameters in a pig script are in the form of `$identifier`. 
  
  {{{
  A = load '/data/mydata/$date';
@@ -27, +29 @@

  .
  }}}
  
- For this example, pig would expect `date` to be passed from pig command line 
or from a parameter file. The value would be substituted prior to running the 
load statement.
+ In this example, the value of the `date` parameter is expected to be passed 
on each invocation of the script and is substituted in before running the pig 
script. An error  
+ is generated if the value for any parameter is not found.
  
- In addition to supplying parameter value, a user can supply a command to 
execute to generate a parameter value. This can be done using `declare` 
statement. 
+ A parameter name have a structure of a standard language identifier: it must 
start with a latter or underscore followed by any number of letters, digits, 
and underscores. The  
+ names are case insensitive. The names can be escaped with `\` in which case 
substitution does not take place.
+ 
+ In the initial version of the software the parameters are only allowed when 
pig script is specified. They are disabled with `-e` switch or in the 
interactive mode. 
+ 
+ === Specifying Parameters ===
+ 
+ Parameter value can be supplied in four different ways.
+ 
+  Command Line 
+ 
+ Parameters can be passed via pig command line using `-param param=val` 
construct. Multiple parameters can be specified. If the same parameter is 
specified multiple  
+ times, the last value will be used and a warning will be generated.
+ 
+ The command line for Example 4 above would look as follows:
+ 
+ {{{
+ pig -param date='20080201'
+ }}}
+ 
+  Parameter File 
+ 
+ Parameters can also be specified in a file that can be passed to pig using 
`-param_file file` construct. Multiple files can be specified. If the same 
parameter is present  
+ multiple times in the file, the last value will be used and a warning will be 
generated. If a parameter present in multiple files, the value from the last 
file will be used  
+ and a warning will be generated.
+ 
+ A parameter file will contain one line per parameter. Empty lines are 
allowed. Perl style (#) comment lines are also allowed. Comments must take a 
full line and `#` must be  
+ the first character on the line. Each parameter line will be of the form: 
`param_name=param_value`. White spaces around `=` are allowed but are 
optional. 
+ 
+ {{{
+ # my parameters
+ 
+ date = '20080201'
+ cmd = `generate_name`
+ }}}
+ 
+ Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
+ 
+  Declare Statement 
+ 
+ `declare` command can be used from within pig script. The use case for this 
is to describe one parameter in terms of other(s).
+ 
+ {{{
+ %declare CMD `$mycmd $date`
+ A = load '/data/mydata/$CMD';
+ B = filter A by $0'5';
+ .
+ }}}
+ 
+ The format is `%declare param value`
+ 
+ `declare` command starts with `%` to indicate that this is a preprocessor 
command that is processed prior to executing pig script. It takes the highest 
precedence. The  
+ scope of parameter value defined via `declare` is all the lines following 
`declare` command until the next `declare` command that defines this parameter 
is encountered.
+ 
+  Default Statement 
+ 
+ `default` command can be used to provide a default value for a parameter. 
This value is used if the parameter has no value defined by any other means. 
(`default` has the  
+ lowest 

[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-04-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  .
  }}}
  
+ In this example, the value of the `date` is expected to be passed on each 
invocation of the script and is substituted before running the pig script. An 
error is generated if the value for any parameter is not found.
- In this example, the value of the `date` parameter is expected to be passed 
on each invocation of the script and is substituted in before running the pig 
script. An error  
- is generated if the value for any parameter is not found.
  
- A parameter name have a structure of a standard language identifier: it must 
start with a latter or underscore followed by any number of letters, digits, 
and underscores. The  
+ A parameter name have a structure of a standard language identifier: it must 
start with a letter or underscore followed by any number of letters, digits, 
and underscores. The names are case insensitive. The names can be escaped with 
`\` in which case substitution does not take place.
- names are case insensitive. The names can be escaped with `\` in which case 
substitution does not take place.
  
  In the initial version of the software the parameters are only allowed when 
pig script is specified. They are disabled with `-e` switch or in the 
interactive mode. 
  
@@ -43, +41 @@

  
   Command Line 
  
- Parameters can be passed via pig command line using `-param param=val` 
construct. Multiple parameters can be specified. If the same parameter is 
specified multiple  
+ Parameters can be passed via pig command line using `-param param=val` 
construct. Multiple parameters can be specified. If the same parameter is 
specified multiple times, the last value will be used and a warning will be 
generated.
- times, the last value will be used and a warning will be generated.
- 
- The command line for Example 4 above would look as follows:
  
  {{{
  pig -param date='20080201'
@@ -54, +49 @@

  
   Parameter File 
  
+ Parameters can also be specified in a file that can be passed to pig using 
`-param_file file` construct. Multiple files can be specified. If the same 
parameter is present multiple times in the file, the last value will be used 
and a warning will be generated. If a parameter present in multiple files, the 
value from the last file will be used and a warning will be generated.
- Parameters can also be specified in a file that can be passed to pig using 
`-param_file file` construct. Multiple files can be specified. If the same 
parameter is present  
- multiple times in the file, the last value will be used and a warning will be 
generated. If a parameter present in multiple files, the value from the last 
file will be used  
- and a warning will be generated.
  
+ A parameter file will contain one line per parameter. Empty lines are 
allowed. Perl style (#) comment lines are also allowed. Comments must take a 
full line and `#` must be the first character on the line. Each parameter line 
will be of the form: `param_name=param_value`. White spaces around `=` are 
allowed but are optional. 
- A parameter file will contain one line per parameter. Empty lines are 
allowed. Perl style (#) comment lines are also allowed. Comments must take a 
full line and `#` must be  
- the first character on the line. Each parameter line will be of the form: 
`param_name=param_value`. White spaces around `=` are allowed but are 
optional. 
  
  {{{
  # my parameters
@@ -68, +60 @@

  cmd = `generate_name`
  }}}
  
- Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
+ Files and command line parameters can be combined with command line 
parameters taking precedence.
  
   Declare Statement 
  
@@ -83, +75 @@

  
  The format is `%declare param value`
  
+ `declare` command starts with `%` to indicate that this is a preprocessor 
command that is processed prior to executing pig script. It takes the highest 
precedence. The scope of parameter value defined via `declare` is all the lines 
following `declare` command until the next `declare` command that defines the 
same parameter is encountered.
- `declare` command starts with `%` to indicate that this is a preprocessor 
command that is processed prior to executing pig script. It takes the highest 
precedence. The  
- scope of parameter value defined via `declare` is all the lines following 
`declare` command until the next `declare` command that defines this parameter 
is encountered.
  
   Default Statement 
  
- `default` command can be used to provide a default value for a parameter. 
This value is used if the parameter has no value defined by any other means. 
(`default` has the  
+ `default` command can be 

[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-02-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  
  === Parameter Specification ===
  
- Parameters in a pig script will be of the form `$identifier`. 
+ Parameters in a pig script will be of the form `%identifier`. 
  
  {{{
  A = load '/data/mydata/$date';


[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-02-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  
  For this example, pig would execute `generate_date` command when it 
encounters the `declare` statement and assigns the result (stdout) to parameter 
`CMD`. The value of `CMD` is substituted prior to running the load statement.
  
- `declare` statement starts with `%` to indicate that it is part of the 
preprocessor that performs parameter substitution rather than Pig language 
itself. 
+ `declare` statement starts with `%` to indicate that it is part of the 
preprocessor that performs parameter substitution rather than Pig language 
itself. The declare statement runs till the end of the line unless the value is 
a literal in which case it can take multiple lines.
  
  `declare` can also be used to define one parameter in terms of others:
  
@@ -106, +106 @@

  
  Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
  
- `declare` command takes the highest precedence. Having multiple `declare` 
commands defining the same parameter is an error that results in an error 
message and abort of the processing.
+ `declare` command takes the highest precedence. The scope of parameter value 
defined via `declare` is all the lines following `declare` command until the 
next `declare` command that defines this parameter is encountered.
  
- Default parameter values can be specified in a script using `%default param 
value` statement. This statement is identical to `declare` except that it has 
the lowest precedence meaning that its value is only used if it has not been 
defined before.
+ Default parameter values can be specified in a script using `%default param 
value` statement. This statement is identical to `declare` except that it has 
the lowest precedence meaning that its value is only used if it has not been 
defined before. Only first `default` statement for a particular parameter is 
meaningful. The rest are warned on and are ignored.
  
  {{{
  %default cmd=generate_name


[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-02-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  
  For this example, parameter `date` is substituted first when `declare` 
statement is encountered. Then `generate_name` command is executed passing 
value of `date` as a parameter to it. Its output (stdout) is assigned to `CMD` 
which is used in the load statement prior to its execution.
  
- Note that the variables passed on the command line must be resolved prior to 
the declare statement. The following sequence would cause an error:
+ Note that variables passed on the command line must be resolved prior to the 
declare statement. The following sequence would cause an error:
  
  {{{
  declare A `cmd1 $B`
@@ -67, +67 @@

  .
  }}}
  
- In this example, parameters `cmd` and `date` are substituted first when 
`declare` statement is encountered. The the resulting command is executed and 
its stdout is placed into the path prior to running the load statement.
+ In this example, parameters `cmd` and `date` are substituted first when 
`declare` statement is encountered. Then the resulting command is executed and 
its stdout is placed into the path prior to running the load statement.
  
- Note that parameter names are case insesitive and $cmd and $CMD means the 
same thing. This is to match the rest of Pig Latin.
+ Note that parameter names are case insensitive and $cmd and $CMD means the 
same thing. This is to match the rest of Pig Latin.
  
  === Parameter Passing ===
  
@@ -94, +94 @@

  cmd = generate_name
  }}}
  
- Files and command line parameters can be combined with command line 
parameters taking precedence over files in case of duplicate parameters.
+ Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
  
  The fault parameter values can be specified in a script using `declare 
param=value` command:
  
@@ -103, +103 @@

  }}}
  
  Default values are only used if parameters is not specified.
+ 
+ `declare` can also be used to define one parameter in terms of others:
+ 
+ {{{
+ declare param1 ($param2 + $param3)
+ }}}
+ 
+ Note that `param2` and `param3` must be defined prior to this `declare` 
statement.
  
  === Debugging ===
  
@@ -115, +123 @@

  A C-style preprocessor will be writtem to perform parameter substitution. The 
preprocessor will do the following:
  
   1. Create  an empty `original name.substituted` file in the current 
working directory
-  2. Read parameters from files, command line , and declare statement and 
construct a hash preserving the precedence rules in case of duplicates 
described above
+  2. Read parameters from files, command line and populate parameter hash 
using precedence rules describe above.
   3. For each line in the input script
-   * if declare line, skip
* if comment or empty line, copy over
+   * if declare line
+* search the line for variables that need to be replaced and perform 
replacement if needed. Generate an error and abort if replacement is needed but 
the correspondent parameter is not found in the parameter hash.
+* if the param value is enclosed in backticks, run the command and capture 
its stdout. If the command succeeds, store the parameter defined in `declare` 
in the parameter hash with its value set to command's stdout. If the command 
fails, report the error and abort the processing.
+* if declare statement is not a command, store it in the parameter hash.
* for all other lines
+* search the line for variables that need to be replaced and perform 
replacement if needed. Generate an error and abort if replacement is needed but 
the correspondent parameter is not found in the parameter hash. (Reuse the code 
from the parameter substitution in declare statement.)
-* parse each part enclosed in `%` and remove `%s`. Locate any identifier 
that starts with `$` and lookup it up in the hash. If found, make the 
substitution; otherwise, report an error and abort.
-* if the part is enclosed in backticks, run the substitued command. If the 
command succeeds, substitute the command with its stdout. If it fails, report 
an error and abort.
 * place the substituted line into the output file.
   4. If -dryrun is not specified, pass the output file to grunt to execute. 
Otherwise, print the name of the file and exit.
   5. if neither -debug nor -dryrun are specified, remove the output file.


[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-02-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  = Parameter Substitution in Pig =
  
- ==  Motivation ==
+ == Motivation ==
  
  This document describes a proposal for implementing parameter substitution in 
pig. This proposal is motivated by multiple requests from users who would like 
to create a template pig script and then use it with different parameters on a 
regular basis. For instance, if you have daily processing that is identical 
every day except the date it needs to process, it would be very convenient to 
put a placeholder for the date and provide the actual value at run time.
  
@@ -29, +29 @@

  
  For this example, pig would expect `date` to be passed from pig command line 
or from a parameter file. The value would be substituted prior to running the 
load statement.
  
- In addition to supplying parameter value, a user can supply a command to 
execute to generate a parameter value. This can be done using `declare` 
statement.
+ In addition to supplying parameter value, a user can supply a command to 
execute to generate a parameter value. This can be done using `declare` 
statement. 
  
  {{{
- declare CMD `generate_date`
+ #declare CMD `generate_date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0'5';
  .
@@ -40, +40 @@

  
  For this example, pig would execute `generate_date` command when it 
encounters the `declare` statement and assigns the result (stdout) to parameter 
`CMD`. The value of `CMD` is substituted prior to running the load statement.
  
- A command can take parameters which need to be substituted as well.
+ `declare` statement starts with `#` to indicate that it is part of the 
preprocessor that performs parameter substitution rather than Pig language 
itself. 
+ 
+ `declare` can also be used to define one parameter in terms of others:
  
  {{{
+ #declare param1 ($param2 + $param3)
+ }}}
+ 
+ With exception to string literals that can span multiple lines, for initial 
release, `declare` is a single-line command.
+ 
+ The command specified within `declare` statement can take parameters which 
need to be substituted as well.
+ 
+ {{{
- declare CMD `generate_date $date`
+ #declare CMD `generate_date $date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0'5';
  .
@@ -54, +64 @@

  Note that variables passed on the command line must be resolved prior to the 
declare statement. The following sequence would cause an error:
  
  {{{
- declare A `cmd1 $B`
+ #declare A `cmd1 $B`
- declare $B `cmd2`
+ #declare $B `cmd2`
  }}}
  
  Command name itself can be a parameter.
  
  {{{
- declare CMD `$mycmd $date`
+ #declare CMD `$mycmd $date`
  A = load '/data/mydata/$CMD';
  B = filter A by $0'5';
  .
@@ -96, +106 @@

  
  Files and command line parameters can be combined, with command line 
parameters taking precedence over files in case of duplicate parameters.
  
- The fault parameter values can be specified in a script using `declare 
param=value` statement:
+ `declare` command takes the highest precedence. Having multiple `declare` 
commands defining the same parameter is an error that results in an error 
message and abort of the processing.
+ 
+ Default parameter values can be specified in a script using `#default param 
value` statement. This statement is identical to `declare` except that it has 
the lowest precedence meaning that its value is only used if it has not been 
defined before.
  
  {{{
- declare cmd=generate_name
+ #default cmd=generate_name
  }}}
- 
- Default values are only used if parameters is not specified.
- 
- `declare` can also be used to define one parameter in terms of others:
- 
- {{{
- declare param1 ($param2 + $param3)
- }}}
- 
- Note that `param2` and `param3` must be defined prior to this `declare` 
statement.
  
  === Debugging ===
  
@@ -122, +124 @@

  
  A C-style preprocessor will be written to perform parameter substitution. The 
preprocessor will do the following:
  
-  1. Create  an empty `original name.substituted` file in the current 
working directory
+  1. Create an empty `original name.substituted` file in the current working 
directory
   2. Read parameters from files, command line and populate parameter hash 
using precedence rules describe above.
   3. For each line in the input script
* if comment or empty line, copy over
@@ -130, +132 @@

 * search the line for variables that need to be replaced and perform 
replacement if needed. Generate an error and abort if replacement is needed but 
the correspondent parameter is not found in the parameter hash.
 * if the param value is enclosed in backticks, run the command and capture 
its stdout. If the command succeeds, store the parameter defined in `declare` 
in the parameter hash with its value set to command's stdout. If 

[Pig Wiki] Update of ParameterSubstitution by OlgaN

2008-02-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ParameterSubstitution

--
  .
  }}}
  
- In this example, parameters `cmd` and `date` are substituted first when 
`declare` statement is encountered. Then the resulting command is executed and 
its stdout is placed into the path prior to running the load statement.
+ In this example, parameters `mycmd` and `date` are substituted first when 
`declare` statement is encountered. Then the resulting command is executed and 
its stdout is placed into the path prior to running the load statement.
  
  Note that parameter names are case insensitive and $cmd and $CMD means the 
same thing. This is to match the rest of Pig Latin.