[ 
https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565971#action_12565971
 ] 

Olga Natkovich commented on PIG-58:
-----------------------------------

This is in response to 
https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565958#action_12565958
=============================================================================================

This is response to Alan's comment: 
https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565958#action_12565958
=========================================================================================================

I think it is an interesting idea and a reasonable approach but I have a few 
concerns:

(1) I don't think that C-style preprocessor (CPP) is necessarily that well 
known and understood among our users and developers
(2) CPP is complex to use and implement and might be too heavy weight for what 
we are trying to do. Briefly looked at the CPP code and it is very involved. 
Translating or writing on our own chunks of it would be a fairly large project.
(3) For C, CPP is used to influence how code is compiled. For pig we are trying 
to influence the run time behavior of the pig program and there are other ways 
to do it.  One way to do it is to embed Pig in languages such as Perl, Python 
or C/C++ which would take care of code inclusion, conditional execution and 
more. Similarly, we might decide to have this things in pig language but I 
don't think they belong in our preprocessor. (What's the difference between 
"if" and "#if" would be in pig.) So my approach is (a) simple things like 
parameter substitution can be done in preprocessor. (b) more complex things 
happen in the language itself or in the language in which Pig is embedded.
(4) CPP does not provide support for command execution which users asked for 
and just forcing them to run it from command line has limitation in terms of 
parameterizing command line and also harvesting return codes and error messages.

There are a couple of things I like from this proposal and would like to use:

(1) use #define rather than declare
(2) extend #define to also declare commands
(3) We can later further expand #define to include more things as we need them.

This way only variable names would be used outside of define which is nice 
since if pig later support variables such as for scalars they would have 
consistent representation.

So my examples from the document would now look as follows:

(1)
 A = load '/data/mydata/$date';

(2) 
#define CMD `generate_date`
A = load '/data/mydata/$CMD';

(3)
#define CMD `generate_name $date`
A = load '/data/mydata/$CMD';

(4)
#define CMD `$cmd $date`;
A = load '/data/mydata/$CMD';

I think this also addresses some of the concerns from 

https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565959#action_12565959

> parameterized Pig scripts
> -------------------------
>
>                 Key: PIG-58
>                 URL: https://issues.apache.org/jira/browse/PIG-58
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This feature has been requested by several users and would be very useful in 
> conjunction with streaming. The feature would allow pig script to include 
> parameters that are replaced at run time. For instance, if your script needs 
> to run on a daily basis over the data of the previous day, you would be able 
> to use the script and providing a date as a run-time parameter to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val> 
> construct. Multiple parameters can be specified. They are applied to the 
> script in the order they are specified on the command line
> (2) Default values for the parameters can be specified within the script via 
> decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used 
> te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by 
> Main before grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %<identifie>% look for a match in the hash. 
> Replace, if found.  Write the line to the temp file.
> - pass the temp file to grunt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to