[
https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565971#action_12565971
]
Olga Natkovich commented on PIG-58:
-----------------------------------
This is in response to
https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565958#action_12565958
=============================================================================================
This is response to Alan's comment:
https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565958#action_12565958
=========================================================================================================
I think it is an interesting idea and a reasonable approach but I have a few
concerns:
(1) I don't think that C-style preprocessor (CPP) is necessarily that well
known and understood among our users and developers
(2) CPP is complex to use and implement and might be too heavy weight for what
we are trying to do. Briefly looked at the CPP code and it is very involved.
Translating or writing on our own chunks of it would be a fairly large project.
(3) For C, CPP is used to influence how code is compiled. For pig we are trying
to influence the run time behavior of the pig program and there are other ways
to do it. One way to do it is to embed Pig in languages such as Perl, Python
or C/C++ which would take care of code inclusion, conditional execution and
more. Similarly, we might decide to have this things in pig language but I
don't think they belong in our preprocessor. (What's the difference between
"if" and "#if" would be in pig.) So my approach is (a) simple things like
parameter substitution can be done in preprocessor. (b) more complex things
happen in the language itself or in the language in which Pig is embedded.
(4) CPP does not provide support for command execution which users asked for
and just forcing them to run it from command line has limitation in terms of
parameterizing command line and also harvesting return codes and error messages.
There are a couple of things I like from this proposal and would like to use:
(1) use #define rather than declare
(2) extend #define to also declare commands
(3) We can later further expand #define to include more things as we need them.
This way only variable names would be used outside of define which is nice
since if pig later support variables such as for scalars they would have
consistent representation.
So my examples from the document would now look as follows:
(1)
A = load '/data/mydata/$date';
(2)
#define CMD `generate_date`
A = load '/data/mydata/$CMD';
(3)
#define CMD `generate_name $date`
A = load '/data/mydata/$CMD';
(4)
#define CMD `$cmd $date`;
A = load '/data/mydata/$CMD';
I think this also addresses some of the concerns from
https://issues.apache.org/jira/browse/PIG-58?focusedCommentId=12565959#action_12565959
> parameterized Pig scripts
> -------------------------
>
> Key: PIG-58
> URL: https://issues.apache.org/jira/browse/PIG-58
> Project: Pig
> Issue Type: New Feature
> Reporter: Olga Natkovich
>
> This feature has been requested by several users and would be very useful in
> conjunction with streaming. The feature would allow pig script to include
> parameters that are replaced at run time. For instance, if your script needs
> to run on a daily basis over the data of the previous day, you would be able
> to use the script and providing a date as a run-time parameter to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val>
> construct. Multiple parameters can be specified. They are applied to the
> script in the order they are specified on the command line
> (2) Default values for the parameters can be specified within the script via
> decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used
> te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by
> Main before grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
> if this is a declare line, skip it
> else for each unescaped pattern %<identifie>% look for a match in the hash.
> Replace, if found. Write the line to the temp file.
> - pass the temp file to grunt.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.