[ 
https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586465#action_12586465
 ] 

Olga Natkovich commented on PIG-58:
-----------------------------------

Pi, thanks for your prompt feedback. Couple of follow up questions/comments.

>> 1. You assume no escaping in shell command, right?

Yes, I assume that the command is formed such that it can run as is. At this 
point it does not allow backticks inside of the command. If this turned out to 
be needed we can change that. The only escaping that is done is for $identifier 
to prevent the substitution. This is done by the substitution code itself.

>> 2. The name "UtilFunctions" implies it does not hold state (even global 
>> state). From the way it is used, we should have a better name or refactor is 
>> needed.

This is how the code was submitted to me and renaming things is not one of 
things I like to do :).  Since it is not an interface point, I don't really 
want to change names.

>> 3. PigFileParser.unquote still doesn't do escaping.

There are two types of escaping supported for the lines: for quotes and for 
$identifier. Since the preprocessor does not interpret any other data, I don't 
think anything else is needed. 

>> 4. <DEFALT> token in PigFileParser misspelled

Will fix that :).

>> 5. Why ParamLoader.Parse() throw IOException?

I am not sure but seems like all parser we have do that.

>> 6. In UtilFunctions.substitute, what does "replaced_line = 
>> replaced_line.replaceAll("\\\\\\$","\\$");" do?

The indent is to allow escaping $identifier. I am not exactly sure why it 
requires so many escapes. I can try and see if there is another more intuitive 
way to do that.

>> 7. Shouldn't logger be declared "private final Logger logger = 
>> Logger.getLogger("org.apache.pig.preprocessor.log");" (everything in one 
>> line) to make it consistent?

Will fix that

>> 1. I prefer HashMap to HashTable

Any particular reason?

2. Pattern identifier in UtilFunctions.substitute can be made static, this 
makes it 1 microsecond faster :D

Will fix that :).

I also realized that I we don't have to require that single word parameters 
have to be quoted. I will make that change in the parser, make the updates that 
you suggested and submit new patch



> parameterized Pig scripts
> -------------------------
>
>                 Key: PIG-58
>                 URL: https://issues.apache.org/jira/browse/PIG-58
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: PIG-58_v1.patch
>
>
> This feature has been requested by several users and would be very useful in 
> conjunction with streaming. The feature would allow pig script to include 
> parameters that are replaced at run time. For instance, if your script needs 
> to run on a daily basis over the data of the previous day, you would be able 
> to use the script and providing a date as a run-time parameter to it.
> Example:
> =======
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .....
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> =======
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param <param>=<val> 
> construct. Multiple parameters can be specified. They are applied to the 
> script in the order they are specified on the command line
> (2) Default values for the parameters can be specified within the script via 
> decare statement:
> decare <param>=<value>
> (3) Withint the script the parameter will be enclosed in %%. \% can be used 
> te escape.
> Implementation:
> ============
> Use preprocessor to do the substitution. The preprocessor would be invoced by 
> Main before grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %<identifie>% look for a match in the hash. 
> Replace, if found.  Write the line to the temp file.
> - pass the temp file to grunt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to