[jira] Commented: (PIG-111) Configuration of Pig

Alan Gates (JIRA) Fri, 14 Mar 2008 13:18:05 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578917#action_12578917
 ]


Alan Gates commented on PIG-111:
--------------------------------

Stefan,

Sorry it took me so long to get back to this.  

I have some issues with this patch that I should have raised earlier.

1)  While I'm fine with saying that going forward .pigrc (bash style) is not 
our preferred method, we cannot remove it without warning now.  We have users 
who depend on it, so we need to give them some warning before it vanishes.  For 
now, if we find a .pigrc we can issue a warning that says that's deprecated, 
and won't be supported at some future time.

2) Why did you remove HConfiguration?  I am fine with saying that a Properties 
object is the way we communicate with the backend.  But HConfiguration is not 
exposed outside of the hadoop specific packages.  And it just extends 
Properties and handles the translation between properties and hadoop 
configuration.  I don't see a problem with that.  Am I missing something?  As a 
general rule we should not remove classes unless there is a strong reason to do 
so.

3) Currently, a hadoop JobConf object is constructed, if a hadoop-site.xml file 
is in the class path, the values from that file are picked up and used as part 
of defining the paramaters for the JobConf.   Your change in HExecution engine 
to set the namenode explicitly to local if it had not been set (around line 
115) interferes with this.  If a user defines his cluster in hadoop-site.xml 
instead of pig.properties and you then explicitly set the cluster to local 
(because the hadoop-site.xml won't get pick up until the JobConf is constructed 
later) this causes causes his map reduce job to try to run locally (as the 
reading of the hadoop-site.xml doesn't overwrite the already set values).  I'm 
not clear this is something we want to change.  Hadoop will always have 
configuration values that users will want to set, and we don't want to import 
those into our config files (that is, we want to support using 
hadoop-site.xml).  Perhaps the solution would be to not default to local if 
exectype is set to mapreduce.

> Configuration of Pig
> --------------------
>
>                 Key: PIG-111
>                 URL: https://issues.apache.org/jira/browse/PIG-111
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Craig Macdonald
>            Assignee: Stefan Groschupf
>         Attachments: after.png, before.png, config.patch.1502, 
> PIG-111-v04.patch, PIG-111-v05.patch, PIG-111-v06.patch, 
> PIG-111_v_3_sg.patch, PIG-111_v_7_r633244M.patch, PIG-111_v_8_r633244M.patch, 
> PIG-93-v01.patch, PIG-93-v02.patch
>
>
> This JIRA discusses issues relating to the configuration of Pig.
> Uses cases:
>  
> 1. I want to configure Pig programatically from Java
>  Motivation: pig can be embedded from another Java program, and configuration 
> should be accessible to be set by the client code
> 2. I want to configure Pig from the command line
> 3. I want to configure Pig from the Pig shell (Grunt)
> 4. I want Pig to remember my configuration for every Pig session
>  Motivation: to save me typing in some configuration stuff every time.
> 5. I want Pig to remember my configuration for this script.
>  Motivation: I must use a common configuration for 50% of my Pig scripts - 
> can I share this configuration between scripts.
> Current Status: 
>  * Pig uses System properties for some configuration
>  * A configuration properties object in PigContext is not used.
>  * pigrc can contain properties
>  * Configuration properties can not be set from Grunt
> Proposed solutions to use cases:
> 1. Configuration should be set in PigContext, and accessible from client code.
> 2. System properties are copied to PigContext, or can be specified on the 
> command line (duplication with System properties)
> 3. Allow configuration properties to be set using the "set" command in Grunt
> 4. Pigrc can contain properties. Is this enough, or can other configuration 
> stuff be set, eg aliases, imports, etc.
> 5. Add an include directive to pig, to allow a shared configuration/Pig 
> script to be included.
> Connections to Shell scripting: 
>  * The source command in Bash allows another bash script file to be included 
> - this allows shared variables to be set in one file shared between a set of 
> scripts.
>  * Aliases can be set, according to user preferences, etc.
>  * All this can be done in your .bashrc file
> Issues: 
>  * What happens when you change a property after the property has been read?
>  * Can Grunt read a pigrc containing various statements etc before the 
> PigServer is completely configured?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-111) Configuration of Pig

Reply via email to