Hi Sanjay, You can specify a 'hadoop.job.ugi' property for your mapreduce job.
e.g., hadoop.job.ugi=username,groupname Hope this helps. Regards, - Youngwoo 2010/10/21 Kaluskar, Sanjay <[email protected]> > I am trying to do the same (submitting a PIG script to a remote cluster > from a Windows m/c) and the job gets submitted after setting the > following in pig.properties: > > fs.default.name=hdfs://<node>:54310 > mapred.job.tracker=hdfs://<node>:54510 > > However, my script fails because it looks for inputs under /user/DrWho. > Is it possible to specify the hadoop cluster user in pig.properties? How > does one control it? Where is DrWho coming from? > > Thanks, > -sanjay > > -----Original Message----- > From: Gerrit Jansen van Vuuren [mailto:[email protected]] > Sent: Sunday, October 17, 2010 6:47 PM > To: [email protected] > Subject: RE: accessing remote cluster with Pig > > Glad it worked for you :) > > I use the standard apache pig distributions. > There are several places that environment variables can be changed and > set, and I have no idea which one cloudera uses but here is a list: > > /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here > that sets the home variables and is managed by puppet) /etc/bash.bashrc > (not good idea to set it here) $HOME/.bashrc (quick for users that > don't have permission to root but not for production ) > $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, > gets > sourced by $PIG_HOME/bin/pig ) > > To see what variables your pig is picking up you can manually insert the > lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the > $PIG_HOME/bin/pig file just before it calls java. > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[email protected]] > Sent: Sunday, October 17, 2010 7:49 AM > To: [email protected] > Subject: Re: accessing remote cluster with Pig > > > Gerrir, thank you for your answer! It has pointed me in the right > direction. > > > It looks like Pig (at least mine) ignores PIG_HOME. But with your help I > was > > able to debug a bit further: > ----- > $ find / -name 'pig.properties' > /etc/pig/conf.dist/pig.properties > /etc/pig/conf/pig.properties > /usr/lib/pig/example-confs/conf.default/pig.properties > /usr/lib/pig/conf/pig.properties > ----- > > I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what > my > Pig uses. > > So while Cloudera packaging makes /etc/pig/conf/pig.properties (the > "Debian > way"), it is not used at all. And it probably ignores the environment > vars > too. > > Thanks again! :) > > Anze > > > > On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > > Hi, > > > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > > > The two parameters that tell pig where to find the namenode and job > tracker > > are: > > > > E.g (assuming your using the default ports) > > > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > > > fs.default.name=hdfs://<namenode url>:8020/ > > mapred.job.tracker=<jobtracker url>:8021 > > > > -------------- > > > > Having these properties you don't need to specify pig -x mapreduce, > just > > pig is enough. > > > > > > Cheers, > > Gerrit > > > > -----Original Message----- > > From: Anze [mailto:[email protected]] > > Sent: Saturday, October 16, 2010 9:53 PM > > To: [email protected] > > Subject: accessing remote cluster with Pig > > > > Hi again! :) > > > > I am trying to run Pig on a local machine, but I want it to connect to > a > > remote cluster. I can't make it use my settings - whatever I do, I get > > this: ----- > > $ pig -x mapreduce > > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > > /home/pigtest/conf/pig_1287260263699.log > > 2010-10-16 22:17:43,896 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting > > to > > hadoop file system at: file:/// > > grunt> > > ----- > > > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > > remote > > > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > > PIGDIR, > > HADOOP_CLASSPATH,... I have also tried changing > > /etc/pig/conf/pig.configuration (even wrote there some free text so it > > would > > > > at least give me an error message) - nothing. It still connects to > file:/// > > and is still doesn't display a message about a jobtracker: > > ----- > > $ export HADOOPDIR=/etc/hadoop/conf > > $ export PIG_PATH=/etc/pig/conf > > $ export PIG_CLASSPATH=$HADOOPDIR > > $ export PIG_HADOOP_VERSION=0.20.2 > > $ export PIG_HOME="/usr/lib/pig" > > $ export PIG_CONF_DIR="/etc/pig/" > > $ export PIG_LOG_DIR="/var/log/pig" > > $ pig -x mapreduce > > 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: > > /home/pigtest/conf/pig_1287261154272.log > > 2010-10-16 22:32:34,471 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting > > to > > hadoop file system at: file:/// > > grunt> > > ----- > > > > I am guessing I am doing something fundamentally wrong. How do I > change > the > > Pig's settings? > > > > More info: using Cloudera package hadoop-pig from CDH3b3 > (0.7.0+16-1~lenny- > > cdh3b3). I would appreciate some pointers. > > > > Kind regards, > > > > Anze > > >
