I am trying to do the same (submitting a PIG script to a remote cluster from a Windows m/c) and the job gets submitted after setting the following in pig.properties:
fs.default.name=hdfs://<node>:54310 mapred.job.tracker=hdfs://<node>:54510 However, my script fails because it looks for inputs under /user/DrWho. Is it possible to specify the hadoop cluster user in pig.properties? How does one control it? Where is DrWho coming from? Thanks, -sanjay -----Original Message----- From: Gerrit Jansen van Vuuren [mailto:[email protected]] Sent: Sunday, October 17, 2010 6:47 PM To: [email protected] Subject: RE: accessing remote cluster with Pig Glad it worked for you :) I use the standard apache pig distributions. There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list: /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that sets the home variables and is managed by puppet) /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick for users that don't have permission to root but not for production ) $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets sourced by $PIG_HOME/bin/pig ) To see what variables your pig is picking up you can manually insert the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file just before it calls java. Cheers, Gerrit -----Original Message----- From: Anze [mailto:[email protected]] Sent: Sunday, October 17, 2010 7:49 AM To: [email protected] Subject: Re: accessing remote cluster with Pig Gerrir, thank you for your answer! It has pointed me in the right direction. It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was able to debug a bit further: ----- $ find / -name 'pig.properties' /etc/pig/conf.dist/pig.properties /etc/pig/conf/pig.properties /usr/lib/pig/example-confs/conf.default/pig.properties /usr/lib/pig/conf/pig.properties ----- I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses. So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too. Thanks again! :) Anze On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Pig configuration is in the file: $PIG_HOME/conf/pig.properties > > The two parameters that tell pig where to find the namenode and job tracker > are: > > E.g (assuming your using the default ports) > > ----[ $PIG_HOME/conf/pig.properties ]--------------- > > fs.default.name=hdfs://<namenode url>:8020/ > mapred.job.tracker=<jobtracker url>:8021 > > -------------- > > Having these properties you don't need to specify pig -x mapreduce, just > pig is enough. > > > Cheers, > Gerrit > > -----Original Message----- > From: Anze [mailto:[email protected]] > Sent: Saturday, October 16, 2010 9:53 PM > To: [email protected] > Subject: accessing remote cluster with Pig > > Hi again! :) > > I am trying to run Pig on a local machine, but I want it to connect to a > remote cluster. I can't make it use my settings - whatever I do, I get > this: ----- > $ pig -x mapreduce > 10/10/16 22:17:43 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287260263699.log > 2010-10-16 22:17:43,896 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I have copied the hadoop settings files (/etc/hadoop/conf/*) from the > remote > > cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, > PIGDIR, > HADOOP_CLASSPATH,... I have also tried changing > /etc/pig/conf/pig.configuration (even wrote there some free text so it > would > > at least give me an error message) - nothing. It still connects to file:/// > and is still doesn't display a message about a jobtracker: > ----- > $ export HADOOPDIR=/etc/hadoop/conf > $ export PIG_PATH=/etc/pig/conf > $ export PIG_CLASSPATH=$HADOOPDIR > $ export PIG_HADOOP_VERSION=0.20.2 > $ export PIG_HOME="/usr/lib/pig" > $ export PIG_CONF_DIR="/etc/pig/" > $ export PIG_LOG_DIR="/var/log/pig" > $ pig -x mapreduce > 10/10/16 22:32:34 INFO pig.Main: Logging error messages to: > /home/pigtest/conf/pig_1287261154272.log > 2010-10-16 22:32:34,471 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to > hadoop file system at: file:/// > grunt> > ----- > > I am guessing I am doing something fundamentally wrong. How do I change the > Pig's settings? > > More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny- > cdh3b3). I would appreciate some pointers. > > Kind regards, > > Anze
