Hi James,
Seems like your pig is not finding the Hadoop configuration files in its
classpath when it is firing.
Assuming that you have installed Hadoop-0.20.0 somewhere in your local
fileSystem say <hadoop-20 installation directory>, please add the
following to your classpath
export HADOOPDIR=<hadoop installation directory>/conf
export PIG_CLASSPATH=$PIG_HOME/pig.jar:$HADOOPDIR
This should solve you problem if all your configuration have been done
as per instructions in pig wiki
Thanks and regards,
Pratyush
James R. Leek wrote:
Hi, I seem to be having an odd problem with pig. At least I haven't
found any documentation on it. I've been using hadoop 20.1 to do some
parsing of my data, and I thought pig might be a good tool to process
what comes out. I got pig 0.5.0, which seemed to be working OK until
I tried to read from my HDFS. Pig only seems to be reading from my
local file system. (Well, actually it's NFS.)
Anyway, pig starts up and says:
2009-11-23 23:12:28,799 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
but this is a lie. When I try to access a file from it (my hdfs is
mounted at /data/pig/dfs. /data is a local drive on each node in the
cluster) I get:
grunt> virus5 = load '/data/pig/dfs/virus_output/part-r-00000';
grunt> dump virus5;
2009-11-23 23:15:11,377 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2009-11-23 23:15:11,377 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2009-11-23 23:15:14,356 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2009-11-23 23:15:14,386 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2009-11-23 23:15:14,402 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
2009-11-23 23:15:14,890 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2009-11-23 23:15:14,891 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2009-11-23 23:15:14,891 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2009-11-23 23:15:14,917 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "file:/tmp/temp1663096198/tmp782307025"
2009-11-23 23:15:14,917 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2009-11-23 23:15:14,923 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2100: file:/data/pig/dfs/virus_output/part-r-00000 does not
exist.
Details at logfile: /home/leek2/pig_1259046747727.log
This does work with hadoop, however:
hadoop dfs -ls /data/pig/dfs/virus_output/part-r-00000
Found 1 items
-rw-r--r-- 1 leek2 supergroup 1151360535 2009-11-23 15:58
/data/pig/dfs/virus_output/part-r-00000
I can read from my local file system just fine though. Pig does seem
to be connecting to the hadoop cluster? Does anyone know what I'm
doing wrong?
Thanks,
Jim