Hi, I seem to be having an odd problem with pig. At least I haven't found any documentation on it. I've been using hadoop 20.1 to do some parsing of my data, and I thought pig might be a good tool to process what comes out. I got pig 0.5.0, which seemed to be working OK until I tried to read from my HDFS. Pig only seems to be reading from my local file system. (Well, actually it's NFS.)

Anyway, pig starts up and says:

2009-11-23 23:12:28,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///

but this is a lie. When I try to access a file from it (my hdfs is mounted at /data/pig/dfs. /data is a local drive on each node in the cluster) I get:

grunt> virus5 = load '/data/pig/dfs/virus_output/part-r-00000';
grunt> dump virus5;
2009-11-23 23:15:11,377 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-11-23 23:15:11,377 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-11-23 23:15:14,356 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-11-23 23:15:14,386 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-11-23 23:15:14,402 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-11-23 23:15:14,890 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-11-23 23:15:14,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-11-23 23:15:14,891 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-11-23 23:15:14,917 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1663096198/tmp782307025" 2009-11-23 23:15:14,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-11-23 23:15:14,923 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: file:/data/pig/dfs/virus_output/part-r-00000 does not exist.
Details at logfile: /home/leek2/pig_1259046747727.log

This does work with hadoop, however:

hadoop dfs -ls /data/pig/dfs/virus_output/part-r-00000
Found 1 items
-rw-r--r-- 1 leek2 supergroup 1151360535 2009-11-23 15:58 /data/pig/dfs/virus_output/part-r-00000


I can read from my local file system just fine though. Pig does seem to be connecting to the hadoop cluster? Does anyone know what I'm doing wrong?

Thanks,
Jim

Reply via email to