Pig isn't reading my HDFS?

James R. Leek Mon, 23 Nov 2009 23:48:07 -0800

Hi, I seem to be having an odd problem with pig. At least I haven'tfound any documentation on it. I've been using hadoop 20.1 to do someparsing of my data, and I thought pig might be a good tool to processwhat comes out. I got pig 0.5.0, which seemed to be working OK until Itried to read from my HDFS. Pig only seems to be reading from my localfile system. (Well, actually it's NFS.)


Anyway, pig starts up and says:

2009-11-23 23:12:28,799 [main] INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: file:///

but this is a lie. When I try to access a file from it (my hdfs ismounted at /data/pig/dfs. /data is a local drive on each node in thecluster) I get:


grunt> virus5 = load '/data/pig/dfs/virus_output/part-r-00000';
grunt> dump virus5;

2009-11-23 23:15:11,377 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 12009-11-23 23:15:11,377 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 12009-11-23 23:15:14,356 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job2009-11-23 23:15:14,386 [main] INFOorg.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized2009-11-23 23:15:14,402 [Thread-5] WARNorg.apache.hadoop.mapred.JobClient - Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same.2009-11-23 23:15:14,890 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete2009-11-23 23:15:14,891 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete2009-11-23 23:15:14,891 [main] ERRORorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map reduce job(s) failed!2009-11-23 23:15:14,917 [main] ERRORorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Failed to produce result in: "file:/tmp/temp1663096198/tmp782307025"2009-11-23 23:15:14,917 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Failed!2009-11-23 23:15:14,923 [main] ERROR org.apache.pig.tools.grunt.Grunt -ERROR 2100: file:/data/pig/dfs/virus_output/part-r-00000 does not exist.

Details at logfile: /home/leek2/pig_1259046747727.log

This does work with hadoop, however:

hadoop dfs -ls /data/pig/dfs/virus_output/part-r-00000
Found 1 items

-rw-r--r-- 1 leek2 supergroup 1151360535 2009-11-23 15:58/data/pig/dfs/virus_output/part-r-00000

I can read from my local file system just fine though. Pig does seem tobe connecting to the hadoop cluster? Does anyone know what I'm doing wrong?


Thanks,
Jim

Pig isn't reading my HDFS?

Reply via email to