Re: Pig isn't reading my HDFS?

Pratyush Banerjee Tue, 24 Nov 2009 00:31:37 -0800

Hi James,

Seems like your pig is not finding the Hadoop configuration files in itsclasspath when it is firing.Assuming that you have installed Hadoop-0.20.0 somewhere in your localfileSystem say <hadoop-20 installation directory>, please add thefollowing to your classpath

export HADOOPDIR=<hadoop installation directory>/conf
export PIG_CLASSPATH=$PIG_HOME/pig.jar:$HADOOPDIR

This should solve you problem if all your configuration have been doneas per instructions in pig wiki


Thanks and regards,
Pratyush

James R. Leek wrote:

Hi, I seem to be having an odd problem with pig. At least I haven'tfound any documentation on it. I've been using hadoop 20.1 to do someparsing of my data, and I thought pig might be a good tool to processwhat comes out. I got pig 0.5.0, which seemed to be working OK untilI tried to read from my HDFS. Pig only seems to be reading from mylocal file system. (Well, actually it's NFS.)
Anyway, pig starts up and says:
2009-11-23 23:12:28,799 [main] INFOorg.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to hadoop file system at: file:///
but this is a lie. When I try to access a file from it (my hdfs ismounted at /data/pig/dfs. /data is a local drive on each node in thecluster) I get:
grunt> virus5 = load '/data/pig/dfs/virus_output/part-r-00000';
grunt> dump virus5;
2009-11-23 23:15:11,377 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size before optimization: 12009-11-23 23:15:11,377 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- MR plan size after optimization: 12009-11-23 23:15:14,356 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- Setting up single store job2009-11-23 23:15:14,386 [main] INFOorg.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVMMetrics with processName=JobTracker, sessionId= - already initialized2009-11-23 23:15:14,402 [Thread-5] WARNorg.apache.hadoop.mapred.JobClient - Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same.2009-11-23 23:15:14,890 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 0% complete2009-11-23 23:15:14,891 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 100% complete2009-11-23 23:15:14,891 [main] ERRORorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map reduce job(s) failed!2009-11-23 23:15:14,917 [main] ERRORorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Failed to produce result in: "file:/tmp/temp1663096198/tmp782307025"2009-11-23 23:15:14,917 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- Failed!2009-11-23 23:15:14,923 [main] ERROR org.apache.pig.tools.grunt.Grunt- ERROR 2100: file:/data/pig/dfs/virus_output/part-r-00000 does notexist.
Details at logfile: /home/leek2/pig_1259046747727.log

This does work with hadoop, however:

hadoop dfs -ls /data/pig/dfs/virus_output/part-r-00000
Found 1 items
-rw-r--r-- 1 leek2 supergroup 1151360535 2009-11-23 15:58/data/pig/dfs/virus_output/part-r-00000
I can read from my local file system just fine though. Pig does seemto be connecting to the hadoop cluster? Does anyone know what I'mdoing wrong?
Thanks,
Jim

Re: Pig isn't reading my HDFS?

Reply via email to