Problem in running Pig in hadoop mode

Anastasia Theodouli Mon, 04 Jan 2010 13:01:41 -0800

Dear users,

I'm new to hadoop and pig and I really feel I need some help...


I managed to set up a hadoop cluster on two Ubuntu boxes. All hadoop deamons 
begin without any problems.
I can also successfully copy files to the HDFS from the local filesystem.

The problem is that I can't run pig in mapreduce mode.I can do it in local mode 
though...

Every time I try to run an example script (from the Pig wiki examples), I get 
this:
2009-12-30 20:50:08,872 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2009-12-30 20:50:09,037 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Initializing JVM Metrics with processName=JobTracker, sessionId=

Furthermore, from grunt shell I seem to be connected to the local filesystem 
grunt> ls shows me all the local files and not the HDFS ones.

I don't know how to make the settings in the file 
<PIG_HOME>/conf/pig.properties ,please note also that I have created this file 
manually. Which environments variables --PIG_CLASSPATH,HADOOPDIR , other ?-- 
should I set there? Should it be this way: 
<property><name>....</name><value>....</value></property> 
or this way: export <variable_name>=value ?
Any example concerning this file would be highly appreciated, as I didn't find 
any so far.

I also tried to change pig execution mode using this command 'pig -x mapreduce' 
but I took this message in bash, pig: invalid option -- 'x'

You can find below the full error stack I took when I tried to access a file 
from the hdfs through grunt shell.

My commands in the grunt shell

grunt> A= load 'hdfs://master:54310/id.out';            
grunt> dump A;

(I took the same error below by using this command that describes the full path 
to the file in the hdfs.
 
grunt> A= load 'hdfs://master:54310/user/hadoop/id.out';           
grunt> dump A; )

The error stack 

 2009-12-30 21:11:15,266 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
- Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
already initialized
2009-12-30 21:11:15,268 [Thread-21] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-12-30 21:11:20,267 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-12-30 21:11:20,267 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2009-12-30 21:11:20,267 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- java.io.IOException: Call failed on local exception
    at org.apache.hadoop.ipc.Client.call(Client.java:718)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
    at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103)
    at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:173)
    at 
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:67)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339)
    at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:189)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
    at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441)

2009-12-30 21:11:20,268 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Unable to open iterator for alias: A [Job terminated with 
anomalous status FAILED]
    at org.apache.pig.PigServer.openIterator(PigServer.java:410)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
    at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
    at 
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
    at org.apache.pig.Main.main(Main.java:282)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    ... 6 more

2009-12-30 21:11:20,268 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
Unable to open iterator for alias: A [Job terminated with anomalous status 
FAILED]
2009-12-30 21:11:20,268 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Unable to open iterator for alias: A [Job terminated with 
anomalous status FAILED]

If any more experienced user can figure out what the problem(s)is, I would be 
grateful!

Regards,

Anastasia Th.

                                          
_________________________________________________________________
Windows Live: Keep your friends up to date with what you do online.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_1:092010

Problem in running Pig in hadoop mode

Reply via email to