Hi,
I have a EC2 box setup with Pig 0.8.1 which can run my jobs fine in local mode. 
So now I want to configure the NN & JT such that the job goes to the EMR 
cluster I've spun up.
I have a local pigconf directory with the Hadoop XML files and pointed 
HADOOP_CONF_DIR and PIG_CLASSPATH set to it.

in core-site.xml I have

 <property>
    <name>fs.default.name</name>
    <value>hdfs://10.116.83.74:9000</value>
  </property>


On mapred-site.xml I have:
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>10.116.83.74:9001</value>
  </property>


Now Pig tries to connect and I get 
2011-12-01 16:10:58,009 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /home/mashlogic/ayon/pigconf/pig_1322784657959.log
2011-12-01 16:10:58,950 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://10.116.83.74:9000
2011-12-01 16:10:59,814 [main] ERROR org.apache.pig.Main - ERROR 2999: 
Unexpected internal error. Failed to create DataStorage


log file says:

Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to /10.116.83.74:9000 failed on local 
exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
at org.apache.hadoop.ipc.Client.call(Client.java:1110)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:815)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:724)
================================================================================

My EMR is running Hive jobs just fine. So if I can get it to run my Pig jobs, 
I'll be happy.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.

Reply via email to