I didn't use Cloudera distribution before. Pig bundles Apache hadoop 0.20.2 client library. If Cloudera made some changes to hadoop, that could be an issue.

One thing you can try is build hadoop20.jar by yourself (http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt), put it in lib (replace the original hadoop20.jar).

Daniel

felix gao wrote:
Daniel,

No, I am using 0.20.2 from Cloudera.
here is all the jar under pig's lib
$ ls ~/pig-0.7.0/lib
automaton.jar  hadoop-LICENSE.txt  hadoop-lzo.jar  hadoop18.jar
 hadoop20.jar  hbase-0.20.0-test.jar  hbase-0.20.0.jar  jdiff
 zookeeper-hbase-1329.jar

$ ls $HADOOP_HOME
CHANGES.txt  build.xml      hadoop-0.20.2+737-ant.jar
hadoop-ant-0.20.2+737.jar       hadoop-examples.jar          ivy
 webapps
LICENSE.txt  cloudera       hadoop-0.20.2+737-core.jar      hadoop-ant.jar
               hadoop-test-0.20.2+737.jar   ivy.xml
NOTICE.txt   conf           hadoop-0.20.2+737-examples.jar
 hadoop-core-0.20.2+737.jar      hadoop-test.jar              lib
README.txt   contrib        hadoop-0.20.2+737-test.jar      hadoop-core.jar
                hadoop-tools-0.20.2+737.jar  logs
bin          example-confs  hadoop-0.20.2+737-tools.jar
hadoop-examples-0.20.2+737.jar  hadoop-tools.jar             pids


$ ls  $HADOOP_HOME/lib
aspectjrt-1.6.5.jar           commons-logging-api-1.0.4.jar
 jackson-mapper-asl-1.5.2.jar  junit-4.5.jar
servlet-api-2.5-6.1.14.jar
aspectjtools-1.6.5.jar        commons-net-1.4.1.jar
 jasper-compiler-5.5.12.jar    kfs-0.2.2.jar
slf4j-api-1.4.3.jar
commons-cli-1.2.jar           core-3.1.1.jar
jasper-runtime-5.5.12.jar     kfs-0.2.LICENSE.txt
slf4j-log4j12-1.4.3.jar
commons-codec-1.4.jar         hadoop-fairscheduler-0.20.2+737.jar  jdiff
                    log4j-1.2.15.jar                    xmlenc-0.52.jar
commons-daemon-1.0.1.jar      hadoop-lzo-0.4.6.jar
jets3t-0.6.1.jar              mockito-all-1.8.2.jar
commons-el-1.0.jar            hsqldb-1.8.0.10.LICENSE.txt
 jetty-6.1.14.jar              mysql-connector-java-5.0.8-bin.jar
commons-httpclient-3.0.1.jar  hsqldb-1.8.0.10.jar
 jetty-util-6.1.14.jar         native
commons-logging-1.0.4.jar     jackson-core-asl-1.5.2.jar           jsp-2.1
                    oro-2.0.8.jar

please tell me how to get this working with pig

Thanks,

Felix






On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai <[email protected]> wrote:

Looks like hadoop client jar does not match the version of server side. Are
you using hadoop 0.20.2 from Apache?

Daniel

-----Original Message----- From: felix gao
Sent: Thursday, December 09, 2010 5:48 PM
To: [email protected]
Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to
create DataStorage


I kept seening Failed to create DataStroage error when try to run pig

$ java -cp pig-0.7.0-core.jar:$HADOOP_CONF_DIR org.apache.pig.Main -x
mapreduce
10/12/09 20:35:31 INFO pig.Main: Logging error messages to:
/home/testpig/pig-0.7.0/pig_1291944931735.log
2010-12-09 20:35:31,997 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: hdfs://localhost:8020
2010-12-09 20:35:32,333 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage

$ cat pig_1291944931735.log
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
at

org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at

org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at

org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:216)
at

org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:126)
at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
at org.apache.pig.PigServer.<init>(PigServer.java:184)
at org.apache.pig.PigServer.<init>(PigServer.java:173)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:354)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed on
local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at

org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at

org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 8 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

if I ran java -cp pig-0.7.0-core.jar org.apache.pig.Main -x mapreduce
command, I can atleast see the grunt shell.

However, when using hadoop commands
$ hadoop fs -ls
Found 1 items
-rw-r--r--   1 testpig supergroup     454557 2010-12-09 19:31
/user/testpig/access_log.2010-08-30-23-01.lzo

everything seems to be fine connecting to hdfs.

My environment have the following settings
PIG_HOME=/home/testpig/pig-0.7.0
HADOOP_HOME=/usr/lib/hadoop-0.20    (cloudera distribution)
HADOOP_CONF_DIR=/usr/lib/hadoop-0.20/conf
JAVA_HOME=/usr/java/default

pig-env.sh have the following setting
export PIG_OPTS="$PIG_OPTS
-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64"
export

PIG_CLASSPATH=$PIG_CLASSPATH:/home/testpig/hadoop-lzo.jar:/home/testpig/elephant-bird.jar:/home/testpig/elephant-bird/lib/*
export PIG_HADOOP_VERSION=20


What is going on there?

Thanks a lot.

Felix


Reply via email to