hadoop20.jar is more than hadoop-core.jar, it includes all hadoop
classes and dependent libraries. Where did you get hadoop? Is that from
CDH? which version is it?
Daniel
felix gao wrote:
Daniel,
Here is what I did, the jar is already build by cloudera, so I did
mv hadoop-core-0.20.2+737.jar hadoop20.jar to pig's lib dir
then I did
java -Dfs.default.name=hdfs://localhost:8020
-Dmapred.job.tracker=localhost:8021 -jar pig-0.7.0-core.jar
10/12/10 14:21:42 INFO pig.Main: Logging error messages to:
/home/felix/pig-0.7.0/pig_1292008902688.log
2010-12-10 14:21:43,014 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: hdfs://localhost:8020
2010-12-10 14:21:43,275 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage
seems that still doesn't fix my problem.
Felix
On Fri, Dec 10, 2010 at 11:10 AM, Daniel Dai <[email protected]> wrote:
I didn't use Cloudera distribution before. Pig bundles Apache hadoop 0.20.2
client library. If Cloudera made some changes to hadoop, that could be an
issue.
One thing you can try is build hadoop20.jar by yourself (
http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt), put it in lib
(replace the original hadoop20.jar).
Daniel
felix gao wrote:
Daniel,
No, I am using 0.20.2 from Cloudera.
here is all the jar under pig's lib
$ ls ~/pig-0.7.0/lib
automaton.jar hadoop-LICENSE.txt hadoop-lzo.jar hadoop18.jar
hadoop20.jar hbase-0.20.0-test.jar hbase-0.20.0.jar jdiff
zookeeper-hbase-1329.jar
$ ls $HADOOP_HOME
CHANGES.txt build.xml hadoop-0.20.2+737-ant.jar
hadoop-ant-0.20.2+737.jar hadoop-examples.jar ivy
webapps
LICENSE.txt cloudera hadoop-0.20.2+737-core.jar hadoop-ant.jar
hadoop-test-0.20.2+737.jar ivy.xml
NOTICE.txt conf hadoop-0.20.2+737-examples.jar
hadoop-core-0.20.2+737.jar hadoop-test.jar lib
README.txt contrib hadoop-0.20.2+737-test.jar
hadoop-core.jar
hadoop-tools-0.20.2+737.jar logs
bin example-confs hadoop-0.20.2+737-tools.jar
hadoop-examples-0.20.2+737.jar hadoop-tools.jar pids
$ ls $HADOOP_HOME/lib
aspectjrt-1.6.5.jar commons-logging-api-1.0.4.jar
jackson-mapper-asl-1.5.2.jar junit-4.5.jar
servlet-api-2.5-6.1.14.jar
aspectjtools-1.6.5.jar commons-net-1.4.1.jar
jasper-compiler-5.5.12.jar kfs-0.2.2.jar
slf4j-api-1.4.3.jar
commons-cli-1.2.jar core-3.1.1.jar
jasper-runtime-5.5.12.jar kfs-0.2.LICENSE.txt
slf4j-log4j12-1.4.3.jar
commons-codec-1.4.jar hadoop-fairscheduler-0.20.2+737.jar jdiff
log4j-1.2.15.jar xmlenc-0.52.jar
commons-daemon-1.0.1.jar hadoop-lzo-0.4.6.jar
jets3t-0.6.1.jar mockito-all-1.8.2.jar
commons-el-1.0.jar hsqldb-1.8.0.10.LICENSE.txt
jetty-6.1.14.jar mysql-connector-java-5.0.8-bin.jar
commons-httpclient-3.0.1.jar hsqldb-1.8.0.10.jar
jetty-util-6.1.14.jar native
commons-logging-1.0.4.jar jackson-core-asl-1.5.2.jar jsp-2.1
oro-2.0.8.jar
please tell me how to get this working with pig
Thanks,
Felix
On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai <[email protected]> wrote:
Looks like hadoop client jar does not match the version of server side.
Are
you using hadoop 0.20.2 from Apache?
Daniel
-----Original Message----- From: felix gao
Sent: Thursday, December 09, 2010 5:48 PM
To: [email protected]
Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to
create DataStorage
I kept seening Failed to create DataStroage error when try to run pig
$ java -cp pig-0.7.0-core.jar:$HADOOP_CONF_DIR org.apache.pig.Main -x
mapreduce
10/12/09 20:35:31 INFO pig.Main: Logging error messages to:
/home/testpig/pig-0.7.0/pig_1291944931735.log
2010-12-09 20:35:31,997 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting
to hadoop file system at: hdfs://localhost:8020
2010-12-09 20:35:32,333 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage
$ cat pig_1291944931735.log
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:216)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:126)
at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
at org.apache.pig.PigServer.<init>(PigServer.java:184)
at org.apache.pig.PigServer.<init>(PigServer.java:173)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:354)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 failed
on
local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 8 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
if I ran java -cp pig-0.7.0-core.jar org.apache.pig.Main -x mapreduce
command, I can atleast see the grunt shell.
However, when using hadoop commands
$ hadoop fs -ls
Found 1 items
-rw-r--r-- 1 testpig supergroup 454557 2010-12-09 19:31
/user/testpig/access_log.2010-08-30-23-01.lzo
everything seems to be fine connecting to hdfs.
My environment have the following settings
PIG_HOME=/home/testpig/pig-0.7.0
HADOOP_HOME=/usr/lib/hadoop-0.20 (cloudera distribution)
HADOOP_CONF_DIR=/usr/lib/hadoop-0.20/conf
JAVA_HOME=/usr/java/default
pig-env.sh have the following setting
export PIG_OPTS="$PIG_OPTS
-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64"
export
PIG_CLASSPATH=$PIG_CLASSPATH:/home/testpig/hadoop-lzo.jar:/home/testpig/elephant-bird.jar:/home/testpig/elephant-bird/lib/*
export PIG_HADOOP_VERSION=20
What is going on there?
Thanks a lot.
Felix