I just fixed the problem. I am using CDH3b2. Appearently Cloudera have their own pig distribution. THere are some major patches going on for their version of pig 0.7 0011-PIG-1452-to-remove-hadoop20.jar-from-lib-and-use-had.patch 0012-CLOUDERA-BUILD.-Build-pig-against-CDH3b3-snapshot.patch
Now that I am really confused on which version to use from now. Thanks for the help. Felix On Fri, Dec 10, 2010 at 11:30 AM, Daniel Dai <[email protected]> wrote: > hadoop20.jar is more than hadoop-core.jar, it includes all hadoop classes > and dependent libraries. Where did you get hadoop? Is that from CDH? which > version is it? > > > Daniel > > felix gao wrote: > >> Daniel, >> >> Here is what I did, the jar is already build by cloudera, so I did >> mv hadoop-core-0.20.2+737.jar hadoop20.jar to pig's lib dir >> >> then I did >> java -Dfs.default.name=hdfs://localhost:8020 >> -Dmapred.job.tracker=localhost:8021 -jar pig-0.7.0-core.jar >> 10/12/10 14:21:42 INFO pig.Main: Logging error messages to: >> /home/felix/pig-0.7.0/pig_1292008902688.log >> 2010-12-10 14:21:43,014 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> Connecting >> to hadoop file system at: hdfs://localhost:8020 >> 2010-12-10 14:21:43,275 [main] ERROR org.apache.pig.Main - ERROR 2999: >> Unexpected internal error. Failed to create DataStorage >> >> seems that still doesn't fix my problem. >> >> Felix >> >> >> On Fri, Dec 10, 2010 at 11:10 AM, Daniel Dai <[email protected]> >> wrote: >> >> >> >>> I didn't use Cloudera distribution before. Pig bundles Apache hadoop >>> 0.20.2 >>> client library. If Cloudera made some changes to hadoop, that could be an >>> issue. >>> >>> One thing you can try is build hadoop20.jar by yourself ( >>> http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt), put it in lib >>> (replace the original hadoop20.jar). >>> >>> Daniel >>> >>> >>> felix gao wrote: >>> >>> >>> >>>> Daniel, >>>> >>>> No, I am using 0.20.2 from Cloudera. >>>> here is all the jar under pig's lib >>>> $ ls ~/pig-0.7.0/lib >>>> automaton.jar hadoop-LICENSE.txt hadoop-lzo.jar hadoop18.jar >>>> hadoop20.jar hbase-0.20.0-test.jar hbase-0.20.0.jar jdiff >>>> zookeeper-hbase-1329.jar >>>> >>>> $ ls $HADOOP_HOME >>>> CHANGES.txt build.xml hadoop-0.20.2+737-ant.jar >>>> hadoop-ant-0.20.2+737.jar hadoop-examples.jar ivy >>>> webapps >>>> LICENSE.txt cloudera hadoop-0.20.2+737-core.jar >>>> hadoop-ant.jar >>>> hadoop-test-0.20.2+737.jar ivy.xml >>>> NOTICE.txt conf hadoop-0.20.2+737-examples.jar >>>> hadoop-core-0.20.2+737.jar hadoop-test.jar lib >>>> README.txt contrib hadoop-0.20.2+737-test.jar >>>> hadoop-core.jar >>>> hadoop-tools-0.20.2+737.jar logs >>>> bin example-confs hadoop-0.20.2+737-tools.jar >>>> hadoop-examples-0.20.2+737.jar hadoop-tools.jar pids >>>> >>>> >>>> $ ls $HADOOP_HOME/lib >>>> aspectjrt-1.6.5.jar commons-logging-api-1.0.4.jar >>>> jackson-mapper-asl-1.5.2.jar junit-4.5.jar >>>> servlet-api-2.5-6.1.14.jar >>>> aspectjtools-1.6.5.jar commons-net-1.4.1.jar >>>> jasper-compiler-5.5.12.jar kfs-0.2.2.jar >>>> slf4j-api-1.4.3.jar >>>> commons-cli-1.2.jar core-3.1.1.jar >>>> jasper-runtime-5.5.12.jar kfs-0.2.LICENSE.txt >>>> slf4j-log4j12-1.4.3.jar >>>> commons-codec-1.4.jar hadoop-fairscheduler-0.20.2+737.jar jdiff >>>> log4j-1.2.15.jar xmlenc-0.52.jar >>>> commons-daemon-1.0.1.jar hadoop-lzo-0.4.6.jar >>>> jets3t-0.6.1.jar mockito-all-1.8.2.jar >>>> commons-el-1.0.jar hsqldb-1.8.0.10.LICENSE.txt >>>> jetty-6.1.14.jar mysql-connector-java-5.0.8-bin.jar >>>> commons-httpclient-3.0.1.jar hsqldb-1.8.0.10.jar >>>> jetty-util-6.1.14.jar native >>>> commons-logging-1.0.4.jar jackson-core-asl-1.5.2.jar >>>> jsp-2.1 >>>> oro-2.0.8.jar >>>> >>>> please tell me how to get this working with pig >>>> >>>> Thanks, >>>> >>>> Felix >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai <[email protected]> wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Looks like hadoop client jar does not match the version of server side. >>>>> Are >>>>> you using hadoop 0.20.2 from Apache? >>>>> >>>>> Daniel >>>>> >>>>> -----Original Message----- From: felix gao >>>>> Sent: Thursday, December 09, 2010 5:48 PM >>>>> To: [email protected] >>>>> Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed to >>>>> create DataStorage >>>>> >>>>> >>>>> I kept seening Failed to create DataStroage error when try to run pig >>>>> >>>>> $ java -cp pig-0.7.0-core.jar:$HADOOP_CONF_DIR org.apache.pig.Main -x >>>>> mapreduce >>>>> 10/12/09 20:35:31 INFO pig.Main: Logging error messages to: >>>>> /home/testpig/pig-0.7.0/pig_1291944931735.log >>>>> 2010-12-09 20:35:31,997 [main] INFO >>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>>>> Connecting >>>>> to hadoop file system at: hdfs://localhost:8020 >>>>> 2010-12-09 20:35:32,333 [main] ERROR org.apache.pig.Main - ERROR 2999: >>>>> Unexpected internal error. Failed to create DataStorage >>>>> >>>>> $ cat pig_1291944931735.log >>>>> Error before Pig is launched >>>>> ---------------------------- >>>>> ERROR 2999: Unexpected internal error. Failed to create DataStorage >>>>> >>>>> java.lang.RuntimeException: Failed to create DataStorage >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58) >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:216) >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:126) >>>>> at org.apache.pig.impl.PigContext.connect(PigContext.java:184) >>>>> at org.apache.pig.PigServer.<init>(PigServer.java:184) >>>>> at org.apache.pig.PigServer.<init>(PigServer.java:173) >>>>> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54) >>>>> at org.apache.pig.Main.main(Main.java:354) >>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020failed >>>>> on >>>>> local exception: java.io.EOFException >>>>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:743) >>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >>>>> at $Proxy0.getProtocolVersion(Unknown Source) >>>>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) >>>>> at >>>>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) >>>>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) >>>>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) >>>>> at >>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) >>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) >>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) >>>>> at >>>>> >>>>> >>>>> >>>>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72) >>>>> ... 8 more >>>>> Caused by: java.io.EOFException >>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>>>> at >>>>> >>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) >>>>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) >>>>> >>>>> if I ran java -cp pig-0.7.0-core.jar org.apache.pig.Main -x mapreduce >>>>> command, I can atleast see the grunt shell. >>>>> >>>>> However, when using hadoop commands >>>>> $ hadoop fs -ls >>>>> Found 1 items >>>>> -rw-r--r-- 1 testpig supergroup 454557 2010-12-09 19:31 >>>>> /user/testpig/access_log.2010-08-30-23-01.lzo >>>>> >>>>> everything seems to be fine connecting to hdfs. >>>>> >>>>> My environment have the following settings >>>>> PIG_HOME=/home/testpig/pig-0.7.0 >>>>> HADOOP_HOME=/usr/lib/hadoop-0.20 (cloudera distribution) >>>>> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20/conf >>>>> JAVA_HOME=/usr/java/default >>>>> >>>>> pig-env.sh have the following setting >>>>> export PIG_OPTS="$PIG_OPTS >>>>> -Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64" >>>>> export >>>>> >>>>> >>>>> >>>>> PIG_CLASSPATH=$PIG_CLASSPATH:/home/testpig/hadoop-lzo.jar:/home/testpig/elephant-bird.jar:/home/testpig/elephant-bird/lib/* >>>>> export PIG_HADOOP_VERSION=20 >>>>> >>>>> >>>>> What is going on there? >>>>> >>>>> Thanks a lot. >>>>> >>>>> Felix >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >
