The CDH3 distribution has security patched in; my understanding is that this changes the protocol, and both your server and client libraries must be compatible. You don't need the Cloudera version of pig, I think, but you do need their version of the Hadoop jars on both sides -- so you can't take the fat pig.jar, but must use the pig-nohadoop.jar version, and put the Cloudera Hadoop jars on your classpath.
-D On Fri, Dec 10, 2010 at 11:46 AM, felix gao <[email protected]> wrote: > I just fixed the problem. > I am using CDH3b2. Appearently Cloudera have their own pig distribution. > THere are some major patches going on for their version of pig 0.7 > 0011-PIG-1452-to-remove-hadoop20.jar-from-lib-and-use-had.patch > 0012-CLOUDERA-BUILD.-Build-pig-against-CDH3b3-snapshot.patch > > Now that I am really confused on which version to use from now. > > Thanks for the help. > > Felix > > > On Fri, Dec 10, 2010 at 11:30 AM, Daniel Dai <[email protected]> > wrote: > > > hadoop20.jar is more than hadoop-core.jar, it includes all hadoop classes > > and dependent libraries. Where did you get hadoop? Is that from CDH? > which > > version is it? > > > > > > Daniel > > > > felix gao wrote: > > > >> Daniel, > >> > >> Here is what I did, the jar is already build by cloudera, so I did > >> mv hadoop-core-0.20.2+737.jar hadoop20.jar to pig's lib dir > >> > >> then I did > >> java -Dfs.default.name=hdfs://localhost:8020 > >> -Dmapred.job.tracker=localhost:8021 -jar pig-0.7.0-core.jar > >> 10/12/10 14:21:42 INFO pig.Main: Logging error messages to: > >> /home/felix/pig-0.7.0/pig_1292008902688.log > >> 2010-12-10 14:21:43,014 [main] INFO > >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> Connecting > >> to hadoop file system at: hdfs://localhost:8020 > >> 2010-12-10 14:21:43,275 [main] ERROR org.apache.pig.Main - ERROR 2999: > >> Unexpected internal error. Failed to create DataStorage > >> > >> seems that still doesn't fix my problem. > >> > >> Felix > >> > >> > >> On Fri, Dec 10, 2010 at 11:10 AM, Daniel Dai <[email protected]> > >> wrote: > >> > >> > >> > >>> I didn't use Cloudera distribution before. Pig bundles Apache hadoop > >>> 0.20.2 > >>> client library. If Cloudera made some changes to hadoop, that could be > an > >>> issue. > >>> > >>> One thing you can try is build hadoop20.jar by yourself ( > >>> http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt<http://behemoth.strlen.net/%7Ealex/hadoop20-pig-howto.txt>), > put it in lib > >>> (replace the original hadoop20.jar). > >>> > >>> Daniel > >>> > >>> > >>> felix gao wrote: > >>> > >>> > >>> > >>>> Daniel, > >>>> > >>>> No, I am using 0.20.2 from Cloudera. > >>>> here is all the jar under pig's lib > >>>> $ ls ~/pig-0.7.0/lib > >>>> automaton.jar hadoop-LICENSE.txt hadoop-lzo.jar hadoop18.jar > >>>> hadoop20.jar hbase-0.20.0-test.jar hbase-0.20.0.jar jdiff > >>>> zookeeper-hbase-1329.jar > >>>> > >>>> $ ls $HADOOP_HOME > >>>> CHANGES.txt build.xml hadoop-0.20.2+737-ant.jar > >>>> hadoop-ant-0.20.2+737.jar hadoop-examples.jar ivy > >>>> webapps > >>>> LICENSE.txt cloudera hadoop-0.20.2+737-core.jar > >>>> hadoop-ant.jar > >>>> hadoop-test-0.20.2+737.jar ivy.xml > >>>> NOTICE.txt conf hadoop-0.20.2+737-examples.jar > >>>> hadoop-core-0.20.2+737.jar hadoop-test.jar lib > >>>> README.txt contrib hadoop-0.20.2+737-test.jar > >>>> hadoop-core.jar > >>>> hadoop-tools-0.20.2+737.jar logs > >>>> bin example-confs hadoop-0.20.2+737-tools.jar > >>>> hadoop-examples-0.20.2+737.jar hadoop-tools.jar pids > >>>> > >>>> > >>>> $ ls $HADOOP_HOME/lib > >>>> aspectjrt-1.6.5.jar commons-logging-api-1.0.4.jar > >>>> jackson-mapper-asl-1.5.2.jar junit-4.5.jar > >>>> servlet-api-2.5-6.1.14.jar > >>>> aspectjtools-1.6.5.jar commons-net-1.4.1.jar > >>>> jasper-compiler-5.5.12.jar kfs-0.2.2.jar > >>>> slf4j-api-1.4.3.jar > >>>> commons-cli-1.2.jar core-3.1.1.jar > >>>> jasper-runtime-5.5.12.jar kfs-0.2.LICENSE.txt > >>>> slf4j-log4j12-1.4.3.jar > >>>> commons-codec-1.4.jar hadoop-fairscheduler-0.20.2+737.jar > jdiff > >>>> log4j-1.2.15.jar xmlenc-0.52.jar > >>>> commons-daemon-1.0.1.jar hadoop-lzo-0.4.6.jar > >>>> jets3t-0.6.1.jar mockito-all-1.8.2.jar > >>>> commons-el-1.0.jar hsqldb-1.8.0.10.LICENSE.txt > >>>> jetty-6.1.14.jar mysql-connector-java-5.0.8-bin.jar > >>>> commons-httpclient-3.0.1.jar hsqldb-1.8.0.10.jar > >>>> jetty-util-6.1.14.jar native > >>>> commons-logging-1.0.4.jar jackson-core-asl-1.5.2.jar > >>>> jsp-2.1 > >>>> oro-2.0.8.jar > >>>> > >>>> please tell me how to get this working with pig > >>>> > >>>> Thanks, > >>>> > >>>> Felix > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Fri, Dec 10, 2010 at 12:20 AM, Daniel Dai <[email protected]> > wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> Looks like hadoop client jar does not match the version of server > side. > >>>>> Are > >>>>> you using hadoop 0.20.2 from Apache? > >>>>> > >>>>> Daniel > >>>>> > >>>>> -----Original Message----- From: felix gao > >>>>> Sent: Thursday, December 09, 2010 5:48 PM > >>>>> To: [email protected] > >>>>> Subject: Strange problem with Pig 0.7.0 and Hadoop 0.20.2 and Failed > to > >>>>> create DataStorage > >>>>> > >>>>> > >>>>> I kept seening Failed to create DataStroage error when try to run pig > >>>>> > >>>>> $ java -cp pig-0.7.0-core.jar:$HADOOP_CONF_DIR org.apache.pig.Main -x > >>>>> mapreduce > >>>>> 10/12/09 20:35:31 INFO pig.Main: Logging error messages to: > >>>>> /home/testpig/pig-0.7.0/pig_1291944931735.log > >>>>> 2010-12-09 20:35:31,997 [main] INFO > >>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >>>>> Connecting > >>>>> to hadoop file system at: hdfs://localhost:8020 > >>>>> 2010-12-09 20:35:32,333 [main] ERROR org.apache.pig.Main - ERROR > 2999: > >>>>> Unexpected internal error. Failed to create DataStorage > >>>>> > >>>>> $ cat pig_1291944931735.log > >>>>> Error before Pig is launched > >>>>> ---------------------------- > >>>>> ERROR 2999: Unexpected internal error. Failed to create DataStorage > >>>>> > >>>>> java.lang.RuntimeException: Failed to create DataStorage > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75) > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58) > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:216) > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:126) > >>>>> at org.apache.pig.impl.PigContext.connect(PigContext.java:184) > >>>>> at org.apache.pig.PigServer.<init>(PigServer.java:184) > >>>>> at org.apache.pig.PigServer.<init>(PigServer.java:173) > >>>>> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54) > >>>>> at org.apache.pig.Main.main(Main.java:354) > >>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020 > failed > >>>>> on > >>>>> local exception: java.io.EOFException > >>>>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > >>>>> at org.apache.hadoop.ipc.Client.call(Client.java:743) > >>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > >>>>> at $Proxy0.getProtocolVersion(Unknown Source) > >>>>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > >>>>> at > >>>>> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) > >>>>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) > >>>>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) > >>>>> at > >>>>> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > >>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > >>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) > >>>>> at > >>>>> > >>>>> > >>>>> > >>>>> > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72) > >>>>> ... 8 more > >>>>> Caused by: java.io.EOFException > >>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) > >>>>> at > >>>>> > >>>>> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) > >>>>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) > >>>>> > >>>>> if I ran java -cp pig-0.7.0-core.jar org.apache.pig.Main -x mapreduce > >>>>> command, I can atleast see the grunt shell. > >>>>> > >>>>> However, when using hadoop commands > >>>>> $ hadoop fs -ls > >>>>> Found 1 items > >>>>> -rw-r--r-- 1 testpig supergroup 454557 2010-12-09 19:31 > >>>>> /user/testpig/access_log.2010-08-30-23-01.lzo > >>>>> > >>>>> everything seems to be fine connecting to hdfs. > >>>>> > >>>>> My environment have the following settings > >>>>> PIG_HOME=/home/testpig/pig-0.7.0 > >>>>> HADOOP_HOME=/usr/lib/hadoop-0.20 (cloudera distribution) > >>>>> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20/conf > >>>>> JAVA_HOME=/usr/java/default > >>>>> > >>>>> pig-env.sh have the following setting > >>>>> export PIG_OPTS="$PIG_OPTS > >>>>> -Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64" > >>>>> export > >>>>> > >>>>> > >>>>> > >>>>> > PIG_CLASSPATH=$PIG_CLASSPATH:/home/testpig/hadoop-lzo.jar:/home/testpig/elephant-bird.jar:/home/testpig/elephant-bird/lib/* > >>>>> export PIG_HADOOP_VERSION=20 > >>>>> > >>>>> > >>>>> What is going on there? > >>>>> > >>>>> Thanks a lot. > >>>>> > >>>>> Felix > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > > >
