Dmitriy, thanks for answering! I will try it and post here how it goes... Right now I'm in a middle of Pig 0.7 session (I gave up and exported data from HBase to HDFS). Next week... :)
Anze On Thursday 28 October 2010, Dmitriy Ryaboy wrote: > It works with 20.2, and the error trace you pasted appears to be > completely independent of HBaseStorage.. > > I see that you are using the snapshot jar -- try putting your hadoop > jars and various dependencies on your classpath, and only using the > -nohadoop jar that pig also builds. > > -D > > On Thu, Oct 28, 2010 at 1:42 AM, Anze <[email protected]> wrote: > > Does anyone know, should Pig (0.8 - svn trunk) work with Hadoop 0.20.2? > > > > I still can't start the Pig... > > > > Thanks, > > > > Anze > > > > On Wednesday 27 October 2010, Anze wrote: > >> Thanks, I guess I would trip over that later on - but for this immediate > >> problem it doesn't help (of course, because Pig fails at the start, when > >> I'm not working with HBase yet). > >> > >> I have tracked the error message to HBaseStorage.init() and added some > >> debugging info: > >> ----- > >> public void init() { > >> // check if name node is set, if not we set local as fail back > >> String nameNode = > >> this.properties.getProperty(FILE_SYSTEM_LOCATION); > >> System.out.println("NAMENODE: " + nameNode); // debug > >> if (nameNode == null || nameNode.length() == 0) { > >> nameNode = "local"; > >> } > >> this.configuration = > >> ConfigurationUtil.toConfiguration(this.properties); > >> try { > >> if (this.uri != null) { > >> this.fs = FileSystem.get(this.uri, this.configuration); > >> } else { > >> this.fs = FileSystem.get(this.configuration); > >> } > >> } catch (IOException e) { > >> e.printStackTrace(); // debug > >> throw new RuntimeException("Failed to create DataStorage", > >> e); } > >> short defaultReplication = fs.getDefaultReplication(); > >> properties.setProperty(DEFAULT_REPLICATION_FACTOR_KEY, > >> > >> Short.valueOf(defaultReplication).toString()); } > >> ----- > >> > >> The run now looks like this: > >> ----- > >> root:/opt/pig# bin/pig > >> PIG_HOME: /opt/pig/bin/.. > >> PIG_CONF_DIR: /opt/pig/bin/../conf > >> 2010-10-27 10:18:18,728 [main] INFO org.apache.pig.Main - Logging error > >> messages to: /opt/pig/pig_1288167498720.log > >> 2010-10-27 10:18:18,940 [main] INFO > >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> Connecting to hadoop file system at: hdfs://<MY NAMENODE>:8020/ > >> NAMENODE: hdfs://<MY NAMENODE>:8020/ > >> java.io.IOException: Call to <MY NAMENODE>/10.0.0.3:8020 failed on local > >> exception: java.io.EOFException > >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > >> at org.apache.hadoop.ipc.Client.call(Client.java:743) > >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > >> at $Proxy0.getProtocolVersion(Unknown Source) > >> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > >> at > >> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) > >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) > >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) > >> at > >> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileS > >> yst em.java:82) at > >> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > >> at > >> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at > >> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at > >> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at > >> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at > >> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage > >> .ja va:73) at > >> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStora > >> ge. java:58) at > >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExe > >> cut ionEngine.java:212) at > >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExe > >> cut ionEngine.java:132) at > >> org.apache.pig.impl.PigContext.connect(PigContext.java:183) at > >> org.apache.pig.PigServer.<init>(PigServer.java:225) > >> at org.apache.pig.PigServer.<init>(PigServer.java:214) > >> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55) > >> at org.apache.pig.Main.run(Main.java:450) > >> at org.apache.pig.Main.main(Main.java:107) > >> Caused by: java.io.EOFException > >> at java.io.DataInputStream.readInt(DataInputStream.java:375) > >> at > >> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) > >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) > >> 2010-10-27 10:18:19,124 [main] ERROR org.apache.pig.Main - ERROR 2999: > >> Unexpected internal error. Failed to create DataStorage > >> Details at logfile: /opt/pig/pig_1288167498720.log > >> ----- > >> > >> I have replaced the name of my server with <MY NAMENODE> in the above > >> listing. BTW, this works as it should: > >> # hadoop fs -ls hdfs://<MY NAMENODE>:8020/ > >> > >> I would appreciate some pointers, I have no idea what is causing this... > >> > >> Anze > >> > >> On Wednesday 27 October 2010, Dmitriy Ryaboy wrote: > >> > The same way you have /etc/hadoop/conf on the claspath, you want to > >> > put the hbase conf directory on the classpath. > >> > > >> > -D > >> > > >> > On Tue, Oct 26, 2010 at 11:50 PM, Anze <[email protected]> wrote: > >> > >> ... You have all the conf files in PIG_CLASSPATH right? > >> > > > >> > > I think I do: > >> > > *** > >> > > PIG_HOME: /opt/pig/bin/.. > >> > > PIG_CONF_DIR: /opt/pig/bin/../conf > >> > > dry run: > >> > > /usr/lib/jvm/java-6-sun/bin/java -Xmx1000m > >> > > -Dpig.log.dir=/opt/pig/bin/../logs -Dpig.log.file=pig.log > >> > > -Dpig.home.dir=/opt/pig/bin/.. - > >> > > Dpig.root.logger=INFO,console,DRFA -classpath > >> > > /opt/pig/bin/../conf:/usr/lib/jvm/java-6- > >> > > sun/lib/tools.jar:/etc/hadoop/conf:/opt/pig/bin/../build/classes:/op > >> > > t/p ig /bin/../build/test/classes:/opt/pig/bin/../pig- > >> > > *-core.jar:/opt/pig/bin/../build/pig-0.8.0- > >> > > SNAPSHOT.jar:/opt/pig/bin/../lib/automaton.jar:/opt/pig/bin/../lib/h > >> > > bas e- 0.20.6.jar:/opt/pig/bin/../lib/hbase-0.20.6- > >> > > test.jar:/opt/pig/bin/../lib/zookeeper-hbase-1329.jar > >> > > org.apache.pig.Main *** > >> > > > >> > > Generated log file contains: > >> > > *** > >> > > Error before Pig is launched > >> > > ---------------------------- > >> > > ERROR 2999: Unexpected internal error. Failed to create DataStorage > >> > > > >> > > java.lang.RuntimeException: Failed to create DataStorage > >> > > > >> > > at > >> > > > >> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataSto > >> > > rag e. java:75) at > >> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataS > >> > > tor ag e.java:58) at > >> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init( > >> > > HEx ec utionEngine.java:212) at > >> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init( > >> > > HEx ec utionEngine.java:132) at > >> > > org.apache.pig.impl.PigContext.connect(PigContext.java:183) at > >> > > org.apache.pig.PigServer.<init>(PigServer.java:225) > >> > > > >> > > at org.apache.pig.PigServer.<init>(PigServer.java:214) > >> > > at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55) > >> > > at org.apache.pig.Main.run(Main.java:450) > >> > > at org.apache.pig.Main.main(Main.java:107) > >> > > > >> > > Caused by: java.io.IOException: Call to > >> > > namenode.admundus.com/10.0.0.3:8020 failed on local exception: > >> > > java.io.EOFException > >> > > > >> > > at > >> > > org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at > >> > > org.apache.hadoop.ipc.Client.call(Client.java:743) at > >> > > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at > >> > > $Proxy0.getProtocolVersion(Unknown Source) > >> > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > >> > > at > >> > > > >> > > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:10 > >> > > 6) > >> > > > >> > > at > >> > > org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) at > >> > > org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) at > >> > > > >> > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedF > >> > > ile Sy stem.java:82) at > >> > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:137 > >> > > 8) > >> > > > >> > > at > >> > > > >> > > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at > >> > > > >> > > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at > >> > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at > >> > > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at > >> > > > >> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataSto > >> > > rag e. java:72) ... 9 more > >> > > Caused by: java.io.EOFException > >> > > > >> > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > >> > > at > >> > > > >> > > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java: > >> > > 501 ) > >> > > > >> > > at > >> > > org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) > >> > > > >> > > ==================================================================== > >> > > === == ======= > >> > > > >> > > And the Pig complains: > >> > > *** > >> > > log4j:WARN No appenders could be found for logger > >> > > (org.apache.hadoop.conf.Configuration). > >> > > log4j:WARN Please initialize the log4j system properly. > >> > > 2010-10-27 08:46:44,762 [main] INFO org.apache.pig.Main - Logging > >> > > error messages to: /opt/pig/bin/pig_1288162004754.log > >> > > 2010-10-27 08:46:44,970 [main] INFO > >> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >> > > Connecting to hadoop file system at: hdfs://...:8020/ > >> > > 2010-10-27 08:46:45,158 [main] ERROR org.apache.pig.Main - ERROR > >> > > 2999: Unexpected internal error. Failed to create DataStorage > >> > > Details at logfile: /opt/pig/bin/pig_1288162004754.log > >> > > *** > >> > > > >> > > Any idea what is wrong? I have searched the net and most answers > >> > > talk about incompatible versions of Hadoop and Pig (but the posts > >> > > are old). > >> > > > >> > > Thanks, > >> > > > >> > > Anze > >> > > > >> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote: > >> > >> Yeah pig 8 is not officially released yet, it will be cut at the > >> > >> end of the month or beginning of next month. > >> > >> > >> > >> Failed to create DataStorage sounds vaguely familiar.. can you send > >> > >> the full pig session and the full error? I think it's not > >> > >> connecting to hbase on the client-side, or something along those > >> > >> lines. You have all the conf files in PIG_CLASSPATH right? > >> > >> > >> > >> -D > >> > >> > >> > >> On Tue, Oct 26, 2010 at 6:32 AM, Anze <[email protected]> wrote: > >> > >> > Hmmm, not quite there yet. :-/ > >> > >> > > >> > >> > I installed: > >> > >> > - HBase 0.20.6 > >> > >> > - Cloudera CDH3b3 Hadoop (0.20.2) > >> > >> > - Pig 0.8 (since official download is empty (?) I fetched the Pig > >> > >> > trunk from SVN and built it) > >> > >> > > >> > >> > Now it complains about "Failed to create DataStorage". Any ideas? > >> > >> > Should I upgrade Haddop too? > >> > >> > > >> > >> > This is getting a bit complicated to install. :) > >> > >> > > >> > >> > I would appreciate some pointers - google revealed nothing > >> > >> > useful. > >> > >> > > >> > >> > Thanks, > >> > >> > > >> > >> > Anze > >> > >> > > >> > >> > On Tuesday 26 October 2010, Anze wrote: > >> > >> >> Great! :) > >> > >> >> > >> > >> >> Thanks for helping me out. > >> > >> >> > >> > >> >> All the best, > >> > >> >> > >> > >> >> Anze > >> > >> >> > >> > >> >> On Tuesday 26 October 2010, Dmitriy Ryaboy wrote: > >> > >> >> > I think that you might be able to get away with 20.2 if you > >> > >> >> > don't use the filtering options. > >> > >> >> > > >> > >> >> > On Mon, Oct 25, 2010 at 3:39 PM, Anze <[email protected]> wrote: > >> > >> >> > > Dmitriy, thanks for the answer! > >> > >> >> > > > >> > >> >> > > The problem with upgrading to HBase 0.20.6 is that cloudera > >> > >> >> > > doesn't ship it yet and we would like to keep our install at > >> > >> >> > > "official" versions, even if beta. Of course, since this is > >> > >> >> > > a development / testing cluster, we could bend the rules if > >> > >> >> > > really necessary... > >> > >> >> > > > >> > >> >> > > I have written a small MR job (actually, just "M" job :) > >> > >> >> > > that exports the tables to files (allowing me to use Pig > >> > >> >> > > 0.7), but that is a bit cumbersome and slow. > >> > >> >> > > > >> > >> >> > > If I install the latest Pig (0.8), will it work at all with > >> > >> >> > > HBase 0.20.2? In other words, are scan filters (which were > >> > >> >> > > fixed in 0.20.6) needed as part of user-defined parameters > >> > >> >> > > or as part of Pig optimizations in reading from HBase? Hope > >> > >> >> > > my question makes sense... > >> > >> >> > > > >> > >> >> > > :) > >> > >> >> > > > >> > >> >> > > Thanks again, > >> > >> >> > > > >> > >> >> > > Anze > >> > >> >> > > > >> > >> >> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote: > >> > >> >> > >> Anze, the reason we bumped up to 20.6 in the ticket was > >> > >> >> > >> because HBase 20.2 had a bug in it. Ask the HBase folks, > >> > >> >> > >> but I'd say you should upgrade. > >> > >> >> > >> FWIW we upgraded to 20.6 from 20.2 a few months back and > >> > >> >> > >> it's been working smoothly. > >> > >> >> > >> > >> > >> >> > >> The Elephant-Bird hbase loader for pig 0.6 does add row > >> > >> >> > >> keys and most of the other features we added to the > >> > >> >> > >> built-in loader for pig 0.8 (notably, it does not do > >> > >> >> > >> storage). But I don't recommend downgrading to pig 0.6, as > >> > >> >> > >> 7 and especially 8 are great improvements to the software. > >> > >> >> > >> > >> > >> >> > >> -D > >> > >> >> > >> > >> > >> >> > >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <[email protected]> > > > > wrote: > >> > >> >> > >> > Hi all! > >> > >> >> > >> > > >> > >> >> > >> > I am struggling to find a working solution to load data > >> > >> >> > >> > from HBase directly. I am using Cloudera CDH3b3 which > >> > >> >> > >> > comes with Pig 0.7. What would be the easiest way to > >> > >> >> > >> > load data from HBase? If it matters: we need the rows to > >> > >> >> > >> > be included, too. > >> > >> >> > >> > > >> > >> >> > >> > I have checked ElephantBird, but it seems to require Pig > >> > >> >> > >> > 0.6. I could downgrade, but it seems... well... :) > >> > >> >> > >> > > >> > >> >> > >> > On the other hand, loading from HBase with rows is only > >> > >> >> > >> > added in Pig 0.8: > >> > >> >> > >> > https://issues.apache.org/jira/browse/PIG-915 > >> > >> >> > >> > https://issues.apache.org/jira/browse/PIG-1205 > >> > >> >> > >> > But judging from the last issue Pig 0.8 requires HBase > >> > >> >> > >> > 0.20.6? > >> > >> >> > >> > > >> > >> >> > >> > I can install latest Pig from source if needed, but I'd > >> > >> >> > >> > rather leave Hadoop and HBase at their versions (0.20.2 > >> > >> >> > >> > and 0.89.20100924 respectively). > >> > >> >> > >> > > >> > >> >> > >> > Should I write my own UDF? I'd appreciate some pointers. > >> > >> >> > >> > > >> > >> >> > >> > Thanks, > >> > >> >> > >> > > >> > >> >> > >> > Anze
