It works with 20.2, and the error trace you pasted appears to be
completely independent of HBaseStorage..

I see that you are using the snapshot jar -- try putting your hadoop
jars and various dependencies on your classpath, and only using the
-nohadoop jar that pig also builds.

-D

On Thu, Oct 28, 2010 at 1:42 AM, Anze <[email protected]> wrote:
>
> Does anyone know, should Pig (0.8 - svn trunk) work with Hadoop 0.20.2?
>
> I still can't start the Pig...
>
> Thanks,
>
> Anze
>
>
> On Wednesday 27 October 2010, Anze wrote:
>> Thanks, I guess I would trip over that later on - but for this immediate
>> problem it doesn't help (of course, because Pig fails at the start, when
>> I'm not working with HBase yet).
>>
>> I have tracked the error message to HBaseStorage.init() and added some
>> debugging info:
>> -----
>>     public void init() {
>>         // check if name node is set, if not we set local as fail back
>>         String nameNode =
>> this.properties.getProperty(FILE_SYSTEM_LOCATION);
>> System.out.println("NAMENODE: " + nameNode); // debug
>>         if (nameNode == null || nameNode.length() == 0) {
>>             nameNode = "local";
>>         }
>>         this.configuration =
>> ConfigurationUtil.toConfiguration(this.properties);
>>         try {
>>             if (this.uri != null) {
>>                 this.fs = FileSystem.get(this.uri, this.configuration);
>>             } else {
>>                 this.fs = FileSystem.get(this.configuration);
>>             }
>>         } catch (IOException e) {
>>             e.printStackTrace(); // debug
>>             throw new RuntimeException("Failed to create DataStorage", e);
>>         }
>>         short defaultReplication = fs.getDefaultReplication();
>>         properties.setProperty(DEFAULT_REPLICATION_FACTOR_KEY,
>>
>> Short.valueOf(defaultReplication).toString()); }
>> -----
>>
>> The run now looks like this:
>> -----
>> root:/opt/pig# bin/pig
>> PIG_HOME: /opt/pig/bin/..
>> PIG_CONF_DIR: /opt/pig/bin/../conf
>> 2010-10-27 10:18:18,728 [main] INFO  org.apache.pig.Main - Logging error
>> messages to: /opt/pig/pig_1288167498720.log
>> 2010-10-27 10:18:18,940 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
>> to hadoop file system at: hdfs://<MY NAMENODE>:8020/
>> NAMENODE: hdfs://<MY NAMENODE>:8020/
>> java.io.IOException: Call to <MY NAMENODE>/10.0.0.3:8020 failed on local
>> exception: java.io.EOFException
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>         at $Proxy0.getProtocolVersion(Unknown Source)
>>         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>         at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSyst
>> em.java:82) at
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>>         at
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.ja
>> va:73) at
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.
>> java:58) at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecut
>> ionEngine.java:212) at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecut
>> ionEngine.java:132) at
>> org.apache.pig.impl.PigContext.connect(PigContext.java:183) at
>> org.apache.pig.PigServer.<init>(PigServer.java:225)
>>         at org.apache.pig.PigServer.<init>(PigServer.java:214)
>>         at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>>         at org.apache.pig.Main.run(Main.java:450)
>>         at org.apache.pig.Main.main(Main.java:107)
>> Caused by: java.io.EOFException
>>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>> 2010-10-27 10:18:19,124 [main] ERROR org.apache.pig.Main - ERROR 2999:
>> Unexpected internal error. Failed to create DataStorage
>> Details at logfile: /opt/pig/pig_1288167498720.log
>> -----
>>
>> I have replaced the name of my server with <MY NAMENODE> in the above
>> listing. BTW, this works as it should:
>> # hadoop fs -ls hdfs://<MY NAMENODE>:8020/
>>
>> I would appreciate some pointers, I have no idea what is causing this...
>>
>> Anze
>>
>> On Wednesday 27 October 2010, Dmitriy Ryaboy wrote:
>> > The same way you have /etc/hadoop/conf on the claspath, you want to
>> > put the hbase conf directory on the classpath.
>> >
>> > -D
>> >
>> > On Tue, Oct 26, 2010 at 11:50 PM, Anze <[email protected]> wrote:
>> > >> ... You have all the conf files in PIG_CLASSPATH right?
>> > >
>> > > I think I do:
>> > > ***
>> > > PIG_HOME: /opt/pig/bin/..
>> > > PIG_CONF_DIR: /opt/pig/bin/../conf
>> > > dry run:
>> > > /usr/lib/jvm/java-6-sun/bin/java -Xmx1000m
>> > > -Dpig.log.dir=/opt/pig/bin/../logs -Dpig.log.file=pig.log
>> > > -Dpig.home.dir=/opt/pig/bin/.. -
>> > > Dpig.root.logger=INFO,console,DRFA -classpath
>> > > /opt/pig/bin/../conf:/usr/lib/jvm/java-6-
>> > > sun/lib/tools.jar:/etc/hadoop/conf:/opt/pig/bin/../build/classes:/opt/p
>> > > ig /bin/../build/test/classes:/opt/pig/bin/../pig-
>> > > *-core.jar:/opt/pig/bin/../build/pig-0.8.0-
>> > > SNAPSHOT.jar:/opt/pig/bin/../lib/automaton.jar:/opt/pig/bin/../lib/hbas
>> > > e- 0.20.6.jar:/opt/pig/bin/../lib/hbase-0.20.6-
>> > > test.jar:/opt/pig/bin/../lib/zookeeper-hbase-1329.jar
>> > > org.apache.pig.Main ***
>> > >
>> > > Generated log file contains:
>> > > ***
>> > > Error before Pig is launched
>> > > ----------------------------
>> > > ERROR 2999: Unexpected internal error. Failed to create DataStorage
>> > >
>> > > java.lang.RuntimeException: Failed to create DataStorage
>> > >
>> > >        at
>> > >
>> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorag
>> > > e. java:75) at
>> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStor
>> > > ag e.java:58) at
>> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HEx
>> > > ec utionEngine.java:212) at
>> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HEx
>> > > ec utionEngine.java:132) at
>> > > org.apache.pig.impl.PigContext.connect(PigContext.java:183) at
>> > > org.apache.pig.PigServer.<init>(PigServer.java:225)
>> > >
>> > >        at org.apache.pig.PigServer.<init>(PigServer.java:214)
>> > >        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>> > >        at org.apache.pig.Main.run(Main.java:450)
>> > >        at org.apache.pig.Main.main(Main.java:107)
>> > >
>> > > Caused by: java.io.IOException: Call to
>> > > namenode.admundus.com/10.0.0.3:8020 failed on local exception:
>> > > java.io.EOFException
>> > >
>> > >        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>> > >        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>> > >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>> > >        at $Proxy0.getProtocolVersion(Unknown Source)
>> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>> > >        at
>> > >
>> > > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>> > >
>> > >        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>> > >        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>> > >        at
>> > >
>> > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFile
>> > > Sy stem.java:82) at
>> > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>> > >
>> > >        at
>> > >        org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>> > >        at
>> > >        org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>> > >        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at
>> > >        org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at
>> > >
>> > > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorag
>> > > e. java:72) ... 9 more
>> > > Caused by: java.io.EOFException
>> > >
>> > >        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>> > >        at
>> > >
>> > > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501
>> > > )
>> > >
>> > >        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>> > >
>> > > =======================================================================
>> > > == =======
>> > >
>> > > And the Pig complains:
>> > > ***
>> > > log4j:WARN No appenders could be found for logger
>> > > (org.apache.hadoop.conf.Configuration).
>> > > log4j:WARN Please initialize the log4j system properly.
>> > > 2010-10-27 08:46:44,762 [main] INFO  org.apache.pig.Main - Logging
>> > > error messages to: /opt/pig/bin/pig_1288162004754.log
>> > > 2010-10-27 08:46:44,970 [main] INFO
>> > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > > Connecting to hadoop file system at: hdfs://...:8020/
>> > > 2010-10-27 08:46:45,158 [main] ERROR org.apache.pig.Main - ERROR 2999:
>> > > Unexpected internal error. Failed to create DataStorage
>> > > Details at logfile: /opt/pig/bin/pig_1288162004754.log
>> > > ***
>> > >
>> > > Any idea what is wrong? I have searched the net and most answers talk
>> > > about incompatible versions of Hadoop and Pig (but the posts are old).
>> > >
>> > > Thanks,
>> > >
>> > > Anze
>> > >
>> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> > >> Yeah pig 8 is not officially released yet, it will be cut at the end
>> > >> of the month or beginning of next month.
>> > >>
>> > >> Failed to create DataStorage sounds vaguely familiar.. can you send
>> > >> the full pig session and the full error? I think it's not connecting
>> > >> to hbase on the client-side, or something along those lines. You have
>> > >> all the conf files in PIG_CLASSPATH right?
>> > >>
>> > >> -D
>> > >>
>> > >> On Tue, Oct 26, 2010 at 6:32 AM, Anze <[email protected]> wrote:
>> > >> > Hmmm, not quite there yet. :-/
>> > >> >
>> > >> > I installed:
>> > >> > - HBase 0.20.6
>> > >> > - Cloudera CDH3b3 Hadoop (0.20.2)
>> > >> > - Pig 0.8 (since official download is empty (?) I fetched the Pig
>> > >> > trunk from SVN and built it)
>> > >> >
>> > >> > Now it complains about "Failed to create DataStorage". Any ideas?
>> > >> > Should I upgrade Haddop too?
>> > >> >
>> > >> > This is getting a bit complicated to install. :)
>> > >> >
>> > >> > I would appreciate some pointers - google revealed nothing useful.
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Anze
>> > >> >
>> > >> > On Tuesday 26 October 2010, Anze wrote:
>> > >> >> Great! :)
>> > >> >>
>> > >> >> Thanks for helping me out.
>> > >> >>
>> > >> >> All the best,
>> > >> >>
>> > >> >> Anze
>> > >> >>
>> > >> >> On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> > >> >> > I think that you might be able to get away with 20.2 if you don't
>> > >> >> > use the filtering options.
>> > >> >> >
>> > >> >> > On Mon, Oct 25, 2010 at 3:39 PM, Anze <[email protected]> wrote:
>> > >> >> > > Dmitriy, thanks for the answer!
>> > >> >> > >
>> > >> >> > > The problem with upgrading to HBase 0.20.6 is that cloudera
>> > >> >> > > doesn't ship it yet and we would like to keep our install at
>> > >> >> > > "official" versions, even if beta. Of course, since this is a
>> > >> >> > > development / testing cluster, we could bend the rules if
>> > >> >> > > really necessary...
>> > >> >> > >
>> > >> >> > > I have written a small MR job (actually, just "M" job :) that
>> > >> >> > > exports the tables to files (allowing me to use Pig 0.7), but
>> > >> >> > > that is a bit cumbersome and slow.
>> > >> >> > >
>> > >> >> > > If I install the latest Pig (0.8), will it work at all with
>> > >> >> > > HBase 0.20.2? In other words, are scan filters (which were
>> > >> >> > > fixed in 0.20.6) needed as part of user-defined parameters or
>> > >> >> > > as part of Pig optimizations in reading from HBase? Hope my
>> > >> >> > > question makes sense...
>> > >> >> > >
>> > >> >> > > :)
>> > >> >> > >
>> > >> >> > > Thanks again,
>> > >> >> > >
>> > >> >> > > Anze
>> > >> >> > >
>> > >> >> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> > >> >> > >> Anze, the reason we bumped up to 20.6 in the ticket was
>> > >> >> > >> because HBase 20.2 had a bug in it. Ask the HBase folks, but
>> > >> >> > >> I'd say you should upgrade.
>> > >> >> > >> FWIW we upgraded to 20.6 from 20.2 a few months back and it's
>> > >> >> > >> been working smoothly.
>> > >> >> > >>
>> > >> >> > >> The Elephant-Bird hbase loader for pig 0.6 does add row keys
>> > >> >> > >> and most of the other features we added to the built-in
>> > >> >> > >> loader for pig 0.8 (notably, it does not do storage). But I
>> > >> >> > >> don't recommend downgrading to pig 0.6, as 7 and especially 8
>> > >> >> > >> are great improvements to the software.
>> > >> >> > >>
>> > >> >> > >> -D
>> > >> >> > >>
>> > >> >> > >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <[email protected]>
> wrote:
>> > >> >> > >> > Hi all!
>> > >> >> > >> >
>> > >> >> > >> > I am struggling to find a working solution to load data from
>> > >> >> > >> > HBase directly. I am using Cloudera CDH3b3 which comes with
>> > >> >> > >> > Pig 0.7. What would be the easiest way to load data from
>> > >> >> > >> > HBase? If it matters: we need the rows to be included, too.
>> > >> >> > >> >
>> > >> >> > >> > I have checked ElephantBird, but it seems to require Pig
>> > >> >> > >> > 0.6. I could downgrade, but it seems... well... :)
>> > >> >> > >> >
>> > >> >> > >> > On the other hand, loading from HBase with rows is only
>> > >> >> > >> > added in Pig 0.8:
>> > >> >> > >> > https://issues.apache.org/jira/browse/PIG-915
>> > >> >> > >> > https://issues.apache.org/jira/browse/PIG-1205
>> > >> >> > >> > But judging from the last issue Pig 0.8 requires HBase
>> > >> >> > >> > 0.20.6?
>> > >> >> > >> >
>> > >> >> > >> > I can install latest Pig from source if needed, but I'd
>> > >> >> > >> > rather leave Hadoop and HBase at their versions (0.20.2 and
>> > >> >> > >> > 0.89.20100924 respectively).
>> > >> >> > >> >
>> > >> >> > >> > Should I write my own UDF? I'd appreciate some pointers.
>> > >> >> > >> >
>> > >> >> > >> > Thanks,
>> > >> >> > >> >
>> > >> >> > >> > Anze
>
>

Reply via email to