Re: loading from HBase - Pig 0.7

Dmitriy Ryaboy Wed, 27 Oct 2010 00:49:47 -0700

The same way you have /etc/hadoop/conf on the claspath, you want to
put the hbase conf directory on the classpath.


-D

On Tue, Oct 26, 2010 at 11:50 PM, Anze <[email protected]> wrote:
>
>> ... You have all the conf files in PIG_CLASSPATH right?
>
> I think I do:
> ***
> PIG_HOME: /opt/pig/bin/..
> PIG_CONF_DIR: /opt/pig/bin/../conf
> dry run:
> /usr/lib/jvm/java-6-sun/bin/java -Xmx1000m -Dpig.log.dir=/opt/pig/bin/../logs
> -Dpig.log.file=pig.log -Dpig.home.dir=/opt/pig/bin/.. -
> Dpig.root.logger=INFO,console,DRFA -classpath
> /opt/pig/bin/../conf:/usr/lib/jvm/java-6-
> sun/lib/tools.jar:/etc/hadoop/conf:/opt/pig/bin/../build/classes:/opt/pig/bin/../build/test/classes:/opt/pig/bin/../pig-
> *-core.jar:/opt/pig/bin/../build/pig-0.8.0-
> SNAPSHOT.jar:/opt/pig/bin/../lib/automaton.jar:/opt/pig/bin/../lib/hbase-0.20.6.jar:/opt/pig/bin/../lib/hbase-0.20.6-
> test.jar:/opt/pig/bin/../lib/zookeeper-hbase-1329.jar org.apache.pig.Main
> ***
>
> Generated log file contains:
> ***
> Error before Pig is launched
> ----------------------------
> ERROR 2999: Unexpected internal error. Failed to create DataStorage
>
> java.lang.RuntimeException: Failed to create DataStorage
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:212)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:132)
>        at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
>        at org.apache.pig.PigServer.<init>(PigServer.java:225)
>        at org.apache.pig.PigServer.<init>(PigServer.java:214)
>        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>        at org.apache.pig.Main.run(Main.java:450)
>        at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Call to namenode.admundus.com/10.0.0.3:8020
> failed on local exception: java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>        at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>        ... 9 more
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> ================================================================================
>
> And the Pig complains:
> ***
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.conf.Configuration).
> log4j:WARN Please initialize the log4j system properly.
> 2010-10-27 08:46:44,762 [main] INFO  org.apache.pig.Main - Logging error
> messages to: /opt/pig/bin/pig_1288162004754.log
> 2010-10-27 08:46:44,970 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
> hadoop file system at: hdfs://...:8020/
> 2010-10-27 08:46:45,158 [main] ERROR org.apache.pig.Main - ERROR 2999:
> Unexpected internal error. Failed to create DataStorage
> Details at logfile: /opt/pig/bin/pig_1288162004754.log
> ***
>
> Any idea what is wrong? I have searched the net and most answers talk about
> incompatible versions of Hadoop and Pig (but the posts are old).
>
> Thanks,
>
> Anze
>
>
> On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> Yeah pig 8 is not officially released yet, it will be cut at the end
>> of the month or beginning of next month.
>>
>> Failed to create DataStorage sounds vaguely familiar.. can you send
>> the full pig session and the full error? I think it's not connecting
>> to hbase on the client-side, or something along those lines. You have
>> all the conf files in PIG_CLASSPATH right?
>>
>> -D
>>
>> On Tue, Oct 26, 2010 at 6:32 AM, Anze <[email protected]> wrote:
>> > Hmmm, not quite there yet. :-/
>> >
>> > I installed:
>> > - HBase 0.20.6
>> > - Cloudera CDH3b3 Hadoop (0.20.2)
>> > - Pig 0.8 (since official download is empty (?) I fetched the Pig trunk
>> > from SVN and built it)
>> >
>> > Now it complains about "Failed to create DataStorage". Any ideas? Should
>> > I upgrade Haddop too?
>> >
>> > This is getting a bit complicated to install. :)
>> >
>> > I would appreciate some pointers - google revealed nothing useful.
>> >
>> > Thanks,
>> >
>> > Anze
>> >
>> > On Tuesday 26 October 2010, Anze wrote:
>> >> Great! :)
>> >>
>> >> Thanks for helping me out.
>> >>
>> >> All the best,
>> >>
>> >> Anze
>> >>
>> >> On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> >> > I think that you might be able to get away with 20.2 if you don't use
>> >> > the filtering options.
>> >> >
>> >> > On Mon, Oct 25, 2010 at 3:39 PM, Anze <[email protected]> wrote:
>> >> > > Dmitriy, thanks for the answer!
>> >> > >
>> >> > > The problem with upgrading to HBase 0.20.6 is that cloudera doesn't
>> >> > > ship it yet and we would like to keep our install at "official"
>> >> > > versions, even if beta. Of course, since this is a development /
>> >> > > testing cluster, we could bend the rules if really necessary...
>> >> > >
>> >> > > I have written a small MR job (actually, just "M" job :) that
>> >> > > exports the tables to files (allowing me to use Pig 0.7), but that
>> >> > > is a bit cumbersome and slow.
>> >> > >
>> >> > > If I install the latest Pig (0.8), will it work at all with HBase
>> >> > > 0.20.2? In other words, are scan filters (which were fixed in
>> >> > > 0.20.6) needed as part of user-defined parameters or as part of Pig
>> >> > > optimizations in reading from HBase? Hope my question makes
>> >> > > sense...
>> >> > >
>> >> > > :)
>> >> > >
>> >> > > Thanks again,
>> >> > >
>> >> > > Anze
>> >> > >
>> >> > > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
>> >> > >> Anze, the reason we bumped up to 20.6 in the ticket was because
>> >> > >> HBase 20.2 had a bug in it. Ask the HBase folks, but I'd say you
>> >> > >> should upgrade.
>> >> > >> FWIW we upgraded to 20.6 from 20.2 a few months back and it's been
>> >> > >> working smoothly.
>> >> > >>
>> >> > >> The Elephant-Bird hbase loader for pig 0.6 does add row keys and
>> >> > >> most of the other features we added to the built-in loader for pig
>> >> > >> 0.8 (notably, it does not do storage). But I don't recommend
>> >> > >> downgrading to pig 0.6, as 7 and especially 8 are great
>> >> > >> improvements to the software.
>> >> > >>
>> >> > >> -D
>> >> > >>
>> >> > >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <[email protected]> wrote:
>> >> > >> > Hi all!
>> >> > >> >
>> >> > >> > I am struggling to find a working solution to load data from
>> >> > >> > HBase directly. I am using Cloudera CDH3b3 which comes with Pig
>> >> > >> > 0.7. What would be the easiest way to load data from HBase?
>> >> > >> > If it matters: we need the rows to be included, too.
>> >> > >> >
>> >> > >> > I have checked ElephantBird, but it seems to require Pig 0.6. I
>> >> > >> > could downgrade, but it seems... well... :)
>> >> > >> >
>> >> > >> > On the other hand, loading from HBase with rows is only added in
>> >> > >> > Pig 0.8: https://issues.apache.org/jira/browse/PIG-915
>> >> > >> > https://issues.apache.org/jira/browse/PIG-1205
>> >> > >> > But judging from the last issue Pig 0.8 requires HBase 0.20.6?
>> >> > >> >
>> >> > >> > I can install latest Pig from source if needed, but I'd rather
>> >> > >> > leave Hadoop and HBase at their versions (0.20.2 and
>> >> > >> > 0.89.20100924 respectively).
>> >> > >> >
>> >> > >> > Should I write my own UDF? I'd appreciate some pointers.
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> >
>> >> > >> > Anze
>
>

Re: loading from HBase - Pig 0.7

Reply via email to