Re: loading from HBase - Pig 0.7

Anze Mon, 25 Oct 2010 15:39:53 -0700

Dmitriy, thanks for the answer!

The problem with upgrading to HBase 0.20.6 is that cloudera doesn't ship it 
yet and we would like to keep our install at "official" versions, even if 
beta. Of course, since this is a development / testing cluster, we could bend 
the rules if really necessary...


I have written a small MR job (actually, just "M" job :) that exports the 
tables to files (allowing me to use Pig 0.7), but that is a bit cumbersome and 
slow.

If I install the latest Pig (0.8), will it work at all with HBase 0.20.2? 
In other words, are scan filters (which were fixed in 0.20.6) needed as part 
of user-defined parameters or as part of Pig optimizations in reading from 
HBase? Hope my question makes sense... :)

Thanks again,

Anze


On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
> Anze, the reason we bumped up to 20.6 in the ticket was because HBase
> 20.2 had a bug in it. Ask the HBase folks, but I'd say you should
> upgrade.
> FWIW we upgraded to 20.6 from 20.2 a few months back and it's been
> working smoothly.
> 
> The Elephant-Bird hbase loader for pig 0.6 does add row keys and most
> of the other features we added to the built-in loader for pig 0.8
> (notably, it does not do storage). But I don't recommend downgrading
> to pig 0.6, as 7 and especially 8 are great improvements to the
> software.
> 
> -D
> 
> On Mon, Oct 25, 2010 at 7:01 AM, Anze <[email protected]> wrote:
> > Hi all!
> > 
> > I am struggling to find a working solution to load data from HBase
> > directly. I am using Cloudera CDH3b3 which comes with Pig 0.7. What
> > would be the easiest way to load data from HBase?
> > If it matters: we need the rows to be included, too.
> > 
> > I have checked ElephantBird, but it seems to require Pig 0.6. I could
> > downgrade, but it seems... well... :)
> > 
> > On the other hand, loading from HBase with rows is only added in Pig 0.8:
> > https://issues.apache.org/jira/browse/PIG-915
> > https://issues.apache.org/jira/browse/PIG-1205
> > But judging from the last issue Pig 0.8 requires HBase 0.20.6?
> > 
> > I can install latest Pig from source if needed, but I'd rather leave
> > Hadoop and HBase at their versions (0.20.2 and 0.89.20100924
> > respectively).
> > 
> > Should I write my own UDF? I'd appreciate some pointers.
> > 
> > Thanks,
> > 
> > Anze

Re: loading from HBase - Pig 0.7

Reply via email to