Re: loading from HBase - Pig 0.7

Anze Tue, 26 Oct 2010 02:50:38 -0700

Great! :)

Thanks for helping me out.


All the best,

Anze

On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
> I think that you might be able to get away with 20.2 if you don't use
> the filtering options.
> 
> On Mon, Oct 25, 2010 at 3:39 PM, Anze <[email protected]> wrote:
> > Dmitriy, thanks for the answer!
> > 
> > The problem with upgrading to HBase 0.20.6 is that cloudera doesn't ship
> > it yet and we would like to keep our install at "official" versions,
> > even if beta. Of course, since this is a development / testing cluster,
> > we could bend the rules if really necessary...
> > 
> > I have written a small MR job (actually, just "M" job :) that exports the
> > tables to files (allowing me to use Pig 0.7), but that is a bit
> > cumbersome and slow.
> > 
> > If I install the latest Pig (0.8), will it work at all with HBase 0.20.2?
> > In other words, are scan filters (which were fixed in 0.20.6) needed as
> > part of user-defined parameters or as part of Pig optimizations in
> > reading from HBase? Hope my question makes sense... :)
> > 
> > Thanks again,
> > 
> > Anze
> > 
> > On Tuesday 26 October 2010, Dmitriy Ryaboy wrote:
> >> Anze, the reason we bumped up to 20.6 in the ticket was because HBase
> >> 20.2 had a bug in it. Ask the HBase folks, but I'd say you should
> >> upgrade.
> >> FWIW we upgraded to 20.6 from 20.2 a few months back and it's been
> >> working smoothly.
> >> 
> >> The Elephant-Bird hbase loader for pig 0.6 does add row keys and most
> >> of the other features we added to the built-in loader for pig 0.8
> >> (notably, it does not do storage). But I don't recommend downgrading
> >> to pig 0.6, as 7 and especially 8 are great improvements to the
> >> software.
> >> 
> >> -D
> >> 
> >> On Mon, Oct 25, 2010 at 7:01 AM, Anze <[email protected]> wrote:
> >> > Hi all!
> >> > 
> >> > I am struggling to find a working solution to load data from HBase
> >> > directly. I am using Cloudera CDH3b3 which comes with Pig 0.7. What
> >> > would be the easiest way to load data from HBase?
> >> > If it matters: we need the rows to be included, too.
> >> > 
> >> > I have checked ElephantBird, but it seems to require Pig 0.6. I could
> >> > downgrade, but it seems... well... :)
> >> > 
> >> > On the other hand, loading from HBase with rows is only added in Pig
> >> > 0.8: https://issues.apache.org/jira/browse/PIG-915
> >> > https://issues.apache.org/jira/browse/PIG-1205
> >> > But judging from the last issue Pig 0.8 requires HBase 0.20.6?
> >> > 
> >> > I can install latest Pig from source if needed, but I'd rather leave
> >> > Hadoop and HBase at their versions (0.20.2 and 0.89.20100924
> >> > respectively).
> >> > 
> >> > Should I write my own UDF? I'd appreciate some pointers.
> >> > 
> >> > Thanks,
> >> > 
> >> > Anze

Re: loading from HBase - Pig 0.7

Reply via email to