Thanks Dmitriy !

Effectively it works using the caster AND (defining value OR metrics as
long)
grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:long);

I don't really understand why the HBaseStorage LoadFunc considers that
cf:qualifier == value but why not....I'll look in the code :)
I'll try to setup an easy way t oreproduce it and I'll jira it.

btw, I'm not sure I understood your last comment, how did you do to pull
bytearrays so ?

shazz


On Tue, Sep 6, 2011 at 7:10 PM, Dmitriy Ryaboy <[email protected]> wrote:

> (fwiw, HBaseStorage works fine for me when I use it to pull whole protocol
> buffer messages down as byte arrays)
>
> On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <[email protected]>
> wrote:
>
> > That's interesting... we should be able to return a byte array properly
> > (though this is a bit risky for people who try to later turn this
> bytearray
> > into a long using Pig, since the conversion from bytes to longs in Pig is
> > different than in HBase).
> >
> > Could you guys open a jira, preferably with an easy way to reproduce the
> > error?
> >
> > D
> >
> >
> > On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <[email protected]> wrote:
> >
> >> My load looks like this
> >>
> >> .... AS (key:chararray, value:long);
> >>
> >> and I'm able to return data.
> >>
> >> I changed the load to
> >>
> >> .... AS (key:chararray, value:bytearray);
> >>
> >> and had results that match yours.
> >>
> >> Try changing the value to long or int type and see if that helps.
> >>
> >> -bp
> >>
> >>
> >> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <[email protected]> wrote:
> >>
> >> > the 'funny' thing is that if I look at the other CF name (from an byte
> >> id
> >> > gives the name, reverse way) :
> >> >
> >> > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> >> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> >> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > metrics:bytearray);
> >> >
> >> > I've got the same issue:
> >> > (,proc.loadavg.1m)
> >> > (,proc.loadavg.5m)
> >> > (,Measurement_1)
> >> > (,Measurement_2)
> >> > (,Measurement_3)
> >> >
> >> > So there is a real issue with byte array....
> >> >
> >> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <[email protected]> wrote:
> >> >
> >> > > Hello Bryce,
> >> > >
> >> > > not better... :-(
> >> > >
> >> > > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > > metrics:bytearray);
> >> > > grunt> dump tsd_metrics2;
> >> > >
> >> > > [...]
> >> > >
> >> > > (Measurement_1,)
> >> > > (Measurement_2,)
> >> > > (Measurement_3,)
> >> > > (proc.loadavg.1m,)
> >> > > (proc.loadavg.5m,)
> >> > >
> >> > >
> >> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <[email protected]> wrote:
> >> > >
> >> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
> >> > >>
> >> > >> '-caster=HBaseBinaryConverter -loadKey=true'
> >> > >>
> >> > >> -bp
> >> > >>
> >> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <[email protected]>
> wrote:
> >> > >>
> >> > >> > Hello Norbert,
> >> > >> >
> >> > >> > Unfortunately, same result :
> >> > >> > (Measurement_1,)
> >> > >> > (Measurement_2,)
> >> > >> > (Measurement_3,)
> >> > >> > (proc.loadavg.1m,)
> >> > >> > (proc.loadavg.5m,)
> >> > >> >
> >> > >> > the row key is well extracted (Measurement_1 for example) but the
> >> > value,
> >> > >> > the
> >> > >> > id I need for timestamp data querying, the bytearray, is not :(
> >> > >> >
> >> > >> > shazz
> >> > >> >
> >> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> >> > >> [email protected]
> >> > >> > >wrote:
> >> > >> >
> >> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <[email protected]>
> >> > wrote:
> >> > >> > > > So from Pig when I want to retrieve only the metrics and
> their
> >> > value
> >> > >> (=
> >> > >> > > id
> >> > >> > > > for the data table) I do :
> >> > >> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> >> > >> > > >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > '-loadKey
> >> > >> > > > true') AS (metrics:bytearray);
> >> > >> > > > dump tsd_metrics;
> >> > >> > >
> >> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
> >> your
> >> > >> > > LOAD schema includes an extra column containing the row key,
> and
> >> you
> >> > >> > > should add equivalent to your schema column mapping (the AS
> >> clause).
> >> > >> > > Try the following:
> >> > >> > >
> >> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >> > >> > >
> >> > >> > > Norbert
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to