(fwiw, HBaseStorage works fine for me when I use it to pull whole protocol buffer messages down as byte arrays)
On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <[email protected]> wrote: > That's interesting... we should be able to return a byte array properly > (though this is a bit risky for people who try to later turn this bytearray > into a long using Pig, since the conversion from bytes to longs in Pig is > different than in HBase). > > Could you guys open a jira, preferably with an easy way to reproduce the > error? > > D > > > On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <[email protected]> wrote: > >> My load looks like this >> >> .... AS (key:chararray, value:long); >> >> and I'm able to return data. >> >> I changed the load to >> >> .... AS (key:chararray, value:bytearray); >> >> and had results that match yours. >> >> Try changing the value to long or int type and see if that helps. >> >> -bp >> >> >> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <[email protected]> wrote: >> >> > the 'funny' thing is that if I look at the other CF name (from an byte >> id >> > gives the name, reverse way) : >> > >> > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using >> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics', >> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray, >> > metrics:bytearray); >> > >> > I've got the same issue: >> > (,proc.loadavg.1m) >> > (,proc.loadavg.5m) >> > (,Measurement_1) >> > (,Measurement_2) >> > (,Measurement_3) >> > >> > So there is a real issue with byte array.... >> > >> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <[email protected]> wrote: >> > >> > > Hello Bryce, >> > > >> > > not better... :-( >> > > >> > > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', >> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray, >> > > metrics:bytearray); >> > > grunt> dump tsd_metrics2; >> > > >> > > [...] >> > > >> > > (Measurement_1,) >> > > (Measurement_2,) >> > > (Measurement_3,) >> > > (proc.loadavg.1m,) >> > > (proc.loadavg.5m,) >> > > >> > > >> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <[email protected]> wrote: >> > > >> > >> Try adding -caster=HBaseBinaryConverter along with loadKey >> > >> >> > >> '-caster=HBaseBinaryConverter -loadKey=true' >> > >> >> > >> -bp >> > >> >> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <[email protected]> wrote: >> > >> >> > >> > Hello Norbert, >> > >> > >> > >> > Unfortunately, same result : >> > >> > (Measurement_1,) >> > >> > (Measurement_2,) >> > >> > (Measurement_3,) >> > >> > (proc.loadavg.1m,) >> > >> > (proc.loadavg.5m,) >> > >> > >> > >> > the row key is well extracted (Measurement_1 for example) but the >> > value, >> > >> > the >> > >> > id I need for timestamp data querying, the bytearray, is not :( >> > >> > >> > >> > shazz >> > >> > >> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger < >> > >> [email protected] >> > >> > >wrote: >> > >> > >> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <[email protected]> >> > wrote: >> > >> > > > So from Pig when I want to retrieve only the metrics and their >> > value >> > >> (= >> > >> > > id >> > >> > > > for the data table) I do : >> > >> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using >> > >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', >> > >> > '-loadKey >> > >> > > > true') AS (metrics:bytearray); >> > >> > > > dump tsd_metrics; >> > >> > > >> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then >> your >> > >> > > LOAD schema includes an extra column containing the row key, and >> you >> > >> > > should add equivalent to your schema column mapping (the AS >> clause). >> > >> > > Try the following: >> > >> > > >> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using >> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', >> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray); >> > >> > > >> > >> > > Norbert >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >
