I tried the below code with the Avro -1.3.3.jar included. The BinaryDecoder 
shows as deprecated. If I use the below code, I get the first value as 1 and 
second as 0. This doesn't seem to be the correct value. So I tried using the 
DecoderFactory as DecoderFactory.get().binaryDecoder(byteValue,null) but it 
says get is not defined. What do you think is going wrong here ?
Also, what other way is there to decode this if not manual. Can we use gora for 
this ?
Sorry if this is the wrong place to post. I could post in the avro or gora or 
hbase list if you guys feel that's more appropriate.

Thanks.

-----Original Message-----
From: Ferdy Galema [mailto:[email protected]] 
Sent: Thursday, May 30, 2013 1:16 AM
To: [email protected]
Subject: Re: Extracting status code from hbase

Just to add how to manually decode ProtocolStatus:

ByteArrayInputStream bis = new ByteArrayInputStream(bytes); BinaryDecoder bd = 
new BinaryDecoder(bis); System.out.print(bd.readInt()); //first value is an 
Integer //second value is an array of for(long i = bd.readArrayStart(); i != 0; 
i = bd.arrayNext()) {
  for (long j = 0; j < i; j++) {
    System.out.print(bd.readString(null).toString());
  }
}
System.out.print(bd.readLong()); //last value is a Long



On Thu, May 30, 2013 at 10:10 AM, Ferdy Galema <[email protected]>wrote:

> Hi,
>
> I might be able to shed some light on this.
>
> Every row in HBase webtable is a serialized 
> org.apache.nutch.storage.WebPage. When you use Gora related wrappers 
> (like the various Nutch jobs such as ParserJob, FetcherJob) you use 
> the Avro schemas so you don't have to think about how it is encoded. 
> You get a WebPage object that has all fields that you specified on the input 
> of a Job.
>
> When you want to bypass this for some reason, you can decode the 
> fields
> manually:
>
> The bytes of the columns are encoded in a specific manner. I see that 
> your testcode simply tries to interpret every value as a UTF-8 encoded string.
> That is what Bytes.toString(byte[]) assumes. Although this works for 
> certain fields because they actually are UTF-8 encoded strings, some 
> values are encoded differently. HBaseByteInterface (in the Gora 
> project) shows the different encodings. For example, the status field 
> f:st "status" is an Integer (as indicated in WebPage class) that is 
> encoded as
> Bytes.toBytes(int) and should be decoded accordingly with 
> Bytes.toInt(byte[]). The difficult fields are f:prot "ProtocolStatus" 
> and p:st "ParseStatus" are Avro records (so they have a schema of their own).
> To decode those, you can use the same code as in HBaseByteInterface. 
> That is, a combination of SpecificDatumReader and BinaryDecoder.
>
> Good luck.
>
>
>
>
> On Thu, May 30, 2013 at 4:12 AM, Shah, Nishant <[email protected]> wrote:
>
>> Thanks for the reply. Will look into your suggestions.
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney [mailto:[email protected]]
>> Sent: Wednesday, May 29, 2013 7:09 PM
>> To: [email protected]
>> Subject: Re: Extracting status code from hbase
>>
>> OH, BTW I meant to refer you to the test in line 178 of [0].
>> testPutNested hth Lewis
>>
>>
>> On Wed, May 29, 2013 at 7:07 PM, Lewis John Mcgibbney < 
>> [email protected]> wrote:
>>
>> > This is most certainly better aimed at either Gora or HBase lists.
>> > Obtaining better (and consistent) understanding and of course 
>> > abstracting users from such data structures is what we have been 
>> > addressing in current Gora development. (See GORA-174) You will 
>> > want to look specifically at some of the testing we do for this 
>> > stuff over in Goran namely in [0-1].
>> > Specifically, the Query API in Gora for some data store 
>> > implementations could `probably` do with some attention... so 
>> > please voice you opinion over on user@gora if it tickles your fancy.
>> > Thanks
>> > Lewis
>> >
>> >
>> > [0]
>> > http://svn.apache.org/viewvc/gora/trunk/gora-core/src/test/java/org
>> > /ap ache/gora/store/DataStoreTestBase.java?view=markup
>> > [1]
>> > http://svn.apache.org/viewvc/gora/trunk/gora-core/src/examples/avro
>> > /we
>> > bpage.json?view=markup
>> >
>> >
>> > On Wed, May 29, 2013 at 3:55 PM, Shah, Nishant <[email protected]>
>> wrote:
>> >
>> >> Hi Everyone,
>> >>
>> >> I got my error. I was trying to use toString for a field which is 
>> >> int or float or long. But this leads me to another question.
>> >> The protocol status is a nested structure. Similar to parseStatus.
>> >> How could we parse these to get the individual majorcode,
>> minorcode,args ?
>> >> Also, how to detect if a url has returned a 404, or 200 or any 
>> >> other status code ?
>> >> Thanks.
>> >>
>> >> -----Original Message-----
>> >> From: Shah, Nishant
>> >> Sent: Wednesday, May 29, 2013 1:51 PM
>> >> To: [email protected]
>> >> Subject: Extracting status code from hbase
>> >>
>> >> Hi Everyone,
>> >>
>> >> I have my Nutch 2.1 setup with Hbase. Once I am done with the 
>> >> crawl, I want to extract all the information from the column family 'f'.
>> >> For this I do,
>> >>
>> >> Scan s = new Scan();
>> >> ResultScanner scanner = table.getScanner(s); try { // Scanners 
>> >> return Result instances.
>> >> // Now, for the actual iteration. One way is to use a while loop 
>> >> // like
>> >> so:
>> >> for (Result rr = scanner.next(); rr != null; rr = scanner.next()) 
>> >> { // print out the row we found and the columns we were looking // 
>> >> for System.out.println("Found row: " + rr); String[] 
>> >> rrs=getColumnsInColumnFamily(rr,"f");
>> >> NavigableMap familyMap = rr.getFamilyMap(Bytes.toBytes("f"));
>> >> Iterator entries = familyMap.entrySet().iterator(); 
>> >> while(entries.hasNext()){
>> >>
>> >> Entry thisEntry = (Entry) entries.next(); Object key = 
>> >> thisEntry.getKey(); Object val = thisEntry.getValue();
>> >> System.out.println(Bytes.toString((byte[])
>> >> key)+"="+Bytes.toString((byte[]) val)); }
>> >>
>> >> The value for status is blank. It's not null, but blank. Same is 
>> >> the case with headers. 'mtdt' family and rest of the 'f' family is fine.
>> >> Can anyone suggest why this is happening ?
>> >> Thanks,
>> >> Nishant
>> >>
>> >
>> >
>> >
>> > --
>> > *Lewis*
>> >
>>
>>
>>
>> --
>> *Lewis*
>>
>
>
>
> --
> *Ferdy Galema*
> Kalooga Development
>



--
*Ferdy Galema*
Kalooga Development

-- 

*Kalooga* | Visual RelevanceCheck out our Visual Gallery Layer 
now!<http://www.independent.co.uk/arts-entertainment/music/news/david-cameron-gets-teenage-kicks-starring-in-one-direction-music-video-8499282.html#!kalooga-10369/%22One%20Direction%22>
Kalooga

Helperpark 288
9723 ZA Groningen
The Netherlands
+31 50 2103400

www.kalooga.com
[email protected] EMEA

53 Davies Street
W1K 5JH London
United Kingdom
+44 20 7129 1430Kalooga Spain and LatAM

Maria de Sevilla Diago No 3
28022 Madrid - Madrid
Spain
+34 670 580 872

Reply via email to