Hi Everyone, I got my error. I was trying to use toString for a field which is int or float or long. But this leads me to another question. The protocol status is a nested structure. Similar to parseStatus. How could we parse these to get the individual majorcode, minorcode,args ? Also, how to detect if a url has returned a 404, or 200 or any other status code ? Thanks.
-----Original Message----- From: Shah, Nishant Sent: Wednesday, May 29, 2013 1:51 PM To: [email protected] Subject: Extracting status code from hbase Hi Everyone, I have my Nutch 2.1 setup with Hbase. Once I am done with the crawl, I want to extract all the information from the column family 'f'. For this I do, Scan s = new Scan(); ResultScanner scanner = table.getScanner(s); try { // Scanners return Result instances. // Now, for the actual iteration. One way is to use a while loop // like so: for (Result rr = scanner.next(); rr != null; rr = scanner.next()) { // print out the row we found and the columns we were looking // for System.out.println("Found row: " + rr); String[] rrs=getColumnsInColumnFamily(rr,"f"); NavigableMap familyMap = rr.getFamilyMap(Bytes.toBytes("f")); Iterator entries = familyMap.entrySet().iterator(); while(entries.hasNext()){ Entry thisEntry = (Entry) entries.next(); Object key = thisEntry.getKey(); Object val = thisEntry.getValue(); System.out.println(Bytes.toString((byte[]) key)+"="+Bytes.toString((byte[]) val)); } The value for status is blank. It's not null, but blank. Same is the case with headers. 'mtdt' family and rest of the 'f' family is fine. Can anyone suggest why this is happening ? Thanks, Nishant

