I do not what is sotred in the hbase after inject a website.
When I use the hbase shell $ scan 'webpage' , there are :
hbase(main):028:0> scan '1_webpage'
ROW COLUMN+CELL
com.xinhuanet.www:http/ column=f:fi, timestamp=1371110099941,
value=\x00'\x8D\x00
com.xinhuanet.www:http/ column=f:ts, timestamp=1371110099941,
value=\x00\x00\x01?<\x87\xBA\x0A
com.xinhuanet.www:http/ column=mk:_injmrk_,
timestamp=1371110099941, value=y
com.xinhuanet.www:http/ column=mk:dist, timestamp=1371110099941,
value=0
com.xinhuanet.www:http/ column=mtdt:_csh_,
timestamp=1371110099941, value=?\x80\x00\x00
com.xinhuanet.www:http/ column=s:s, timestamp=1371110099941,
value=?\x80\x00\x00
1 row(s) in 0.0300 seconds
So, is only 6 column are setted in the hbase ? And what is the real data stored
in it?
I find that in the source code, there is a WebPage Class. I could not
understand all, but I think there should be 24 fileds in the hbase for each
webside.
public static final String[] _ALL_FIELDS =
{"baseUrl","status","fetchTime","prevFetchTime","fetchInterval","retriesSinceFetch","modifiedTime","prevModifiedTime","protocolStatus","content","contentType","prevSignature","signature","title","text","parseStatus","score","reprUrl","headers","outlinks","inlinks","markers","metadata","batchId",};
Thanks
HeChuan