Thanks a lot
1.Is there a document discribe the column symbols (likecolumn=s:s )? There
are a lot symbols I can not understand.
2.I find a problem in my new nutch project, when only run the generate job
in eclipse after inject job, there is no more "column=mk:_gnmrk_" in the
hbase, but when I use the nutch2.1, after generate job, there will put this
column in. Is this changed?
Thanks
HeChuan
At 2013-06-14 02:00:24,"Tejas Patil" <[email protected]> wrote:
>row-key = url
>
>column=f:fi : fetchInterval (the delay between re-fetches of a
>page)
>column=f:ts : fetchTime (indicates when the url will be elligible
>for fetching)
>column=mk:_injmrk_ : markers
>column=mk:dist
>column=mtdt:_csh_ : metadata
>column=s:s : status (is the url fetched, unfetched, newly
>injected, gone, redirected etc..)
>
>
>
>On Thu, Jun 13, 2013 at 6:40 AM, RS <[email protected]> wrote:
>
>> I do not what is sotred in the hbase after inject a website.
>> When I use the hbase shell $ scan 'webpage' , there are :
>> hbase(main):028:0> scan '1_webpage'
>> ROW COLUMN+CELL
>> com.xinhuanet.www:http/ column=f:fi, timestamp=1371110099941,
>> value=\x00'\x8D\x00
>> com.xinhuanet.www:http/ column=f:ts, timestamp=1371110099941,
>> value=\x00\x00\x01?<\x87\xBA\x0A
>> com.xinhuanet.www:http/ column=mk:_injmrk_,
>> timestamp=1371110099941, value=y
>> com.xinhuanet.www:http/ column=mk:dist,
>> timestamp=1371110099941, value=0
>> com.xinhuanet.www:http/ column=mtdt:_csh_,
>> timestamp=1371110099941, value=?\x80\x00\x00
>> com.xinhuanet.www:http/ column=s:s, timestamp=1371110099941,
>> value=?\x80\x00\x00
>> 1 row(s) in 0.0300 seconds
>>
>>
>> So, is only 6 column are setted in the hbase ? And what is the real data
>> stored in it?
>> I find that in the source code, there is a WebPage Class. I could not
>> understand all, but I think there should be 24 fileds in the hbase for each
>> webside.
>> public static final String[] _ALL_FIELDS =
>> {"baseUrl","status","fetchTime","prevFetchTime","fetchInterval","retriesSinceFetch","modifiedTime","prevModifiedTime","protocolStatus","content","contentType","prevSignature","signature","title","text","parseStatus","score","reprUrl","headers","outlinks","inlinks","markers","metadata","batchId",};
>>
>>
>> Thanks
>> HeChuan
>>
>>