Thanks a lot
    1.Is there a document discribe the column symbols (likecolumn=s:s )?  There 
are a lot symbols I can not understand.
    2.I find a problem in my new nutch project, when only run the generate job 
in eclipse after inject job, there is no more  "column=mk:_gnmrk_" in the 
hbase, but when I use the  nutch2.1,  after generate job, there will put this 
column in. Is this changed?

Thanks
HeChuan


At 2013-06-14 02:00:24,"Tejas Patil" <[email protected]> wrote:
>row-key = url
>
>column=f:fi          : fetchInterval (the delay between re-fetches of a
>page)
>column=f:ts          : fetchTime (indicates when the url will be elligible
>for fetching)
>column=mk:_injmrk_   : markers
>column=mk:dist
>column=mtdt:_csh_    : metadata
>column=s:s           : status (is the url fetched, unfetched, newly
>injected, gone, redirected etc..)
>
>
>
>On Thu, Jun 13, 2013 at 6:40 AM, RS <[email protected]> wrote:
>
>> I do not what is sotred in the hbase after inject a website.
>> When I use the hbase shell  $ scan 'webpage'  , there are :
>> hbase(main):028:0> scan '1_webpage'
>> ROW                                  COLUMN+CELL
>>  com.xinhuanet.www:http/             column=f:fi, timestamp=1371110099941,
>> value=\x00'\x8D\x00
>>  com.xinhuanet.www:http/             column=f:ts, timestamp=1371110099941,
>> value=\x00\x00\x01?<\x87\xBA\x0A
>>  com.xinhuanet.www:http/             column=mk:_injmrk_,
>> timestamp=1371110099941, value=y
>>  com.xinhuanet.www:http/             column=mk:dist,
>> timestamp=1371110099941, value=0
>>  com.xinhuanet.www:http/             column=mtdt:_csh_,
>> timestamp=1371110099941, value=?\x80\x00\x00
>>  com.xinhuanet.www:http/             column=s:s, timestamp=1371110099941,
>> value=?\x80\x00\x00
>> 1 row(s) in 0.0300 seconds
>>
>>
>> So, is only 6 column are setted in the hbase ? And what is the real data
>> stored in it?
>> I find that in the source code, there is a WebPage Class.  I could not
>> understand all, but I think there should be 24 fileds in the hbase for each
>> webside.
>>   public static final String[] _ALL_FIELDS =
>> {"baseUrl","status","fetchTime","prevFetchTime","fetchInterval","retriesSinceFetch","modifiedTime","prevModifiedTime","protocolStatus","content","contentType","prevSignature","signature","title","text","parseStatus","score","reprUrl","headers","outlinks","inlinks","markers","metadata","batchId",};
>>
>>
>> Thanks
>> HeChuan
>>
>>

Reply via email to