Why does a url with a fetch status of 'fetch_gone' show up as
'db_unfetched'? Shouldn't the crawldb entry have a status of
'db_gone'? This is happening in nutch-1.0

Here is one example of what I'm talking about
=========================================
[jkon...@rampage search]$ ./bin/nutch readseg -get
testParseSegment/20091202111849
"http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s";
Crawl Generate::
Version: 7
Status: 1 (db_unfetched)
Fetch time: Fri Nov 27 16:28:09 PST 2009
Modified time: Wed Dec 31 16:00:00 PST 1969
Retries since fetch: 0
Retry interval: 7776000 seconds (90 days)
Score: 7.535359E-10
Signature: null
Metadata: _ngt_: 1259781530311

Crawl Fetch::
Version: 7
Status: 37 (fetch_gone)
Fetch time: Wed Dec 02 12:25:21 PST 2009
Modified time: Wed Dec 31 16:00:00 PST 1969
Retries since fetch: 0
Retry interval: 6998400 seconds (81 days)
Score: 2.47059988E10
Signature: null
Metadata: _ngt_: 1259781530311_pst_: notfound(14), lastModified=0:
http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s

[jkon...@rampage search]$ ./bin/nutch readdb testParseSegment/c -url
"http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s";
URL: http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s
Version: 7
Status: 1 (db_unfetched)
Fetch time: Sat Apr 03 01:25:21 PDT 2010
Modified time: Wed Dec 31 16:00:00 PST 1969
Retries since fetch: 0
Retry interval: 6998400 seconds (81 days)
Score: 2.47059988E10
Signature: null
Metadata: _pst_: notfound(14), lastModified=0:
http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s
=========================================


Thanks,
  Jason

Reply via email to