Why does a url with a fetch status of 'fetch_gone' show up as 'db_unfetched'? Shouldn't the crawldb entry have a status of 'db_gone'? This is happening in nutch-1.0
Here is one example of what I'm talking about ========================================= [jkon...@rampage search]$ ./bin/nutch readseg -get testParseSegment/20091202111849 "http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s" Crawl Generate:: Version: 7 Status: 1 (db_unfetched) Fetch time: Fri Nov 27 16:28:09 PST 2009 Modified time: Wed Dec 31 16:00:00 PST 1969 Retries since fetch: 0 Retry interval: 7776000 seconds (90 days) Score: 7.535359E-10 Signature: null Metadata: _ngt_: 1259781530311 Crawl Fetch:: Version: 7 Status: 37 (fetch_gone) Fetch time: Wed Dec 02 12:25:21 PST 2009 Modified time: Wed Dec 31 16:00:00 PST 1969 Retries since fetch: 0 Retry interval: 6998400 seconds (81 days) Score: 2.47059988E10 Signature: null Metadata: _ngt_: 1259781530311_pst_: notfound(14), lastModified=0: http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s [jkon...@rampage search]$ ./bin/nutch readdb testParseSegment/c -url "http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s" URL: http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s Version: 7 Status: 1 (db_unfetched) Fetch time: Sat Apr 03 01:25:21 PDT 2010 Modified time: Wed Dec 31 16:00:00 PST 1969 Retries since fetch: 0 Retry interval: 6998400 seconds (81 days) Score: 2.47059988E10 Signature: null Metadata: _pst_: notfound(14), lastModified=0: http://answers.yahoo.com/question/index?qid=20080802122654AA7qj6s ========================================= Thanks, Jason