Nils Hoeller wrote:
is there a way to index the whole WebDB,
which means the normal sites  that have been indexed
+ the sites that are one depth deeper and so beeing only stored in the WebDB

This is supposed to be possible, but I think no one has tried this in a while and fear it may no longer work.

If you specify '-refetchonly' when you generate your fetchlist then it should generate a fetchlist with fetch=false entries for all of the previously unfetched pages. Then the fetcher should pass these through to the output with null content, and the indexer should index the url and incoming anchor texts.

But glancing at the current code it looks like IndexSegment.java does not index entries with ProtocolStatus.NOTFETCHING.

If you desire this behavior, please file a bug report. Also, please try to patch IndexSegment.java so that it does index these entries. If this works, please attach your patch to the bug report.

Doug


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to