Nils Hoeller wrote:
is there a way to index the whole WebDB,
which means the normal sites that have been indexed
+ the sites that are one depth deeper and so
beeing only stored in the WebDB
This is supposed to be possible, but I think no one has tried this in a
while and fear it may no longer work.
If you specify '-refetchonly' when you generate your fetchlist then it
should generate a fetchlist with fetch=false entries for all of the
previously unfetched pages. Then the fetcher should pass these through
to the output with null content, and the indexer should index the url
and incoming anchor texts.
But glancing at the current code it looks like IndexSegment.java does
not index entries with ProtocolStatus.NOTFETCHING.
If you desire this behavior, please file a bug report. Also, please try
to patch IndexSegment.java so that it does index these entries. If this
works, please attach your patch to the bug report.
Doug
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers