The webdb and the segments are two separate things. The webdb
is basically used by fetcher to keep track of the status of the URL
(like last fetch time, was there an error). The segments contain
the data from the fetches themselves, and also the data's index,
which is used during searches.

So you've deleted the page from the webdb, and now you need
to remove it from the index. You can use the PruneIndexTool
to do this. Here's a link for more info:

http://lucene.apache.org/nutch/apidocs/org/apache/nutch/tools/PruneIndexTool.html

Howie

Does anyone know how to force a page to be deleted. I have run the
WebDBWriter class and removed the page from the database but it still
shows on the search? Further checks using WebDBReader give a 'null'
response when looking for the page.


Reply via email to