The webdb and the segments are two separate things. The webdb
is basically used by fetcher to keep track of the status of the URL
(like last fetch time, was there an error). The segments contain
the data from the fetches themselves, and also the data's index,
which is used during searches.

So you've deleted the page from the webdb, and now you need
to remove it from the index. You can use the PruneIndexTool
to do this. Here's a link for more info:

http://lucene.apache.org/nutch/apidocs/org/apache/nutch/tools/PruneIndexTool.html

Howie

>Does anyone know how to force a page to be deleted. I have run the
>WebDBWriter class and removed the page from the database but it still
>shows on the search? Further checks using WebDBReader give a 'null'
>response when looking for the page.



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to