That's a good question, and one that you could add to the FAQ on the Wiki. From the quick scan of the source code, it doesn't look like you can delete a URL directly. However, you can filter it out with bin/nutch mergedb (uses CrawlDbMerger class, so check its javadocs), effectively removing the URL from CrawlDb.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: oddaniel <[EMAIL PROTECTED]> > To: [email protected] > Sent: Saturday, April 19, 2008 4:20:04 AM > Subject: Delete Urls from CrawlsDB > > > Is it possible to remove or delete one of the urls that has been crawled from > the crawl database? If this is possible, how can it be done? > -- > View this message in context: > http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p16773512.html > Sent from the Nutch - User mailing list archive at Nabble.com.
