Re: Delete Urls from CrawlsDB

ogjunk-nutch Tue, 22 Apr 2008 20:46:53 -0700

That's a good question, and one that you could add to the FAQ on the Wiki.  
From the quick scan of the source code, it doesn't look like you can delete a 
URL directly.  However, you can filter it out with bin/nutch mergedb (uses 
CrawlDbMerger class, so check its javadocs), effectively removing the URL from 
CrawlDb.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: oddaniel <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Saturday, April 19, 2008 4:20:04 AM
> Subject: Delete Urls from CrawlsDB
> 
> 
> Is it possible to remove or delete one of the urls that has been crawled from
> the crawl database? If this is possible, how can it be done?
> -- 
> View this message in context: 
> http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p16773512.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Delete Urls from CrawlsDB

Reply via email to