Hi, Thanks for the response. I spent the whole day searching for how to use filters to get CrawlDbMerger to delete unwanted urls from the crawldb. Im confused. I dont have a clue how to do this. Please can yu tell me how exactly.
ogjunk-nutch wrote: > > That's a good question, and one that you could add to the FAQ on the Wiki. > From the quick scan of the source code, it doesn't look like you can > delete a URL directly. However, you can filter it out with bin/nutch > mergedb (uses CrawlDbMerger class, so check its javadocs), effectively > removing the URL from CrawlDb. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message ---- >> From: oddaniel <[EMAIL PROTECTED]> >> To: [email protected] >> Sent: Saturday, April 19, 2008 4:20:04 AM >> Subject: Delete Urls from CrawlsDB >> >> >> Is it possible to remove or delete one of the urls that has been crawled >> from >> the crawl database? If this is possible, how can it be done? >> -- >> View this message in context: >> http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p16773512.html >> Sent from the Nutch - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p17010176.html Sent from the Nutch - User mailing list archive at Nabble.com.
