There was a similar question just a couple of days/weeks ago. In short:
Write an URLFilter that matches the urls to be deleted. Use the
CrawlDbReader tool to view the crawldb. (Both dumping into a textfile
and reading ad hoc urls).
On 11/14/2011 02:20 PM, mina wrote:
i crawl sites with nutch 1.3, now i want delete a url from crawldb, how can i
do this? how i can see urls in crawldb?
--
View this message in context:
http://lucene.472066.n3.nabble.com/remove-crawled-url-from-crawldb-in-nutch-1-3-tp3506810p3506810.html
Sent from the Nutch - User mailing list archive at Nabble.com.