There was a similar question just a couple of days/weeks ago. In short: Write an URLFilter that matches the urls to be deleted. Use the CrawlDbReader tool to view the crawldb. (Both dumping into a textfile and reading ad hoc urls).

On 11/14/2011 02:20 PM, mina wrote:
i crawl sites with nutch 1.3, now i want delete a url from crawldb, how can i
do this? how i can see urls in crawldb?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/remove-crawled-url-from-crawldb-in-nutch-1-3-tp3506810p3506810.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to