Re: How to remove domain from Nutch DB

Markus Jelsma Mon, 20 Jun 2011 08:28:15 -0700

Updating the crawldb with all segments should work. Don't forget the -filter 
option.


On Monday 20 June 2011 16:54:12 Dietrich wrote:
> How can one remove documents from a specific domain from an existing Nutch
> db? Addding a filter to regex-urlfilter.txt seems to prevent them from
> being added to the linkDb, but documents already in there are not
> affected at all, and I could not see how else to do this.
> It can't possibly be that I have to completely recreate the crawl folder,
> is it?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: How to remove domain from Nutch DB

Reply via email to