Order is important when defining rules in the urlfilter files. The url will filtered/unfiltered according to the first pattern in the file that is encountered.
> I have tried using the crawl-urlfilter.txt. > > +^http://([a-z0-9]*\.)* > -^http://([a-z0-9]*?\.)*remita.net I think you want -^http://([a-z0-9]*?\.)*remita.net +^http://([a-z0-9]*\.)* Howie > From: [EMAIL PROTECTED] > To: [email protected] > Subject: 答复: Someone Please respond ... Deleting Urls already crawled from > the crawlDB > Date: Mon, 5 May 2008 14:12:20 +0800 > > Please try "CrawlDbMerger", > > This tool merges several CrawlDb-s into one, optionally filtering URLs > through the current URLFilters, to skip prohibited pages. > > It's possible to use this tool just for filtering - in that case only one > CrawlDb should be specified in arguments. > > > -----邮件原件----- > 发件人: oddaniel [mailto:[EMAIL PROTECTED] > 发送时间: 2008年5月5日 13:27 > 收件人: [email protected] > 主题: Someone Please respond ... Deleting Urls already crawled from the > crawlDB > > > Guys i have been trying to get this done for weeks now. No progress. Someone > please help me. I am trying to delete a domain already crawled from my > crawldb and index. > > I have a list of domains already crawled in my index. How do I exclude or > delete domains from my crawl output folder. I have tried using the > crawl-urlfilter.txt. > > +^http://([a-z0-9]*\.)* > -^http://([a-z0-9]*?\.)*remita.net > > Hoping it will exclude the domain remita.net from the crawldb or index and > include all the other urls. Then I run the LinkDbMerger, SegmentMerger, > CrawlDbMerger, IndexMerger. No change. All domains remain part of my output. > > Please how can I get this done. > -- > View this message in context: > http://www.nabble.com/Someone-Please-respond-...-Deleting-Urls-already-crawl > ed-from-the-crawlDB-tp17053927p17053927.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > _________________________________________________________________ Make Windows Vista more reliable and secure with Windows Vista Service Pack 1. http://www.windowsvista.com/SP1?WT.mc_id=hotmailvistasp1banner
