Olaf, Cheers, I am still confused though. How does nutch know which of the two to use, that is, how do I tell nutch if its doing intranet or internet? Do I rename regex-urlfilter.txt to crawl-urlfilter.txt to if I want to do internet crawls? Steve
-----Original Message----- From: Olaf Thiele [mailto:[EMAIL PROTECTED] Sent: Thursday, March 31, 2005 4:40 PM To: [email protected] Subject: Re: What's the difference between crawl-urlfilter.txt and regex-urlfilter.txt? Hi Steve, the crawl-urlfilter is for intranet crawling while regex-urlfilter is for internet crawling. Kind regards, Olaf On Thu, 31 Mar 2005 12:01:19 +0800, Steve Follmer <[EMAIL PROTECTED]> wrote: > > What's the difference between crawl-urlfilter.txt and > regex-urlfilter.txt? They look very similar. Why does nutch have both, > and what do they do different? > > My best guess is that the first is used only by the crawl tool and the > second is used by nutch proper. The crawl tool and nutch proper seem > to also have > separate .xml config files. I further guess that this is just an > artifact of > having two separate tools that need separate but equal configuration? > > -Poindexter > > -- <SimpleHuman gender="male"> <Physical name="Olaf Thiele" /> <Virtual adress="http://www.olafthiele.de" /> </SimpleHuman>
