Olaf, Cheers, I am still confused though. 
How does nutch know which of the two to use,
that is, how do I tell nutch if its doing intranet
or internet? Do I rename regex-urlfilter.txt
to crawl-urlfilter.txt to if I want to do internet crawls?
Steve


-----Original Message-----
From: Olaf Thiele [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 31, 2005 4:40 PM
To: [email protected]
Subject: Re: What's the difference between crawl-urlfilter.txt and
regex-urlfilter.txt?


Hi Steve,
the crawl-urlfilter is for intranet crawling while regex-urlfilter is
for internet crawling.

Kind regards,
Olaf



On Thu, 31 Mar 2005 12:01:19 +0800, Steve Follmer <[EMAIL PROTECTED]>
wrote:
> 
> What's the difference between crawl-urlfilter.txt and 
> regex-urlfilter.txt? They look very similar. Why does nutch have both,

> and what do they do different?
> 
> My best guess is that the first is used only by the crawl tool and the

> second is used by nutch proper. The crawl tool and nutch proper seem 
> to also have
> separate .xml config files. I further guess that this is just an
> artifact of
> having two separate tools that need separate but equal configuration?
> 
> -Poindexter
> 
> 


-- 

<SimpleHuman gender="male">
   <Physical name="Olaf Thiele" />
   <Virtual adress="http://www.olafthiele.de"; />
</SimpleHuman>

Reply via email to