Re: Fetching outside the domain ?

goudal Wed, 18 Oct 2006 06:13:29 -0700

"Tomi NA" <[EMAIL PROTECTED]>
> Date: Wed, 18 Oct 2006 12:03:04 +0200
> Subject: Re: Fetching outside the domain ?



>Frederic, what exactly is the problem? You'd like the recrawl not to
>leave your web site? You can do that very easily: set the
>"db.ignore.external.links" property in nutch-site.xml to "true" (you
>can copy the xml property from nutch-default and then change the value
>to "true");

Well, but what about all the filter regex prefix suffix and such .txt file in 
the conf directory ?

Why the crawl filter configuration file is not used while recrawling ?

Btw we have some virtual local hosts, hoz does the db.ignore.external.links 
deal with that ?


>
>> Btw as a beginner, totally ignorant of java, and timeless system ingeneer in
>> charge of too many things, is there any doc that really explain the behaviou
r
>> of nutch ?
>
>A good place to read about nutch is the nutch wiki:
>http://wiki.apache.org/nutch/

I have not found an explanation about the different steps, the overall 
structure of the thing.


f.g.

Re: Fetching outside the domain ?

Reply via email to