Re: [Nutch-general] Fetching outside the domain ?

goudal Wed, 18 Oct 2006 06:13:37 -0700

"Tomi NA" <[EMAIL PROTECTED]>
> Date: Wed, 18 Oct 2006 12:03:04 +0200
> Subject: Re: Fetching outside the domain ?



>Frederic, what exactly is the problem? You'd like the recrawl not to
>leave your web site? You can do that very easily: set the
>"db.ignore.external.links" property in nutch-site.xml to "true" (you
>can copy the xml property from nutch-default and then change the value
>to "true");

Well, but what about all the filter regex prefix suffix and such .txt file in 
the conf directory ?

Why the crawl filter configuration file is not used while recrawling ?

Btw we have some virtual local hosts, hoz does the db.ignore.external.links 
deal with that ?


>
>> Btw as a beginner, totally ignorant of java, and timeless system ingeneer in
>> charge of too many things, is there any doc that really explain the behaviou
r
>> of nutch ?
>
>A good place to read about nutch is the nutch wiki:
>http://wiki.apache.org/nutch/

I have not found an explanation about the different steps, the overall 
structure of the thing.


f.g.




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Fetching outside the domain ?

Reply via email to