I'm using release 0.9.  Is it better to use 0.9 or a nightly build in a 
production environment?

I at present in the process of trying out nightly build #334

Regards,
 
Hilkiah G. Lavinier MEng (Hons), ACGI 
6 Winston Lane, 
Goodwill, 
Roseau, Dominica 
Mbl: (767) 275 3382
Hm : (767) 440 3924
Fax: (767) 440 4991
VoIP USA: (646) 432 4487
 
Email: [EMAIL PROTECTED]
Email: [EMAIL PROTECTED]
IM: Yahoo hilkiah / MSN [EMAIL PROTECTED]
IM: ICQ #8978201  / AOL hilkiah21

----- Original Message ----
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, January 20, 2008 3:24:43 PM
Subject: Re: db.ignore.external.links


Hilkiah Lavinier wrote:
> Hi I need to better understand the impact of the
> db.ignore.external.links property.

Which version of Nutch is this?


> I have this set to true in my nutch-site.xml file.  Based on the
> description, I expect that links to sites not included in the initial
> inject list won't get indexed. However after running a -depth 10 from
> an initial list of 15 sites, nutch has indexed (confirmed from
> searching with tomcat) hundreds of sites that were NOT included in
> the initial seed list.  How come?  Is there some other option that I
> must set to say "only index the pages for the sites included in the
> initially supplied seed list".

No, this property should have the effect as you expected - if it
 doesn't 
work properly then it's a bug that needs to be fixed. Please be aware 
that certain aspects of redirect treatment have been changed recently -
 
AFAIK the option should work correctly with the current code in trunk. 
The new urls outside the initial seed hosts may come from redirects to 
external hosts.


> 
> For whats its worth I'm using the urlfilter-suffix instead of the
> urlfilter-regex since I read somewhere that the regex filter causes
> crashes and the suffix one is more stable etc.

This shouldn't matter in this case.



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com







      
____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Reply via email to