I'm using release 0.9. Is it better to use 0.9 or a nightly build in a production environment?
I at present in the process of trying out nightly build #334 Regards, Hilkiah G. Lavinier MEng (Hons), ACGI 6 Winston Lane, Goodwill, Roseau, Dominica Mbl: (767) 275 3382 Hm : (767) 440 3924 Fax: (767) 440 4991 VoIP USA: (646) 432 4487 Email: [EMAIL PROTECTED] Email: [EMAIL PROTECTED] IM: Yahoo hilkiah / MSN [EMAIL PROTECTED] IM: ICQ #8978201 / AOL hilkiah21 ----- Original Message ---- From: Andrzej Bialecki <[EMAIL PROTECTED]> To: [email protected] Sent: Sunday, January 20, 2008 3:24:43 PM Subject: Re: db.ignore.external.links Hilkiah Lavinier wrote: > Hi I need to better understand the impact of the > db.ignore.external.links property. Which version of Nutch is this? > I have this set to true in my nutch-site.xml file. Based on the > description, I expect that links to sites not included in the initial > inject list won't get indexed. However after running a -depth 10 from > an initial list of 15 sites, nutch has indexed (confirmed from > searching with tomcat) hundreds of sites that were NOT included in > the initial seed list. How come? Is there some other option that I > must set to say "only index the pages for the sites included in the > initially supplied seed list". No, this property should have the effect as you expected - if it doesn't work properly then it's a bug that needs to be fixed. Please be aware that certain aspects of redirect treatment have been changed recently - AFAIK the option should work correctly with the current code in trunk. The new urls outside the initial seed hosts may come from redirects to external hosts. > > For whats its worth I'm using the urlfilter-suffix instead of the > urlfilter-regex since I read somewhere that the regex filter causes > crashes and the suffix one is more stable etc. This shouldn't matter in this case. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs
