Ledio Ago wrote:
Hi Michael! Did you get a answer on this one?  It seems like the refetch 
interval
is hardcoded, no matter what you set it in the config file, since 
FETCH_GENERATION_DELAY_MS takes effect after the first fetch.

Anybody out there, is this correct, or we are reading this wrong.  If this is 
correct
then the refeching feature doesn't work.

This is not the case (i.e. you are reading this wrong :) ). The FETCH_GENERATION_DELAY_MS constant specifies how much time needs to pass before Pages already selected to be included in a fetchlist will be re-considered for selection again, UNLESS they have been updated with updatedb (after fetching).

This is to prevent selecting the same pages, if you run FetchListTool twice in a rapid succession - but at the same time, if you lost or discarded that fetchlist, not to wait indefinitely. 7 days was considered to be a good optimum (some large fetch jobs may run for days, so it could be a couple days before you have a chance to run updatedb with the results of fetching).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to