Hi,

Can some one please explain how the following scenario works?

I need to crawl a site with 50K urls.  This site is a dynamic site and will
have frequent updates on the site. Assuming it takes 2 days to completely
crawl this site, can we have some configuration(fetch schedule or something
else) so that once the crawl cycle is complete, the next crawl cycle will
start automatically after two days to find the new URLS. If this feature is
not available, should we manually control the repeated crawling of the site
thru some sort of scripting?

Actually we will have to crawl more than 50 sites to be crawled separately.
If we need to maintain re-crawling of each site, should we have 50 separate
scripts to handle them. Please let us know if anyone has faced this
situation?


Thanks,
Senthil








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whether-Nutch-AdaptiveFetchSchedule-can-do-recrawling-automatically-tp4056979p4057036.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to