Hi Raja, The FetchSchedule [0] defines the contract for implementations that manipulate fetch times and re-fetch intervals. FetchScheduleFactory [1] caches the instance in the ObjectCache. The Interface and classes (respectively) do not automate or semi-automate actual scheduling e.g. execute the scheduling directly. Instead the parameters and behaviour defined by your FetchSchedule implementation is consulted when a fetching job is executed.
You asked if you can control this through scripts, the answer is yes. I have continuous crawls running as nightly jobs, all of this is scripted and managed via cron. Simply put, if the page is ready to be crawled AND the job is executed, then the page will be fetched within the next segment or batch. hth Lewis [0] http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/crawl/FetchSchedule.java [1] http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/crawl/FetchScheduleFactory.java On Thu, Apr 18, 2013 at 5:53 AM, vivekvl <[email protected]> wrote: > Curious to know whether Nutch AdaptiveFetchSchedule can do recrawling > automatically? > > I observed Hadoop automatically reinitiates the interrupted Jobs. Otherwise > Hadoop is always up and running with Nutch jobs configured to it. In this > scenario if a page is ready to be crawled based on adaptive schedule, > whether Nutch will recrawl the page? > > Also I like to know the best approach for continuous crawling for live > environment. > > Thanks, > Raja > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Whether-Nutch-AdaptiveFetchSchedule-can-do-recrawling-automatically-tp4056979.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- *Lewis*

