Nutch selects records that are eligible for fetch. It's either due to a transient failure or if the fetch interval has been expired. This means that failed fetches due to network issues are refetched within 24 hours. Successfully fetched pages are only refetched if the current time exceeds the previously fetchTime + interval.
-----Original message----- > From:kamaci <[email protected]> > Sent: Wed 20-Mar-2013 23:46 > To: [email protected] > Subject: Does Nutch Checks Whether A Page crawled before or not > > Lets assume that I am crawling wikipedia.org with depth 1 and topN 1. After > it finishes crawling if I rerun that command and after finishes again and > again. What happens? Does Nutch skips previous fetched pages or try to crawl > same pages again? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Does-Nutch-Checks-Whether-A-Page-crawled-before-or-not-tp4049564.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

