Just saw the code to confirm that. Protocol Status = "2" corresponds to FAILED. Nutch will attempt to fetch them in subsequent round with a hope that I can fetch it. After a limit 'db.fetch.retry.max', it will mark that url as DB_GONE and wont reattempt it further.
On Sat, May 4, 2013 at 12:04 PM, Tejas Patil <[email protected]>wrote: > My guess is that those urls were not fetched successfully and so its been > retried in every round of crawl. > > > On Sat, May 4, 2013 at 11:55 AM, raviksingh <[email protected]>wrote: > >> Hi, >> I have written a java program that call "crawl" command. This fetches >> and updates the results in MySQL. However, if called again the same urls >> are >> fetched again and again. Which certainly slows the process. Status for >> many >> urls is now "2". They still get fetched every time. What can be the >> problem. >> Please help. >> >> Regards >> Ravi Singh >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Nutch-Crawls-Again-and-again-tp4060834.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > >

