Re: Nutch Crawls Again and again

Tejas Patil Sat, 04 May 2013 12:12:00 -0700

Just saw the code to confirm that. Protocol Status =  "2" corresponds
to FAILED. Nutch will attempt to fetch them in subsequent round with a hope
that I can fetch it. After a limit 'db.fetch.retry.max', it will mark that
url as DB_GONE and wont reattempt it further.



On Sat, May 4, 2013 at 12:04 PM, Tejas Patil <[email protected]>wrote:

> My guess is that those urls were not fetched successfully and so its been
> retried in every round of crawl.
>
>
> On Sat, May 4, 2013 at 11:55 AM, raviksingh <[email protected]>wrote:
>
>> Hi,
>>     I have written a java program that call "crawl" command. This fetches
>> and updates the results in MySQL. However, if called again the same urls
>> are
>> fetched again and again. Which certainly slows the process. Status for
>> many
>> urls is now "2". They still get fetched every time. What can be the
>> problem.
>> Please help.
>>
>> Regards
>> Ravi Singh
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Nutch-Crawls-Again-and-again-tp4060834.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>

Re: Nutch Crawls Again and again

Reply via email to