Re: Need To Crawl Only Failed URLS

Manish Verma Fri, 15 Jan 2016 12:35:15 -0800

Thanks Markus, Will Try This .

Thanks
Manish Verma



> On Jan 15, 2016, at 5:28 AM, Markus Jelsma <[email protected]> wrote:
> 
> Hello - you mean transient errors such as temporary connection issues, time 
> outs etc? That is not possible, but they will be retried one day later. One 
> option is to dump the failed URL's via readdb and using the -retry switch. 
> The output contains failed URL's. Parse the dump, get the URL's and use the 
> freegen tool to force recrawl. You can also patch the generator tool to 
> restrict to records with a non-zero retry count.
> 
> Markus
> 
> -----Original message-----
>> From:Manish Verma <[email protected]>
>> Sent: Thursday 14th January 2016 23:27
>> To: [email protected]
>> Subject: Need To Crawl Only Failed URLS
>> 
>> Hi,
>> 
>> I want to crawl just failed url, I picked failed URL’s from log and want to 
>> crawl just these. 
>> I believe if I put these as seed and run crawl script , it will generate 
>> fetch list containing not only these failed urls rather all present in 
>> crawlDB for next fetch cycle ?
>> 
>> Any Thoughts .
>> 
>> Thanks Manish
>> 
>>

Re: Need To Crawl Only Failed URLS

Reply via email to