RE: Need To Crawl Only Failed URLS

Markus Jelsma Fri, 15 Jan 2016 05:29:46 -0800

Hello - you mean transient errors such as temporary connection issues, time 
outs etc? That is not possible, but they will be retried one day later. One 
option is to dump the failed URL's via readdb and using the -retry switch. The 
output contains failed URL's. Parse the dump, get the URL's and use the freegen 
tool to force recrawl. You can also patch the generator tool to restrict to 
records with a non-zero retry count.


Markus
 
-----Original message-----
> From:Manish Verma <[email protected]>
> Sent: Thursday 14th January 2016 23:27
> To: [email protected]
> Subject: Need To Crawl Only Failed URLS
> 
> Hi,
> 
> I want to crawl just failed url, I picked failed URL’s from log and want to 
> crawl just these. 
> I believe if I put these as seed and run crawl script , it will generate 
> fetch list containing not only these failed urls rather all present in 
> crawlDB for next fetch cycle ?
> 
> Any Thoughts .
> 
> Thanks Manish
> 
>

RE: Need To Crawl Only Failed URLS

Reply via email to