I crawl my website and I forgot to whitelist my Nutch server IP, so as expected 
Nutch got 403 and didn't fetch the url.

When I whitelisted the server Nutch is not able to re-crawl the url and still 
see it 403.

It happen with me when the website was down and crawlDB has 500 for all links. 
It refuses to re-fetch links

What should I do? My workaround is to delete the crawl folder and re-crawl 
again, but is this the right way? If yes, then it is really not good as I need 
to clean Solr core to make sure that Nutch and Solr are in sync

Kind regards,
Hany Shehata
Enterprise Engineer
Green Six Sigma Certified
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST)
ul. Kapelanka 42A, 30-347 Kraków, Poland

Tie line: 7148 7689 4698
External: +48 123 42 0698
Mobile: +48 723 680 278
E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
Protect our environment - please only print this if you have to!


This E-mail is confidential.  

It may also be legally privileged. If you are not the addressee you may not 
forward, disclose or use any part of it. If you have received this message in 
please delete it and all copies from your system and notify the sender 
immediately by
return E-mail.

Internet communications cannot be guaranteed to be timely secure, error or 
The sender does not accept liability for any errors or omissions.

Reply via email to