I think it is because I am crawling a single host, eventually I think it
throttles the crawler's connections and returns 503 for connection attempts.

I can work around this by crawling in smaller batches and pausing for a bit
in between batches (and probably also by reducing threads or delaying
connections per host), but since I can't guarantee I'll never encounter a
503 again I wanted to be sure it can correctly handle it by re-fetching it
in a subsequent round of crawling.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-have-nutch-2-retry-503-errors-tp4123311p4123475.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to