I think it is because I am crawling a single host, eventually I think it throttles the crawler's connections and returns 503 for connection attempts.
I can work around this by crawling in smaller batches and pausing for a bit in between batches (and probably also by reducing threads or delaying connections per host), but since I can't guarantee I'll never encounter a 503 again I wanted to be sure it can correctly handle it by re-fetching it in a subsequent round of crawling. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-have-nutch-2-retry-503-errors-tp4123311p4123475.html Sent from the Nutch - User mailing list archive at Nabble.com.

