Is it just slower or do these URLs properly crash Nutch? Can you tell us more about the crashes you are getting, e.g. logs etc..?
On 17 July 2014 15:06, Adam Estrada <[email protected]> wrote: > All, > > I am coming across a few pages that are not responsive at all which is > causing Nutch to #failwhale before finishing the current crawl. I have > increased http.timeout and it still crashes. How can I get Nutch to > skip over unresponsive URLs that are causing the entire thing to bail? > > Thanks, > Adam > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

