Is it just slower or do these URLs properly crash Nutch? Can you tell us
more about the crashes you are getting, e.g. logs etc..?


On 17 July 2014 15:06, Adam Estrada <[email protected]> wrote:

> All,
>
> I am coming across a few pages that are not responsive at all which is
> causing Nutch to #failwhale before finishing the current crawl. I have
> increased http.timeout and it still crashes. How can I get Nutch to
> skip over unresponsive URLs that are causing the entire thing to bail?
>
> Thanks,
> Adam
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to