Hi There are some scenarios of failure in nutch which I'm not sure how to handle.
1. I run nutch on a huge amount of urls and some kind of OOM exception if thrown, or one of those "cannot allocate memory". The result is that my segment is half complete. How can I recover from this? Do I have to recrawl all the urls that were in the segment? If so, how do I mark them for recrawl in crawldb? 2. I run nutch on a huge amount of urls and some urls are not parsed sucessfully. I get an index which has all the urls that worked and doesnt have the ones that didnt work. How can I handle them without having to recrawl the whole thing? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-handle-failures-in-nutch-tp3898768p3898768.html Sent from the Nutch - User mailing list archive at Nabble.com.

