On 2010-05-03 19:59, Emmanuel de Castro Santana wrote: > "Unfortunately, no. You should at least crawl without parsing, so that > when you download the content you can run the parsing separately, and > repeat it if it fails." > > I've just found this in the FAQ, can it be done ? > > http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F
The first method of recovering that is mentioned there works only (sometimes) for crawls performed using LocalJobTracker and local file system. It does not work in any other case. > > By the way, about not parsing, isn't necessary to parse the content anyway > in order to generate links for the next segment ? If this is true, one would > have to run parse separatedly, which would result the same. Yes, but if the parsing fails you still have the downloaded content, which you can re-parse again after you fixed the config or the code... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com