Hi Marek, One reason for this is that separating fetching and parsing stages means that if there was to be an error during execution of a fetch (which also undertook parsing) the error would be inherently harder to root out and resolve. This could also mean that any crawl data collected during the fetch process could be lost or damaged in this process.
On the other hand, if we undertake a parse of the fetched (fetching without parsing) data after this stage has completed and we encounter an error, then we can assume that the error is somewhere within the parsing stage and not the fetching. I am not sure if there is a way to change this back without hacking some of your own code... maybe the best way is to use a reliable script On Fri, Jun 10, 2011 at 11:01 AM, Marek Bachmann <[email protected]>wrote: > ... and I wonder if there is a way to change this behaviour back to let the > fetcher start the parsing. > > The syntax help of the command hasn't been updated it seems: > > root@hrz-vm180:/home/nutchServer/nutch/runtime/local/bin# ./nutch fetch > Usage: Fetcher <segment> [-threads n] [-noParsing] > > > -- *Lewis*

