Re: Fetcher does no parsing by default in 1.3

lewis john mcgibbney Fri, 10 Jun 2011 03:12:29 -0700

Hi Marek,

One reason for this is that separating fetching and parsing stages means
that if there was to be an error during execution of a fetch (which also
undertook parsing) the error would be inherently harder to root out and
resolve. This could also mean that any crawl data collected during the fetch
process could be lost or damaged in this process.

On the other hand, if we undertake a parse of the fetched (fetching without
parsing) data after this stage has completed and we encounter an error, then
we can assume that the error is somewhere within the parsing stage and not
the fetching.

I am not sure if there is a way to change this back without hacking some of
your own code... maybe the best way is to use a reliable script

On Fri, Jun 10, 2011 at 11:01 AM, Marek Bachmann
<[email protected]>wrote:

> ... and I wonder if there is a way to change this behaviour back to let the
> fetcher start the parsing.
>
> The syntax help of the command hasn't been updated it seems:
>
>  root@hrz-vm180:/home/nutchServer/nutch/runtime/local/bin# ./nutch fetch
> Usage: Fetcher <segment> [-threads n] [-noParsing]
>
>
>

-- 
*Lewis*

Re: Fetcher does no parsing by default in 1.3

Reply via email to