Suppose I want nutch to fetch URLs and a) follow links in HTML documents *only*b) provide HTML and all other documents found as such to some external tool as is, i.e. unparsed.
Is it correct that it is sufficient to only activate the parse-html plugin from all the parse-* plugins or is even this not necessary?
(Is there a more detailed description of what the individual stages of nutch do beyond the tutorial?)
Thanks, Harald.

