eyal edri wrote:
Hi,
Is there a way to tell nutch not to parse the pages it fetches? meaning just
to extract the links from it?
Extracting links requires that a page is downloaded first (otherwise
where would you extract the links from?) and parsed (otherwise how would
you extract links from an unintelligible byte[]?).
I know there is a "-no parsing" attribute,but still i need to d/l some
contentTypes using the parse-XXX plugins.. so i'm not sure it will work if i
use the option.
No download -> no parsing -> no outlinks.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com