eyal edri wrote:
Hi,

Is there a way to tell nutch not to parse the pages it fetches? meaning just
to extract the links from it?

Extracting links requires that a page is downloaded first (otherwise where would you extract the links from?) and parsed (otherwise how would you extract links from an unintelligible byte[]?).


I know there is a "-no parsing" attribute,but still i need to d/l some
contentTypes using the parse-XXX plugins.. so i'm not sure it will work if i
use the option.

No download -> no parsing -> no outlinks.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to