maybe you can try to use http://search.capan.org/~podmaster/HTML-LinkExtractor-0.13
eyal edri wrote:
Hi, Is there a way to tell nutch not to parse the pages it fetches? meaning just to extract the links from it? I know there is a "-no parsing" attribute,but still i need to d/l some contentTypes using the parse-XXX plugins.. so i'm not sure it will work if i use the option. Thank you,
