Anton Beza wrote:
Hello,
I'm trying to find a way to re-parse the pages stored through Nutch.
I want to be able to access the pages Nutch has already processed and
stored, apply a new parser, and replace the old content with the new.
Is this possible in Nutch 0.8, or will it have to be altered to achieve
this?
Just remove the following directories from each segment: crawl_parse,
parse_text, parse_data, and then run bin/nutch parse on these segments.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com