Thanks! I'd like to automate this. Do you know which Java class does the actual parsing?
Thanks again, Anton On 7/26/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Anton Beza wrote: > > Hello, > > > > I'm trying to find a way to re-parse the pages stored through Nutch. > > > > I want to be able to access the pages Nutch has already processed and > > stored, apply a new parser, and replace the old content with the new. > > > > Is this possible in Nutch 0.8, or will it have to be altered to achieve > > this? > > Just remove the following directories from each segment: crawl_parse, > parse_text, parse_data, and then run bin/nutch parse on these segments. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >
