Look through the source code for SegmentDumper. You can do this somewhat similarly by overriding reduce() -- it gets called once for every page in the segment with access to all the parse and fetch data for the document.
Hope this helps... --Mike On 4/30/07, Anton Beza <[EMAIL PROTECTED]> wrote:
Hi, I'm trying to iterate through all of the pages that Nutch stored so that I can re-parse them and save over their current ParseText. I have fetcher.content.store set to true and all of the pages' full content is stored. How would I iterate through each stored page using Nutch 0.8? Thanks, Anton
