Look through the source code for SegmentDumper. You can do this somewhat similarly by overriding reduce() -- it gets called once for every page in the segment with access to all the parse and fetch data for the document.
Hope this helps... --Mike On 4/30/07, Anton Beza <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to iterate through all of the pages that Nutch stored so that I > can re-parse them and save over their current ParseText. > > I have fetcher.content.store set to true and all of the pages' full content > is stored. > > How would I iterate through each stored page using Nutch 0.8? > > Thanks, > Anton > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
