Look through the source code for SegmentDumper. You can do this
somewhat similarly by overriding reduce() -- it gets called once for
every page in the segment with access to all the parse and fetch data
for the document.

Hope this helps...
--Mike

On 4/30/07, Anton Beza <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm trying to iterate through all of the pages that Nutch stored so that I
> can re-parse them and save over their current ParseText.
>
> I have fetcher.content.store set to true and all of the pages' full content
> is stored.
>
> How would I iterate through each stored page using Nutch 0.8?
>
> Thanks,
> Anton
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to