On Mon, Sep 29, 2008 at 9:19 PM, Kevin MacDonald <[EMAIL PROTECTED]> wrote: > Once I have done a crawl I have a need to pass all of the raw HTML and > javascript that has been fetched through a custom parser. During a fetch > does nutch store all of the raw content including HTML tags on disk?
Yes, if you have fetcher.store.content set to true (which is true by default). Raw content of a page will be saved under <segment>/content directory. To reach a particular content, you may try this bin/nutch readseg -get <segment> <url> -noparse -noparsedata -nofetch -nogenerate -noparsetext > Thanks > > Kevin > -- Doğacan Güney
