Hi I am new to Nutch. I just started to use Nutch to crawl an intranet and extract a certain field from the html pages. The first step I would like to do is to dump all the html pages to a directory. I guess I should add a filter class to do it, but I have no idea where should I start. Can someone give me some advice on how to start or which class's source code I should read? Thank you very much! Paul
- How to dump the crawled Html pages? Paul Lypaczewski
- Re: How to dump the crawled Html pages? Markus Jelsma
- Re: How to dump the crawled Html pages? Paul Lypaczewski
- Re: How to dump the crawled Html pages? Hannes Carl Meyer
- Re: How to dump the crawled Html pages? Paul Lypaczewski
- Re: How to dump the crawled Html p... Paul Lypaczewski
- Re: How to dump the crawled Ht... Shadiq Ammar

