How to dump the crawled Html pages?

Paul Lypaczewski Fri, 17 Dec 2010 10:31:31 -0800

Hi
I am new to Nutch. I just started to use Nutch to crawl an intranet and extract 
a certain field from the html pages. The first step I would like to do is to 
dump all the html pages to a directory. I guess I should add a filter class to 
do it, but I have no idea where should I start.
Can someone give me some advice on how to start or which class's source code I 
should read?
Thank you very much!
Paul

How to dump the crawled Html pages?

Reply via email to