Getting html pages through a Nutch crawl (for a dataset)

Sameendra Samarawickrama Sun, 22 Jan 2012 02:52:16 -0800

Hi,
I am using Nutch to generate a small dataset of web; dataset on which I am
planning of running a focused crawler later.


I did a test crawl of and I have the 'segments' folder built up. Now I need
to get that exact html pages it fetched out of the seed url/s.

Is it possible to create a dataset this way? If so, how do I get those html
pages?

Thanks a lot!

Getting html pages through a Nutch crawl (for a dataset)

Reply via email to