Hi,
I am using Nutch to generate a small dataset of web; dataset on which I am
planning of running a focused crawler later.

I did a test crawl of and I have the 'segments' folder built up. Now I need
to get that exact html pages it fetched out of the seed url/s.

Is it possible to create a dataset this way? If so, how do I get those html
pages?

Thanks a lot!

Reply via email to