Hi, I'm a newbie to Nutch. I installed nutch and use it to do the crawling successfully.
The point is, I checked the crawled files under /segments/***/fetcher/ and they are not in .html or other similar format. (There are two files named "data" and "index" under each subfolder.)
Since I want to crawl thousands of web pages and parse theHTML code of each web page...I was wondering, what should I do so that the crawled pages can be in HTML format?
Thanks. -- sarah
