crawled page are not in HTML -- what should I do?

Sarah Zhai Wed, 17 Aug 2005 17:50:56 -0700

Hi,
I'm a newbie to Nutch.
I installed nutch and use it to do the crawling successfully.

The point is, I checked the crawled files under /segments/***/fetcher/and they are not in .html or other similar format.(There are two files named "data" and "index" under each subfolder.)


Since I want to crawl thousands of web pages and parse the

HTML code of each web page...I was wondering, what should Ido so that the crawled pages can be in HTML format?


Thanks.

--
sarah

crawled page are not in HTML -- what should I do?

Reply via email to