How to use Nutch to parse Web-pages!

Morrowwind Tue, 15 Jan 2008 11:46:40 -0800

Hi,

My project is about web page processing and I need to parse the web-pages to
get all the plain text first.


Now I have finished the crawling part using nutch, and I'm in trouble with
the parsing part. I have my data in crawldb folder. How can I parse the
plain text out of the web pages and store them in a .txt file? 

Could anyone give me a hint please. 

Thanks a lot.


-- 
View this message in context: 
http://www.nabble.com/How-to-use-Nutch-to-parse-Web-pages%21-tp14845212p14845212.html
Sent from the Nutch - User mailing list archive at Nabble.com.

How to use Nutch to parse Web-pages!

Reply via email to