Hi,

My project is about web page processing and I need to parse the web-pages to
get all the plain text first. 

Now I have finished the crawling part using nutch, and I'm in trouble with
the parsing part. I have my data in crawldb folder. How can I parse the
plain text out of the web pages and store them in a .txt file? 

Could anyone give me a hint please. 

Thanks a lot.


-- 
View this message in context: 
http://www.nabble.com/How-to-use-Nutch-to-parse-Web-pages%21-tp14845212p14845212.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to