Hi, My project is about web page processing and I need to parse the web-pages to get all the plain text first.
Now I have finished the crawling part using nutch, and I'm in trouble with the parsing part. I have my data in crawldb folder. How can I parse the plain text out of the web pages and store them in a .txt file? Could anyone give me a hint please. Thanks a lot. -- View this message in context: http://www.nabble.com/How-to-use-Nutch-to-parse-Web-pages%21-tp14845212p14845212.html Sent from the Nutch - User mailing list archive at Nabble.com.
