check this out http://kuthrax.blogspot.com/2008/01/how-to-retrieve-parsed-content-from.html
On Jan 15, 2008 2:46 PM, Morrowwind <[EMAIL PROTECTED]> wrote: > > Hi, > > My project is about web page processing and I need to parse the web-pages > to > get all the plain text first. > > Now I have finished the crawling part using nutch, and I'm in trouble with > the parsing part. I have my data in crawldb folder. How can I parse the > plain text out of the web pages and store them in a .txt file? > > Could anyone give me a hint please. > > Thanks a lot. > > > -- > View this message in context: > http://www.nabble.com/How-to-use-Nutch-to-parse-Web-pages%21-tp14845212p14845212.html > Sent from the Nutch - User mailing list archive at Nabble.com. > >
