check this out

http://kuthrax.blogspot.com/2008/01/how-to-retrieve-parsed-content-from.html


On Jan 15, 2008 2:46 PM, Morrowwind <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> My project is about web page processing and I need to parse the web-pages
> to
> get all the plain text first.
>
> Now I have finished the crawling part using nutch, and I'm in trouble with
> the parsing part. I have my data in crawldb folder. How can I parse the
> plain text out of the web pages and store them in a .txt file?
>
> Could anyone give me a hint please.
>
> Thanks a lot.
>
>
> --
> View this message in context:
> http://www.nabble.com/How-to-use-Nutch-to-parse-Web-pages%21-tp14845212p14845212.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Reply via email to