Hello-

   I've done this, I think it is

   nutch readseg -dump <segment_dir> <dumpfile>

to dump all the html of everything in a segment. You can also specify what url you are interested in, type nutch readseg for details.

                       see you
                           -Jim


----- Original Message ----- From: "LoneEagle70" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, October 17, 2007 5:53 AM
Subject: Extracting html pages from db



Hi,

I was able to install Nutch 0.9 and crawl a site and use the Web Page to do
full text search of my db.

But we need to extract informations from all HTML page.

So, is there a way to extract HTML pages from the db?
--
View this message in context: http://www.nabble.com/Extracting-html-pages-from-db-tf4640373.html#a13253122
Sent from the Nutch - User mailing list archive at Nabble.com.


Reply via email to