Hello-
I've done this, I think it is
nutch readseg -dump <segment_dir> <dumpfile>
to dump all the html of everything in a segment. You can also specify what
url you are interested in, type nutch readseg for details.
see you
-Jim
----- Original Message -----
From: "LoneEagle70" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, October 17, 2007 5:53 AM
Subject: Extracting html pages from db
Hi,
I was able to install Nutch 0.9 and crawl a site and use the Web Page to
do
full text search of my db.
But we need to extract informations from all HTML page.
So, is there a way to extract HTML pages from the db?
--
View this message in context:
http://www.nabble.com/Extracting-html-pages-from-db-tf4640373.html#a13253122
Sent from the Nutch - User mailing list archive at Nabble.com.