you could access the html files directly by lucene, theres a few sample chapters on http://lucenebook.com to get your adjusted with lucenc's api doc

best of luck :)

gk

----- Original Message ----- From: "Benny" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, August 22, 2005 7:53 PM
Subject: Index local file.


Hi,

Can someone give me some hints how index local files?

I have a lot of plain HTML files (more than 50K pages, the size is
around 2-3k/page). I don't prefer puting them in the web service and
using url to index them. I'd like NUTCH to index them from local HD.
Is it possible? if it is, what kind of url I need inject into db? for
example, if you use web service, we use the

http://domain/file.html

How about local HD file's format? I believe no more "http", what's
protocol supposed to be. These file are still in plain HTML format.


Benny




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to