You can also use the TextMining.org toolbox, which provides classes to
extract text from PDF and DOC files, using the Jakarta POI project. They are
all free, under Apache Licence.
The URL
:http://www.textmining.org/modules.php?op=modloadname=Newsfile=articlesid
=6mode=threadorder=0thold=0).
(URL
You need to be able to extract the text from them and feed that to lucene.
http://ww.pdfbox.org can extract text from pdf documents.
Ben
On Fri, 17 Oct 2003, Andre Hughes wrote:
Hello,
Can the Lucene search engine index and search though PDF documents?
What are the file format limits for