RE: Does the Lucene search engine work with PDF's?

2003-10-20 Thread MOYSE Gilles (Cetelem)
You can also use the TextMining.org toolbox, which provides classes to
extract text from PDF and DOC files, using the Jakarta POI project. They are
all free, under Apache Licence. 

The URL
:http://www.textmining.org/modules.php?op=modloadname=Newsfile=articlesid
=6mode=threadorder=0thold=0).
(URL tested today) 

You can try the JGuru page : http://www.jguru.com/faq/view.jsp?EID=1074237

Gilles Moyse


-Message d'origine-
De : Andre Hughes [mailto:[EMAIL PROTECTED]
Envoyé : samedi 18 octobre 2003 00:05
À : [EMAIL PROTECTED]
Objet : Does the Lucene search engine work with PDF's?


Hello,
Can the Lucene search engine index and search though PDF documents?
What are the file format limits for Lucene search engine.
 
Thanks in Advance,
 
Andre'


Re: Does the Lucene search engine work with PDF's?

2003-10-17 Thread Ben Litchfield


You need to be able to extract the text from them and feed that to lucene.
http://ww.pdfbox.org can extract text from pdf documents.

Ben


On Fri, 17 Oct 2003, Andre Hughes wrote:

 Hello,
 Can the Lucene search engine index and search though PDF documents?
 What are the file format limits for Lucene search engine.

 Thanks in Advance,

 Andre'


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]