About the PDF documents, see https://issues.apache.org/jira/browse/PDFBOX-361. I've resolved it using the PDFBox trunk version.
On Tue, Jul 21, 2009 at 2:59 AM, Paco Avila <[email protected]> wrote: > Sometimes fails to index PDF, MSWord, MSExcel. I know this is due to the > PDFBox and POI libraries (I have sumitted some of these documents to them), > but is is important to know where there is a problem with the text > extractors. > > On Mon, Jul 20, 2009 at 8:02 PM, Fabiano Nunes <[email protected]> wrote: > > > What kind of documents? > > > > On Thu, Jul 9, 2009 at 6:49 PM, Paco Avila <[email protected]> wrote: > > > > > Sometimes when i put a document in the repository, the text extractor > > fails > > > and the document is not indexed. would be nice to test if a document > has > > > been indexed or not, but currently i can't see an easy way to achieve > > this > > > behaviou. > > > > > > Any idea? > > > > > > > > > -- > Paco Avila > GIT Consultors > tel: +34 971 498310 > fax: +34 971496189 > e-mail: [email protected] > http://www.git.es >
