Re: PDF indexing issues

Furkan KAMACI Fri, 15 Nov 2013 09:28:21 -0800

You should check the Apache PDFBox project. A similar question:
https://issues.apache.org/jira/browse/PDFBOX-940



2013/11/15 Marcello Lorenzi <mlore...@sorint.it>

> Hi,
> during you testing of Apache SOLR 4.3, we have noticed some errors
> occurred for PDF indexing:
>
> ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
> ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for '--UCS2'
>
> and
>
> ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter;
> FlateFilter: stop reading corrupt stream due to a DataFormatException
>
> Could these errors related to PDF  files format?
>
> Thanks,
> Marcello
>

Re: PDF indexing issues

Reply via email to