You should check the Apache PDFBox project. A similar question:
https://issues.apache.org/jira/browse/PDFBOX-940


2013/11/15 Marcello Lorenzi <mlore...@sorint.it>

> Hi,
> during you testing of Apache SOLR 4.3, we have noticed some errors
> occurred for PDF indexing:
>
> ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
> ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont;
> Error: Could not parse predefined CMAP file for '--UCS2'
>
> and
>
> ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter;
> FlateFilter: stop reading corrupt stream due to a DataFormatException
>
> Could these errors related to PDF  files format?
>
> Thanks,
> Marcello
>

Reply via email to