Issues with extraction content of PDF files

Zheng Lin Edwin Yeo Fri, 18 Dec 2015 09:58:30 -0800

Hi,

I'm indexing some PDF documents in Solr. However, for certain PDF files,
there are chinese text in the documents, but after indexing, what is
indexed in the content is either a series of "??????" or an empty content.


i've also tried on the Tika app, and I get the same results.

What could be the reason that causes this?

I've shared one of the file with the issue on dropbox, which you can access
via the link here:
https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0


Regards,
Edwin

Issues with extraction content of PDF files

Reply via email to