Re: No Unicode mapping warnings

Nick Burch Tue, 26 Jul 2016 08:04:02 -0700

On Tue, 26 Jul 2016, Oliver Steinau wrote:

I'm having problems extracting text from a small (43 KB) PDF file usingtika-1.13 -- I get a bunch of warnings like
WARN  No Unicode mapping for C0104 (38) in font FDLICI+PSOwstswiss
WARN  No Unicode mapping for C0097 (31) in font FDLICI+PSOwstswiss

Can you try with the ExtractText tool from Apache PDFBox?http://pdfbox.apache.org/2.0/commandline.html#extracttext

If that works fine, then it's a Tika bug and we'll need to look into it.If that fails with the same problem, then you'd need to report a bug toPDFBox and attach a problematic pdf file to the jira. (Tika would then getthe fix on the next release)


Nick

Re: No Unicode mapping warnings

Reply via email to