Re: No Unicode mapping warnings

Oliver Steinau Tue, 26 Jul 2016 08:24:46 -0700

PDFBox gives (kind of) the same warnings, and also returns garbage(albeit different).


I'll try and report a bug.


Oliver


On 26.07.2016 17:02, Nick Burch wrote:

On Tue, 26 Jul 2016, Oliver Steinau wrote:
I'm having problems extracting text from a small (43 KB) PDF fileusing tika-1.13 -- I get a bunch of warnings like
WARN  No Unicode mapping for C0104 (38) in font FDLICI+PSOwstswiss
WARN  No Unicode mapping for C0097 (31) in font FDLICI+PSOwstswiss
Can you try with the ExtractText tool from Apache PDFBox?http://pdfbox.apache.org/2.0/commandline.html#extracttext
If that works fine, then it's a Tika bug and we'll need to look intoit. If that fails with the same problem, then you'd need to report abug to PDFBox and attach a problematic pdf file to the jira. (Tikawould then get the fix on the next release)
Nick

Re: No Unicode mapping warnings

Reply via email to