I'm having problems extracting text from a small (43 KB) PDF file using
tika-1.13 -- I get a bunch of warnings like
WARN No Unicode mapping for C0104 (38) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0097 (31) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0110 (43) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0105 (39) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0115 (47) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0114 (46) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0065 (17) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0117 (49) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0102 (36) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0098 (32) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0071 (20) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0246 (54) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0223 (53) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0058 (16) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0066 (18) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0120 (51) in font FDLICI+PSOwstswiss
WARN No Unicode mapping for C0072 (21) in font FDLICI+PSOwstswiss
and tika returns only garbage.
The file displays just fine in Acrobate Reader, and what's more,
pdftotext.exe extracts the text just fine...
Is there anything I can do about this?
Thanks a lot,
Oliver