Am 10.12.2020 um 08:46 schrieb Mikael Hagstrom:

Hi,


I'm using PDFBox to extract text from PDF files. It works for most files but have an issue with one particular file.


I'm getting two errors.

 1. |o.a.pdfbox.pdmodel.font.PDCIDFontType2 : Could not read embedded
    OTF for font CTQGJF+Arial-BoldMT|
 2. |java.io.IOException: Error:TTF.loca unknown offset format. at
    
org.apache.fontbox.ttf.IndexToLocationTable.read(IndexToLocationTable.java:71)|


(2) should bring a number

Both mean that the font / the PDF is broken.



There is two fonts embedded in the file. I get the same error for both fonts. These fonts are embedded as|TTF|rather than|OTF|. There is also some text added in front of the font name that looks a bit random.

There is no issues opening the file in a PDF reader. Information from the PDF reader below.


Can you extract the text with Adobe?

Please share the file (upload to sharehoster) and tell what PDFBox version you're using

Tilman



Is there some way of dealing with these two errors?



Regards,

Mikael


Reply via email to