Hello there, > > I know about ligatures, and normally PDFBox handles them well, e.g. ff ffi fi fl > are quite common in TeX-produced PDF documents. > But why should PDFBox reproduce a fi (FI) ligature as fl (FL)? >
When does this problem occur? Are you receiving "fl" instead of "fi" when performing text extraction (eg. PDFTextStripper utility) or are you seeing it when performing PDF rendering (eg. the PageDrawer utility)? Debugging could be more or less rewarding depending on what tools you are using and how familiar you are with font encodings and charsets. The basic idea would be to find out the value of the "problematic" byte in the PDF text object, and then to look up its character name. If you could share the PDF document I might take a look at it sometimes. Could be another Type1C font issue where I am to blame. VR

