These are most likely ligatures in the original PDF. Ligatures for fi, fl, ffl, and ft are pretty common, and some word processing programs automatically replace the original character sequences by their corresponding ligatures. I haven't really seen a Th ligature before, but it makes sense because the vertical bar of the T and the vertical bar of the h typically appear visually too far apart without custom kerning.
HTH, Mirko On Wed, Aug 31, 2011 at 12:59 PM, Hesham G. <[email protected]> wrote: > Hello , > > I have a PDF that I extract its text using PDFBox. The PDF is read fine > using Mac's Preview, but in PDFBox some words are read in a strange way. > Examples: > crucifixion => cruci<xion > They => +ey > after => a>er > > You can check a 1 page PDF sample here : > http://www.4shared.com/document/F5DG_rHu/pdf_with_strange_text.html > > Is this something with the PDF or it concerns PDFBox ? > > > Best regards , > Hesham

