Mirko ,
Thanks a lot for your reply.
Shouldn't PDFBox handle those ligatures automatically, as stated in the
previous PDFBox versions ?
Best regards ,
Hesham
---------------------------------------------
Included message :
These are most likely ligatures in the original PDF. Ligatures for fi, fl,
ffl, and ft are pretty common, and some word processing programs
automatically replace the original character sequences by their
corresponding ligatures. I haven't really seen a Th ligature before, but
it
makes sense because the vertical bar of the T and the vertical bar of the
h
typically appear visually too far apart without custom kerning.
HTH,
Mirko
On Wed, Aug 31, 2011 at 12:59 PM, Hesham G. <[email protected]>
wrote:
Hello ,
I have a PDF that I extract its text using PDFBox. The PDF is read fine
using Mac's Preview, but in PDFBox some words are read in a strange way.
Examples:
crucifixion => cruci<xion
They => +ey
after => a>er
You can check a 1 page PDF sample here :
http://www.4shared.com/document/F5DG_rHu/pdf_with_strange_text.html
Is this something with the PDF or it concerns PDFBox ?
Best regards ,
Hesham