Errors when decomposing Arabic Ligatures ----------------------------------------
Key: PDFBOX-415 URL: https://issues.apache.org/jira/browse/PDFBOX-415 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 0.7.3 Reporter: Justin LeFebvre For arabic ligatures U+FC5E to U+FC63, the decomposition of each contains a space which causes a word to be broken up into two words. Also, the U+FDF2 ligature is handled differently by different fonts. Some encode it as U+0644 U+0644 U+0647 and add on an extra separate U+0627. U+FDF2 should be encoded as U+0627 U+0644 U+0644 U+0647. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.