Hi Team,
PDF viewers are not rendering all of the tamil letters as expected in the PDF generated using PDFbox. It seems I have to do the required substitutions while generating the PDF to get it rendered as expected. Attempting the substitutions, any help would be appreciated. Ligature Substitutions - Tamil Use Cases Below are the 5 possible cases for a base character to join with vowels. There are 18 base characters, however the cases will be the same for the remaining seventeen. Case 1 - vowel follows the base character - No change required. PDF viewers render as expected. க + ா = கா Case 2 - Vowel on top of base character - No change required. PDF viewers render as expected. க + ி = கி க + ீ = கீ க + ் = க் Case 3 - base character follows the vowel - Need to reverse the glyphes க + ெ = கெ -> ெ + க = கெ க + ே = கே -> ே + க = கே க + ை = கை -> ை + க = கை Case 4 - base character follows the composite vowel - Need to split and reorder the glyphs க + ொ = கொ -> க + ெ + ா -> ெ + க + ா = கொ க + ோ = கோ -> க + ே + ா -> ே + க + ா = கோ க + ௌ = கௌ -> க + ெ + ள -> ெ + க + ள = கௌ Case 5 - Base character and vowel needs to point new glypse id - New resultant glyphe without unicode character - Substitute new glyphe for a series of glyphes க + ு = கு -> கு க + ூ = கூ - > கூ Below in table representation, Input text JDK TTF PDFbox generate PDF Input text Char Sequence Code points gid Actual* Expected க் க + ் 2965 3021 Character : க Codepoint : 2965 unicode : ub95 Character : ் Codepoint : 3021 unicode : ubcd 1828 1862 க் க் All good கா க + ா 2965 3006 Character : க Codepoint : 2965 unicode : ub95 Character : ா Codepoint : 3006 unicode : ubbe 1828 1851 கா கா All good கி க + ி 2965 3007 Character : க Codepoint : 2965 unicode : ub95 Character : ி Codepoint : 3007 unicode : ubbf 1828 1852 கி கி All good கீ க + ீ 2965 3008 Character : க Codepoint : 2965 unicode : ub95 Character : ீ Codepoint : 3008 unicode : ubc0 1828 1853 கீ கீ All good கு க + ு 2965 3009 Character : க Codepoint : 2965 unicode : ub95 Character : ு Codepoint : 3009 unicode : ubc1 1828 1854 கு கு (gid = 6698) New glyphe expected. கூ க + ூ 2965 3010 Character : க Codepoint : 2965 unicode : ub95 Character : ூ Codepoint : 3010 unicode : ubc2 1828 1855 கூ கூ ( gid = 6716) New glyphe expected. கெ க + ெ 2965 3014 Character : க Codepoint : 2965 unicode : ub95 Character : ெ Codepoint : 3014 unicode : ubc6 1828 1856 கெ ெ + க = கெ Reversing the glyphes expected. கே க + ே 2965 3015 Character : க Codepoint : 2965 unicode : ub95 Character : ே Codepoint : 3015 unicode : ubc7 1828 1857 கே ே + க = கே Reversing the glyphes expected. கை க + ை 2965 3016 Character : க Codepoint : 2965 unicode : ub95 Character : ை Codepoint : 3016 unicode : ubc8 1828 1858 கை ை + க = கை Reversing the glyphes expected. கொ க + ொ 2965 3018 Character : க Codepoint : 2965 unicode : ub95 Character : ொ Codepoint : 3018 unicode : ubca 1828 1859 கொ க + ெ + ா ெ + க + ா = கொ Split and reorder expected. கோ க + ோ 2965 3019 Character : க Codepoint : 2965 unicode : ub95 Character : ோ Codepoint : 3019 unicode : ubcb 1828 1860 கோ க + ே + ா ே + க + ா = கோ Split and reorder expected. கௌ க + ௌ 2965 3020 Character : க Codepoint : 2965 unicode : ub95 Character : ௌ Codepoint : 3020 unicode : ubcc 1828 1861 கௌ க + ெ + ள ெ + க + ள = கௌ Split and reorder expected. * Actual - the dotted circle will be invisible. Attached the actual output and expected output. Did a hard coded substitution(For the glyphe id without having unicode, hardcoded at PDCIDFontType2#public byte[] encode(int unicode). Reverse, split and reorder input text charsequence before calling the showtext. Also added the glyphe id that does not have a unicode at TrueTypeEmbedder Subsetter for embedding the glyphe into the generated pdf.) just to obtain the expected output. How to handle these substitutions in an efficient way? Looking at the GlyphSubstitutionTable, fontbox.cmap.Identity-H, fontbox.unicode.Scripts.txt. Couldn’t get it so far. Any help would be appreciated. thank you, Jeyan
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org