Translation of combining diacritics Type 1 font

jd9225 Wed, 19 Feb 2020 19:00:44 -0800

Hello,

I am currently a researcher at RIT's DPRL, using PDFBox 2.0.7 with
MHVHUS+CMR10 Type 1 font and PDFTextStripper.  I am interested in finding
the matrix (or values) used to translate diacritic elements, or a similar
way to find the positioning of diacritic elements.


In my example, the Type 1 font is an embedded subset within the pdf
document using Type1Encoding.  When I access the glyph for the diacritic
element eg. dieresis, through getPath, the position of the path is above
the lowercase characters.  For uppercase characters, I can get the
diacritic, however the position of the path is the same as lowercase
characters, as opposed to placed above the uppercase character.  In
addition, the name is the combining diacritic. E.G. dieresiscmb, which
isn't available in getCharStringsDict or getCharSet.

On a side note, combining diacritical names cause problems when using the
PDPageContentStream class to showText of the unicode; resulting in an
IllegalArgumentException that the combining diacritic does not exist in the
font, even when the character's TextPosition and font were parsed using
PDFTextStripper.  Let me know if I should open a ticket for this issue.

How are the diacritical accents for Type 1 fonts translated from their
stored location into place?
[image: diacriticdieresis.png]
(I have cc'd my advisor)

Thank you,
Jessica Diehl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Translation of combining diacritics Type 1 font

Reply via email to