https://bugs.kde.org/show_bug.cgi?id=344849
--- Comment #8 from 4aa7f...@opayq.com --- I have read some parts of the PDF standard (ISO 32000-1:2008) and can only confirm the assessment in the Sejda bug report (which has been closed in the meantime). According to section 7.9.2.2 "Text String Type" of ISO 32000-1:2008, fields such as the "Author" field in the example document must be represented as a PDF "text string", which can be encoded either as UTF16-BE with byte order mark or as PDFDocEncoding. PDFDocEncoding can encode all Latin1 characters; however, it is NOT the same as either ISO Latin1 or Windows-1252! The mapping of PDFDocEncoding bytes to characters is defined in Annex D, table D.2 "Latin Character Set and Encodings". Note that both PDFDocEncoding and Windows-1252 can in fact encode the characters "–‰". Thus, the string need not be encoded as UTB16-BE and the provided PDF document is valid (the characters "–‰" are correctly encoded as "0x85 0x8B" in PDFDocEncoding). It seems that Okular does not correctly parse PDFDocEncoded text strings. (The other example document works correctly because U+2012 cannot be encoded in PDFDocEncoding, so UTF16-BE was used, which is correctly read by Okular.) -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel