Re: Unicode HTML, download

Philippe Verdy Sun, 21 Nov 2004 14:39:47 -0800

From: "Doug Ewell" <[EMAIL PROTECTED]>

Cryptically naming these two CSS classes ".he" and ".heb", which
provides no indication of which is the Unicode encoding and which is the
Latin-1 hack, merely makes a bad suggestion worse.

It was not cryptocraphic: "he" was meant for Hebrew (generic, properly Unicode encoded, suitable for any modern Hebrew), and "heb" for Biblic Hebrew where a legacy encoding may still be needed, in absence of workable Unicode support for now: this won't be the same language however, so a change of encoding may be justified. I was not advocating for mixing encodings within the same text for the same language...

But I was nearly sure that a technical jargon in Hebrew would probably not need Biblic Hebrew, except for illustration purpose within small delimited block quotes or spans, where there will be simultaneously changes of: - language level - needed character set, some characters not being encodable with Unicode - a needed changed encoding (from Unicode to Latin-1 override hack) - specific font to render the legacy encoding. In that case, it is acceptable to have the general text in modern Hebrew properly coded with Unicode, even if the small illustrative quotes remain fully in a non standard mapping, and won't appear correctly without the necessary font.

Note that PDF files DO mix encodings within the embedded fonts that PDF writers dynamically create for only the necessary glyphs. These encodings are specific to the document, for each embedded font... This is why PDF files can encode text that still don't have Unicode character mappings. You can see that when you attempt to copy/paste text fragments from PDF files in sections using embedded fonts; the pasted text will not reproduce the same characters as what you can see in the PDF reader; copy/pasting however works for PDF files using external fonts with standard mappings.

Re: Unicode HTML, download

Reply via email to