15.8.2011 0:47, Asmus Freytag wrote:
Not all documents are HTML or CSS.
The Numericana page that was cited argues for using Symbol font on web pages, and I showed a few errors in its argumentation in that respect. I also wrote: “However, it might be argued that the Symbol font has been used in text documents (normally not plain text but text that may contain different fonts) and that the characters so used are existing usage that needs to be taken into account. There are two big ifs here: if this involves symbols that do not exist as Unicode characters and if the existing usage is relevant enough, then there might be something to be consider for inclusion into Unicode. The burden of proof lies, of course, on both ifs, with those who propose new characters.”
Maybe I should have stopped there…
The question here is whether it's useful to add code additional points to allow plain-text coverage of certain widely spread fonts (of which "the" symbol font is one) so that it's possible to use, for example, automated processes to re-encode font runs in older documents to make them more fully portable.
“Rich text” that uses the Symbol font can be converted to plain Unicode text, naturally losing any specific formatting such as particular shapes of glyphs, to the extent that the glyphs of Symbol can be identified as representing certain Unicode characters.
If a font vendor has decided to include glyphs that cannot be so resolved, I would say that it is up to the vendor to step up and suggest what the glyphs stand for as text characters and, if they do not exist in Unicode, make a proposal on encoding them.
I wrote previously that Symbol 0xD6 is a particular glyph for SQUARE ROOT U+221A and Symbol 0x60 has behavior that does not match Unicode coding principles (it combines with the _next_ character); it seems that the latter applies to some implementations only, whereas in others, it is a spacing character, making its encoding as a Unicode character even more questionable.
The Symbol font also contains both sans-serif and serif variants of some characters like the registered sign “®.” If there is evidence that there are texts that use both variants, then one might ask whether an addition should be made to let this distinction be made in plain text. I would say no (in this rather hypothetical issue), since the presence of a text character in two font shapes in a single font does not imply that the shapes need to be encoded as separate Unicode characters. A font may well contain alternative glyphs for a character, and it should be up to the font implementation and use to control the use of alternative glyphs if needed.
-- Yucca, http://www.cs.tut.fi/~jkorpela/

