On 12/20/2012 2:36 PM, Jukka K. Korpela wrote:
2012-12-20 14:13, David Starner wrote:

It may be useful to try to agree on official or semi-official names for
characters in a language. Such a list hardly needs to cover all of the over
100,000 Unicode characters.

Why not? Why should an English speaker sticking a arbitrary character
into a character map program get a name for it but a non-English
speaker not?

For most characters, a “translated” name would be arbitrary. I would compare this to names of biological species. Most species lack names in most languages, and when names exist, they are often vaguely and inconsistently used.

But when real people, not biologists, want to look up information they have precisely two choices: they can look at a visual index (for species that can be arranged visually) or they can look up the scientific name for the species based on the only thing they know: the local popular name.


That’s why people use scientific (Linnaean) names. We use common names for common animals, but it just would not make sense to assign a name to the millions of insect species in each human language. The scientific name is a crucial key to information. With Unicode characters, both the number and the name act as such keys, though the name is usually descriptive of meaning, too.

Unlike species, all characters for living scripts have popular local names in at least one language other than English.

It may not be desirable to blindly translate ALL such names into ALL languages, but major languages (not only English) may be used by people that are familiar with or study many other languages and scripts. For those languages, their community of scholars represents another set of users who benefit from translated names.

Finally, for arcane scripts, there's usually an easily translatable part of the character name (think of LATIN LETTER SMALL) and an arbitrary part of the name (e.g. A) which comes from a transliteration scheme, a catalog number or the like.

If a language doesn't have a unique transliteration scheme for a particular script, the choices are to either use the same as present in the Unicode Standard, or to use one from another, culturally more relevant language (e.g. a French-based instead of and English-based transliteration).



So Unicode names should not be translated at all, any more than you
translate General Category values for example.

Why wouldn't you?

Because those values are identifiers.

No, names have multiple uses; especially if you take the formal name as one in a series of "aliases" for each character - that's why it's often more useful to think of translations of the full code charts and character index, instead of "just" the formal names. (The latter, by themselves are not so useful).


There's an argument that they're generally useful
for programmers only and programming often requires English knowledge,
but if I were explaining the character categories in Esperanto, I
would certainly say that Sm is matematikaj simboloj or Simbolo
Matematika, not act like "Symbol, Math" should have any importance to
my audience.

We can and often should *explain* meanings of identifiers in different languages, but that’s different from naming things. The value “Sm” has a technical meaning, and it is not identical with the common-language expression “mathematical symbol” or its variants, though rather close.


The linguistic content of the short labels is indeed limited, however, I can see good reasons to provide alternate abbreviations for characters, e.g. for ZWSP or WJ, because these terms are used in places where they do not act as identifiers.

A./

Reply via email to