On 12/20/2012 2:36 PM, Jukka K. Korpela wrote:
2012-12-20 14:13, David Starner wrote:
It may be useful to try to agree on official or semi-official names for
characters in a language. Such a list hardly needs to cover all of
the over
100,000 Unicode characters.
Why not? Why should an English speaker sticking a arbitrary character
into a character map program get a name for it but a non-English
speaker not?
For most characters, a “translated” name would be arbitrary. I would
compare this to names of biological species. Most species lack names
in most languages, and when names exist, they are often vaguely and
inconsistently used.
But when real people, not biologists, want to look up information they
have precisely two choices: they can look at a visual index (for species
that can be arranged visually) or they can look up the scientific name
for the species based on the only thing they know: the local popular name.
That’s why people use scientific (Linnaean) names. We use common names
for common animals, but it just would not make sense to assign a name
to the millions of insect species in each human language. The
scientific name is a crucial key to information. With Unicode
characters, both the number and the name act as such keys, though the
name is usually descriptive of meaning, too.
Unlike species, all characters for living scripts have popular local
names in at least one language other than English.
It may not be desirable to blindly translate ALL such names into ALL
languages, but major languages (not only English) may be used by people
that are familiar with or study many other languages and scripts. For
those languages, their community of scholars represents another set of
users who benefit from translated names.
Finally, for arcane scripts, there's usually an easily translatable part
of the character name (think of LATIN LETTER SMALL) and an arbitrary
part of the name (e.g. A) which comes from a transliteration scheme, a
catalog number or the like.
If a language doesn't have a unique transliteration scheme for a
particular script, the choices are to either use the same as present in
the Unicode Standard, or to use one from another, culturally more
relevant language (e.g. a French-based instead of and English-based
transliteration).
So Unicode names should not be translated at all, any more than you
translate General Category values for example.
Why wouldn't you?
Because those values are identifiers.
No, names have multiple uses; especially if you take the formal name as
one in a series of "aliases" for each character - that's why it's often
more useful to think of translations of the full code charts and
character index, instead of "just" the formal names. (The latter, by
themselves are not so useful).
There's an argument that they're generally useful
for programmers only and programming often requires English knowledge,
but if I were explaining the character categories in Esperanto, I
would certainly say that Sm is matematikaj simboloj or Simbolo
Matematika, not act like "Symbol, Math" should have any importance to
my audience.
We can and often should *explain* meanings of identifiers in different
languages, but that’s different from naming things. The value “Sm” has
a technical meaning, and it is not identical with the common-language
expression “mathematical symbol” or its variants, though rather close.
The linguistic content of the short labels is indeed limited, however, I
can see good reasons to provide alternate abbreviations for characters,
e.g. for ZWSP or WJ, because these terms are used in places where they
do not act as identifiers.
A./