Luke Dashjr <luke at dashjr dot org> wrote:

That is, 100 decimal is "one hundred" with a binary value of 110 0100.
But the same "100" in tonal would be "san" with a binary value of
1 0000 0000.

"100" with the meaning of "one hundred" is spoken as "ciento" in Spanish, "ekatón" in Greek, "sto" in Russian, etc. So pronunciation by itself doesn't necessarily justify separate encoding.

Within English-speaking contexts, "100" can also be a binary number, or an octal number with a binary value of 100 0000. In my world as a developer, it's often a hex number, as in tonal. In most of these cases it's typically pronounced "one zero zero" or "one oh oh." So the numeric value of a string of digits within a positional system also doesn't necessarily justify separate encoding.

TTS systems always have to rely on environmental hints. Anyone who has worked on them will agree.

And in the other example, one is "B with double lines" vs "bitcoins".

As David pointed out, currency symbols really aren’t an analogy to anything else. They are never built from combining characters, and are never decomposable to them. This has nothing really to do with TTS or pronunciation. One person in the Ubuntu thread mentioned that, but that is not the primary reason.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to