On 08.05.2012 14:00, Nicolas Cellier wrote:
leadingChar was here merely for handling Han unification
http://en.wikipedia.org/wiki/Han_unification.
As I understand it, the language variant is encoded in leadingChar,
while the generic ideogram is encoded in the unicode value.
From what I can tell, it (based on the indexes) initially also allowed for non-unicode WideStrings (see Character asUnicode, and it's use of EncodedCharSet). That was probably a bad idea in the first place since there is no 16-bit string type in Pharo, I guess adding that would be the original thought behind it...

Later on it was changed to do Han unification, but:
- The code for selecting/displaying font based on leadingChar is a mess, never tested, and no compatible font come installed in Pharo by default. - There is no code to, say, automatically add leadingChars based on current environment to character events, so even if it works, input might still display incorrectly. - Using it in Translation support requires storing in a special format which includes the leading characters. - The LanguageEnvironments that currently deal with this also have other roles, which is purely empirical and, in some cases, outdated (I'm looking at you, defaultSystemConverter)
We tried to clean a lot already and avoid using leadingChar, except
maybe for east-asian languages, so in case of umlaut, I would classify
this as a bug.

Nicolas
Yeah, that's clearly something gone entirely wrong when the translation table was initialized. 17 has never been a valid leadingChar.
FWIW, it was reinitialized and works as expected in 1.4.

TLDR of non-bug related stuff; If it were up to me, I'd say nuke leadingChar from orbit (which, as you say, is already done in most cases), then find some other way than what is currently done in LanguageEnvironment (if possible) to query the correct encoding of strings you pass to the system.

Cheers,
Henry

Reply via email to