Re: Why no combining-character form for U+00F8?

Jukka K. Korpela Thu, 16 Aug 2012 23:46:14 -0700

2012-08-17 1:44, Ian Clifton wrote:

Andreas Prilop <[email protected]> writes:

On Thu, 16 Aug 2012, Ian Clifton wrote:

Having just been to Norway, and wanting to email my friends all
about it, I came across a curiosity: neither of the combining
characters U+0337, U+0338 seem to work in usually-reliable Emacs


Windows:
  http://en.wikipedia.org/wiki/Keyboard_layout#US-International

Unix:
  http://en.wikipedia.org/wiki/Compose_key#Common_compose_combinations

Maybe I should explain at this point: I’ve got used to using combining
characters as a way of composing characters myself, using direct input
of characters by hexadecimal character number  (<ctrl-X> 8 [RET] hex

> [RET] in Emacs, <shift><ctrl-U>hex<shift><ctrl> in many Unix tools).

It’s certainly useful to know one universal method for entering anyUnicode character in one’s favorite environment. Even if it is somewhatclumsy, with a key combination prefix that looks odd on first sight,it’s convenient to use once you’ve learned it and use it regularly.Often such methods are more or less hidden in software. (I have usedEmacs for about 30 years, and I did not know about <ctrl-X> 8 ...)

Not
the most efficient method, but by remembering the character numbers of a
handful of combining accents, I can assemble most of the accented
characters I use. Perhaps I should start trying to learn these compose
combinations, as they’re shorter and mostly mnemonic.

It depends, especially on the frequency of needing a character. If youe.g. need Latin 1 supplement characters frequently, US Internationalkeyboard layout can be handy, but you need to check the placement of acharacter until you’ve learned to memorize it. There is nothingintuitive or easy to remember in the allocation of “ø” to the keycombination AltGr L (or at least I fail to see any). Using Unicodenumbers, you would still need to check the number until you memorize it,and there is nothing intuitive in the allocation of “ø” to U+00F8. Butmemorizing a Unicode number (for a frequently needed character) lets youenter it in any software that has some general method for entering acharacter by the number. And modern editors and word processors normallyhave.

There is an essential difference between using combining mark and usinga precomposed character: they are distinct at the character level, andin any processing, they are handled as distinct unless programmed totreat them as “the same”, either via normalization or otherwise. Forexample, in Emacs, E <ctrl-X> 8 [RET] 301 [RET] produces an “e” followedby a combining mark, and while Emacs displays it as “é”, it’s stilldifferent (at character level) from what you get by <ctrl-X> 8 [RET] E9[RET]. In searches, for example, they do not match. In rendering, theywould normally produce the same result, but not necessary; transferredto another program, they might produce different results, if the programcannot handle combining marks or handles them too simply.

My point is that on practical grounds, precomposed characters likeU+00E9 for “é” are generally safe, especially in older software, than arepresentation using a combining mark. (For “ø”, there is no choice, asdiscussed.)


Yucca

Re: Why no combining-character form for U+00F8?

Reply via email to