2012-08-17 1:44, Ian Clifton wrote:
Andreas Prilop <[email protected]> writes:
On Thu, 16 Aug 2012, Ian Clifton wrote:
Having just been to Norway, and wanting to email my friends all
about it, I came across a curiosity: neither of the combining
characters U+0337, U+0338 seem to work in usually-reliable Emacs
Windows:
http://en.wikipedia.org/wiki/Keyboard_layout#US-International
Unix:
http://en.wikipedia.org/wiki/Compose_key#Common_compose_combinations
Maybe I should explain at this point: I’ve got used to using combining
characters as a way of composing characters myself, using direct input
of characters by hexadecimal character number (<ctrl-X> 8 [RET] hex
> [RET] in Emacs, <shift><ctrl-U>hex<shift><ctrl> in many Unix tools).
It’s certainly useful to know one universal method for entering any
Unicode character in one’s favorite environment. Even if it is somewhat
clumsy, with a key combination prefix that looks odd on first sight,
it’s convenient to use once you’ve learned it and use it regularly.
Often such methods are more or less hidden in software. (I have used
Emacs for about 30 years, and I did not know about <ctrl-X> 8 ...)
Not
the most efficient method, but by remembering the character numbers of a
handful of combining accents, I can assemble most of the accented
characters I use. Perhaps I should start trying to learn these compose
combinations, as they’re shorter and mostly mnemonic.
It depends, especially on the frequency of needing a character. If you
e.g. need Latin 1 supplement characters frequently, US International
keyboard layout can be handy, but you need to check the placement of a
character until you’ve learned to memorize it. There is nothing
intuitive or easy to remember in the allocation of “ø” to the key
combination AltGr L (or at least I fail to see any). Using Unicode
numbers, you would still need to check the number until you memorize it,
and there is nothing intuitive in the allocation of “ø” to U+00F8. But
memorizing a Unicode number (for a frequently needed character) lets you
enter it in any software that has some general method for entering a
character by the number. And modern editors and word processors normally
have.
There is an essential difference between using combining mark and using
a precomposed character: they are distinct at the character level, and
in any processing, they are handled as distinct unless programmed to
treat them as “the same”, either via normalization or otherwise. For
example, in Emacs, E <ctrl-X> 8 [RET] 301 [RET] produces an “e” followed
by a combining mark, and while Emacs displays it as “é”, it’s still
different (at character level) from what you get by <ctrl-X> 8 [RET] E9
[RET]. In searches, for example, they do not match. In rendering, they
would normally produce the same result, but not necessary; transferred
to another program, they might produce different results, if the program
cannot handle combining marks or handles them too simply.
My point is that on practical grounds, precomposed characters like
U+00E9 for “é” are generally safe, especially in older software, than a
representation using a combining mark. (For “ø”, there is no choice, as
discussed.)
Yucca