Re: Modified keypresses

Tony Mechelynck Sun, 17 Apr 2011 13:51:55 -0700

On 17/04/11 16:19, Benjamin R. Haskell wrote:
[...]

Err, if you're using the 7-bit Control Sequence Introducer (\e[ = <Esc>
+ <[> = \033 \133), then CSI sequences are virtually always valid UTF-8.
The 8-bit single-character variant does avoid UTF-8, though. In a
properly formed stream of bytes, 0x9b is never the first character of a
UTF-8 sequence, since it has the appearance of a continuation byte.

[...]

U+009B is an unprintable codepoint in Unicode, set apart to mean <CSI>.Its UTF-8 representation is 0xC2 0x9B. Couldn't that be used in UTF-8?It is valid UTF-8, but a *control* code, not a printable one, and itstill means <CSI>.


See http://www.unicode.org/charts/PDF/U0080.pdf which says:

009B <control>
     = CONTROL SEQUENCE INTRODUCER

Yes, it might confuse some Windows users whose OS misguidedly pretendsthat its Windows-1252 is ISO-8859-1; but cp1252 and Latin1 are *not* thesame, whatever Bill Gates may decree. (OTOH, it is intentionally thatthe 256 first codepoints of Unicode are the same as in Latin1, and thatthe first half of those even have the same disk representation in UTF-8as in Latin1 and US-ASCII.)



Best regards,
Tony.
--

Ye gods! Give me strength to suffer what cannot be changed, courage tochange

what must be changed, and wisdom to tell the two apart.
                -- Marcus Aurelius

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Modified keypresses

Raspunde prin e-mail lui