On 18/04/11 02:31, Ben Schmidt wrote:
- It sounds like changing the internal Vim byte-stream representation
for
keypresses to actually be CSI could be a good idea. By making careful
use of the
private area we could ensure Vim can represent everything it needs
to, plus
almost by definition it can represent all the keys/modifiers
required, plus it
is somewhat future-proof. Does anyone have any objection to this?
That sounds reasonable to me. I use rxvt-unicode, though, which is
(apparently)
the only major holdout from the xterm-keycodes monoculture. So, I have a
particular interest in making sure the conversion to/from CSI works
right.
(Volunteering services, here. Not complaining.)
Sounds good.
Err, if you're using the 7-bit Control Sequence Introducer (\e[ =
<Esc> + <[> =
\033 \133), then CSI sequences are virtually always valid UTF-8. The
8-bit
single-character variant does avoid UTF-8, though. In a properly
formed stream of
bytes, 0x9b is never the first character of a UTF-8 sequence, since it
has the
appearance of a continuation byte.
OK, great. It's internal and so completely under our control, so we can
use 0x9b always. Part of the input handling code would be to convert
known sequences beginning with \e[ to 0x9b. We then can ensure we never
accidentally interpret valid UTF-8 as keystrokes, as keystrokes will
always be invalid UTF-8.
There is a potential gotcha here. If you put a macro with special keys
into your .vimrc, it becomes invalid UTF-8, so the next time you open
it, it may be converted from latin1 or something like that, because Vim
falls back to the next encoding in fencs. The conversion would result in
valid UTF-8, and so the keystrokes would no longer be interpreted as
keystrokes. This is currently a gotcha anyway, though, so I don't think
it is a problem. I presume even if Vim would ordinarily fall back,
:scriptencoding forces its interpretation even if invalid UTF-8 is
encountered, so there should be no problem when actually sourcing
scripts.
Single-byte encodings are a different story. IIRC, currently Vim takes
special care to avoid interpreting CSI in a buffer as keystrokes, but
interprets code point 0x80 in single-byte encodings as beginning a
keystroke. This would change if we used CSI, as CSI would be interpreted
as a keystroke, and 0x80 would not. I don't think this is actually a
problem. It only makes a difference if you play back a macro you've
yanked anyway (or something like that). A minor backwards
incompatibility.
Out of interest, what is Unicode code point 0x80?
from http://www.unicode.org/charts/PDF/U0080.pdf
0080 <control>
with no alternative description.
In both Latin1 (true Latin1, not Windows-1252) and Unicode, both 0x00 to
0x1F and 0x80 to 0x9F are unprintable control characters. Most of them
have some additional description. For instance U+009B = Latin1 0x9B = CSI.
Something else to think about, though: What about users who set termcap
options which include CSI in their vimrc. Will they continue to work?
Will it continue to be easy to set termcap options by using CTRL-V +
press-the-key-that-is-giving-you-problems? It is almost certainly going
to be impossible to do this 100% backwards compatibly, but it's an issue
we should think about. My initial thoughts are that it would probably
work just as well (or badly) as it always has, but that the added input
filtering may make it a bit less transparent, and perhaps we should have
a mechanism for doing it more reliably.
I'll get to the rest later. As far as this part goes, though, I don't
think there are any compelling reasons not to use CSI for the internal
byte-stream representation. There are just a couple of minor
incompatibilities and rough edges to handle with care. Any other issues
anyone knows about that should be considered on this front?
Ben.
Best regards,
Tony.
--
If God is dead, who will save the Queen?
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php