Re: Modified keypresses

Paul LeoNerd Evans Mon, 18 Apr 2011 06:51:41 -0700

On Sun, Apr 17, 2011 at 11:23:34PM +1000, Ben Schmidt wrote:
> - It sounds like changing the internal Vim byte-stream representation
>   for keypresses to actually be CSI could be a good idea. By making
>   careful use of the private area we could ensure Vim can represent
>   everything it needs to, plus almost by definition it can represent all
>   the keys/modifiers required, plus it is somewhat future-proof. Does
>   anyone have any objection to this?
No objection here. It's reasonably compact and should be able to
represent all that's needed.


> - I have one possible objection: I'm interested in how close to
>   losslessly we can convert between registers (which can hold
>   keypresses) and buffers (which cannot). In many encodings, it cannot
>   be done losslessly, but in UTF-8, which is very common these days,
>   perhaps it can. I suspect perhaps CSI was designed with this in mind.
>   Does anyone know if CSI sequences are ever valid UTF-8 or are designed
>   not to be? If they are not ever valid UTF-8 we have the nice situation
>   that there isn't an ambiguity between keypresses and characters. If
>   not, perhaps designing something with that property would be wise,
>   rather than using CSI.
UTF-8 is specifically designed to play nicely with ECMA-35/48 in this
regard.

Encoding CSI as Escape '[' means that it doesn't include any high-bit
bytes, only US-ASCII.

Encoding CSI as a single 0x9b byte means that it does use a byte in the
C1 range. UTF-8 cafefully avoids using a C1 byte in the -initial-
position of a multibyte sequence. Any multibyte UTF-8 sequence uses only
a G1 byte as the initial byte, though it may include C1 bytes in its
continuation bytes. A CSI byte could appear as the non-initial byte
within a longer UTF-8 sequence, so some care needs to be applied on
parsing (i.e. you cannot simply strstr() or strchr() search). However,
no ambiguity arises from simple prefix-based parsing, presuming valid
UTF-8 and CSI sequence input.

> - Whatever byte-stream representation we land on, though, we would need
>   a more careful survey of what internal Vim things are needed, and a
>   sensible design for using the private area. That shouldn't be too hard
>   and probably doesn't need to involve the community too much.

What things did you have in mind that would even need the private use
area? Would these be non-keypress events? The CSI model is able to
represent any of the keypresses already, so I'm not sure what's left to
consider.

> - More importantly, though, we need some unambiguous specification of
>   when to use the rich key representation and when not to use it. I
>   think this needs a fair amount of further discussion. E.g. when Vim
>   receives ^M from a terminal, should it leave it as ^M or should it
>   convert it to a CSI sequence? What about in the GUI? What about when
>   keys are stuffed into the input buffer? What about when storing the
>   LHS and RHS of a mapping?

This has been the subject of some debate lately with regards to the
terminal input side of the puzzle.

The way libtermkey handles this is that it has a set of "translations",
better names for some sequences. It renames 0x08 as "Backspace", 0x09 as
"Tab", 0x0d as "Enter" and 0x1b as "Escape". Any of the other C0s take
their Ctrl-[letter] interpretation. This can be disabled by a flag.

This is important for terminals to still match users' expectations with
regards to the keys they press to get various termios handling. termios
can only store a single byte value for the various events, so, for
example, EOF has to be 0x04, the single-byte representation of "Ctrl-D".

So, I would say, there are 4 keys which tend to go by different names:

 Ctrl-H  =>  Backspace
 Ctrl-I  =>  Tab
 Ctrl-M  =>  Enter
 Ctrl-[  =>  Escape

The others are relatively uncontentious and can be left as-is.

> - Finally, we need to finish discussing how mappings will be triggered.
> 
> - My initial thoughts on this: Vim should use generic non-CSI codes when
>   ambiguity exists: That includes when it receives codes from a terminal
>   which are not CSI, so e.g., when receiving ^M which could be Enter or
>   Ctrl-M, it should just store it as ^M. However, when it receives a key
>   in the GUI or via CSI and it knows specifically it was produced by
>   Enter or Ctrl-M it should store it specifically using CSI. For
>   backward compatibility reasons, existing <>-notation should be
>   considered ambiguous, but we should come up with an extension of this
>   notation to be considered specific. Perhaps just adding an extra
>   character after the opening '<'--for instance <!Enter> could mean
>   Enter, specifically. When interpreting keypresses, specific keypresses
>   would trigger specific mappings, or ambiguous mappings if no
>   corresponding specific one exists; ambiguous keypresses would never
>   trigger a specific mapping. This means you could map <Enter> and it
>   would work in dumb terminals and the GUI, and existing plugins would
>   keep working in both, too, including when you actually press Ctrl-M.
>   However, in the GUI/smart terminal, you could override that
>   default/ambiguous map, by mapping <!C-M> which would take precedence,
>   but only be active when we are sure Ctrl-M was pressed and not Enter.
>   Do others think this approach would work?

Ooooh. I think I could get to like that approach. It would allow people
to map specific things without breaking general ones.

Though does that still handle such cases as the physical keys Ctrl-C vs
Ctrl-Shift-C? Can I ask to map <!Ctrl-C> to mean only the lowercase,
without the (implied) uppercase including shift?

-- 
Paul "LeoNerd" Evans

[email protected]
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

signature.asc
Description: Digital signature

Re: Modified keypresses

Raspunde prin e-mail lui