On 03/04/10 06:36, Tony Mechelynck wrote:
Hi Bram,

1. (Minor bug): On this system (gvim 7.2.411, Huge version with
GTK2-GNOME GUI), typing Ctrl-K in Insert mode followed by two spaces
doesn't give the expected result: instead of U+00A0 ("Alt-space", the
non-breaking space) I get U+E000, a CJK character. Ctrl-K NS works
correctly.

2. U+E000 is displayed in gvim as CJK halfwidth. Shouldn't it be fullwidth?

3. "\<Char-nnnn>" gives wrong results for some Unicode codepoints. I
tried to find examples and counterexamples, as follows (in the comment
after the :echo statements, the UTF-8 expansion in hex):

:echo "«\<Char-0x40>" | " 40
�...@»
:echo "«\<Char-0x80>" | " C2 80
«<80><fe>X»
:echo "«\<Char-0x100>»" | " C4 80
«Ā<fe>X»
:echo "«\<Char-0x101>»" | " C4 81
«ā»
:echo "«\<Char-0x180>»" | " C6 80
«ƀ<fe>X»
:echo "«\<Char-0x190>»" | " C6 90
«Ɛ»
:echo "«\<Char-0x1A0>»" | " C6 A0
«Ơ»
:echo "«\<Char-0x1C0>»" | " C7 80
«ǀ<fe>X»
:echo "«\<Char-0x4E00>»" | " E4 B8 80
«一<fe>X»
:echo "«\<Char-0x4E01>»" | " E4 B8 81
«丁»
:echo "«\<Char-0x4E20>»" | " E4 B8 A0
«丠»
:echo "«\<Char-0x4E40>»" | " E4 B9 80
«乀<fe>X»
:echo "«\<Char-0xE000>»" | " EE 80 80
«<ee><80><fe>X<80><fe>X»
:echo "«\<Char-57344>»" | " EE 80 80
«<ee><80><fe>X<80><fe>X»
:echo "«\<Char-0xE001>»" | " EE 80 81
«<ee><80><fe>X<81>»"
:echo "«\<Char-0xE040>»" | " EE 81 80
«<fe>X»

This seems to indicate that the extra bytes 0xFE 0x58 appear after any
0x80 in the UTF-8 expansion of the character. (I added the « »
characters to "bound" the display so that any extra whitespace would be
visible but they change nothing to the bug.)

The bug does not occur after Ctrl-V u in Insert mode or when using
<Char-...> in an Insert-mode mapping. It does when using "\<Char-...>"
in other commands than :echo. Note the following:

:let j = "\<Char-0xE000>"
:let j
j <ee><80><fe>X<80><fe>X
i<Ctrl-R>=j<Enter>
î<t_þ>X<t_þ>X

(where <Ctrl-R> and <Enter> are one keystroke each, not counting
modifiers). Apparently gvim tries to interpret 0x80 0xFE as a "special
key", and "resolves" it (incorrectly) as <t_þ>.

Two very big files were loaded when I first noticed bug #3, but
restarting gvim without them reproduced the bug again with the same
spurious bytes.


Best regards,
Tony.

Update: There is a second case which triggers incorrect behaviour in "\<Char-nnnn>" when 'encoding' is UTF-8:

- As noted above, after every 0x80 byte in the UTF-8 representation, the bytes 0xFE 0x58 are spuriously added: after the UTF-8 string if the 0x80 is its last byte (giving two invalid bytes after the correct multibyte glyph), and/or in the middle of it if there is a 0x80 byte other than the last (making the whole multibyte sequence invalid; the 0x80 can never be the first byte, because the first byte of a multibyte UTF-8 sequence is >= 0xC0 [0xC2 actually, except for "overlong" sequences representing ASCII bytes], and it can not be an "only byte" because single-byte sequences are <= 0x7F).

- In addition, after every 0x9B byte, the bytes 0xFD 0x4F are added, also immediately after that byte, breaking the UTF-8 sequence if it isn't the last byte.

- The above are repeatable "every time", even from one run of gvim to the next, and I always get 0x80 0xFE 0x58 instead of 0x80, and 0x9B 0xFD 0x4F instead of 0x9B, in all the UTF-8 sequences generated by the "\<Char-nnnn>" construct.

- Removing the spurious bytes (including those in the middle of a byte sequence) make the correct multibyte glyph appear immediately (I'm assuming, of course, that 'encoding' is still set to UTF-8).

- The fact that those two byte values, 0x80 aka Alt-Null and 0x9B aka Alt-Escape aka CSI, play special roles in gvim's representation of special keys, might help to spot where the bug comes from. (Yes, did I say it? I tested all this in GUI mode, in my usual "Huge" gvim with GTK2/Gnome GUI, and, of course, with +multi_byte among others. Currently at patchlevel 7.2.411)


I'm crossposting this update to vim_dev because my first post (in vim_multibyte) got no reply whatsoever; but it was only four days ago, and the Easter holiday is upon us; maybe I wasn't patient enough.


Have a nice holiday, and Happy Vimming!
Tony.
--
Immortality -- a fate worse than death.
                -- Edgar A. Shoaff

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

To unsubscribe, reply using "remove me" as the subject.

Raspunde prin e-mail lui