I've noticed that in ksh's vi mode ranged operations are performed
without respect to cursor's position within utf8 byte sequence.  Eg.:

 1. type "echo тест | hexdump -C"
 2. leave inseart mode
 3. "0", "2E", "dh", Enter
 4. you end up with "те" and 0x82 (last byte of letter under cursor).

This happens because Endword() moves cursor to the whitespace after word
and decrements cursor position by 1, so that it points to last byte of
last letter.  Then del_range() removes bytes between cursor position and
preceding utf8 start byte, which include start byte of letter under

My diff makes del_range() (and yank_range() which operates in the same
manner) always skip to the beginning of utf8 sequence it is in.
Although this is a more of a bandaid - proper fix would be to make sure
that cursor never rests on continuation byte - it is less invasive and
does not hurt code readability too much.

Comments?  OKs?

Dmitrij D. Czarkoff

Index: vi.c
RCS file: /var/cvs/src/bin/ksh/vi.c,v
retrieving revision 1.40
diff -u -p -r1.40 vi.c
--- vi.c        11 Oct 2016 19:52:54 -0000      1.40
+++ vi.c        14 Oct 2016 10:47:25 -0000
@@ -1323,6 +1323,10 @@ redo_insert(int count)
 static void
 yank_range(int a, int b)
+       while (isu8cont((unsigned char)es->cbuf[a]))
+               a--;
+       while (isu8cont((unsigned char)es->cbuf[b]))
+               b--;
        yanklen = b - a;
        if (yanklen != 0)
                memmove(ybuf, &es->cbuf[a], yanklen);
@@ -1493,6 +1497,10 @@ putbuf(const char *buf, int len, int rep
 static void
 del_range(int a, int b)
+       while (isu8cont((unsigned char)es->cbuf[a]))
+               a--;
+       while (isu8cont((unsigned char)es->cbuf[b]))
+               b--;
        if (es->linelen != b)
                memmove(&es->cbuf[a], &es->cbuf[b], es->linelen - b);
        es->linelen -= b - a;

Reply via email to