Re: ksh(1): vi mode UTF-8 bug

Ingo Schwarze Mon, 29 May 2017 07:16:48 -0700

Hi,

Anton Lindqvist wrote on Sun, May 28, 2017 at 06:07:00PM +0200:
> On Sun, May 28, 2017 at 10:56:19AM +0200, Walter Alejandro Iglesias wrote:


>> There is still a similar issue when you try to "replace" a utf-8
>> character (in command mode press 'r' to replace a single character or
>> 'R' to replace a string).

> Thanks for the report, please try out the diff below.
> As I understand the problem: the current code assumes that the character
> to replace consists of a single byte, which is not true for Unicode
> characters.

Correct.  That needs to be improved.

> When replacing such a character, delete the continuation
> bytes and then replace the start byte with the replacement.
> This ensures no continuation bytes are left behind.
> I made use of putbuf() since it has the side-effect of advancing the
> cursor.
> Lastly, adjust the cursor to be positioned on the last replaced
> character.
> 
> NUL-terminating the line buffer is necessary in order for the following
> to work:
> 
> 1. Insert ö
> 
> 2. Press esc, h (back one char), ro (replace with o), ax (append x)
> 
> Note that replacing a character with a Unicode character does not work
> either.
> 
> Comments? OK?
> 
> Index: bin/ksh/vi.c
> ===================================================================
> RCS file: /cvs/src/bin/ksh/vi.c,v
> retrieving revision 1.45
> diff -u -p -r1.45 vi.c
> --- bin/ksh/vi.c      28 May 2017 07:27:01 -0000      1.45
> +++ bin/ksh/vi.c      28 May 2017 15:59:59 -0000
> @@ -926,13 +926,22 @@ vi_cmd(int argcnt, const char *cmd)
>                       if (cmd[1] == 0)
>                               vi_error();
>                       else {
> -                             int     n;
> -
>                               if (es->cursor + argcnt > es->linelen)
>                                       return -1;

These two lines are no longer accurate.  They try to make sure there
are enough characters under and to the right of the cursor to match
the number you want to replace (for example, with "2r"), and beep
otherwise - but they count bytes, which is wrong.

To catch the error condition of an excessive argument, i think you
first need to iterate to the right, using the c1 variable and isu8cont(),
and return -1 if you hit the end prematurely.  Do not change anything
in that case.

If so far, you succeed, you know you have to replace the range
[es->cursor, c1].

> -                             for (n = 0; n < argcnt; ++n)
> -                                     es->cbuf[es->cursor + n] = cmd[1];
> -                             es->cursor += n - 1;
> +
> +                             while (argcnt-- > 0) {
> +                                     for (cur = es->cursor + 1;
> +                                         cur < es->linelen; cur++)
> +                                             if (!isu8cont(es->cbuf[cur]))
> +                                                     break;
> +                                     if (cur > 1)
> +                                             del_range(es->cursor, cur - 1);

Given that you don't know the length (in bytes) of the character
to insert yet, i think it may be simpler to delete the byte under the
cursor as well, even though that is slightly inefficient for the ASCII
case.

> +                                     putbuf(&cmd[1], 1, 1);

It seems that here, you may need to measure the length of the character
to insert in bytes and then call something like

  putbuf(cmd + 1, #bytes, 0);


My impression is that the 's' command is likely also affected, but that
can be fixed in a separate patch.

Yours,
  Ingo

Re: ksh(1): vi mode UTF-8 bug

Reply via email to