Stephane Chazelas <stephane.chaze...@gmail.com> wrote: |2016-01-11 16:10:28 +0100, Steffen Nurpmeso: |[...] |> And your diff includes |> |> +.Pp |> +For compatibility with |> +.St -p1003.1-2008 |> +.Xr fold 1 , |> +if a double-width character is followed by two backspace characters |> +instead of the usual one, both are regarded as belonging to that |> +character, and the second one does not decrement the column count. |> |> Have you actually ever encountered anything that behaves like this |> in canonical mode? I have not, except that all tested terminals |> (a very restricted set, Thomas Dickey and Marc Lehmann, to name |> a few, would know much better than myself) do so in non-canonical |> mode. And that is weird enough given that they delete the glyph |> that makes up the double-width column and, in order to achieve |> that, place a space before the cursor. |> |> But i think POSIX text utilities behave in canonical mode, because |> beforehand the quote from above we read |> |> Although terminal input in canonical processing mode requires |> the erase character (frequently set to <backspace>) to erase the |> previous character (not byte or column position), terminal |> output is not buffered and is extremely difficult, if not |> impossible, to parse correctly; the interpretation depends |> entirely on the physical device that actually |> displays/prints/stores the output. |> |> So if this is true then i think this is even worth a POSIX issue?
|I'm under the impression you're mixing two things, the \b |processing on input and on output. I don't think the input |processing matters as far as colrm is concerned. It seems to me you are right. |$ printf '|\uFF21\b\b|\n' ||| |$ printf '|\uFF21\b|\n' ||| | |in both xterm and gnome-terminal. As in, you need two backspace |characters to delete that character. With only one, the cursor |moves back one column, and if you write another character, the |double-width glyph is erased (leaving an empty single-width |space and your replacement character). | |So it's right that colrm should assume that |<a-double-width-charater>\b\b doesn't change the cursor |position. Oh i want to make clear that i never had any doubt Ingo looked into this thoroughly. |The terminal device line-discipline (so in the kernel), in |canonical mode, when you *type* backspace/delete after *typing* |that double-width A, at least on Linux echos back only one \b |("\b \b", not "\b\b \b\b"), which means it doesn't erase that |double-width A properly. | |I don't think that issue can be solved, as \b is the sequence to |move the cursor to the left by one column. So one has to issue |\b\b to move the cursor over a double-width character. Linux (at |least) has a IUTF8 termios setting (stty iutf8) to tell the line |discipline that the charset is UTF-8, so for instance when you |type é<Backspace>, it erases the é instead of its last byte from |its internal buffer (and echoes "\b \b"), but it doesn't know |the width of each character as would be displayed by the |terminal, and that is almost impossible to achieve at the line |discipline level. | |Of relevance: | |https://unix.stackexchange.com/questions/245013/get-the-display-width\ |-of-a-string-of-characters |http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-char\ |acters-and-character-width Interesting pointers, but especially the first with the following discussion, thanks! In the shell, call-through hooks like "posix.wcwidth STRING", "posix.wcswidth STRING" or similar would be a good thing to have, and it still seems to me that UNIX/POSIX has a lot of room improving the situation regarding real native language support. (e.g., Perl is so much more sophisticated in this area). Nonetheless.. Sorry for the noise! --steffen