Stephane Chazelas <stephane.chaze...@gmail.com> wrote:
 |2016-01-11 16:10:28 +0100, Steffen Nurpmeso:
 |[...]
 |> And your diff includes
 |> 
 |>   +.Pp
 |>   +For compatibility with
 |>   +.St -p1003.1-2008
 |>   +.Xr fold 1 ,
 |>   +if a double-width character is followed by two backspace characters
 |>   +instead of the usual one, both are regarded as belonging to that
 |>   +character, and the second one does not decrement the column count.
 |> 
 |> Have you actually ever encountered anything that behaves like this
 |> in canonical mode?  I have not, except that all tested terminals
 |> (a very restricted set, Thomas Dickey and Marc Lehmann, to name
 |> a few, would know much better than myself) do so in non-canonical
 |> mode.  And that is weird enough given that they delete the glyph
 |> that makes up the double-width column and, in order to achieve
 |> that, place a space before the cursor.
 |> 
 |> But i think POSIX text utilities behave in canonical mode, because
 |> beforehand the quote from above we read
 |> 
 |>   Although terminal input in canonical processing mode requires
 |>   the erase character (frequently set to <backspace>) to erase the
 |>   previous character (not byte or column position), terminal
 |>   output is not buffered and is extremely difficult, if not
 |>   impossible, to parse correctly; the interpretation depends
 |>   entirely on the physical device that actually
 |>   displays/prints/stores the output.
 |> 
 |> So if this is true then i think this is even worth a POSIX issue?

 |I'm under the impression you're mixing two things, the \b
 |processing on input and on output. I don't think the input
 |processing matters as far as colrm is concerned.

It seems to me you are right.

 |$ printf '|\uFF21\b\b|\n'
 |||
 |$ printf '|\uFF21\b|\n'
 |||
 |
 |in both xterm and gnome-terminal. As in, you need two backspace
 |characters to delete that character. With only one, the cursor
 |moves back one column, and if you write another character, the
 |double-width glyph is erased (leaving an empty single-width
 |space and your replacement character).
 |
 |So it's right that colrm should assume that
 |<a-double-width-charater>\b\b doesn't change the cursor
 |position.

Oh i want to make clear that i never had any doubt Ingo looked
into this thoroughly.

 |The terminal device line-discipline (so in the kernel), in
 |canonical mode, when you *type* backspace/delete after *typing*
 |that double-width A, at least on Linux echos back only one \b
 |("\b \b", not "\b\b \b\b"), which means it doesn't erase that
 |double-width A properly.
 |
 |I don't think that issue can be solved, as \b is the sequence to
 |move the cursor to the left by one column. So one has to issue
 |\b\b to move the cursor over a double-width character. Linux (at
 |least) has a IUTF8 termios setting (stty iutf8) to tell the line
 |discipline that the charset is UTF-8, so for instance when you
 |type é<Backspace>, it erases the é instead of its last byte from
 |its internal buffer (and echoes "\b \b"), but it doesn't know
 |the width of each character as would be displayed by the
 |terminal, and that is almost impossible to achieve at the line
 |discipline level.
 |
 |Of relevance:
 |
 |https://unix.stackexchange.com/questions/245013/get-the-display-width\
 |-of-a-string-of-characters
 |http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-char\
 |acters-and-character-width

Interesting pointers, but especially the first with the following
discussion, thanks!  In the shell, call-through hooks like
"posix.wcwidth STRING", "posix.wcswidth STRING" or similar would
be a good thing to have, and it still seems to me that UNIX/POSIX
has a lot of room improving the situation regarding real native
language support.  (e.g., Perl is so much more sophisticated in
this area).
Nonetheless..  Sorry for the noise!

--steffen

Reply via email to