Stephane Chazelas <[email protected]> wrote:
|2016-01-11 16:10:28 +0100, Steffen Nurpmeso:
|[...]
|> And your diff includes
|>
|> +.Pp
|> +For compatibility with
|> +.St -p1003.1-2008
|> +.Xr fold 1 ,
|> +if a double-width character is followed by two backspace characters
|> +instead of the usual one, both are regarded as belonging to that
|> +character, and the second one does not decrement the column count.
|>
|> Have you actually ever encountered anything that behaves like this
|> in canonical mode? I have not, except that all tested terminals
|> (a very restricted set, Thomas Dickey and Marc Lehmann, to name
|> a few, would know much better than myself) do so in non-canonical
|> mode. And that is weird enough given that they delete the glyph
|> that makes up the double-width column and, in order to achieve
|> that, place a space before the cursor.
|>
|> But i think POSIX text utilities behave in canonical mode, because
|> beforehand the quote from above we read
|>
|> Although terminal input in canonical processing mode requires
|> the erase character (frequently set to <backspace>) to erase the
|> previous character (not byte or column position), terminal
|> output is not buffered and is extremely difficult, if not
|> impossible, to parse correctly; the interpretation depends
|> entirely on the physical device that actually
|> displays/prints/stores the output.
|>
|> So if this is true then i think this is even worth a POSIX issue?
|I'm under the impression you're mixing two things, the \b
|processing on input and on output. I don't think the input
|processing matters as far as colrm is concerned.
It seems to me you are right.
|$ printf '|\uFF21\b\b|\n'
|||
|$ printf '|\uFF21\b|\n'
|||
|
|in both xterm and gnome-terminal. As in, you need two backspace
|characters to delete that character. With only one, the cursor
|moves back one column, and if you write another character, the
|double-width glyph is erased (leaving an empty single-width
|space and your replacement character).
|
|So it's right that colrm should assume that
|<a-double-width-charater>\b\b doesn't change the cursor
|position.
Oh i want to make clear that i never had any doubt Ingo looked
into this thoroughly.
|The terminal device line-discipline (so in the kernel), in
|canonical mode, when you *type* backspace/delete after *typing*
|that double-width A, at least on Linux echos back only one \b
|("\b \b", not "\b\b \b\b"), which means it doesn't erase that
|double-width A properly.
|
|I don't think that issue can be solved, as \b is the sequence to
|move the cursor to the left by one column. So one has to issue
|\b\b to move the cursor over a double-width character. Linux (at
|least) has a IUTF8 termios setting (stty iutf8) to tell the line
|discipline that the charset is UTF-8, so for instance when you
|type é<Backspace>, it erases the é instead of its last byte from
|its internal buffer (and echoes "\b \b"), but it doesn't know
|the width of each character as would be displayed by the
|terminal, and that is almost impossible to achieve at the line
|discipline level.
|
|Of relevance:
|
|https://unix.stackexchange.com/questions/245013/get-the-display-width\
|-of-a-string-of-characters
|http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-char\
|acters-and-character-width
Interesting pointers, but especially the first with the following
discussion, thanks! In the shell, call-through hooks like
"posix.wcwidth STRING", "posix.wcswidth STRING" or similar would
be a good thing to have, and it still seems to me that UNIX/POSIX
has a lot of room improving the situation regarding real native
language support. (e.g., Perl is so much more sophisticated in
this area).
Nonetheless.. Sorry for the noise!
--steffen