2016-01-11 16:10:28 +0100, Steffen Nurpmeso:
[...]
> And your diff includes
> 
>   +.Pp
>   +For compatibility with
>   +.St -p1003.1-2008
>   +.Xr fold 1 ,
>   +if a double-width character is followed by two backspace characters
>   +instead of the usual one, both are regarded as belonging to that
>   +character, and the second one does not decrement the column count.
> 
> Have you actually ever encountered anything that behaves like this
> in canonical mode?  I have not, except that all tested terminals
> (a very restricted set, Thomas Dickey and Marc Lehmann, to name
> a few, would know much better than myself) do so in non-canonical
> mode.  And that is weird enough given that they delete the glyph
> that makes up the double-width column and, in order to achieve
> that, place a space before the cursor.
> 
> But i think POSIX text utilities behave in canonical mode, because
> beforehand the quote from above we read
> 
>   Although terminal input in canonical processing mode requires
>   the erase character (frequently set to <backspace>) to erase the
>   previous character (not byte or column position), terminal
>   output is not buffered and is extremely difficult, if not
>   impossible, to parse correctly; the interpretation depends
>   entirely on the physical device that actually
>   displays/prints/stores the output.
> 
> So if this is true then i think this is even worth a POSIX issue?
> I repeat that i have not yet encountered any utility which behaves
> the way that POSIX describes and Ingo tries to address with
> special processing?
[...]

I'm under the impression you're mixing two things, the \b
processing on input and on output. I don't think the input
processing matters as far as colrm is concerned.

$ printf '|\uFF21\b\b|\n'
||
$ printf '|\uFF21\b|\n'
| |

in both xterm and gnome-terminal. As in, you need two backspace
characters to delete that character. With only one, the cursor
moves back one column, and if you write another character, the
double-width glyph is erased (leaving an empty single-width
space and your replacement character).

So it's right that colrm should assume that
<a-double-width-charater>\b\b doesn't change the cursor
position.

The terminal device line-discipline (so in the kernel), in
canonical mode, when you *type* backspace/delete after *typing*
that double-width A, at least on Linux echos back only one \b
("\b \b", not "\b\b \b\b"), which means it doesn't erase that
double-width A properly.

I don't think that issue can be solved, as \b is the sequence to
move the cursor to the left by one column. So one has to issue
\b\b to move the cursor over a double-width character. Linux (at
least) has a IUTF8 termios setting (stty iutf8) to tell the line
discipline that the charset is UTF-8, so for instance when you
type é<Backspace>, it erases the é instead of its last byte from
its internal buffer (and echoes "\b \b"), but it doesn't know
the width of each character as would be displayed by the
terminal, and that is almost impossible to achieve at the line
discipline level.

Of relevance:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-characters-and-character-width

-- 
Stephane

Reply via email to