I realized the issue I describe in the message below (sent to Ingo in private) happens in the tty console (no X11) indifferently of what you set in LC_CTYPE. The problem comes when you have a non english keyboard; you can easily type by accident some non ascii character smaller than 0xc0, then your line is lost. So, I think it *is* a bug.
Vi editing mode users can workaround the problem using vi-show8. As far as I can understand there isn't an easy solid way to know if a non ascii character is utf-8, so (with all respect others work deserve and please correct me if I'm wrong) each time you fix some utf8 input issue there is a big chance you're unfixing the non-utf8 non-ascii input. :-) To preserve a safe ascii implementation you should consider to keep utf8 hacks aside. You could do the same you do with nvi, diverting the effort developers are now putting in this hacks in implementing some utf-8 version of ksh as a package for those who think they need utf-8 support in console (don't count me among them). ----- Forwarded message from Walter Alejandro Iglesias <[email protected]> ----- Date: Fri, 2 Jun 2017 14:47:55 +0200 From: Walter Alejandro Iglesias <[email protected]> To: Ingo Schwarze <[email protected]> Subject: Re: ksh(1): vi mode UTF-8 bug User-Agent: Mutt/1.8.2hg (2017-04-18) Hi Ingo, On Mon, May 29, 2017 at 07:28:37PM +0200, Ingo Schwarze wrote: > Hi Walter, > > Walter Alejandro Iglesias wrote on Mon, May 29, 2017 at 06:44:40PM +0200: > > > Are those wide char versions of C functions consistent enough to write > > a separate implementation to be loaded when LC_TYPE is set to utf-8? > > Sure, you can rewrite the complete shell to use wchar_t * rather > than char *, and if you do that, you can use the new code to handle > ASCII as well, no need to have two copies. But that would be a > huge effort, even more error-prone than the small, careful adjustments > we are doing now, and would have a number of additional downsides; > among others, losing the ability to handle arbitrary bytes, while > in UTF-8 mode. > > For an editor, going wchar_t might be better because having substantial > amounts of UTF-8 in user input is a common case in some files that > people edit. > > For a shell, editing strings that contain non-ASCII is not the main > purpose. Sure, it is nice if the command line is able to handle > strings containing an occasional UTF-8 character. But the main > purpose of the shell remains to safely input and execute Unix-style > command lines, where non-ASCII characters are a non-essential addition > at best. > > Yours, > Ingo > > > For more details, see > https://www.openbsd.org/papers/eurobsdcon2016-utf8.pdf There is an issue I ignore since when it is present (regression?). I suppose it's caused by the way you test if non ascii characteres are utf8. It happens with both vi and emacs editing modes. With LC_CTYPE=C if you type non ascii characters smaller than 0xc0 and pass over them with the cursor you'll see how the cursor thinks it's a wide character. This overrides characters, commands as x, r or s get confused and calling the line from the history file get screwed too. Given opensbsd formally support only utf8, are you aware and accept this issue as part of the deal to handle utf8 or may I report it as a bug? ----- End forwarded message -----
