Re: ksh(1): don't output invalid UTF-8 characters

2017-06-25 Thread Anton Lindqvist
For reference, I just committed the fix, see message below. Thanks to all who helped out. > CVSROOT: /cvs > Module name: src > Changes by: an...@cvs.openbsd.org 2017/06/25 02:51:53 > > Modified files: > bin/ksh: emacs.c > > Log message: > Don't output partial UTF-8

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Walter Alejandro Iglesias
On Mon, Jun 05, 2017 at 10:46:49PM +0200, Ingo Schwarze wrote: > Hi Walter, > > Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 09:21:31PM +0200: > > On Mon, Jun 05, 2017 at 06:06:34PM +0200, Ingo Schwarze wrote: > >> Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 04:50:21PM

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Ingo Schwarze
Hi Walter, Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 09:21:31PM +0200: > On Mon, Jun 05, 2017 at 06:06:34PM +0200, Ingo Schwarze wrote: >> Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 04:50:21PM +0200: > But this time I don't think you need a capture of the sequence.

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Walter Alejandro Iglesias
In article <20170605192131.ga60...@server.roquesor.com> you wrote: > >Encodings using more bytes than required are invalid. In particular, >1100 and 1101 are not valid start bytes, the byte after >1110 must be at least 1010, and the byte after must >be at

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Kurt H Maier
On Mon, Jun 05, 2017 at 09:21:31PM +0200, Walter Alejandro Iglesias wrote: > > I wonder how plan9 handle utf8. > In general, by getting rid of TTYs and character-addressed interfaces almost entirely. Probably not the best fit for OpenBSD. khm

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Walter Alejandro Iglesias
On Mon, Jun 05, 2017 at 06:06:34PM +0200, Ingo Schwarze wrote: > Hi Walter, > > Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 04:50:21PM +0200: > > > report (I'm on chapter 2 of K :-)). I wish with time I'll learn how > > to do it. > > IIRC, you said you saw some undesirable

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Ingo Schwarze
Hi Walter, Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 04:50:21PM +0200: > I'm still not skilled enough to make a proper patch or a clear bug > report (I'm on chapter 2 of K :-)). I wish with time I'll learn how > to do it. IIRC, you said you saw some undesirable behaviour with ksh

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Walter Alejandro Iglesias
Just to applogize to developers here, I'm still not skilled enough to make a proper patch or a clear bug report (I'm on chapter 2 of K :-)). I wish with time I'll learn how to do it. I came to the ksh utf8 discussion because I've been playing with some mail mime encoder just to learn C and

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-05 Thread Ingo Schwarze
Hi Anton, Anton Lindqvist wrote on Sun, Jun 04, 2017 at 11:09:35AM +0200: > Although this discussion hasn't settled, True. I think nicm@ has convinced me that the shell *can* try to be nicer towards terminals, without risking hangs if done very carefully. Probably that's worth doing, it makes

Re: ksh(1): don't output invalid UTF-8 characters

2017-06-04 Thread Anton Lindqvist
Hi, Although this discussion hasn't settled, here's a new diff trying to address the previously raised issues: - The new function x_e_getu8() tries to read a complete UTF-8 character. When a continuation byte is expected but not received, it resets its state and retries. The fix to u8len()

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-22 Thread Boudewijn Dijkstra
Op Fri, 19 May 2017 15:17:55 +0200 schreef Anton Lindqvist : On Fri, May 19, 2017 at 09:33:33AM -0300, Lucas Gabriel Vuotto wrote: On 19/05/17 03:42, Anton Lindqvist wrote: > > +static int > +u8len(unsigned char c) > +{ > + switch (c & 0xF0) { > + case 0xF0: > +

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Nicholas Marriott
Having a look at ksh, I don't see how Anton's original diff is much different from x_emacs() looping around x_e_getc() until it finishes a long key input? It would be better to stop reading early if an invalid UTF-8 byte is input rather than always requiring exactly N bytes; he needs to fix his

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Nicholas Marriott
On Fri, May 19, 2017 at 09:29:06PM +0200, Ingo Schwarze wrote: > On a side note, i don't think gnome-terminal and konsole are relevant. > I never installed them before and did so now for the first time for > testing, but they installed so many libraries that i feel uncomfortable > and unsafe using

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Nicholas Marriott
Hi On Fri, May 19, 2017 at 10:23:08PM +0200, Ingo Schwarze wrote: > Hi Nicholas, > > Nicholas Marriott wrote on Fri, May 19, 2017 at 07:04:53PM +0100: > > > Perhaps I haven't understood what you are saying correctly, > > What matters most is that sending an incomplete character > followed by

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Ingo Schwarze
Hi Nicholas, Nicholas Marriott wrote on Fri, May 19, 2017 at 07:04:53PM +0100: > Perhaps I haven't understood what you are saying correctly, What matters most is that sending an incomplete character followed by U+0008 (ASCII BACKSPACE) is a no-op, both in the sense that it doesn't change the

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Ingo Schwarze
Hi Nicholas, Nicholas Marriott wrote on Fri, May 19, 2017 at 07:27:36PM +0100: > ksh has problems for me with Anton's example in several terminals, > not just in tmux. Mostly the cursor seems to end up one character > off rather than in the prompt, which is less visibly incorrect > perhaps, but

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Nicholas Marriott
ksh has problems for me with Anton's example in several terminals, not just in tmux. Mostly the cursor seems to end up one character off rather than in the prompt, which is less visibly incorrect perhaps, but still wrong. I don't know that ksh will be able to predict this reliably (not uncommon

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Nicholas Marriott
Hi Perhaps I haven't understood what you are saying correctly, but I don't think it is possible to send control characters or any other invalid UTF-8 bytes inside UTF-8 characters and safely predict what the terminal will do. How about these examples: printf '\343\203\010\217a\n' printf

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Ingo Schwarze
Hi Anton, Anton Lindqvist wrote on Fri, May 19, 2017 at 08:42:05AM +0200: > 1. Run ksh under tmux. > > 2. Input the following characters, without spaces: > >a (any character) ^B (backward-char) รถ (any UTF-8 character) > > 3. At this point, the prompt gets overwritten. > > Since ksh read

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Anton Lindqvist
On Fri, May 19, 2017 at 09:33:33AM -0300, Lucas Gabriel Vuotto wrote: > Hi, > > On 19/05/17 03:42, Anton Lindqvist wrote: > > Hi, > > I did submit this problem[1] earlier but with an incomplete analysis and > > fix. Here's a second attempt. > > > > This does only occur when running ksh with

Re: ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Lucas Gabriel Vuotto
Hi, On 19/05/17 03:42, Anton Lindqvist wrote: > Hi, > I did submit this problem[1] earlier but with an incomplete analysis and > fix. Here's a second attempt. > > This does only occur when running ksh with emacs mode under tmux. How to > re-produce: > > 1. Run ksh under tmux. > > 2. Input the

ksh(1): don't output invalid UTF-8 characters

2017-05-19 Thread Anton Lindqvist
Hi, I did submit this problem[1] earlier but with an incomplete analysis and fix. Here's a second attempt. This does only occur when running ksh with emacs mode under tmux. How to re-produce: 1. Run ksh under tmux. 2. Input the following characters, without spaces: a (any character) ^B