Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-27 Thread Sören Tempel
Ingo Schwarze wrote: > It would, and in principle, that would be an improvement. > But i think editline(3) code quality is insufficent for use in a > shell. It's all quite messy and hastily and sloppily written. > I tried to polish some of it in the past, but got distracted, > so editline(3)

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-27 Thread Theo de Raadt
> But i think editline(3) code quality is insufficent for use in a > shell. It's all quite messy and hastily and sloppily written. I also mistrust it. It may be tempting to use a library, but the shell is quite standalone code, and benefits from internal purpose-built code.

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-27 Thread Ingo Schwarze
Hi Soeren, Soeren Tempel wrote on Mon, Jun 07, 2021 at 07:02:25PM +0200: > Nice, wasn't aware that you also had a patch ready. Yeah, that was due to the fact that we, as developers, often use the lists but sometimes also send comments and patches privately, to reduce the mail volume for

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-27 Thread Ingo Schwarze
Hi Jeremie, Jeremie Courreges-Anglas wrote on Thu, Jun 03, 2021 at 11:17:08PM +0200: > On Wed, Jun 02 2021, Ingo Schwarze wrote: >> I'm also adding a few comments as suggested by jca@. Parsing of UTF-8 >> is less trivial than one might think, witnessed once again by the fact >> that i got this

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-08 Thread Nicholas Marriott
Looks good to me, ok nicm On Wed, Jun 02, 2021 at 09:00:16PM +0200, Ingo Schwarze wrote: > Hi, > > feeling hesitant to commit into ksh without at least one proper OK, > i'm resending this patch here, sorry if i missed private feedback. > > What the existing code does: > It tries to make sure

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-07 Thread ropers
Hiya, @Ingo: Sorry I have been out of touch. I have arguably been out of sorts, though hopefully not out of order in your book. > Index: emacs.c > === > RCS file: /cvs/src/bin/ksh/emacs.c,v > retrieving revision 1.87 > diff -u -p

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-07 Thread Sören Tempel
Ingo Schwarze wrote: > Hi, Hello, > Which problem needs fixing: > Of the four-byte UTF-8 sequences, only a subset is identified by the > existing code. The other four-byte UTF-8 sequences still get chopped > up resulting in individual bytes being passed on. > > > I'm also adding a few

Re: Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-03 Thread Jeremie Courreges-Anglas
On Wed, Jun 02 2021, Ingo Schwarze wrote: > Hi, > > feeling hesitant to commit into ksh without at least one proper OK, > i'm resending this patch here, sorry if i missed private feedback. I found two mails in my drafts folder, sorry, you didn't miss any feedback from me. > What the existing

Patch: ksh: fix input handling for 4 byte UTF-8 sequences

2021-06-02 Thread Ingo Schwarze
Hi, feeling hesitant to commit into ksh without at least one proper OK, i'm resending this patch here, sorry if i missed private feedback. What the existing code does: It tries to make sure that multi-byte UTF-8 characters get passed on by the shell without fragmentation, not one byte at time.

Re: ksh: fix input handling for 4 byte UTF-8 sequences

2021-05-08 Thread Sören Tempel
Ping. Sören Tempel wrote: > Hello, > > Currently, ksh does not correctly calculate the length of 4 byte UTF-8 > sequences in emacs input mode. For demonstration purposes try inputting > an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can > be identified by checking if the

ksh: fix input handling for 4 byte UTF-8 sequences

2021-04-04 Thread Sören Tempel
Hello, Currently, ksh does not correctly calculate the length of 4 byte UTF-8 sequences in emacs input mode. For demonstration purposes try inputting an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can be identified by checking if the first four bits are set and the fifth bit