Ingo Schwarze <schwa...@usta.de> wrote: > Hi, Hello,
> Which problem needs fixing: > Of the four-byte UTF-8 sequences, only a subset is identified by the > existing code. The other four-byte UTF-8 sequences still get chopped > up resulting in individual bytes being passed on. > > > I'm also adding a few comments as suggested by jca@. Parsing of UTF-8 > is less trivial than one might think, witnessed once again by the fact > that i got this code wrong in the first place. > > I also changed "cc & 0x20" to "cc > 0x9f" and "cc & 0x30" to "cc > 0x8f" > for uniformity and readabilty - UTF-8-parsing is bad enough without > needless micro-optimization, right? Nice, wasn't aware that you also had a patch ready. Sounds good to me and also fixes the problem I originally experienced with 4 byte UTF-8 sequences. > Note that even with the patch below, moving backward and forward > over a blowfish icon on the command line still does not work because > the character is width 2 and the ksh code intentionally does not > use wcwidth(3). But maybe it improves something in tmux? Not sure. Character movements over emojis (e.g. U+1F421) are currently broken because the ksh code doesn't correctly determine the amount of columns needed for a given character (i.e. what you would normally do with wcwidth). I tried fixing this but without wchar.h doing so seemed very cumbersome. Inputting emojis works with your patch though and was broken previously. > Either way, unless it causes regressions, this (or a further improved > version) should go in because what is there is clearly wrong. > > OK? Your diff looks good to me. BTW: Is there any reason why ksh doesn't use editline for all its line editing needs? That would allow handling all these nitty-gritty details in a central place. Greetings, Sören