Hi, the following is a very simple patch to completely clean up the file less/search.c with respect to UTF-8 handling. It also fixes an outright bug: Searching for uppercase UTF-8 characters currently doesn't work because passing a Unicode codepoint (in this case, the "ch" retrieved with step_char()) to isupper(3) is just totally wrong.
The new loop is fairly standard. Invalid bytes are simply skipped. OK? Ingo P.S. I'm sending this even though my pappend() patch https://marc.info/?l=openbsd-tech&m=155249735725712 is still looking for review. Both are touching different files and are independent of each other. The cleanup is now maybe about half done with only one or two functions remaining in the most affected file line.c, but the files cmdbuf.c, cvt.c, and filename.c still remain, so i might have to speed up a bit to get it done before release comes too close. Index: search.c =================================================================== RCS file: /cvs/src/usr.bin/less/search.c,v retrieving revision 1.19 diff -u -p -r1.19 search.c --- search.c 2 Aug 2017 19:35:57 -0000 1.19 +++ search.c 14 Mar 2019 13:48:59 -0000 @@ -75,12 +75,14 @@ static struct pattern_info filter_info; static int is_ucase(char *str) { - char *str_end = str + strlen(str); - LWCHAR ch; + wchar_t ch; + int len; - while (str < str_end) { - ch = step_char(&str, +1, str_end); - if (isupper(ch)) + for (; *str != '\0"; str += len) { + if ((len = mbtowc(&ch, str, MB_CUR_MAX)) == -1) { + mbtowc(NULL, NULL, MB_CUR_MAX); + len = 1; + } else if (iswupper(ch)) return (1); } return (0);