less(1) UTF-8 bugfix and cleanup: search.c

Ingo Schwarze Thu, 14 Mar 2019 08:05:29 -0700

Hi,

the following is a very simple patch to completely clean up the
file less/search.c with respect to UTF-8 handling.  It also fixes
an outright bug: Searching for uppercase UTF-8 characters currently
doesn't work because passing a Unicode codepoint (in this case, the
"ch" retrieved with step_char()) to isupper(3) is just totally
wrong.


The new loop is fairly standard.  Invalid bytes are simply skipped.

OK?
  Ingo


P.S.
I'm sending this even though my pappend() patch
  https://marc.info/?l=openbsd-tech&m=155249735725712
is still looking for review.

Both are touching different files and are independent of each other.
The cleanup is now maybe about half done with only one or two
functions remaining in the most affected file line.c, but the files
cmdbuf.c, cvt.c, and filename.c still remain, so i might have to
speed up a bit to get it done before release comes too close.


Index: search.c
===================================================================
RCS file: /cvs/src/usr.bin/less/search.c,v
retrieving revision 1.19
diff -u -p -r1.19 search.c
--- search.c    2 Aug 2017 19:35:57 -0000       1.19
+++ search.c    14 Mar 2019 13:48:59 -0000
@@ -75,12 +75,14 @@ static struct pattern_info filter_info;
 static int
 is_ucase(char *str)
 {
-       char *str_end = str + strlen(str);
-       LWCHAR ch;
+       wchar_t ch;
+       int len;
 
-       while (str < str_end) {
-               ch = step_char(&str, +1, str_end);
-               if (isupper(ch))
+       for (; *str != '\0"; str += len) {
+               if ((len = mbtowc(&ch, str, MB_CUR_MAX)) == -1) {
+                       mbtowc(NULL, NULL, MB_CUR_MAX);
+                       len = 1;
+               } else if (iswupper(ch))
                        return (1);
        }
        return (0);

less(1) UTF-8 bugfix and cleanup: search.c

Reply via email to