subject:"rev\(1\)\: pull MB_CUR_MAX out of the hot loop"

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-12 Thread Ingo Schwarze

Hi Martijn,

Martijn van Duren wrote on Sun, Jan 09, 2022 at 08:09:20AM +0100:

> I fully agree with your reasoning and also prefer this one over the
> previous two diff.
> 
> OK martijn@

Thanks for checking; committed.
Also thanks to cheloha@ for his work on this issue
and to milllert@ for his feedback.
  Ingo


> On Sun, 2022-01-09 at 00:45 +0100, Ingo Schwarze wrote:

>> Index: rev.c
>> ===
>> RCS file: /cvs/src/usr.bin/rev/rev.c,v
>> retrieving revision 1.13
>> diff -u -p -r1.13 rev.c
>> --- rev.c10 Apr 2016 17:06:52 -  1.13
>> +++ rev.c8 Jan 2022 23:19:46 -
>> @@ -46,13 +46,14 @@ void usage(void);
>>  int
>>  main(int argc, char *argv[])
>>  {
>> -char *filename, *p = NULL, *t, *u;
>> +char *filename, *p = NULL, *t, *te, *u;
>>  FILE *fp;
>>  ssize_t len;
>>  size_t ps = 0;
>> -int ch, rval;
>> +int ch, multibyte, rval;
>>  
>>  setlocale(LC_CTYPE, "");
>> +multibyte = MB_CUR_MAX > 1;
>>  
>>  if (pledge("stdio rpath", NULL) == -1)
>>  err(1, "pledge");
>> @@ -83,14 +84,16 @@ main(int argc, char *argv[])
>>  if (p[len - 1] == '\n')
>>  --len;
>>  for (t = p + len - 1; t >= p; --t) {
>> -if (isu8cont(*t))
>> -continue;
>> -u = t;
>> -do {
>> -putchar(*u);
>> -} while (isu8cont(*(++u)));
>> +te = t;
>> +if (multibyte)
>> +while (t > p && isu8cont(*t))
>> +--t;
>> +for (u = t; u <= te; ++u)
>> +if (putchar(*u) == EOF)
>> +err(1, "stdout");
>>  }
>> -putchar('\n');
>> +if (putchar('\n') == EOF)
>> +err(1, "stdout");
>>  }
>>  if (ferror(fp)) {
>>  warn("%s", filename);
>> @@ -104,7 +107,7 @@ main(int argc, char *argv[])
>>  int
>>  isu8cont(unsigned char c)
>>  {
>> -return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
>> +return (c & (0x80 | 0x40)) == 0x80;
>>  }
>>  
>>  void

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Martijn van Duren

I fully agree with your reasoning and also prefer this one over the
previous two diff.

OK martijn@

On Sun, 2022-01-09 at 00:45 +0100, Ingo Schwarze wrote:
> Hi,
> 
> Martijn van Duren wrote on Sat, Jan 08, 2022 at 08:30:20AM +0100:
> 
> > Why not go for the following diff?
> > It has a comparable speed increase, but without the added complexity
> > of a second inner loop.
> 
> Actually, i like the idea of not duplicating the loop, in cases where
> that is possible without a noticeable performance cost.
> 
> I admit i OK'ed the original UTF-8 diff almost six years ago,
> but looking at the code again, i now dislike how it is testing
> every byte twice for no apparent reason.  Even worse, i regard
> the lack of I/O error checking on write operations as a bug.
> Note that it does make sense to proceed to the next file on *read*
> errors, whereas a *write* error should better be fatal.
> 
> For those reason, and because i prefer versions of isu8cont()
> that do not inspect the locale, i propose the following patch
> instead.
> 
> As an aside, i tried using fwrite(3) instead of the manual
> putchar(*u) loop, but that is almost exactly as slow as repeatedly
> calling MB_CUR_MAX, probably due to either locking or orientation
> setting or some aspect of iov handling or buffering or more than
> one of those; i refrained from profiling it.
> 
>  - 8< - schnipp - >8 - 8< - schnapp - >8 -
> 
> Index: rev.c
> ===
> RCS file: /cvs/src/usr.bin/rev/rev.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 rev.c
> --- rev.c 10 Apr 2016 17:06:52 -  1.13
> +++ rev.c 8 Jan 2022 23:19:46 -
> @@ -46,13 +46,14 @@ void usage(void);
>  int
>  main(int argc, char *argv[])
>  {
> - char *filename, *p = NULL, *t, *u;
> + char *filename, *p = NULL, *t, *te, *u;
>   FILE *fp;
>   ssize_t len;
>   size_t ps = 0;
> - int ch, rval;
> + int ch, multibyte, rval;
>  
>   setlocale(LC_CTYPE, "");
> + multibyte = MB_CUR_MAX > 1;
>  
>   if (pledge("stdio rpath", NULL) == -1)
>   err(1, "pledge");
> @@ -83,14 +84,16 @@ main(int argc, char *argv[])
>   if (p[len - 1] == '\n')
>   --len;
>   for (t = p + len - 1; t >= p; --t) {
> - if (isu8cont(*t))
> - continue;
> - u = t;
> - do {
> - putchar(*u);
> - } while (isu8cont(*(++u)));
> + te = t;
> + if (multibyte)
> + while (t > p && isu8cont(*t))
> + --t;
> + for (u = t; u <= te; ++u)
> + if (putchar(*u) == EOF)
> + err(1, "stdout");
>   }
> - putchar('\n');
> + if (putchar('\n') == EOF)
> + err(1, "stdout");
>   }
>   if (ferror(fp)) {
>   warn("%s", filename);
> @@ -104,7 +107,7 @@ main(int argc, char *argv[])
>  int
>  isu8cont(unsigned char c)
>  {
> - return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
> + return (c & (0x80 | 0x40)) == 0x80;
>  }
>  
>  void
> 
>  - 8< - schnipp - >8 - 8< - schnapp - >8 -
> 
> The rest of this mail contains test reports.
> These test reports use two test programs:
> 
>  - 8< - schnipp - >8 - 8< - schnapp - >8 -
> 
> /* makeutf8.c */
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int
> main(void)
> {
>   FILE*in, *out;
>   char*line = NULL;
>   size_t   linesize = 0;
>   unsigned int wc, wcl = 0;
> 
>   if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL)
>   err(1, "setlocale");
> 
>   if ((in = fopen("/usr/src/gnu/usr.bin/perl/lib/unicore/"
>   "UnicodeData.txt", "r")) == NULL)
>   err(1, "in");
>   if ((out = fopen("utf8.txt", "w")) == NULL)
>   err(1, "out");
> 
>   while (getline(, , in) != -1) {
>   if (sscanf(line, "%x;", ) < 1)
>   err(1, "scanf: %s", line);
>   if (wc > wcl + 1)
>   if (fputwc(L'\n', out) == WEOF)
>   err(1, "write EOL");
>   if (fputwc(wc, out) == WEOF) {
>   if (errno == EILSEQ)
>   warn("write U+%04x", wc);
>   else
>   err(1, "write U+%04x", wc);
>   }
>   wcl = wc;
>   }
>   if (ferror(in))
>   errx(1, "read error");
>

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Ingo Schwarze

Oopsie...

Ingo Schwarze wrote on Sun, Jan 09, 2022 at 12:45:27AM +0100:

> # Demonstate broken error handling in the old code.
> 
>$ ./obj/wrap /obin/rev   
>$ echo $?  
>   42

Stupid me, this is what i meant, of course:

   $ ./obj/wrap /oldbin/rev
   $ echo $?
  0

The point is that the -current rev(1) silently ignores write errors
*and returns indicating success* even though it produced no output or
incomplete output.  But i can't show that if i mistype the path to
the copy of the old binary that i saved earlier...

Yours,
  Ingo

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Ingo Schwarze

Hi,

Martijn van Duren wrote on Sat, Jan 08, 2022 at 08:30:20AM +0100:

> Why not go for the following diff?
> It has a comparable speed increase, but without the added complexity
> of a second inner loop.

Actually, i like the idea of not duplicating the loop, in cases where
that is possible without a noticeable performance cost.

I admit i OK'ed the original UTF-8 diff almost six years ago,
but looking at the code again, i now dislike how it is testing
every byte twice for no apparent reason.  Even worse, i regard
the lack of I/O error checking on write operations as a bug.
Note that it does make sense to proceed to the next file on *read*
errors, whereas a *write* error should better be fatal.

For those reason, and because i prefer versions of isu8cont()
that do not inspect the locale, i propose the following patch
instead.

As an aside, i tried using fwrite(3) instead of the manual
putchar(*u) loop, but that is almost exactly as slow as repeatedly
calling MB_CUR_MAX, probably due to either locking or orientation
setting or some aspect of iov handling or buffering or more than
one of those; i refrained from profiling it.

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   8 Jan 2022 23:19:46 -
@@ -46,13 +46,14 @@ void usage(void);
 int
 main(int argc, char *argv[])
 {
-   char *filename, *p = NULL, *t, *u;
+   char *filename, *p = NULL, *t, *te, *u;
FILE *fp;
ssize_t len;
size_t ps = 0;
-   int ch, rval;
+   int ch, multibyte, rval;
 
setlocale(LC_CTYPE, "");
+   multibyte = MB_CUR_MAX > 1;
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -83,14 +84,16 @@ main(int argc, char *argv[])
if (p[len - 1] == '\n')
--len;
for (t = p + len - 1; t >= p; --t) {
-   if (isu8cont(*t))
-   continue;
-   u = t;
-   do {
-   putchar(*u);
-   } while (isu8cont(*(++u)));
+   te = t;
+   if (multibyte)
+   while (t > p && isu8cont(*t))
+   --t;
+   for (u = t; u <= te; ++u)
+   if (putchar(*u) == EOF)
+   err(1, "stdout");
}
-   putchar('\n');
+   if (putchar('\n') == EOF)
+   err(1, "stdout");
}
if (ferror(fp)) {
warn("%s", filename);
@@ -104,7 +107,7 @@ main(int argc, char *argv[])
 int
 isu8cont(unsigned char c)
 {
-   return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
+   return (c & (0x80 | 0x40)) == 0x80;
 }
 
 void

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

The rest of this mail contains test reports.
These test reports use two test programs:

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

/* makeutf8.c */

#include 
#include 
#include 
#include 
#include 

int
main(void)
{
FILE*in, *out;
char*line = NULL;
size_t   linesize = 0;
unsigned int wc, wcl = 0;

if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL)
err(1, "setlocale");

if ((in = fopen("/usr/src/gnu/usr.bin/perl/lib/unicore/"
"UnicodeData.txt", "r")) == NULL)
err(1, "in");
if ((out = fopen("utf8.txt", "w")) == NULL)
err(1, "out");

while (getline(, , in) != -1) {
if (sscanf(line, "%x;", ) < 1)
err(1, "scanf: %s", line);
if (wc > wcl + 1)
if (fputwc(L'\n', out) == WEOF)
err(1, "write EOL");
if (fputwc(wc, out) == WEOF) {
if (errno == EILSEQ)
warn("write U+%04x", wc);
else
err(1, "write U+%04x", wc);
}
wcl = wc;
}
if (ferror(in))
errx(1, "read error");
if (fputwc(L'\n', out) == WEOF)
err(1, "write final EOL");

fclose(out);
fclose(in);
return 0;
}

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

/* wrap.c */

#include 

int
main(int argc, char *argv[])
{

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Todd C . Miller

On Sat, 08 Jan 2022 08:30:20 +0100, Martijn van Duren wrote:

> Why not go for the following diff?
> It has a comparable speed increase, but without the added complexity of
> a second inner loop.

Personally, that is what I would have done as well, but I am OK
with either direction.

 - todd

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Martijn van Duren

On Fri, 2022-01-07 at 15:00 -0600, Scott Cheloha wrote:
> On Fri, Jan 07, 2022 at 01:43:24PM -0600, Scott Cheloha wrote:
> > 
> > [...]
> > 
> > Like this?
> > 
> > [...]
> 
> Updated: make the for-loop update expressions match.
> 
Why not go for the following diff?
It has a comparable speed increase, but without the added complexity of
a second inner loop.

Either diff is fine by me though, so if you want to go ahead with your
diff: OK martijn@

$ export LC_ALL=C
$ for i in $(jot 10); do time rev /usr/share/dict/words > /dev/null; done
0m00.17s real 0m00.16s user 0m00.02s system
0m00.17s real 0m00.17s user 0m00.01s system
0m00.17s real 0m00.17s user 0m00.01s system
0m00.17s real 0m00.18s user 0m00.00s system
0m00.17s real 0m00.17s user 0m00.01s system
0m00.20s real 0m00.16s user 0m00.03s system
0m00.23s real 0m00.16s user 0m00.01s system
0m00.17s real 0m00.18s user 0m00.00s system
0m00.17s real 0m00.21s user 0m00.00s system
0m00.17s real 0m00.16s user 0m00.01s system
$ for i in $(jot 10); do time ./obj/rev.scott /usr/share/dict/words > 
/dev/null; done
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.02s system
0m00.06s real 0m00.04s user 0m00.02s system
0m00.05s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.05s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.03s user 0m00.01s system
0m00.04s real 0m00.05s user 0m00.00s system
$ for i in $(jot 10); do time ./obj/rev.martijn /usr/share/dict/words > 
/dev/null; done
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.03s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.06s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.05s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.01s system
$ export LC_ALL=en_US.UTF-8
$ for i in $(jot 10); do time rev /usr/share/dict/words > /dev/null; done
0m00.17s real 0m00.17s user 0m00.00s system
0m00.17s real 0m00.18s user 0m00.00s system
0m00.17s real 0m00.17s user 0m00.00s system
0m00.17s real 0m00.18s user 0m00.01s system
0m00.17s real 0m00.17s user 0m00.01s system
0m00.17s real 0m00.18s user 0m00.00s system
0m00.17s real 0m00.17s user 0m00.00s system
0m00.17s real 0m00.17s user 0m00.02s system
0m00.17s real 0m00.17s user 0m00.01s system
0m00.17s real 0m00.18s user 0m00.00s system
$ for i in $(jot 10); do time ./obj/rev.scott /usr/share/dict/words > 
/dev/null; done
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.05s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.00s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.01s system
0m00.04s real 0m00.04s user 0m00.00s system
$ for i in $(jot 10); do time ./obj/rev.martijn /usr/share/dict/words > 
/dev/null; done
0m00.04s real 0m00.04s user 0m00.01s system
0m00.05s real 0m00.04s user 0m00.00s system
0m00.05s real 0m00.05s user 0m00.00s system
0m00.05s real 0m00.03s user 0m00.02s system
0m00.05s real 0m00.05s user 0m00.00s system
0m00.05s real 0m00.05s user 0m00.00s system
0m00.05s real 0m00.05s user 0m00.00s system
0m00.05s real 0m00.04s user 0m00.01s system
0m00.05s real 0m00.04s user 0m00.01s system
0m00.05s real 0m00.05s user 0m00.00s system

martijn@

ps. I don't have your fancy nanotime. Where can I find that?

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   8 Jan 2022 07:09:19 -
@@ -43,6 +43,8 @@
 int isu8cont(unsigned char);
 void usage(void);
 
+int multibyte;
+
 int
 main(int argc, char *argv[])
 {
@@ -53,6 +55,7 @@ main(int argc, char *argv[])
int ch, rval;
 
setlocale(LC_CTYPE, "");
+   multibyte = MB_CUR_MAX > 1;
 
if (pledge("stdio rpath",

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Todd C . Miller

On Fri, 07 Jan 2022 15:00:52 -0600, Scott Cheloha wrote:

> Updated: make the for-loop update expressions match.

Looks good to me.  OK millert@

 - todd

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Scott Cheloha

On Fri, Jan 07, 2022 at 01:43:24PM -0600, Scott Cheloha wrote:
>
> [...]
> 
> Like this?
> 
> [...]

Updated: make the for-loop update expressions match.

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   7 Jan 2022 21:00:11 -
@@ -50,9 +50,10 @@ main(int argc, char *argv[])
FILE *fp;
ssize_t len;
size_t ps = 0;
-   int ch, rval;
+   int ch, multibyte, rval;
 
setlocale(LC_CTYPE, "");
+   multibyte = MB_CUR_MAX > 1;
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -82,13 +83,18 @@ main(int argc, char *argv[])
while ((len = getline(, , fp)) != -1) {
if (p[len - 1] == '\n')
--len;
-   for (t = p + len - 1; t >= p; --t) {
-   if (isu8cont(*t))
-   continue;
-   u = t;
-   do {
-   putchar(*u);
-   } while (isu8cont(*(++u)));
+   if (multibyte) {
+   for (t = p + len - 1; t >= p; t--) {
+   if (isu8cont(*t))
+   continue;
+   u = t;
+   do {
+   putchar(*u);
+   } while (isu8cont(*(++u)));
+   }
+   } else {
+   for (t = p + len - 1; t >= p; t--)
+   putchar(*t);
}
putchar('\n');
}
@@ -104,7 +110,7 @@ main(int argc, char *argv[])
 int
 isu8cont(unsigned char c)
 {
-   return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
+   return (c & (0x80 | 0x40)) == 0x80;
 }
 
 void

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Scott Cheloha

On Fri, Jan 07, 2022 at 07:28:30PM +0100, Ingo Schwarze wrote:
> 
> [...]
> 
> Scott Cheloha wrote on Fri, Jan 07, 2022 at 09:45:41AM -0600:
> 
> > In rev(1), we call MB_CUR_MAX for every byte in the input stream.
> > This is extremely expensive.
> 
> [...]
> 
> So i agree with jour general idea to improve rev(1) and other
> small utility programs that might have similar issues.
> 
> > It is much cheaper to call it once per line and use a simpler loop
> > (see the inlined patch below) if the current locale doesn't handle
> > multibyte characters:
> 
> [useful measurements snipped]
> 
> > CC schwarze@ to double-check I'm not misunderstanding MB_CUR_MAX.
> > I'm under the impression the return value cannot change unless
> > we call setlocale(3).
> 
> Mostly correct, except that uselocale(3) may also change it.
> But of course, we don't call uselocale(3) in small utility programs.
> 
> > ok?
> 
> I think your patch can still be improved a bit before committing.
> 
> millert@ wrote:
> 
> > Why not just store the result of MB_CUR_MAX in a variable and use
> > that?  It's value is not going to change during execution of the
> > program.  This made a big difference in sort(1) once upon a time.
> 
> I agree with that.
> 
> Even with your patch, rev(1) would still be slow for large files
> consisting of very short lines.

Like this?

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   7 Jan 2022 19:41:23 -
@@ -50,9 +50,10 @@ main(int argc, char *argv[])
FILE *fp;
ssize_t len;
size_t ps = 0;
-   int ch, rval;
+   int ch, rval, multibyte;
 
setlocale(LC_CTYPE, "");
+   multibyte = MB_CUR_MAX > 1;
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -82,13 +83,18 @@ main(int argc, char *argv[])
while ((len = getline(, , fp)) != -1) {
if (p[len - 1] == '\n')
--len;
-   for (t = p + len - 1; t >= p; --t) {
-   if (isu8cont(*t))
-   continue;
-   u = t;
-   do {
-   putchar(*u);
-   } while (isu8cont(*(++u)));
+   if (multibyte) {
+   for (t = p + len - 1; t >= p; --t) {
+   if (isu8cont(*t))
+   continue;
+   u = t;
+   do {
+   putchar(*u);
+   } while (isu8cont(*(++u)));
+   }
+   } else {
+   for (t = p + len - 1; t >= p; t--)
+   putchar(*t);
}
putchar('\n');
}
@@ -104,7 +110,7 @@ main(int argc, char *argv[])
 int
 isu8cont(unsigned char c)
 {
-   return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
+   return (c & (0x80 | 0x40)) == 0x80;
 }
 
 void

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Ingo Schwarze

Hi Scott,

thank you for spending quite some work on our small utility programs,
and sorry for failing to follow up as often as i should.

Scott Cheloha wrote on Fri, Jan 07, 2022 at 09:45:41AM -0600:

> In rev(1), we call MB_CUR_MAX for every byte in the input stream.
> This is extremely expensive.

That's actually quite ridiculous.

The reason why MB_CUR_MAX is so expensive is -- multi-thread support.
That is of course completely pointless for small utility programs,
on any operating system.  And on OpenBSD in particular, it is
mostly useless even for large application programs that actually
use multiple threads because we basically only support one single
locale in the first place.

So i had a brief look at the implementation of MB_CUR_MAX in our libc
to see whether it could be made less expensive, but i don't see a
simple way without potentially breaking some application programs
(for example, consider a third-party webserver that wants to run
with a POSIX locale in one thread but with a UTF-8 locale in
another thread).

Apart from that, even if it could be made less expensive, avoiding
to call it in tight loops might still be valuable in the interest
of portability.  On other systems, it must necessarily be expensive:
that web server might want to run one thread with UTF-8, one with
UTF-16, one with UTF-32, and one with Shift JIS...

So i agree with jour general idea to improve rev(1) and other
small utility programs that might have similar issues.

> It is much cheaper to call it once per line and use a simpler loop
> (see the inlined patch below) if the current locale doesn't handle
> multibyte characters:

[useful measurements snipped]

> CC schwarze@ to double-check I'm not misunderstanding MB_CUR_MAX.
> I'm under the impression the return value cannot change unless
> we call setlocale(3).

Mostly correct, except that uselocale(3) may also change it.
But of course, we don't call uselocale(3) in small utility programs.

> ok?

I think your patch can still be improved a bit before committing.

millert@ wrote:

> Why not just store the result of MB_CUR_MAX in a variable and use
> that?  It's value is not going to change during execution of the
> program.  This made a big difference in sort(1) once upon a time.

I agree with that.

Even with your patch, rev(1) would still be slow for large files
consisting of very short lines.

> Index: rev.c
> ===
> RCS file: /cvs/src/usr.bin/rev/rev.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 rev.c
> --- rev.c 10 Apr 2016 17:06:52 -  1.13
> +++ rev.c 7 Jan 2022 15:38:39 -
> @@ -82,13 +82,18 @@ main(int argc, char *argv[])
>   while ((len = getline(, , fp)) != -1) {
>   if (p[len - 1] == '\n')
>   --len;
> - for (t = p + len - 1; t >= p; --t) {
> - if (isu8cont(*t))
> - continue;
> - u = t;
> - do {
> - putchar(*u);
> - } while (isu8cont(*(++u)));
> + if (MB_CUR_MAX == 1) {
> + for (t = p + len - 1; t >= p; t--)
> + putchar(*t);
> + } else {
> + for (t = p + len - 1; t >= p; --t) {

I dislike how you use t-- in one loop but --t in the other.
It made me wonder for a moment whether that needs to be different
depending on the character set.  Wouldn't it be less confusing
to pick one of the forms and stick to it?

> + if (isu8cont(*t))
> + continue;
> + u = t;
> + do {
> + putchar(*u);
> + } while (isu8cont(*(++u)));
> + }
>   }
>   putchar('\n');
>   }
> @@ -104,7 +109,7 @@ main(int argc, char *argv[])
>  int
>  isu8cont(unsigned char c)
>  {
> - return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
> + return (c & (0x80 | 0x40)) == 0x80;
>  }

The isu8cont() function is a semi-standard function used in many
OpenBSD utility programs, originally designed by tedu@, i think,
and it has proven highly useful to get UTF-8 support going.

But back in the day, i think we failed to realize that MB_CUR_MAX
is expensive.  In that sense, it is probably somewhat ill-designed:
it is typically called in tight loops, so i guess there aren't
many use cases where the original version is really that good.

Given that ksh(1) and ypldap(8) already use versions of this
function identical to your version, i'm OK with your suggestion
of keeping the name while changing the semantics of the function,

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Todd C . Miller

Why not just store the result of MB_CUR_MAX in a variable and use
that?  It's value is not going to change during execution of the
program.  This made a big difference in sort(1) once upon a time.

 - todd

rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Scott Cheloha

In rev(1), we call MB_CUR_MAX for every byte in the input stream.
This is extremely expensive.

It is much cheaper to call it once per line and use a simpler loop
(see the inlined patch below) if the current locale doesn't handle
multibyte characters:

# -current rev(1),
$ for i in $(jot 10); do nanotime rev /usr/share/dict/words > /dev/null; done
0.136695757 real 0.130 user 0.010 sys
0.075677725 real 0.060 user 0.000 sys
0.075275764 real 0.070 user 0.000 sys
0.075140009 real 0.070 user 0.000 sys
0.075186630 real 0.070 user 0.010 sys
0.075256959 real 0.080 user 0.000 sys
0.076920069 real 0.080 user 0.000 sys
0.075097523 real 0.060 user 0.010 sys
0.075369093 real 0.070 user 0.000 sys
0.075532266 real 0.070 user 0.000 sys

# patched rev(1)
$ for i in $(jot 10); do nanotime obj/rev /usr/share/dict/words > /dev/null; 
done
0.068547813 real 0.060 user 0.010 sys
0.022880303 real 0.020 user 0.000 sys
0.022530839 real 0.020 user 0.000 sys
0.022801439 real 0.020 user 0.000 sys
0.022595941 real 0.020 user 0.000 sys
0.022768434 real 0.020 user 0.000 sys
0.022536526 real 0.020 user 0.000 sys
0.022611791 real 0.020 user 0.000 sys
0.022943240 real 0.020 user 0.000 sys
0.022329260 real 0.020 user 0.000 sys

Over 3 times as fast as -current rev(1).

Multibyte locales also benefit:

# patched rev(1), LC_CTYPE=en_US.UTF-8
for i in $(jot 10); do LC_CTYPE=en_US.UTF-8 nanotime obj/rev 
/usr/share/dict/words >/dev/null; done
0.088514093 real 0.070 user 0.010 sys
0.026025780 real 0.020 user 0.000 sys
0.025384554 real 0.020 user 0.000 sys
0.025385471 real 0.020 user 0.000 sys
0.025474371 real 0.020 user 0.010 sys
0.025685188 real 0.030 user 0.000 sys
0.02518 real 0.030 user 0.000 sys
0.025783925 real 0.020 user 0.000 sys
0.025348339 real 0.020 user 0.000 sys
0.025672734 real 0.020 user 0.010 sys

About 3 times as fast as -current rev(1).

CC schwarze@ to double-check I'm not misunderstanding MB_CUR_MAX.  I'm
under the impression the return value cannot change unless we call
setlocale(3).

ok?

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   7 Jan 2022 15:38:39 -
@@ -82,13 +82,18 @@ main(int argc, char *argv[])
while ((len = getline(, , fp)) != -1) {
if (p[len - 1] == '\n')
--len;
-   for (t = p + len - 1; t >= p; --t) {
-   if (isu8cont(*t))
-   continue;
-   u = t;
-   do {
-   putchar(*u);
-   } while (isu8cont(*(++u)));
+   if (MB_CUR_MAX == 1) {
+   for (t = p + len - 1; t >= p; t--)
+   putchar(*t);
+   } else {
+   for (t = p + len - 1; t >= p; --t) {
+   if (isu8cont(*t))
+   continue;
+   u = t;
+   do {
+   putchar(*u);
+   } while (isu8cont(*(++u)));
+   }
}
putchar('\n');
}
@@ -104,7 +109,7 @@ main(int argc, char *argv[])
 int
 isu8cont(unsigned char c)
 {
-   return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
+   return (c & (0x80 | 0x40)) == 0x80;
 }
 
 void

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

Re: rev(1): pull MB_CUR_MAX out of the hot loop

rev(1): pull MB_CUR_MAX out of the hot loop

12 matches

Site Navigation

Mail list logo

Footer information