Bad news everyone,

gyes from GNU coreutils in ports seems to top out at ~185 MB/s when
writing to /dev/null on my otherwise idle laptop (per sysutils/pv from
ports):

$ doas nice -n -20 gyes | pv > /dev/null
 921MiB 0:00:05 [ 185MiB/s] [    <=>                                           ]
1.80GiB 0:00:10 [ 185MiB/s] [         <=>                                      ]
3.07GiB 0:00:17 [ 186MiB/s] [               <=>                                ]
4.34GiB 0:00:24 [ 185MiB/s] [                      <=>                         ]
5.06GiB 0:00:28 [ 185MiB/s] [                          <=>                     ]

Under the same conditions, our yes(1) tops out at a mere ~20 MB/s:

$ doas nice -n -20 yes | pv > /dev/null  
 206MiB 0:00:10 [20.7MiB/s] [         <=>                                      ]
 414MiB 0:00:20 [20.7MiB/s] [                  <=>                             ]
 641MiB 0:00:31 [20.7MiB/s] [                             <=>                  ]
 828MiB 0:00:40 [20.7MiB/s] [                                     <=>          ]
1014MiB 0:00:49 [20.7MiB/s] [                                              <=> ]

Not great.  Not great at all.

Attached is a patch to improve our yes(1) throughput and perhaps
restore glory to src/usr.bin.  Basically we bypass stdio and write(2)
up to a page of the expletive all at once.  Or, if the expletive is
too long to pattern a page with we just write(2) it directly.

With the enclosed patch, OpenBSD yes(1) now tops out at ~211 MB/s
under the aforementioned conditions:

$ doas nice -n -20 yes | pv > /dev/null 
1.04GiB 0:00:05 [ 211MiB/s] [    <=>                                           ]
2.07GiB 0:00:10 [ 212MiB/s] [         <=>                                      ]
3.10GiB 0:00:15 [ 211MiB/s] [              <=>                                 ]
4.14GiB 0:00:20 [ 211MiB/s] [                  <=>                             ]
5.18GiB 0:00:25 [ 211MiB/s] [                       <=>                        ]

It's possible sysutils/pv itself is a bottleneck here, but I think a
tenfold throughput improvement is probably pretty robust.  Also, there
may be a more optimal buffer size, but this seems like a good enough
place to start.

--

No, no, I'm not being serious.  Sorry.  :)

The throughput improvement with such a small code change is
interesting though.

Index: yes.c
===================================================================
RCS file: /cvs/src/usr.bin/yes/yes.c,v
retrieving revision 1.9
diff -u -p -r1.9 yes.c
--- yes.c       13 Oct 2015 07:03:26 -0000      1.9
+++ yes.c       18 Jun 2021 06:23:02 -0000
@@ -32,18 +32,55 @@
 
 #include <err.h>
 #include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
 #include <unistd.h>
 
 int
 main(int argc, char *argv[])
 {
+       char *buf, *expletive, *tmp;
+       size_t buflen, exlen, off, pagesize;
+       ssize_t nw;
+
        if (pledge("stdio", NULL) == -1)
                err(1, "pledge");
 
-       if (argc > 1)
-               for (;;)
-                       puts(argv[1]);
-       else
-               for (;;)
-                       puts("y");
+       if (argc == 1) {
+               expletive = "y\n";
+               exlen = 2;
+       } else {
+               expletive = argv[1];
+               exlen = strlen(expletive);
+               expletive[exlen] = '\n';        /* overwrite NUL with NL */
+               exlen += 1;
+       }
+
+       /*
+        * If possible, pack a page-sized buffer with as many copies of
+        * the expletive as we can fit.  Batching multiple lines into
+        * each write(2) improves throughput.
+        */
+       pagesize = getpagesize();
+       if (exlen <= pagesize / 2) {
+               buflen = pagesize / exlen * exlen;
+               buf = malloc(buflen);
+               if (buf == NULL)
+                       err(1, NULL);
+               for (tmp = buf; tmp < buf + buflen; tmp += exlen)
+                       memcpy(tmp, expletive, exlen);
+       } else {
+               buf = expletive;
+               buflen = exlen;
+       }
+
+       for (;;) {
+               for (off = 0; off < buflen; off += nw) {
+                       nw = write(STDOUT_FILENO, buf + off, buflen - off);
+                       if (nw == 0 || nw == -1)
+                               err(1, "write");
+               }
+       }
+
+       return 1;
 }

Reply via email to