Re: new site.8: document site*.tgz and /{upgrade,install}.site
On Fri, Nov 05, 2021 at 05:36:48PM -0700, Andrew Hewus Fresh wrote: > I like it, some comments in-line but overall I think this would have > helped me get started with siteXX stuff, so OK afresh1@ Great, I'll commit with tweaks as per below, thanks! > > +If existent and executable, > > This whole sentence reads weird to me. I don't know if this is better, > but it confused me. > > If they exist and are executable at the end of the install or > upgrade process, /install.site or /upgrade.site respectively, are > run with chroot(8) based at the system's root. > > > +.Pa /install.site > > +and > > +.Pa /upgrade.site > > +are run last during install and upgrade, respectively, with > > +.Xr chroot 8 > > +based at the system's root. I'll adapt this to If they exist and are executable, /install.site and /upgrade.site are run at the end of the install and upgrade process, respectively, with chroot(8) based at the system's root. > > +.Sh FILES > > +.Bl -tag -width "site${VERSION}-$(hostname -s).tgz" -compact > > +.It Pa site${ Ns Va VERSION Ns }.tgz > > +Generic set. > > +.It Pa site${ Ns Va VERSION Ns }-$( Ns Ic hostname Fl s Ns ).tgz > > +Host-specific set. > > +.It Pa /upgrade.site > > +Generic post-upgrade script. > > +.It Pa /install.site > > +Generic post-install script. > > +.El > > +.Sh EXAMPLES > > +Create > > +.Nm > > +sets and update the index: > > +.Bd -literal -offset indent > > +# tar -czhf site70.tgz generic/ > > +# tar -czhf site70-puffy.tgz puffy/ > > +# ls -lT > index.txt > > +.Ed > > +.Pp > > +Trigger an upgrade of > > +.Xr packages 7 > > +upon next boot: > > +.Bd -literal -offset indent > > +echo 'pkg_add -Iu' >>/etc/rc.firsttime > > I think some clarity that this should go into one of those siteXX.tgz > files as "/upgrade.site" would help folks connect how to get those files > onto the install media and make it clearer that those files get > extracted into the chroot and that you don't have to figure out how to > get them into the root of the install media. That's one way and certainly the only one for fresh installs (unless you modify the install media, which seems pointless), but for upgrades you can create /upgrade.site and the installer will pick it up from there. That is, cat << EOF >/upgrade.site echo 'pkg_add -Iu' >>/etc/rc.firsttime EOF chmod +x /upgrade.site sysupgrade will upgrade packages after sysupgrade reboots -- no need for dealing with siteXX.tgz on your existing install. But you're right, I completely missed creating /upgrade.site in that example, so I'll fix that. > Perhaps that is clear enough from "run [...] with chroot", but I think > this example makes things less clear, at least the way it is. So I'll leave that for now. Again, we can polish in-tree.
Re: new site.8: document site*.tgz and /{upgrade,install}.site
I like it, some comments in-line but overall I think this would have helped me get started with siteXX stuff, so OK afresh1@ On Fri, Nov 05, 2021 at 03:19:03PM +, Klemens Nanni wrote: > On Wed, Oct 27, 2021 at 07:35:28PM -0500, Aaron Poffenberger wrote: > > Looks good. Nice to see this moving forward. Thanks. > > > > On 2021-10-27 21:13 +, Klemens Nanni wrote: > > > On Mon, Sep 06, 2021 at 02:29:50PM -0500, Aaron Poffenberger wrote: > > > > Ping. > > > > > > > > Will someone commit this? Seems like no one knows about /upgrade.site > > > > and it > > > > fits well with sysupgrade(8). > > > > > > sysupgrade(8) is one potential /upgrade.site user but I disagree about > > > documenting the latter through the former. > > > > > > Here is a new manual roughly merging the FAQ bits with what you, Aaron, > > > provided. > > > > > > site(8) is a lame but fitting name; autoinstall(8) and sysypgrade(8) > > > reference it and both sets and scripts are documented. > > > > > > Examples are intentionally brief to be shorter and more concise than the > > > FAQ while showing enough to get going. > > > > > > > > > I am quite certain that wording, examples and the references from other > > > manuals can be tweaked, but this diff looks like a good enough start > > > and --if this is the way to go-- I'd prefer to commit and keep polishing > > > in-tree. > > > > > > Feedback? Objections? OK? > > I have not received any feedback on this except from Aaron. > > Anyone? Anything? > > Index: distrib/sets/lists/man/mi > === > RCS file: /cvs/src/distrib/sets/lists/man/mi,v > retrieving revision 1.1645 > diff -u -p -r1.1645 mi > --- distrib/sets/lists/man/mi 2 Nov 2021 22:07:33 - 1.1645 > +++ distrib/sets/lists/man/mi 3 Nov 2021 02:26:16 - > @@ -2561,6 +2561,7 @@ > ./usr/share/man/man8/security.8 > ./usr/share/man/man8/sendmail.8 > ./usr/share/man/man8/sensorsd.8 > +./usr/share/man/man8/site.8 > ./usr/share/man/man8/sftp-server.8 > ./usr/share/man/man8/showmount.8 > ./usr/share/man/man8/shutdown.8 > Index: usr.sbin/sysupgrade/sysupgrade.8 > === > RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v > retrieving revision 1.10 > diff -u -p -r1.10 sysupgrade.8 > --- usr.sbin/sysupgrade/sysupgrade.8 3 Oct 2019 12:43:58 - 1.10 > +++ usr.sbin/sysupgrade/sysupgrade.8 27 Oct 2021 20:42:26 - > @@ -67,6 +67,10 @@ This is the default if the system is cur > Upgrade to a snapshot. > This is the default if the system is currently running a snapshot. > .El > +.Pp > +See > +.Xr site 8 > +for how to customize the upgrade process. > .Sh FILES > .Bl -tag -width "/auto_upgrade.conf" -compact > .It Pa /auto_upgrade.conf > @@ -83,7 +87,8 @@ Directory the upgrade is downloaded to. > .Xr signify 1 , > .Xr installurl 5 , > .Xr autoinstall 8 , > -.Xr release 8 > +.Xr release 8 , > +.Xr site 8 > .Sh HISTORY > .Nm > first appeared in > Index: share/man/man8/Makefile > === > RCS file: /cvs/src/share/man/man8/Makefile,v > retrieving revision 1.102 > diff -u -p -r1.102 Makefile > --- share/man/man8/Makefile 1 May 2021 16:11:10 - 1.102 > +++ share/man/man8/Makefile 27 Oct 2021 20:38:11 - > @@ -6,7 +6,7 @@ MAN= afterboot.8 autoinstall.8 boot_conf > crash.8 daily.8 \ > diskless.8 genassym.sh.8 intro.8 netstart.8 rc.8 \ > rc.conf.8 rc.d.8 rc.shutdown.8 rc.subr.8 release.8 \ > - security.8 ssl.8 starttls.8 sticky.8 yp.8 > + security.8 site.8 ssl.8 starttls.8 sticky.8 yp.8 > > SUBDIR= man8.alpha man8.amd64 man8.arm64 man8.armv7 \ > man8.hppa man8.i386 man8.landisk \ > Index: share/man/man8/autoinstall.8 > === > RCS file: /cvs/src/share/man/man8/autoinstall.8,v > retrieving revision 1.23 > diff -u -p -r1.23 autoinstall.8 > --- share/man/man8/autoinstall.8 18 Jul 2021 11:08:34 - 1.23 > +++ share/man/man8/autoinstall.8 27 Oct 2021 20:42:24 - > @@ -32,6 +32,10 @@ file and HTTP to fetch the file. > If that fails, the installer asks for the location which can either be > a URL or a local path. > .Pp > +See > +.Xr site 8 > +for how to provide custom configuration. > +.Pp > To start unattended installation or upgrade choose '(A)utoinstall' at the > install prompt. > If there is only one network interface, the installer fetches the response > @@ -235,7 +239,8 @@ host foo { > .Sh SEE ALSO > .Xr dhcp-options 5 , > .Xr dhcpd.conf 5 , > -.Xr diskless 8 > +.Xr diskless 8 , > +.Xr site 8 > .Sh HISTORY > The > .Nm > Index: share/man/man8/site.8 > === > RCS file: share/man/man8/site.8 > diff -N share/man/man8/site.8 > --- /dev/null 1 Jan 1970 00:00:00 - > +++ share/man/man8/site.8 27 Oct 2021 21:11:48
ftp: do not URI encode tilde as per RFC 2396
ftp(1) implements RFC 1738 from Dec. 1994 but RFC 2396 from Aug. 1998 updated this and the tl;dr is: do not encode the tilde character. In theory, this shouldn't make a difference as servers should decode "%7e" to "~", BUT not all servers do so and thus some respond with 404. I've hit this in the past already but didn't look at the code. Now I came across this link that made me fix it: https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download curl(1) and wget(1) both fetch this file successfully, i.e. they behave identicall wrt. "%" and send it as-is: $ url='https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download' $ curl -I -sS -w '%{url}\n' "$url" | tail -1 200 https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download $ wget -d "$url" 2>&1 | grep -Ew 'GET|OK' GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download HTTP/1.1 HTTP/1.1 200 OK 200 OK ftp(1) still sticks with RFC 1738 and encodes it: ftp -d "$url" host review.trustedfirmware.org, port https, path changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as patch?download, auth none. Trying 64:ff9b::339f:1211... Requesting https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download GET /changes/TF-A%2Ftrusted-firmware-a%7e7726/revisions/9/patch?download HTTP/1.1 Connection: close Host: review.trustedfirmware.org User-Agent: OpenBSD ftp received 'HTTP/1.1 404 Not Found' ftp: Error retrieving https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download: 404 Not Found RFC 2396 2.4.2. When to Escape and Unescape says: In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. So they're equivalent and tilde does not need encoding. Again, servers should always decode it anyway, but not all of them do, so better send it as-is/unencoded. Diff below is effectively a one-character change that removes "~" from `*unsafe_chars' but it also updates the comments to use RFC 2396 wording and order of characters for easier code-to-standard comparison. This allows me to fetch this patch: $ ./obj/ftp -d "$url" host review.trustedfirmware.org, port https, path changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as patch?download, auth none. Trying 64:ff9b::339f:1211... Requesting https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download HTTP/1.1 Connection: close Host: review.trustedfirmware.org User-Agent: OpenBSD ftp received 'HTTP/1.1 200 OK' received 'Date: Fri, 05 Nov 2021 23:34:49 GMT' received 'Server: Apache' received 'X-Frame-Options: DENY' received 'Content-Disposition: attachment; filename="155911c.diff.base64"' received 'X-Content-Type-Options: nosniff' received 'Cache-Control: private' received 'Pragma: no-cache' received 'Expires: Mon, 01 Jan 1990 00:00:00 GMT' received 'X-FYI-Content-Encoding: base64' received 'X-FYI-Content-Type: application/mbox' received 'Content-Type: text/plain;charset=iso-8859-1' received 'Vary: Accept-Encoding' received 'Connection: close' received 'Transfer-Encoding: chunked' 9544 bytes received in 0.00 seconds (12.21 MB/s) Looking at wget's source somewhat backs up this way of handling tilde by explicitly mentioning broken "%7e" decoding, i.e. servers expecting an unencoded "~" as-is: >From wget-1.2.1/src/url.c 103f: /* Table of "reserved" and "unsafe" characters. Those terms are rfc1738-speak, as such largely obsoleted by rfc2396 and later specs, but the general idea remains. A reserved character is the one that you can't decode without changing the meaning of the URL. For example, you can't decode "/foo/%2f/bar" into "/foo///bar" because the number and contents of path components is different. Non-reserved characters can be changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar". The unsafe characters are loosely based on rfc1738, plus "$" and ",", as recommended by rfc2396, and minus "~", which is very frequently used (and sometimes unrecognized as %7E by
Re: [patch] httpd static gzip compression
... |You would be asking for | |https://exmaple.com/whatever/ls.1 | |and with Accept-Encoding: gzip in the http header, and the |webserver would then look if it has a file | |/whatever/ls.1.gz | |(instead of without .gz) in its document tree and send you that, with |"Content-Encoding: gzip" http header. As an outsider i find this thread very amusing. As you all know the normal approach is to have a cache directory where some "compress" module performs on-the-fly storage if the client asks for some file, accepts compression, and the compressed version yet does not exist. Funnily i once got not even an answer when i asked for static compression, since on-the-fly compression was already available, "so what", i have forgotten which webserver that has been. Cleanup via cron anyhow. Now something for Theo, from the webserver i use. #if defined HAVE_ZLIB_H && defined HAVE_LIBZ # define USE_ZLIB # include #endif #ifndef Z_DEFAULT_COMPRESSION #define Z_DEFAULT_COMPRESSION -1 #endif #ifndef MAX_WBITS #define MAX_WBITS 15 #endif #if defined HAVE_BZLIB_H && defined HAVE_LIBBZ2 # define USE_BZ2LIB /* we don't need stdio interface */ # define BZ_NO_STDIO # include #endif #if defined HAVE_BROTLI_ENCODE_H && defined HAVE_BROTLI # define USE_BROTLI # include #endif #if defined HAVE_ZSTD_H && defined HAVE_ZSTD # define USE_ZSTD # include #endif #if defined HAVE_SYS_MMAN_H && defined HAVE_MMAP && defined ENABLE_MMAP #define USE_MMAP --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: [patch] httpd static gzip compression
On Fri, Nov 05, 2021 at 08:24:21AM -0600, Theo de Raadt wrote: > prx wrote: > > > I think this remark should be placed into perspective. > > > > When a file is requested, its gzipped version is send if : > > * The client ask for it with appropriate header. > > * The server admin configured httpd to do so **and** compressed files. > > and if the gzipd file does not contain the contents as the ungzip'd > file, then two different clients may get different results. Just as big-endian and little-endian architectures can be fed different files from the same ISO-9660 filesystem. The issue you describe certainly exists, but it's not without precedent. It could even be considered a feature, if the client supports compression then send more verbose content.
Re: [patch] httpd static gzip compression
Theo de Raadt(dera...@openbsd.org) on 2021.11.05 08:24:21 -0600: > prx wrote: > > > I think this remark should be placed into perspective. > > > > When a file is requested, its gzipped version is send if : > > * The client ask for it with appropriate header. > > * The server admin configured httpd to do so **and** compressed files. > > and if the gzipd file does not contain the contents as the ungzip'd > file, then two different clients may get different results. Yes, but it's the responsibility of whoever puts the content there to do the correct thing. If i put a html file into a web directory and call it foo.jpg i would not expect a sensible result either.
Re: rpki-client show attr name in rrdp parse errors
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.11.05 15:26:57 +0100: > On Wed, Nov 03, 2021 at 12:58:17PM +0100, Claudio Jeker wrote: > > In one place this is already done but this makes sure we show the bad > > attribute in all cases where a non conforming attribute is found. > > Found another bunch of those non conforming attribute errors. Adjust them > as well. > > OK? ok > -- > :wq Claudio > > Index: rrdp_notification.c > === > RCS file: /cvs/src/usr.sbin/rpki-client/rrdp_notification.c,v > retrieving revision 1.9 > diff -u -p -r1.9 rrdp_notification.c > --- rrdp_notification.c 29 Oct 2021 09:27:36 - 1.9 > +++ rrdp_notification.c 5 Nov 2021 14:06:18 - > @@ -141,7 +141,7 @@ start_notification_elem(struct notificat > continue; > } > PARSE_FAIL(p, "parse failed - non conforming " > - "attribute found in notification elem"); > + "attribute '%s' found in notification elem", attr[i]); > } > if (!(has_xmlns && nxml->version && nxml->session_id && nxml->serial)) > PARSE_FAIL(p, "parse failed - incomplete " > @@ -185,7 +185,7 @@ start_snapshot_elem(struct notification_ > continue; > } > PARSE_FAIL(p, "parse failed - non conforming " > - "attribute found in snapshot elem"); > + "attribute '%s' found in snapshot elem", attr[i]); > } > if (hasUri != 1 || hasHash != 1) > PARSE_FAIL(p, "parse failed - incomplete snapshot attributes"); > @@ -239,7 +239,7 @@ start_delta_elem(struct notification_xml > continue; > } > PARSE_FAIL(p, "parse failed - non conforming " > - "attribute found in snapshot elem"); > + "attribute '%s' found in snapshot elem", attr[i]); > } > /* Only add to the list if we are relevant */ > if (hasUri != 1 || hasHash != 1 || delta_serial == 0) >
Re: [patch] httpd static gzip compression
Ingo Schwarze(schwa...@usta.de) on 2021.11.05 14:37:15 +0100: > Hi Theo, > > Theo de Raadt wrote on Thu, Nov 04, 2021 at 08:27:47AM -0600: > > prx wrote: > >> On 2021/11/04 14:21, prx wrote: > > >>> The attached patch add support for static gzip compression. > >>> > >>> In other words, if a client support gzip compression, when "file" is > >>> requested, httpd will check if "file.gz" is avaiable to serve. > > >> This diff doesn't compress "on the fly". > >> It's up to the webmaster to compress files **before** serving them. > > > Does any other program work this way? > > Yes. The man(1) program does. At least on the vast majority of > Linux systems (at least those using the man-db implementation > of man(1)), on FreeBSD, and on DragonFly BSD. > > Those systems store most manual pages as gzipped man(7) and mdoc(7) > files, and man(1) decompresses them every time a user wants to look > at one of them. You say "man ls", and what you get is actually > /usr/share/man/man1/ls.1.gz or something like that. > > For man(1), that is next to useless because du -sh /usr/share/man = > 42.6M uncompressed. But it has repeatedly caused bugs in the past. > I would love to remove the feature from mandoc, but even though it is > rarely used in OpenBSD (some ports installed gzipped manuals in the > past, but i think the ports tree has been clean now for quite some > time; you might still need the feature when installing software > or unpacking foreign manual page packages without using ports) > it would be a bad idea to remove it because it is too widely used > elsewhere. Note that even the old BSD man(1) supported it. > > > Where you request one filename, and it gives you another? > > You did not ask what web servers do, but we are discussing a patch to > a webserver. For this reason, let me note in passing that even some > otherwise extremely useful sites get it very wrong the other way round: > > $ ftp https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz > Trying 130.89.148.77... > Requesting https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz > 100% |**| 8050 00:00 > > 8050 bytes received in 0.00 seconds (11.74 MB/s) > $ file ls.1.en.gz > ls.1.en.gz: troff or preprocessor input text > $ grep -A 1 '^.SH NAME' ls.1.en.gz > .SH NAME > ls \- list directory contents > $ gunzip ls.1.en.gz > gunzip: ls.1.en.gz: unrecognized file format But with this patch, you are not asking the webserver for https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz You would be asking for https://exmaple.com/whatever/ls.1 and with Accept-Encoding: gzip in the http header, and the webserver would then look if it has a file /whatever/ls.1.gz (instead of without .gz) in its document tree and send you that, with "Content-Encoding: gzip" http header. And because of that header, your client will know that the data is gzipped and will unzip it before writing the file to the output. I.e. there is no such problem (unless the patch has a bug). /Benno > > > I have a difficult time understanding why gzip has to sneak it's way > > into everything. > > > > I always prefer software that does precisely what I expect it to do. > > Certainly. > > I have no strong opinion whether a webserver qualifies as "everything", > though, nor did i look at the diff. While all manpages are small in the > real world, some web servers may have to store huge amounts of data that > clients might request, so disk space might occasionally be an issue on > a web server even in 2021. Also, some websites deliver huge amounts of > data to the client even when the user merely asked for some text (not sure > such sites would consider running OpenBSD httpd(8), but whatever :) - when > browsing the web, bandwidth is still occasionally an issue even in 2021, > even though that is a rather absurd fact. > > Yours, > Ingo >
Re: new site.8: document site*.tgz and /{upgrade,install}.site
On Wed, Oct 27, 2021 at 07:35:28PM -0500, Aaron Poffenberger wrote: > Looks good. Nice to see this moving forward. Thanks. > > On 2021-10-27 21:13 +, Klemens Nanni wrote: > > On Mon, Sep 06, 2021 at 02:29:50PM -0500, Aaron Poffenberger wrote: > > > Ping. > > > > > > Will someone commit this? Seems like no one knows about /upgrade.site and > > > it > > > fits well with sysupgrade(8). > > > > sysupgrade(8) is one potential /upgrade.site user but I disagree about > > documenting the latter through the former. > > > > Here is a new manual roughly merging the FAQ bits with what you, Aaron, > > provided. > > > > site(8) is a lame but fitting name; autoinstall(8) and sysypgrade(8) > > reference it and both sets and scripts are documented. > > > > Examples are intentionally brief to be shorter and more concise than the > > FAQ while showing enough to get going. > > > > > > I am quite certain that wording, examples and the references from other > > manuals can be tweaked, but this diff looks like a good enough start > > and --if this is the way to go-- I'd prefer to commit and keep polishing > > in-tree. > > > > Feedback? Objections? OK? I have not received any feedback on this except from Aaron. Anyone? Anything? Index: distrib/sets/lists/man/mi === RCS file: /cvs/src/distrib/sets/lists/man/mi,v retrieving revision 1.1645 diff -u -p -r1.1645 mi --- distrib/sets/lists/man/mi 2 Nov 2021 22:07:33 - 1.1645 +++ distrib/sets/lists/man/mi 3 Nov 2021 02:26:16 - @@ -2561,6 +2561,7 @@ ./usr/share/man/man8/security.8 ./usr/share/man/man8/sendmail.8 ./usr/share/man/man8/sensorsd.8 +./usr/share/man/man8/site.8 ./usr/share/man/man8/sftp-server.8 ./usr/share/man/man8/showmount.8 ./usr/share/man/man8/shutdown.8 Index: usr.sbin/sysupgrade/sysupgrade.8 === RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v retrieving revision 1.10 diff -u -p -r1.10 sysupgrade.8 --- usr.sbin/sysupgrade/sysupgrade.83 Oct 2019 12:43:58 - 1.10 +++ usr.sbin/sysupgrade/sysupgrade.827 Oct 2021 20:42:26 - @@ -67,6 +67,10 @@ This is the default if the system is cur Upgrade to a snapshot. This is the default if the system is currently running a snapshot. .El +.Pp +See +.Xr site 8 +for how to customize the upgrade process. .Sh FILES .Bl -tag -width "/auto_upgrade.conf" -compact .It Pa /auto_upgrade.conf @@ -83,7 +87,8 @@ Directory the upgrade is downloaded to. .Xr signify 1 , .Xr installurl 5 , .Xr autoinstall 8 , -.Xr release 8 +.Xr release 8 , +.Xr site 8 .Sh HISTORY .Nm first appeared in Index: share/man/man8/Makefile === RCS file: /cvs/src/share/man/man8/Makefile,v retrieving revision 1.102 diff -u -p -r1.102 Makefile --- share/man/man8/Makefile 1 May 2021 16:11:10 - 1.102 +++ share/man/man8/Makefile 27 Oct 2021 20:38:11 - @@ -6,7 +6,7 @@ MAN=afterboot.8 autoinstall.8 boot_conf crash.8 daily.8 \ diskless.8 genassym.sh.8 intro.8 netstart.8 rc.8 \ rc.conf.8 rc.d.8 rc.shutdown.8 rc.subr.8 release.8 \ - security.8 ssl.8 starttls.8 sticky.8 yp.8 + security.8 site.8 ssl.8 starttls.8 sticky.8 yp.8 SUBDIR=man8.alpha man8.amd64 man8.arm64 man8.armv7 \ man8.hppa man8.i386 man8.landisk \ Index: share/man/man8/autoinstall.8 === RCS file: /cvs/src/share/man/man8/autoinstall.8,v retrieving revision 1.23 diff -u -p -r1.23 autoinstall.8 --- share/man/man8/autoinstall.818 Jul 2021 11:08:34 - 1.23 +++ share/man/man8/autoinstall.827 Oct 2021 20:42:24 - @@ -32,6 +32,10 @@ file and HTTP to fetch the file. If that fails, the installer asks for the location which can either be a URL or a local path. .Pp +See +.Xr site 8 +for how to provide custom configuration. +.Pp To start unattended installation or upgrade choose '(A)utoinstall' at the install prompt. If there is only one network interface, the installer fetches the response @@ -235,7 +239,8 @@ host foo { .Sh SEE ALSO .Xr dhcp-options 5 , .Xr dhcpd.conf 5 , -.Xr diskless 8 +.Xr diskless 8 , +.Xr site 8 .Sh HISTORY The .Nm Index: share/man/man8/site.8 === RCS file: share/man/man8/site.8 diff -N share/man/man8/site.8 --- /dev/null 1 Jan 1970 00:00:00 - +++ share/man/man8/site.8 27 Oct 2021 21:11:48 - @@ -0,0 +1,87 @@ +.\" $OpenBSD: $ +.\" +.\" Copyright (c) 2021 Klemens Nanni +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH
Re: rpki-client show attr name in rrdp parse errors
On Wed, Nov 03, 2021 at 12:58:17PM +0100, Claudio Jeker wrote: > In one place this is already done but this makes sure we show the bad > attribute in all cases where a non conforming attribute is found. Found another bunch of those non conforming attribute errors. Adjust them as well. OK? -- :wq Claudio Index: rrdp_notification.c === RCS file: /cvs/src/usr.sbin/rpki-client/rrdp_notification.c,v retrieving revision 1.9 diff -u -p -r1.9 rrdp_notification.c --- rrdp_notification.c 29 Oct 2021 09:27:36 - 1.9 +++ rrdp_notification.c 5 Nov 2021 14:06:18 - @@ -141,7 +141,7 @@ start_notification_elem(struct notificat continue; } PARSE_FAIL(p, "parse failed - non conforming " - "attribute found in notification elem"); + "attribute '%s' found in notification elem", attr[i]); } if (!(has_xmlns && nxml->version && nxml->session_id && nxml->serial)) PARSE_FAIL(p, "parse failed - incomplete " @@ -185,7 +185,7 @@ start_snapshot_elem(struct notification_ continue; } PARSE_FAIL(p, "parse failed - non conforming " - "attribute found in snapshot elem"); + "attribute '%s' found in snapshot elem", attr[i]); } if (hasUri != 1 || hasHash != 1) PARSE_FAIL(p, "parse failed - incomplete snapshot attributes"); @@ -239,7 +239,7 @@ start_delta_elem(struct notification_xml continue; } PARSE_FAIL(p, "parse failed - non conforming " - "attribute found in snapshot elem"); + "attribute '%s' found in snapshot elem", attr[i]); } /* Only add to the list if we are relevant */ if (hasUri != 1 || hasHash != 1 || delta_serial == 0)
Re: [patch] httpd static gzip compression
prx wrote: > I think this remark should be placed into perspective. > > When a file is requested, its gzipped version is send if : > * The client ask for it with appropriate header. > * The server admin configured httpd to do so **and** compressed files. and if the gzipd file does not contain the contents as the ungzip'd file, then two different clients may get different results.
Re: [patch] httpd static gzip compression
* Theo de Raadt le [04-11-2021 08:27:47 -0600]: > prx wrote: > > > * Stuart Henderson le [04-11-2021 14:09:39 +]: > > > On 2021/11/04 14:21, prx wrote: > > > > Hello, > > > > The attached patch add support for static gzip compression. > > > > > > > > In other words, if a client support gzip compression, when "file" is > > > > requested, httpd will check if "file.gz" is avaiable to serve. > > > > > > > > Regards. > > > > > > > > prx > > > > > > btw this was rejected before, > > > > > > https://github.com/reyk/httpd/issues/21 > > > > > > > This diff doesn't compress "on the fly". > > It's up to the webmaster to compress files **before** serving them. > > Does any other program work this way? > > Where you request one filename, and it gives you another? > > I have a difficult time understanding why gzip has to sneak it's way > into everything. > > I always prefer software that does precisely what I expect it to do. > I think this remark should be placed into perspective. When a file is requested, its gzipped version is send if : * The client ask for it with appropriate header. * The server admin configured httpd to do so **and** compressed files. In this situation, the client does get the expected file. Of course, the admin has the responsibility to give the same content in "file" and "file.gz". The cost for the server is negligible but reduce bandwidth usage (for both client and server). According to previous comments, find below a modified patch to enable static gzip compression on a location match. ie : server "foo" { #[... snip ... ] location "/*.html" { gzip_static } location "/*.css" { gzip_static } } Regards. Index: httpd.conf.5 === RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v retrieving revision 1.119 diff -u -p -r1.119 httpd.conf.5 --- httpd.conf.524 Oct 2021 16:01:04 - 1.119 +++ httpd.conf.55 Nov 2021 14:04:22 - @@ -425,6 +425,10 @@ A variable that is set to a comma separa features in use .Pq omitted when TLS client verification is not in use . .El +.It Ic gzip_static +Enable static gzip compression. +.Pp +When a file is requested, serves the file with .gz added to its path if it exists. .It Ic hsts Oo Ar option Oc Enable HTTP Strict Transport Security. Valid options are: Index: httpd.h === RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v retrieving revision 1.158 diff -u -p -r1.158 httpd.h --- httpd.h 24 Oct 2021 16:01:04 - 1.158 +++ httpd.h 5 Nov 2021 14:04:22 - @@ -87,6 +87,7 @@ #define SERVER_DEF_TLS_LIFETIME(2 * 3600) #define SERVER_MIN_TLS_LIFETIME(60) #define SERVER_MAX_TLS_LIFETIME(24 * 3600) +#define SERVER_DEFAULT_GZIP_STATIC 0 #define MEDIATYPE_NAMEMAX 128 /* file name extension */ #define MEDIATYPE_TYPEMAX 64 /* length of type/subtype */ @@ -546,6 +547,7 @@ struct server_config { struct server_fcgiparams fcgiparams; int fcgistrip; char errdocroot[HTTPD_ERRDOCROOT_MAX]; + int gzip_static; TAILQ_ENTRY(server_config) entry; }; Index: parse.y === RCS file: /cvs/src/usr.sbin/httpd/parse.y,v retrieving revision 1.127 diff -u -p -r1.127 parse.y --- parse.y 24 Oct 2021 16:01:04 - 1.127 +++ parse.y 5 Nov 2021 14:04:22 - @@ -141,7 +141,7 @@ typedef struct { %token TIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD REQUEST %token ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS REWRITE %token CA CLIENT CRL OPTIONAL PARAM FORWARDED FOUND NOT -%token ERRDOCS +%token ERRDOCS GZIPSTATIC %token STRING %token NUMBER %type port @@ -553,6 +553,7 @@ serveroptsl : LISTEN ON STRING opttls po | logformat | fastcgi | authenticate + | gzip_static | filter | LOCATION optfound optmatch STRING { struct server *s; @@ -1217,6 +1218,14 @@ fcgiport : NUMBER{ } ; +gzip_static: NO GZIPSTATIC { + srv->srv_conf.gzip_static = SERVER_DEFAULT_GZIP_STATIC; + } + | GZIPSTATIC { + srv->srv_conf.gzip_static = 1; + } + ; + tcpip : TCP '{' optnl tcpflags_l '}' | TCP tcpflags ; @@ -1441,6 +1450,7 @@ lookup(char *s) { "fastcgi",FCGI }, { "forwarded", FORWARDED }, { "found", FOUND }, + { "gzip_static",GZIPSTATIC }, { "hsts", HSTS }, { "include",
Re: [patch] httpd static gzip compression
Hi Theo, Theo de Raadt wrote on Thu, Nov 04, 2021 at 08:27:47AM -0600: > prx wrote: >> On 2021/11/04 14:21, prx wrote: >>> The attached patch add support for static gzip compression. >>> >>> In other words, if a client support gzip compression, when "file" is >>> requested, httpd will check if "file.gz" is avaiable to serve. >> This diff doesn't compress "on the fly". >> It's up to the webmaster to compress files **before** serving them. > Does any other program work this way? Yes. The man(1) program does. At least on the vast majority of Linux systems (at least those using the man-db implementation of man(1)), on FreeBSD, and on DragonFly BSD. Those systems store most manual pages as gzipped man(7) and mdoc(7) files, and man(1) decompresses them every time a user wants to look at one of them. You say "man ls", and what you get is actually /usr/share/man/man1/ls.1.gz or something like that. For man(1), that is next to useless because du -sh /usr/share/man = 42.6M uncompressed. But it has repeatedly caused bugs in the past. I would love to remove the feature from mandoc, but even though it is rarely used in OpenBSD (some ports installed gzipped manuals in the past, but i think the ports tree has been clean now for quite some time; you might still need the feature when installing software or unpacking foreign manual page packages without using ports) it would be a bad idea to remove it because it is too widely used elsewhere. Note that even the old BSD man(1) supported it. > Where you request one filename, and it gives you another? You did not ask what web servers do, but we are discussing a patch to a webserver. For this reason, let me note in passing that even some otherwise extremely useful sites get it very wrong the other way round: $ ftp https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz Trying 130.89.148.77... Requesting https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz 100% |**| 8050 00:00 8050 bytes received in 0.00 seconds (11.74 MB/s) $ file ls.1.en.gz ls.1.en.gz: troff or preprocessor input text $ grep -A 1 '^.SH NAME' ls.1.en.gz .SH NAME ls \- list directory contents $ gunzip ls.1.en.gz gunzip: ls.1.en.gz: unrecognized file format > I have a difficult time understanding why gzip has to sneak it's way > into everything. > > I always prefer software that does precisely what I expect it to do. Certainly. I have no strong opinion whether a webserver qualifies as "everything", though, nor did i look at the diff. While all manpages are small in the real world, some web servers may have to store huge amounts of data that clients might request, so disk space might occasionally be an issue on a web server even in 2021. Also, some websites deliver huge amounts of data to the client even when the user merely asked for some text (not sure such sites would consider running OpenBSD httpd(8), but whatever :) - when browsing the web, bandwidth is still occasionally an issue even in 2021, even though that is a rather absurd fact. Yours, Ingo
Re: UNIX sockets: use vnode(9) lock to protect `v_socket' dereference
On Fri, Nov 05, 2021 at 08:31:07AM +0100, Martin Pieuchot wrote: > On 26/10/21(Tue) 14:12, Vitaliy Makkoveev wrote: > > Another step to make UNIX sockets locking fine grained. > > > > The listening socket has the references from file descriptors layer and > > from the vnode(9) layer. This means when we close(2)'ing such socket it > > still referenced by concurrent thread through connect(2) path. > > > > When we bind(2) UNIX socket we link it to vnode(9) by assigning > > `v_socket'. When we connect(2)'ing socket to the socket we previously > > bind(2)'ed we finding it by namei(9) and obtain it's reference through > > `v_socket'. This socket has no extra reference in file descriptor > > layer and could be closed by concurrent thread. > > > > This time we have `unp_lock' rwlock(9) which protects the whole layer > > and the dereference of `v_socket' is safe. But with the fine grained > > locking the `v_socket' will not be protected by global lock. When we > > obtain the vnode(9) by namei(9) in connect(9) or bind(9) paths it is > > already exclusively locked by vlode(9) lock. But in unp_detach() which > > is called on the close(2)'ing socket we assume `unp_lock' protects > > `v_socket'. > > > > I propose to use exclusive vnode(9) lock to protect `v_socket'. With the > > fine grained locking, the `v_socket' dereference in unp_bind() or > > unp_connect() threads will be safe because unp_detach() thread will wait > > the vnode(9) lock release. The vnode referenced by `unp_vnod' has > > reference counter bumped so it's dereference is also safe without > > `unp_lock' held. > > This makes sense to me. Using the vnode lock here seems the simplest > approach. > > > The `i_lock' should be take before `unp_lock' and unp_detach() should > > release solock(). To prevent connections on this socket the > > 'SO_ACCEPTCONN' bit cleared in soclose(). > > This is done to prevent races when solock() is released inside soabort(), > right? Is it the only one or some more care is needed? > Will this stay with per-socket locks or is this only necessary because of > the global `unp_lock'? > The only "binded" sockets has the associated vnode(9). We link them when we perform unp_bind(). I used the "binded" instead of "listening" because datagram sockets could also be binded but they are not connection oriented. soabort() kills the not accept(2)ed peers of connection oriented sockets when we close "listening" socket. I used "listening" because the socket was bind(2)ed and then marked as "assepting connections" by listen(2). Since the unaccepted peers we kills with soabort() has no associated vnode(9) we don't release `unp_lock' because we don't need to call vrele(9). This diff solves lock dances in unp_connect(). Our thread own `so' we passed to unp_connect(). It's file descriptor has refcouter bumped and the `so' can't be killed by concurrent thread. But we get `so2' the "binded" socket from vnode(9). This time the global `unp_lock' protects both sockets, but with the per-socket locking the only `so' will be protected. So the `so2' dereference will be unsafe because concurrent thread could kill it. Also we could release `unp_lock' within unp_attach(). So unp_attach() called by sonewconn() within unp_connect() could release `unp_lock' too. The upcoming diff which moves unp_gc() stuff from `unp_lock' will release `unp_lock' in unp_detach() to enforce lock order `unp_gc_lock' -> `unp_lock'. We keep vnode(9) locked when we link the socket with vnode(9) in unp_bind(). We also keep vnode(9) locked when we connect sockets in unp_connect(). So unp_detach() should also lock the vnode(9) because vnode(9) lock protects `v_socket'. This will be actual with the upcoming fine grained locks implementation. This time this diff mostly makes `v_socket' protection consistent. Could I commit this diff? > > Index: sys/kern/uipc_socket.c > > === > > RCS file: /cvs/src/sys/kern/uipc_socket.c,v > > retrieving revision 1.265 > > diff -u -p -r1.265 uipc_socket.c > > --- sys/kern/uipc_socket.c 14 Oct 2021 23:05:10 - 1.265 > > +++ sys/kern/uipc_socket.c 26 Oct 2021 11:05:59 - > > @@ -315,6 +315,8 @@ soclose(struct socket *so, int flags) > > /* Revoke async IO early. There is a final revocation in sofree(). */ > > sigio_free(>so_sigio); > > if (so->so_options & SO_ACCEPTCONN) { > > + so->so_options &= ~SO_ACCEPTCONN; > > + > > while ((so2 = TAILQ_FIRST(>so_q0)) != NULL) { > > (void) soqremque(so2, 0); > > (void) soabort(so2); > > Index: sys/kern/uipc_usrreq.c > > === > > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v > > retrieving revision 1.150 > > diff -u -p -r1.150 uipc_usrreq.c > > --- sys/kern/uipc_usrreq.c 21 Oct 2021 22:11:07 - 1.150 > > +++ sys/kern/uipc_usrreq.c 26 Oct 2021 11:05:59 - > > @@ -474,20 +474,30 @@
Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic
On Fri, Nov 05, 2021 at 09:50:05AM +0100, Mark Kettenis wrote: > > Date: Fri, 5 Nov 2021 07:18:03 +0100 > > From: Martin Pieuchot > > > > On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote: > > > This completely removes global rwlock(9) from the unp_internalize() and > > > unp_externalize() normal paths but only leaves it in unp_externalize() > > > error path. Also we don't need to simultaneously hold both fdplock() > > > and `unp_lock' in unp_internalize(). As non obvious profit this > > > simplifies the future lock dances in the UNIX sockets layer. > > > > > > It's safe to call fptounp() without `unp_lock' held. We always got this > > > file descriptor by fd_getfile(9) so we always have the extra reference > > > and this descriptor can't be closed by concurrent thread. Some sockets > > > could be destroyed through 'PRU_ABORT' path but they don't have > > > associated file descriptor and they are not accessible in the > > > unp_internalize() path. > > > > > > The `unp_file' access without `unp_lock' held is also safe. Each socket > > > could have the only associated file descriptor and each file descriptor > > > could have the only associated socket. We only assign `unp_file' in the > > > unp_internalize() path where we got the socket by fd_getfile(9). This > > > descriptor has the extra reference and couldn't be closed concurrently. > > > We could override `unp_file' but with the same address because the > > > associated file descriptor can't be changed so the address will be also > > > the same. So while unp_gc() concurrently runs the dereference of > > > non-NULL `unp_file' is always safe. > > > > Using an atomic operation for `unp_msgcount' is ok with me, one comment > > about `unp_rights' below. > > > > [...] > > > > > > > > - rw_enter_write(_lock); > > > - if (unp_rights + nfds > maxfiles / 10) { > > > - rw_exit_write(_lock); > > > + if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) { > > > + atomic_sub_int(_rights, nfds); > > > > I can't believe this is race free. If two threads, T1 and T2, call > > atomic_add at the same time both might end up returning EMFILE even > > if only the first one currently does. This could happen if T1 exceeds > > the limit and T2 does atomic_add on an already-exceeded `unp_rights' > > before T1 could do atomic_sub. > > > > I suggest using a mutex to protect `unp_rights' instead to solve this > > issue. > > Yes, that would be better. Otherwise it would be trivial to DOS > anything that does file descriptor passing. > Thanks for pointing, this sounds reasonable. The updated diff introduces `unp_rights_mtx' mutex(9) to protect `unp_rights'. Index: sys/kern/uipc_usrreq.c === RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.153 diff -u -p -r1.153 uipc_usrreq.c --- sys/kern/uipc_usrreq.c 30 Oct 2021 16:35:31 - 1.153 +++ sys/kern/uipc_usrreq.c 5 Nov 2021 12:56:41 - @@ -52,14 +52,19 @@ #include #include #include +#include #include /* * Locks used to protect global data and struct members: * I immutable after creation * U unp_lock + * R unp_rights_mtx + * a atomic */ + struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock"); +struct mutex unp_rights_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); /* * Stack of sets of files that were passed over a socket but were @@ -99,7 +104,7 @@ SLIST_HEAD(,unp_deferral)unp_deferred = SLIST_HEAD_INITIALIZER(unp_deferred); ino_t unp_ino;/* [U] prototype for fake inode numbers */ -intunp_rights; /* [U] file descriptors in flight */ +intunp_rights; /* [R] file descriptors in flight */ intunp_defer; /* [U] number of deferred fp to close by the GC task */ intunp_gcing; /* [U] GC task currently running */ @@ -927,17 +932,18 @@ restart: */ rp = (struct fdpass *)CMSG_DATA(cm); - rw_enter_write(_lock); for (i = 0; i < nfds; i++) { struct unpcb *unp; fp = rp->fp; rp++; if ((unp = fptounp(fp)) != NULL) - unp->unp_msgcount--; - unp_rights--; + atomic_dec_long(>unp_msgcount); } - rw_exit_write(_lock); + + mtx_enter(_rights_mtx); + unp_rights -= nfds; + mtx_leave(_rights_mtx); /* * Copy temporary array to message and adjust length, in case of @@ -985,13 +991,13 @@ unp_internalize(struct mbuf *control, st return (EINVAL); nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int); - rw_enter_write(_lock); + mtx_enter(_rights_mtx); if (unp_rights + nfds > maxfiles / 10) { - rw_exit_write(_lock); + mtx_leave(_rights_mtx); return (EMFILE); } unp_rights += nfds; - rw_exit_write(_lock); +
Re: speedup io marshal in rpki-client
On Fri, Nov 05, 2021 at 09:18:15AM +0100, Claudio Jeker wrote: > Noticed the other day. The ip addr arrays and as number array are > marshalled element by element which is not very efficent. > All the data is in one big blob of memory so just use the basic io > operations for a memory blob and ship the full array at once. > > This seems to reduce runtime by 5-10% (in my unscientific testing). > Also it makes the code a fair bit simpler. OK job@
Re: speedup io marshal in rpki-client
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.11.05 09:18:15 +0100: > Noticed the other day. The ip addr arrays and as number array are > marshalled element by element which is not very efficent. > All the data is in one big blob of memory so just use the basic io > operations for a memory blob and ship the full array at once. > > This seems to reduce runtime by 5-10% (in my unscientific testing). > Also it makes the code a fair bit simpler. can't test right now, but i think all the multiplications are correct. You dropped some asserts(), but i guess thats fine. ok benno@ > -- > :wq Claudio > > Index: cert.c > === > RCS file: /cvs/src/usr.sbin/rpki-client/cert.c,v > retrieving revision 1.46 > diff -u -p -r1.46 cert.c > --- cert.c4 Nov 2021 11:32:55 - 1.46 > +++ cert.c5 Nov 2021 07:53:50 - > @@ -1225,34 +1225,6 @@ cert_free(struct cert *p) > free(p); > } > > -static void > -cert_ip_buffer(struct ibuf *b, const struct cert_ip *p) > -{ > - io_simple_buffer(b, >afi, sizeof(enum afi)); > - io_simple_buffer(b, >type, sizeof(enum cert_ip_type)); > - > - if (p->type != CERT_IP_INHERIT) { > - io_simple_buffer(b, >min, sizeof(p->min)); > - io_simple_buffer(b, >max, sizeof(p->max)); > - } > - > - if (p->type == CERT_IP_RANGE) > - ip_addr_range_buffer(b, >range); > - else if (p->type == CERT_IP_ADDR) > - ip_addr_buffer(b, >ip); > -} > - > -static void > -cert_as_buffer(struct ibuf *b, const struct cert_as *p) > -{ > - io_simple_buffer(b, >type, sizeof(enum cert_as_type)); > - if (p->type == CERT_AS_RANGE) { > - io_simple_buffer(b, >range.min, sizeof(uint32_t)); > - io_simple_buffer(b, >range.max, sizeof(uint32_t)); > - } else if (p->type == CERT_AS_ID) > - io_simple_buffer(b, >id, sizeof(uint32_t)); > -} > - > /* > * Write certificate parsed content into buffer. > * See cert_read() for the other side of the pipe. > @@ -1260,18 +1232,15 @@ cert_as_buffer(struct ibuf *b, const str > void > cert_buffer(struct ibuf *b, const struct cert *p) > { > - size_t i; > - > io_simple_buffer(b, >expires, sizeof(p->expires)); > io_simple_buffer(b, >purpose, sizeof(p->purpose)); > io_simple_buffer(b, >talid, sizeof(p->talid)); > io_simple_buffer(b, >ipsz, sizeof(p->ipsz)); > - for (i = 0; i < p->ipsz; i++) > - cert_ip_buffer(b, >ips[i]); > - > io_simple_buffer(b, >asz, sizeof(p->asz)); > - for (i = 0; i < p->asz; i++) > - cert_as_buffer(b, >as[i]); > + > + io_simple_buffer(b, p->ips, p->ipsz * sizeof(p->ips[0])); > + io_simple_buffer(b, p->as, p->asz * sizeof(p->as[0])); > + > io_str_buffer(b, p->mft); > io_str_buffer(b, p->notify); > io_str_buffer(b, p->repo); > @@ -1282,34 +1251,6 @@ cert_buffer(struct ibuf *b, const struct > io_str_buffer(b, p->pubkey); > } > > -static void > -cert_ip_read(struct ibuf *b, struct cert_ip *p) > -{ > - io_read_buf(b, >afi, sizeof(enum afi)); > - io_read_buf(b, >type, sizeof(enum cert_ip_type)); > - > - if (p->type != CERT_IP_INHERIT) { > - io_read_buf(b, >min, sizeof(p->min)); > - io_read_buf(b, >max, sizeof(p->max)); > - } > - > - if (p->type == CERT_IP_RANGE) > - ip_addr_range_read(b, >range); > - else if (p->type == CERT_IP_ADDR) > - ip_addr_read(b, >ip); > -} > - > -static void > -cert_as_read(struct ibuf *b, struct cert_as *p) > -{ > - io_read_buf(b, >type, sizeof(enum cert_as_type)); > - if (p->type == CERT_AS_RANGE) { > - io_read_buf(b, >range.min, sizeof(uint32_t)); > - io_read_buf(b, >range.max, sizeof(uint32_t)); > - } else if (p->type == CERT_AS_ID) > - io_read_buf(b, >id, sizeof(uint32_t)); > -} > - > /* > * Allocate and read parsed certificate content from descriptor. > * The pointer must be freed with cert_free(). > @@ -1319,7 +1260,6 @@ struct cert * > cert_read(struct ibuf *b) > { > struct cert *p; > - size_t i; > > if ((p = calloc(1, sizeof(struct cert))) == NULL) > err(1, NULL); > @@ -1328,19 +1268,17 @@ cert_read(struct ibuf *b) > io_read_buf(b, >purpose, sizeof(p->purpose)); > io_read_buf(b, >talid, sizeof(p->talid)); > io_read_buf(b, >ipsz, sizeof(p->ipsz)); > + io_read_buf(b, >asz, sizeof(p->asz)); > > p->ips = calloc(p->ipsz, sizeof(struct cert_ip)); > if (p->ips == NULL) > err(1, NULL); > - for (i = 0; i < p->ipsz; i++) > - cert_ip_read(b, >ips[i]); > + io_read_buf(b, p->ips, p->ipsz * sizeof(p->ips[0])); > > - io_read_buf(b, >asz, sizeof(p->asz)); > p->as = calloc(p->asz, sizeof(struct cert_as)); > if (p->as == NULL) > err(1, NULL); > - for (i = 0; i < p->asz; i++) > -
poll/select: Lazy removal of knotes
New poll/select(2) implementation convert 'struct pollfd' and 'fdset' to knotes (kqueue event descriptors) then pass them to the kqueue subsystem. A knote is allocated, with kqueue_register(), for every read, write and except condition watched on a given FD. That means at most 3 allocations might be necessary per FD. The diff below reduce the overhead of per-syscall allocation/free of those descriptors by leaving those which didn't trigger on the kqueue across syscall. Leaving knotes on the kqueue allows kqueue_register() to re-use existing descriptor instead of re-allocating a new one. With this knotes are now lazily removed. The mechanism uses a serial number which is incremented for every syscall that indicates if a knote sitting in the kqueue is still valid or should be freed. Note that performance improvements might not be visible with this diff alone because kqueue_register() still pre-allocate a descriptor then drop it. visa@ already pointed out that the lazy removal logic could be integrated in kqueue_scan() which would reduce the complexity of those two syscalls. I'm arguing for doing this in a next step in-tree. Please test and review :) Index: kern/sys_generic.c === RCS file: /cvs/src/sys/kern/sys_generic.c,v retrieving revision 1.139 diff -u -p -r1.139 sys_generic.c --- kern/sys_generic.c 29 Oct 2021 15:52:44 - 1.139 +++ kern/sys_generic.c 5 Nov 2021 08:11:05 - @@ -598,7 +598,7 @@ sys_pselect(struct proc *p, void *v, reg int dopselect(struct proc *p, int nd, fd_set *in, fd_set *ou, fd_set *ex, -struct timespec *timeout, const sigset_t *sigmask, register_t *retval) +struct timespec *tsp, const sigset_t *sigmask, register_t *retval) { struct kqueue_scan_state scan; fd_mask bits[6]; @@ -666,10 +666,10 @@ dopselect(struct proc *p, int nd, fd_set if (nevents == 0 && ncollected == 0) { uint64_t nsecs = INFSLP; - if (timeout != NULL) { - if (!timespecisset(timeout)) + if (tsp != NULL) { + if (!timespecisset(tsp)) goto done; - nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP)); + nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(tsp), MAXTSLP)); } error = tsleep_nsec(>p_kq, PSOCK | PCATCH, "kqsel", nsecs); /* select is not restarted after signals... */ @@ -682,28 +682,37 @@ dopselect(struct proc *p, int nd, fd_set /* Collect at most `nevents' possibly waiting in kqueue_scan() */ kqueue_scan_setup(, p->p_kq); - while (nevents > 0) { + while ((nevents - ncollected) > 0) { struct kevent kev[KQ_NEVENTS]; int i, ready, count; - /* Maximum number of events per iteration */ - count = MIN(nitems(kev), nevents); - ready = kqueue_scan(, count, kev, timeout, p, ); + /* +* Maximum number of events per iteration. Use the whole +* array to gather as many spurious events as possible. +*/ + count = nitems(kev); + ready = kqueue_scan(, count, kev, tsp, p, ); #ifdef KTRACE if (KTRPOINT(p, KTR_STRUCT)) ktrevent(p, kev, ready); #endif - /* Convert back events that are ready. */ + /* Convert back events that are ready/delete spurious ones. */ for (i = 0; i < ready && error == 0; i++) error = pselcollect(p, [i], pobits, ); + /* -* Stop if there was an error or if we had enough -* space to collect all events that were ready. +* Stop if there was an error or if we had enough space +* to collect all non-spurious events that were ready. */ - if (error || ready < count) + if (error || !ready || (ncollected > 0 && ready < count)) break; - nevents -= ready; + /* +* If we only got spurious events try again repositioning +* the marker. +*/ + if (ncollected == 0 && ((tsp == NULL) || timespecisset(tsp))) + scan.kqs_nevent = 0; } kqueue_scan_finish(); *retval = ncollected; @@ -730,7 +739,7 @@ done: if (pibits[0] != (fd_set *)[0]) free(pibits[0], M_TEMP, 6 * ni); - kqueue_purge(p, p->p_kq); + /* Needed to remove events lazily. */ p->p_kq_serial += nd; return (error); @@ -759,7 +768,7 @@ pselregister(struct proc *p, fd_set *pib DPRINTFN(2, "select fd %d mask %d serial %lu\n", fd, msk, p->p_kq_serial);
Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic
> Date: Fri, 5 Nov 2021 07:18:03 +0100 > From: Martin Pieuchot > > On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote: > > This completely removes global rwlock(9) from the unp_internalize() and > > unp_externalize() normal paths but only leaves it in unp_externalize() > > error path. Also we don't need to simultaneously hold both fdplock() > > and `unp_lock' in unp_internalize(). As non obvious profit this > > simplifies the future lock dances in the UNIX sockets layer. > > > > It's safe to call fptounp() without `unp_lock' held. We always got this > > file descriptor by fd_getfile(9) so we always have the extra reference > > and this descriptor can't be closed by concurrent thread. Some sockets > > could be destroyed through 'PRU_ABORT' path but they don't have > > associated file descriptor and they are not accessible in the > > unp_internalize() path. > > > > The `unp_file' access without `unp_lock' held is also safe. Each socket > > could have the only associated file descriptor and each file descriptor > > could have the only associated socket. We only assign `unp_file' in the > > unp_internalize() path where we got the socket by fd_getfile(9). This > > descriptor has the extra reference and couldn't be closed concurrently. > > We could override `unp_file' but with the same address because the > > associated file descriptor can't be changed so the address will be also > > the same. So while unp_gc() concurrently runs the dereference of > > non-NULL `unp_file' is always safe. > > Using an atomic operation for `unp_msgcount' is ok with me, one comment > about `unp_rights' below. > > > Index: sys/kern/uipc_usrreq.c > > === > > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v > > retrieving revision 1.153 > > diff -u -p -r1.153 uipc_usrreq.c > > --- sys/kern/uipc_usrreq.c 30 Oct 2021 16:35:31 - 1.153 > > +++ sys/kern/uipc_usrreq.c 30 Oct 2021 18:41:25 - > > @@ -58,6 +58,7 @@ > > * Locks used to protect global data and struct members: > > * I immutable after creation > > * U unp_lock > > + * a atomic > > */ > > struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock"); > > > > @@ -99,7 +100,7 @@ SLIST_HEAD(,unp_deferral)unp_deferred = > > SLIST_HEAD_INITIALIZER(unp_deferred); > > > > ino_t unp_ino;/* [U] prototype for fake inode numbers */ > > -intunp_rights; /* [U] file descriptors in flight */ > > +intunp_rights; /* [a] file descriptors in flight */ > > intunp_defer; /* [U] number of deferred fp to close by the GC > > task */ > > intunp_gcing; /* [U] GC task currently running */ > > > > @@ -927,17 +928,16 @@ restart: > > */ > > rp = (struct fdpass *)CMSG_DATA(cm); > > > > - rw_enter_write(_lock); > > for (i = 0; i < nfds; i++) { > > struct unpcb *unp; > > > > fp = rp->fp; > > rp++; > > if ((unp = fptounp(fp)) != NULL) > > - unp->unp_msgcount--; > > - unp_rights--; > > + atomic_dec_long(>unp_msgcount); > > } > > - rw_exit_write(_lock); > > + > > + atomic_sub_int(_rights, nfds); > > > > /* > > * Copy temporary array to message and adjust length, in case of > > @@ -985,13 +985,10 @@ unp_internalize(struct mbuf *control, st > > return (EINVAL); > > nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int); > > > > - rw_enter_write(_lock); > > - if (unp_rights + nfds > maxfiles / 10) { > > - rw_exit_write(_lock); > > + if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) { > > + atomic_sub_int(_rights, nfds); > > I can't believe this is race free. If two threads, T1 and T2, call > atomic_add at the same time both might end up returning EMFILE even > if only the first one currently does. This could happen if T1 exceeds > the limit and T2 does atomic_add on an already-exceeded `unp_rights' > before T1 could do atomic_sub. > > I suggest using a mutex to protect `unp_rights' instead to solve this > issue. Yes, that would be better. Otherwise it would be trivial to DOS anything that does file descriptor passing. > > return (EMFILE); > > } > > - unp_rights += nfds; > > - rw_exit_write(_lock); > > > > /* Make sure we have room for the struct file pointers */ > > morespace: > > @@ -1031,7 +1028,6 @@ morespace: > > ip = ((int *)CMSG_DATA(cm)) + nfds - 1; > > rp = ((struct fdpass *)CMSG_DATA(cm)) + nfds - 1; > > fdplock(fdp); > > - rw_enter_write(_lock); > > for (i = 0; i < nfds; i++) { > > memcpy(, ip, sizeof fd); > > ip--; > > @@ -1056,15 +1052,13 @@ morespace: > > rp->flags = fdp->fd_ofileflags[fd] & UF_PLEDGED; > > rp--; > > if ((unp = fptounp(fp)) != NULL) { > > + atomic_inc_long(>unp_msgcount); > >
speedup io marshal in rpki-client
Noticed the other day. The ip addr arrays and as number array are marshalled element by element which is not very efficent. All the data is in one big blob of memory so just use the basic io operations for a memory blob and ship the full array at once. This seems to reduce runtime by 5-10% (in my unscientific testing). Also it makes the code a fair bit simpler. -- :wq Claudio Index: cert.c === RCS file: /cvs/src/usr.sbin/rpki-client/cert.c,v retrieving revision 1.46 diff -u -p -r1.46 cert.c --- cert.c 4 Nov 2021 11:32:55 - 1.46 +++ cert.c 5 Nov 2021 07:53:50 - @@ -1225,34 +1225,6 @@ cert_free(struct cert *p) free(p); } -static void -cert_ip_buffer(struct ibuf *b, const struct cert_ip *p) -{ - io_simple_buffer(b, >afi, sizeof(enum afi)); - io_simple_buffer(b, >type, sizeof(enum cert_ip_type)); - - if (p->type != CERT_IP_INHERIT) { - io_simple_buffer(b, >min, sizeof(p->min)); - io_simple_buffer(b, >max, sizeof(p->max)); - } - - if (p->type == CERT_IP_RANGE) - ip_addr_range_buffer(b, >range); - else if (p->type == CERT_IP_ADDR) - ip_addr_buffer(b, >ip); -} - -static void -cert_as_buffer(struct ibuf *b, const struct cert_as *p) -{ - io_simple_buffer(b, >type, sizeof(enum cert_as_type)); - if (p->type == CERT_AS_RANGE) { - io_simple_buffer(b, >range.min, sizeof(uint32_t)); - io_simple_buffer(b, >range.max, sizeof(uint32_t)); - } else if (p->type == CERT_AS_ID) - io_simple_buffer(b, >id, sizeof(uint32_t)); -} - /* * Write certificate parsed content into buffer. * See cert_read() for the other side of the pipe. @@ -1260,18 +1232,15 @@ cert_as_buffer(struct ibuf *b, const str void cert_buffer(struct ibuf *b, const struct cert *p) { - size_t i; - io_simple_buffer(b, >expires, sizeof(p->expires)); io_simple_buffer(b, >purpose, sizeof(p->purpose)); io_simple_buffer(b, >talid, sizeof(p->talid)); io_simple_buffer(b, >ipsz, sizeof(p->ipsz)); - for (i = 0; i < p->ipsz; i++) - cert_ip_buffer(b, >ips[i]); - io_simple_buffer(b, >asz, sizeof(p->asz)); - for (i = 0; i < p->asz; i++) - cert_as_buffer(b, >as[i]); + + io_simple_buffer(b, p->ips, p->ipsz * sizeof(p->ips[0])); + io_simple_buffer(b, p->as, p->asz * sizeof(p->as[0])); + io_str_buffer(b, p->mft); io_str_buffer(b, p->notify); io_str_buffer(b, p->repo); @@ -1282,34 +1251,6 @@ cert_buffer(struct ibuf *b, const struct io_str_buffer(b, p->pubkey); } -static void -cert_ip_read(struct ibuf *b, struct cert_ip *p) -{ - io_read_buf(b, >afi, sizeof(enum afi)); - io_read_buf(b, >type, sizeof(enum cert_ip_type)); - - if (p->type != CERT_IP_INHERIT) { - io_read_buf(b, >min, sizeof(p->min)); - io_read_buf(b, >max, sizeof(p->max)); - } - - if (p->type == CERT_IP_RANGE) - ip_addr_range_read(b, >range); - else if (p->type == CERT_IP_ADDR) - ip_addr_read(b, >ip); -} - -static void -cert_as_read(struct ibuf *b, struct cert_as *p) -{ - io_read_buf(b, >type, sizeof(enum cert_as_type)); - if (p->type == CERT_AS_RANGE) { - io_read_buf(b, >range.min, sizeof(uint32_t)); - io_read_buf(b, >range.max, sizeof(uint32_t)); - } else if (p->type == CERT_AS_ID) - io_read_buf(b, >id, sizeof(uint32_t)); -} - /* * Allocate and read parsed certificate content from descriptor. * The pointer must be freed with cert_free(). @@ -1319,7 +1260,6 @@ struct cert * cert_read(struct ibuf *b) { struct cert *p; - size_t i; if ((p = calloc(1, sizeof(struct cert))) == NULL) err(1, NULL); @@ -1328,19 +1268,17 @@ cert_read(struct ibuf *b) io_read_buf(b, >purpose, sizeof(p->purpose)); io_read_buf(b, >talid, sizeof(p->talid)); io_read_buf(b, >ipsz, sizeof(p->ipsz)); + io_read_buf(b, >asz, sizeof(p->asz)); p->ips = calloc(p->ipsz, sizeof(struct cert_ip)); if (p->ips == NULL) err(1, NULL); - for (i = 0; i < p->ipsz; i++) - cert_ip_read(b, >ips[i]); + io_read_buf(b, p->ips, p->ipsz * sizeof(p->ips[0])); - io_read_buf(b, >asz, sizeof(p->asz)); p->as = calloc(p->asz, sizeof(struct cert_as)); if (p->as == NULL) err(1, NULL); - for (i = 0; i < p->asz; i++) - cert_as_read(b, >as[i]); + io_read_buf(b, p->as, p->asz * sizeof(p->as[0])); io_read_str(b, >mft); io_read_str(b, >notify); Index: extern.h === RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v retrieving revision 1.93 diff -u -p -r1.93
Re: UNIX sockets: use vnode(9) lock to protect `v_socket' dereference
On 26/10/21(Tue) 14:12, Vitaliy Makkoveev wrote: > Another step to make UNIX sockets locking fine grained. > > The listening socket has the references from file descriptors layer and > from the vnode(9) layer. This means when we close(2)'ing such socket it > still referenced by concurrent thread through connect(2) path. > > When we bind(2) UNIX socket we link it to vnode(9) by assigning > `v_socket'. When we connect(2)'ing socket to the socket we previously > bind(2)'ed we finding it by namei(9) and obtain it's reference through > `v_socket'. This socket has no extra reference in file descriptor > layer and could be closed by concurrent thread. > > This time we have `unp_lock' rwlock(9) which protects the whole layer > and the dereference of `v_socket' is safe. But with the fine grained > locking the `v_socket' will not be protected by global lock. When we > obtain the vnode(9) by namei(9) in connect(9) or bind(9) paths it is > already exclusively locked by vlode(9) lock. But in unp_detach() which > is called on the close(2)'ing socket we assume `unp_lock' protects > `v_socket'. > > I propose to use exclusive vnode(9) lock to protect `v_socket'. With the > fine grained locking, the `v_socket' dereference in unp_bind() or > unp_connect() threads will be safe because unp_detach() thread will wait > the vnode(9) lock release. The vnode referenced by `unp_vnod' has > reference counter bumped so it's dereference is also safe without > `unp_lock' held. This makes sense to me. Using the vnode lock here seems the simplest approach. > The `i_lock' should be take before `unp_lock' and unp_detach() should > release solock(). To prevent connections on this socket the > 'SO_ACCEPTCONN' bit cleared in soclose(). This is done to prevent races when solock() is released inside soabort(), right? Is it the only one or some more care is needed? Will this stay with per-socket locks or is this only necessary because of the global `unp_lock'? > Index: sys/kern/uipc_socket.c > === > RCS file: /cvs/src/sys/kern/uipc_socket.c,v > retrieving revision 1.265 > diff -u -p -r1.265 uipc_socket.c > --- sys/kern/uipc_socket.c14 Oct 2021 23:05:10 - 1.265 > +++ sys/kern/uipc_socket.c26 Oct 2021 11:05:59 - > @@ -315,6 +315,8 @@ soclose(struct socket *so, int flags) > /* Revoke async IO early. There is a final revocation in sofree(). */ > sigio_free(>so_sigio); > if (so->so_options & SO_ACCEPTCONN) { > + so->so_options &= ~SO_ACCEPTCONN; > + > while ((so2 = TAILQ_FIRST(>so_q0)) != NULL) { > (void) soqremque(so2, 0); > (void) soabort(so2); > Index: sys/kern/uipc_usrreq.c > === > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v > retrieving revision 1.150 > diff -u -p -r1.150 uipc_usrreq.c > --- sys/kern/uipc_usrreq.c21 Oct 2021 22:11:07 - 1.150 > +++ sys/kern/uipc_usrreq.c26 Oct 2021 11:05:59 - > @@ -474,20 +474,30 @@ void > unp_detach(struct unpcb *unp) > { > struct socket *so = unp->unp_socket; > - struct vnode *vp = NULL; > + struct vnode *vp = unp->unp_vnode; > > rw_assert_wrlock(_lock); > > LIST_REMOVE(unp, unp_link); > - if (unp->unp_vnode) { > + > + if (vp) { > + unp->unp_vnode = NULL; > + > /* > - * `v_socket' is only read in unp_connect and > - * unplock prevents concurrent access. > + * Enforce `i_lock' -> `unp_lock' because fifo > + * subsystem requires it. >*/ > > - unp->unp_vnode->v_socket = NULL; > - vp = unp->unp_vnode; > - unp->unp_vnode = NULL; > + sounlock(so, SL_LOCKED); > + > + VOP_LOCK(vp, LK_EXCLUSIVE); > + vp->v_socket = NULL; > + > + KERNEL_LOCK(); > + vput(vp); > + KERNEL_UNLOCK(); > + > + solock(so); > } > > if (unp->unp_conn) > @@ -500,21 +510,6 @@ unp_detach(struct unpcb *unp) > pool_put(_pool, unp); > if (unp_rights) > task_add(systqmp, _gc_task); > - > - if (vp != NULL) { > - /* > - * Enforce `i_lock' -> `unplock' because fifo subsystem > - * requires it. The socket can't be closed concurrently > - * because the file descriptor reference is > - * still hold. > - */ > - > - sounlock(so, SL_LOCKED); > - KERNEL_LOCK(); > - vrele(vp); > - KERNEL_UNLOCK(); > - solock(so); > - } > } > > int >
Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic
On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote: > This completely removes global rwlock(9) from the unp_internalize() and > unp_externalize() normal paths but only leaves it in unp_externalize() > error path. Also we don't need to simultaneously hold both fdplock() > and `unp_lock' in unp_internalize(). As non obvious profit this > simplifies the future lock dances in the UNIX sockets layer. > > It's safe to call fptounp() without `unp_lock' held. We always got this > file descriptor by fd_getfile(9) so we always have the extra reference > and this descriptor can't be closed by concurrent thread. Some sockets > could be destroyed through 'PRU_ABORT' path but they don't have > associated file descriptor and they are not accessible in the > unp_internalize() path. > > The `unp_file' access without `unp_lock' held is also safe. Each socket > could have the only associated file descriptor and each file descriptor > could have the only associated socket. We only assign `unp_file' in the > unp_internalize() path where we got the socket by fd_getfile(9). This > descriptor has the extra reference and couldn't be closed concurrently. > We could override `unp_file' but with the same address because the > associated file descriptor can't be changed so the address will be also > the same. So while unp_gc() concurrently runs the dereference of > non-NULL `unp_file' is always safe. Using an atomic operation for `unp_msgcount' is ok with me, one comment about `unp_rights' below. > Index: sys/kern/uipc_usrreq.c > === > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v > retrieving revision 1.153 > diff -u -p -r1.153 uipc_usrreq.c > --- sys/kern/uipc_usrreq.c30 Oct 2021 16:35:31 - 1.153 > +++ sys/kern/uipc_usrreq.c30 Oct 2021 18:41:25 - > @@ -58,6 +58,7 @@ > * Locks used to protect global data and struct members: > * I immutable after creation > * U unp_lock > + * a atomic > */ > struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock"); > > @@ -99,7 +100,7 @@ SLIST_HEAD(,unp_deferral) unp_deferred = > SLIST_HEAD_INITIALIZER(unp_deferred); > > ino_tunp_ino;/* [U] prototype for fake inode numbers */ > -int unp_rights; /* [U] file descriptors in flight */ > +int unp_rights; /* [a] file descriptors in flight */ > int unp_defer; /* [U] number of deferred fp to close by the GC task */ > int unp_gcing; /* [U] GC task currently running */ > > @@ -927,17 +928,16 @@ restart: >*/ > rp = (struct fdpass *)CMSG_DATA(cm); > > - rw_enter_write(_lock); > for (i = 0; i < nfds; i++) { > struct unpcb *unp; > > fp = rp->fp; > rp++; > if ((unp = fptounp(fp)) != NULL) > - unp->unp_msgcount--; > - unp_rights--; > + atomic_dec_long(>unp_msgcount); > } > - rw_exit_write(_lock); > + > + atomic_sub_int(_rights, nfds); > > /* >* Copy temporary array to message and adjust length, in case of > @@ -985,13 +985,10 @@ unp_internalize(struct mbuf *control, st > return (EINVAL); > nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int); > > - rw_enter_write(_lock); > - if (unp_rights + nfds > maxfiles / 10) { > - rw_exit_write(_lock); > + if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) { > + atomic_sub_int(_rights, nfds); I can't believe this is race free. If two threads, T1 and T2, call atomic_add at the same time both might end up returning EMFILE even if only the first one currently does. This could happen if T1 exceeds the limit and T2 does atomic_add on an already-exceeded `unp_rights' before T1 could do atomic_sub. I suggest using a mutex to protect `unp_rights' instead to solve this issue. > return (EMFILE); > } > - unp_rights += nfds; > - rw_exit_write(_lock); > > /* Make sure we have room for the struct file pointers */ > morespace: > @@ -1031,7 +1028,6 @@ morespace: > ip = ((int *)CMSG_DATA(cm)) + nfds - 1; > rp = ((struct fdpass *)CMSG_DATA(cm)) + nfds - 1; > fdplock(fdp); > - rw_enter_write(_lock); > for (i = 0; i < nfds; i++) { > memcpy(, ip, sizeof fd); > ip--; > @@ -1056,15 +1052,13 @@ morespace: > rp->flags = fdp->fd_ofileflags[fd] & UF_PLEDGED; > rp--; > if ((unp = fptounp(fp)) != NULL) { > + atomic_inc_long(>unp_msgcount); > unp->unp_file = fp; > - unp->unp_msgcount++; > } > } > - rw_exit_write(_lock); > fdpunlock(fdp); > return (0); > fail: > - rw_exit_write(_lock); > fdpunlock(fdp); > if (fp != NULL) > FRELE(fp, p); > @@ -1072,17 +1066,13 @@ fail: > for ( ; i >