Re: new site.8: document site*.tgz and /{upgrade,install}.site

2021-11-05 Thread Klemens Nanni
On Fri, Nov 05, 2021 at 05:36:48PM -0700, Andrew Hewus Fresh wrote:
> I like it, some comments in-line but overall I think this would have
> helped me get started with siteXX stuff, so OK afresh1@

Great, I'll commit with tweaks as per below, thanks!

> > +If existent and executable,
> 
> This whole sentence reads weird to me.  I don't know if this is better,
> but it confused me.
> 
>  If they exist and are executable at the end of the install or
>  upgrade process, /install.site or /upgrade.site respectively, are
>  run with chroot(8) based at the system's root.
> 
> > +.Pa /install.site
> > +and
> > +.Pa /upgrade.site
> > +are run last during install and upgrade, respectively, with
> > +.Xr chroot 8
> > +based at the system's root.

I'll adapt this to

If they exist and are executable, /install.site and /upgrade.site are
run at the end of the install and upgrade process, respectively, with
chroot(8) based at the system's root.

> > +.Sh FILES
> > +.Bl -tag -width "site${VERSION}-$(hostname -s).tgz" -compact
> > +.It Pa site${ Ns Va VERSION Ns }.tgz
> > +Generic set.
> > +.It Pa site${ Ns Va VERSION Ns }-$( Ns Ic hostname Fl s Ns ).tgz
> > +Host-specific set.
> > +.It Pa /upgrade.site
> > +Generic post-upgrade script.
> > +.It Pa /install.site
> > +Generic post-install script.
> > +.El
> > +.Sh EXAMPLES
> > +Create
> > +.Nm
> > +sets and update the index:
> > +.Bd -literal -offset indent
> > +# tar -czhf site70.tgz generic/
> > +# tar -czhf site70-puffy.tgz puffy/
> > +# ls -lT > index.txt
> > +.Ed
> > +.Pp
> > +Trigger an upgrade of
> > +.Xr packages 7
> > +upon next boot:
> > +.Bd -literal -offset indent
> > +echo 'pkg_add -Iu' >>/etc/rc.firsttime
> 
> I think some clarity that this should go into one of those siteXX.tgz
> files as "/upgrade.site" would help folks connect how to get those files
> onto the install media and make it clearer that those files get
> extracted into the chroot and that you don't have to figure out how to
> get them into the root of the install media.

That's one way and certainly the only one for fresh installs (unless
you modify the install media, which seems pointless), but for upgrades
you can create /upgrade.site and the installer will pick it up from
there.

That is,

cat << EOF >/upgrade.site
echo 'pkg_add -Iu' >>/etc/rc.firsttime
EOF
chmod +x /upgrade.site
sysupgrade

will upgrade packages after sysupgrade reboots -- no need for dealing
with siteXX.tgz on your existing install.

But you're right, I completely missed creating /upgrade.site in that
example, so I'll fix that.

> Perhaps that is clear enough from "run [...] with chroot", but I think
> this example makes things less clear, at least the way it is.

So I'll leave that for now.  Again, we can polish in-tree.



Re: new site.8: document site*.tgz and /{upgrade,install}.site

2021-11-05 Thread Andrew Hewus Fresh
I like it, some comments in-line but overall I think this would have
helped me get started with siteXX stuff, so OK afresh1@

On Fri, Nov 05, 2021 at 03:19:03PM +, Klemens Nanni wrote:
> On Wed, Oct 27, 2021 at 07:35:28PM -0500, Aaron Poffenberger wrote:
> > Looks good. Nice to see this moving forward. Thanks.
> > 
> > On 2021-10-27 21:13 +, Klemens Nanni  wrote:
> > > On Mon, Sep 06, 2021 at 02:29:50PM -0500, Aaron Poffenberger wrote:
> > > > Ping.
> > > > 
> > > > Will someone commit this? Seems like no one knows about /upgrade.site 
> > > > and it
> > > > fits well with sysupgrade(8).
> > > 
> > > sysupgrade(8) is one potential /upgrade.site user but I disagree about
> > > documenting the latter through the former.
> > > 
> > > Here is a new manual roughly merging the FAQ bits with what you, Aaron,
> > > provided.
> > > 
> > > site(8) is a lame but fitting name;  autoinstall(8) and sysypgrade(8)
> > > reference it and both sets and scripts are documented.
> > > 
> > > Examples are intentionally brief to be shorter and more concise than the
> > > FAQ while showing enough to get going.
> > > 
> > > 
> > > I am quite certain that wording, examples and the references from other
> > > manuals can be tweaked, but this diff looks like a good enough start
> > > and --if this is the way to go-- I'd prefer to commit and keep polishing
> > > in-tree.
> > > 
> > > Feedback? Objections? OK?
> 
> I have not received any feedback on this except from Aaron.
> 
> Anyone? Anything?
> 
> Index: distrib/sets/lists/man/mi
> ===
> RCS file: /cvs/src/distrib/sets/lists/man/mi,v
> retrieving revision 1.1645
> diff -u -p -r1.1645 mi
> --- distrib/sets/lists/man/mi 2 Nov 2021 22:07:33 -   1.1645
> +++ distrib/sets/lists/man/mi 3 Nov 2021 02:26:16 -
> @@ -2561,6 +2561,7 @@
>  ./usr/share/man/man8/security.8
>  ./usr/share/man/man8/sendmail.8
>  ./usr/share/man/man8/sensorsd.8
> +./usr/share/man/man8/site.8
>  ./usr/share/man/man8/sftp-server.8
>  ./usr/share/man/man8/showmount.8
>  ./usr/share/man/man8/shutdown.8
> Index: usr.sbin/sysupgrade/sysupgrade.8
> ===
> RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
> retrieving revision 1.10
> diff -u -p -r1.10 sysupgrade.8
> --- usr.sbin/sysupgrade/sysupgrade.8  3 Oct 2019 12:43:58 -   1.10
> +++ usr.sbin/sysupgrade/sysupgrade.8  27 Oct 2021 20:42:26 -
> @@ -67,6 +67,10 @@ This is the default if the system is cur
>  Upgrade to a snapshot.
>  This is the default if the system is currently running a snapshot.
>  .El
> +.Pp
> +See
> +.Xr site 8
> +for how to customize the upgrade process.
>  .Sh FILES
>  .Bl -tag -width "/auto_upgrade.conf" -compact
>  .It Pa /auto_upgrade.conf
> @@ -83,7 +87,8 @@ Directory the upgrade is downloaded to.
>  .Xr signify 1 ,
>  .Xr installurl 5 ,
>  .Xr autoinstall 8 ,
> -.Xr release 8
> +.Xr release 8 ,
> +.Xr site 8
>  .Sh HISTORY
>  .Nm
>  first appeared in
> Index: share/man/man8/Makefile
> ===
> RCS file: /cvs/src/share/man/man8/Makefile,v
> retrieving revision 1.102
> diff -u -p -r1.102 Makefile
> --- share/man/man8/Makefile   1 May 2021 16:11:10 -   1.102
> +++ share/man/man8/Makefile   27 Oct 2021 20:38:11 -
> @@ -6,7 +6,7 @@ MAN=  afterboot.8 autoinstall.8 boot_conf
>   crash.8 daily.8 \
>   diskless.8 genassym.sh.8 intro.8 netstart.8 rc.8 \
>   rc.conf.8 rc.d.8 rc.shutdown.8 rc.subr.8 release.8 \
> - security.8 ssl.8 starttls.8 sticky.8 yp.8
> + security.8 site.8 ssl.8 starttls.8 sticky.8 yp.8
>  
>  SUBDIR=  man8.alpha man8.amd64 man8.arm64 man8.armv7 \
>   man8.hppa man8.i386 man8.landisk \
> Index: share/man/man8/autoinstall.8
> ===
> RCS file: /cvs/src/share/man/man8/autoinstall.8,v
> retrieving revision 1.23
> diff -u -p -r1.23 autoinstall.8
> --- share/man/man8/autoinstall.8  18 Jul 2021 11:08:34 -  1.23
> +++ share/man/man8/autoinstall.8  27 Oct 2021 20:42:24 -
> @@ -32,6 +32,10 @@ file and HTTP to fetch the file.
>  If that fails, the installer asks for the location which can either be
>  a URL or a local path.
>  .Pp
> +See
> +.Xr site 8
> +for how to provide custom configuration.
> +.Pp
>  To start unattended installation or upgrade choose '(A)utoinstall' at the
>  install prompt.
>  If there is only one network interface, the installer fetches the response
> @@ -235,7 +239,8 @@ host foo {
>  .Sh SEE ALSO
>  .Xr dhcp-options 5 ,
>  .Xr dhcpd.conf 5 ,
> -.Xr diskless 8
> +.Xr diskless 8 ,
> +.Xr site 8
>  .Sh HISTORY
>  The
>  .Nm
> Index: share/man/man8/site.8
> ===
> RCS file: share/man/man8/site.8
> diff -N share/man/man8/site.8
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ share/man/man8/site.8 27 Oct 2021 21:11:48 

ftp: do not URI encode tilde as per RFC 2396

2021-11-05 Thread Klemens Nanni
ftp(1) implements RFC 1738 from Dec. 1994 but RFC 2396 from Aug. 1998
updated this and the tl;dr is:  do not encode the tilde character.

In theory, this shouldn't make a difference as servers should decode
"%7e" to "~", BUT not all servers do so and thus some respond with 404.

I've hit this in the past already but didn't look at the code.
Now I came across this link that made me fix it:
https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download

curl(1) and wget(1) both fetch this file successfully, i.e. they behave
identicall wrt. "%" and send it as-is:

$ 
url='https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download'

$ curl -I -sS -w '%{url}\n' "$url" | tail -1
200 
https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download

$ wget -d "$url" 2>&1 | grep -Ew 'GET|OK'
GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download 
HTTP/1.1
HTTP/1.1 200 OK
200 OK

ftp(1) still sticks with RFC 1738 and encodes it:

ftp -d "$url"
host review.trustedfirmware.org, port https, path 
changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as 
patch?download, auth none.
Trying 64:ff9b::339f:1211...
Requesting 
https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download
GET 
/changes/TF-A%2Ftrusted-firmware-a%7e7726/revisions/9/patch?download HTTP/1.1
Connection: close
Host: review.trustedfirmware.org
User-Agent: OpenBSD ftp

received 'HTTP/1.1 404 Not Found'
ftp: Error retrieving 
https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download:
 404 Not Found

RFC 2396 2.4.2. When to Escape and Unescape says:

   In some cases, data that could be represented by an unreserved
   character may appear escaped; for example, some of the unreserved
   "mark" characters are automatically escaped by some systems.  If the
   given URI scheme defines a canonicalization algorithm, then
   unreserved characters may be unescaped according to that algorithm.
   For example, "%7e" is sometimes used instead of "~" in an http URL
   path, but the two are equivalent for an http URL.

So they're equivalent and tilde does not need encoding.  Again, servers
should always decode it anyway, but not all of them do, so better send
it as-is/unencoded.

Diff below is effectively a one-character change that removes "~" from
`*unsafe_chars' but it also updates the comments to use RFC 2396 wording
and order of characters for easier code-to-standard comparison.

This allows me to fetch this patch:

$ ./obj/ftp -d "$url"
host review.trustedfirmware.org, port https, path 
changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download, save as 
patch?download, auth none.
Trying 64:ff9b::339f:1211...
Requesting 
https://review.trustedfirmware.org/changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download
GET /changes/TF-A%2Ftrusted-firmware-a~7726/revisions/9/patch?download 
HTTP/1.1
Connection: close
Host: review.trustedfirmware.org
User-Agent: OpenBSD ftp

received 'HTTP/1.1 200 OK'
received 'Date: Fri, 05 Nov 2021 23:34:49 GMT'
received 'Server: Apache'
received 'X-Frame-Options: DENY'
received 'Content-Disposition: attachment; 
filename="155911c.diff.base64"'
received 'X-Content-Type-Options: nosniff'
received 'Cache-Control: private'
received 'Pragma: no-cache'
received 'Expires: Mon, 01 Jan 1990 00:00:00 GMT'
received 'X-FYI-Content-Encoding: base64'
received 'X-FYI-Content-Type: application/mbox'
received 'Content-Type: text/plain;charset=iso-8859-1'
received 'Vary: Accept-Encoding'
received 'Connection: close'
received 'Transfer-Encoding: chunked'
9544 bytes received in 0.00 seconds (12.21 MB/s)


Looking at wget's source somewhat backs up this way of handling tilde
by explicitly mentioning broken "%7e" decoding, i.e. servers expecting
an unencoded "~" as-is:

>From wget-1.2.1/src/url.c 103f:

/* Table of "reserved" and "unsafe" characters.  Those terms are
   rfc1738-speak, as such largely obsoleted by rfc2396 and later
   specs, but the general idea remains.

   A reserved character is the one that you can't decode without
   changing the meaning of the URL.  For example, you can't decode
   "/foo/%2f/bar" into "/foo///bar" because the number and contents of
   path components is different.  Non-reserved characters can be
   changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar".  The
   unsafe characters are loosely based on rfc1738, plus "$" and ",",
   as recommended by rfc2396, and minus "~", which is very frequently
   used (and sometimes unrecognized as %7E by 

Re: [patch] httpd static gzip compression

2021-11-05 Thread Steffen Nurpmeso
  ...
 |You would be asking for
 |
 |https://exmaple.com/whatever/ls.1
 |
 |and with Accept-Encoding: gzip in the http header, and the
 |webserver would then look if it has a file
 |
 |/whatever/ls.1.gz
 |
 |(instead of without .gz) in its document tree and send you that, with 
 |"Content-Encoding: gzip" http header.

As an outsider i find this thread very amusing.
As you all know the normal approach is to have a cache directory
where some "compress" module performs on-the-fly storage if the
client asks for some file, accepts compression, and the compressed
version yet does not exist.  Funnily i once got not even an answer
when i asked for static compression, since on-the-fly compression
was already available, "so what", i have forgotten which
webserver that has been.  Cleanup via cron anyhow.

Now something for Theo, from the webserver i use.

  #if defined HAVE_ZLIB_H && defined HAVE_LIBZ
  # define USE_ZLIB
  # include 
  #endif
  #ifndef Z_DEFAULT_COMPRESSION
  #define Z_DEFAULT_COMPRESSION -1
  #endif
  #ifndef MAX_WBITS
  #define MAX_WBITS 15
  #endif
  
  #if defined HAVE_BZLIB_H && defined HAVE_LIBBZ2
  # define USE_BZ2LIB
  /* we don't need stdio interface */
  # define BZ_NO_STDIO
  # include 
  #endif
  
  #if defined HAVE_BROTLI_ENCODE_H && defined HAVE_BROTLI
  # define USE_BROTLI
  # include 
  #endif
  
  #if defined HAVE_ZSTD_H && defined HAVE_ZSTD
  # define USE_ZSTD
  # include 
  #endif
  
  #if defined HAVE_SYS_MMAN_H && defined HAVE_MMAP && defined ENABLE_MMAP
  #define USE_MMAP

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [patch] httpd static gzip compression

2021-11-05 Thread Crystal Kolipe
On Fri, Nov 05, 2021 at 08:24:21AM -0600, Theo de Raadt wrote:
> prx  wrote:
> 
> > I think this remark should be placed into perspective.
> > 
> > When a file is requested, its gzipped version is send if : 
> > * The client ask for it with appropriate header.
> > * The server admin configured httpd to do so **and** compressed files.
> 
> and if the gzipd file does not contain the contents as the ungzip'd
> file, then two different clients may get different results.

Just as big-endian and little-endian architectures can be fed different files
from the same ISO-9660 filesystem.

The issue you describe certainly exists, but it's not without precedent.

It could even be considered a feature, if the client supports compression then
send more verbose content.



Re: [patch] httpd static gzip compression

2021-11-05 Thread Sebastian Benoit
Theo de Raadt(dera...@openbsd.org) on 2021.11.05 08:24:21 -0600:
> prx  wrote:
> 
> > I think this remark should be placed into perspective.
> > 
> > When a file is requested, its gzipped version is send if : 
> > * The client ask for it with appropriate header.
> > * The server admin configured httpd to do so **and** compressed files.
> 
> and if the gzipd file does not contain the contents as the ungzip'd
> file, then two different clients may get different results.

Yes, but it's the responsibility of whoever puts the content there to do the
correct thing.

If i put a html file into a web directory and call it foo.jpg i would not
expect a sensible result either.




Re: rpki-client show attr name in rrdp parse errors

2021-11-05 Thread Sebastian Benoit
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.11.05 15:26:57 +0100:
> On Wed, Nov 03, 2021 at 12:58:17PM +0100, Claudio Jeker wrote:
> > In one place this is already done but this makes sure we show the bad
> > attribute in all cases where a non conforming attribute is found.
> 
> Found another bunch of those non conforming attribute errors. Adjust them
> as well.
> 
> OK?

ok

> -- 
> :wq Claudio
> 
> Index: rrdp_notification.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/rrdp_notification.c,v
> retrieving revision 1.9
> diff -u -p -r1.9 rrdp_notification.c
> --- rrdp_notification.c   29 Oct 2021 09:27:36 -  1.9
> +++ rrdp_notification.c   5 Nov 2021 14:06:18 -
> @@ -141,7 +141,7 @@ start_notification_elem(struct notificat
>   continue;
>   }
>   PARSE_FAIL(p, "parse failed - non conforming "
> - "attribute found in notification elem");
> + "attribute '%s' found in notification elem", attr[i]);
>   }
>   if (!(has_xmlns && nxml->version && nxml->session_id && nxml->serial))
>   PARSE_FAIL(p, "parse failed - incomplete "
> @@ -185,7 +185,7 @@ start_snapshot_elem(struct notification_
>   continue;
>   }
>   PARSE_FAIL(p, "parse failed - non conforming "
> - "attribute found in snapshot elem");
> + "attribute '%s' found in snapshot elem", attr[i]);
>   }
>   if (hasUri != 1 || hasHash != 1)
>   PARSE_FAIL(p, "parse failed - incomplete snapshot attributes");
> @@ -239,7 +239,7 @@ start_delta_elem(struct notification_xml
>   continue;
>   }
>   PARSE_FAIL(p, "parse failed - non conforming "
> - "attribute found in snapshot elem");
> + "attribute '%s' found in snapshot elem", attr[i]);
>   }
>   /* Only add to the list if we are relevant */
>   if (hasUri != 1 || hasHash != 1 || delta_serial == 0)
> 



Re: [patch] httpd static gzip compression

2021-11-05 Thread Sebastian Benoit
Ingo Schwarze(schwa...@usta.de) on 2021.11.05 14:37:15 +0100:
> Hi Theo,
> 
> Theo de Raadt wrote on Thu, Nov 04, 2021 at 08:27:47AM -0600:
> > prx  wrote:
> >> On 2021/11/04 14:21, prx wrote:
> 
> >>> The attached patch add support for static gzip compression.
> >>> 
> >>> In other words, if a client support gzip compression, when "file" is
> >>> requested, httpd will check if "file.gz" is avaiable to serve.
> 
> >> This diff doesn't compress "on the fly".
> >> It's up to the webmaster to compress files **before** serving them.
> 
> > Does any other program work this way?
> 
> Yes.  The man(1) program does.  At least on the vast majority of
> Linux systems (at least those using the man-db implementation
> of man(1)), on FreeBSD, and on DragonFly BSD.
> 
> Those systems store most manual pages as gzipped man(7) and mdoc(7)
> files, and man(1) decompresses them every time a user wants to look
> at one of them.  You say "man ls", and what you get is actually
> /usr/share/man/man1/ls.1.gz or something like that.
> 
> For man(1), that is next to useless because du -sh /usr/share/man =
> 42.6M uncompressed.  But it has repeatedly caused bugs in the past.
> I would love to remove the feature from mandoc, but even though it is
> rarely used in OpenBSD (some ports installed gzipped manuals in the
> past, but i think the ports tree has been clean now for quite some
> time; you might still need the feature when installing software
> or unpacking foreign manual page packages without using ports)
> it would be a bad idea to remove it because it is too widely used
> elsewhere.  Note that even the old BSD man(1) supported it.
> 
> > Where you request one filename, and it gives you another?
> 
> You did not ask what web servers do, but we are discussing a patch to
> a webserver.  For this reason, let me note in passing that even some
> otherwise extremely useful sites get it very wrong the other way round:
> 
>  $ ftp https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
> Trying 130.89.148.77...
> Requesting https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
> 100% |**|  8050   00:00   
>  
> 8050 bytes received in 0.00 seconds (11.74 MB/s)
>  $ file ls.1.en.gz
> ls.1.en.gz: troff or preprocessor input text
>  $ grep -A 1 '^.SH NAME' ls.1.en.gz  
> .SH NAME
> ls \- list directory contents
>  $ gunzip ls.1.en.gz
> gunzip: ls.1.en.gz: unrecognized file format

But with this patch, you are not asking the webserver for 
https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz

You would be asking for

https://exmaple.com/whatever/ls.1

and with Accept-Encoding: gzip in the http header, and the
webserver would then look if it has a file

/whatever/ls.1.gz

(instead of without .gz) in its document tree and send you that, with 
"Content-Encoding: gzip" http header.
And because of that header, your client will know that the data is gzipped
and will unzip it before writing the file to the output.

I.e. there is no such problem (unless the patch has a bug).

/Benno

> 
> > I have a difficult time understanding why gzip has to sneak it's way
> > into everything.
> > 
> > I always prefer software that does precisely what I expect it to do.
> 
> Certainly.
> 
> I have no strong opinion whether a webserver qualifies as "everything",
> though, nor did i look at the diff.  While all manpages are small in the
> real world, some web servers may have to store huge amounts of data that
> clients might request, so disk space might occasionally be an issue on
> a web server even in 2021.  Also, some websites deliver huge amounts of
> data to the client even when the user merely asked for some text (not sure
> such sites would consider running OpenBSD httpd(8), but whatever :) - when
> browsing the web, bandwidth is still occasionally an issue even in 2021,
> even though that is a rather absurd fact.
> 
> Yours,
>   Ingo
> 



Re: new site.8: document site*.tgz and /{upgrade,install}.site

2021-11-05 Thread Klemens Nanni
On Wed, Oct 27, 2021 at 07:35:28PM -0500, Aaron Poffenberger wrote:
> Looks good. Nice to see this moving forward. Thanks.
> 
> On 2021-10-27 21:13 +, Klemens Nanni  wrote:
> > On Mon, Sep 06, 2021 at 02:29:50PM -0500, Aaron Poffenberger wrote:
> > > Ping.
> > > 
> > > Will someone commit this? Seems like no one knows about /upgrade.site and 
> > > it
> > > fits well with sysupgrade(8).
> > 
> > sysupgrade(8) is one potential /upgrade.site user but I disagree about
> > documenting the latter through the former.
> > 
> > Here is a new manual roughly merging the FAQ bits with what you, Aaron,
> > provided.
> > 
> > site(8) is a lame but fitting name;  autoinstall(8) and sysypgrade(8)
> > reference it and both sets and scripts are documented.
> > 
> > Examples are intentionally brief to be shorter and more concise than the
> > FAQ while showing enough to get going.
> > 
> > 
> > I am quite certain that wording, examples and the references from other
> > manuals can be tweaked, but this diff looks like a good enough start
> > and --if this is the way to go-- I'd prefer to commit and keep polishing
> > in-tree.
> > 
> > Feedback? Objections? OK?

I have not received any feedback on this except from Aaron.

Anyone? Anything?

Index: distrib/sets/lists/man/mi
===
RCS file: /cvs/src/distrib/sets/lists/man/mi,v
retrieving revision 1.1645
diff -u -p -r1.1645 mi
--- distrib/sets/lists/man/mi   2 Nov 2021 22:07:33 -   1.1645
+++ distrib/sets/lists/man/mi   3 Nov 2021 02:26:16 -
@@ -2561,6 +2561,7 @@
 ./usr/share/man/man8/security.8
 ./usr/share/man/man8/sendmail.8
 ./usr/share/man/man8/sensorsd.8
+./usr/share/man/man8/site.8
 ./usr/share/man/man8/sftp-server.8
 ./usr/share/man/man8/showmount.8
 ./usr/share/man/man8/shutdown.8
Index: usr.sbin/sysupgrade/sysupgrade.8
===
RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
retrieving revision 1.10
diff -u -p -r1.10 sysupgrade.8
--- usr.sbin/sysupgrade/sysupgrade.83 Oct 2019 12:43:58 -   1.10
+++ usr.sbin/sysupgrade/sysupgrade.827 Oct 2021 20:42:26 -
@@ -67,6 +67,10 @@ This is the default if the system is cur
 Upgrade to a snapshot.
 This is the default if the system is currently running a snapshot.
 .El
+.Pp
+See
+.Xr site 8
+for how to customize the upgrade process.
 .Sh FILES
 .Bl -tag -width "/auto_upgrade.conf" -compact
 .It Pa /auto_upgrade.conf
@@ -83,7 +87,8 @@ Directory the upgrade is downloaded to.
 .Xr signify 1 ,
 .Xr installurl 5 ,
 .Xr autoinstall 8 ,
-.Xr release 8
+.Xr release 8 ,
+.Xr site 8
 .Sh HISTORY
 .Nm
 first appeared in
Index: share/man/man8/Makefile
===
RCS file: /cvs/src/share/man/man8/Makefile,v
retrieving revision 1.102
diff -u -p -r1.102 Makefile
--- share/man/man8/Makefile 1 May 2021 16:11:10 -   1.102
+++ share/man/man8/Makefile 27 Oct 2021 20:38:11 -
@@ -6,7 +6,7 @@ MAN=afterboot.8 autoinstall.8 boot_conf
crash.8 daily.8 \
diskless.8 genassym.sh.8 intro.8 netstart.8 rc.8 \
rc.conf.8 rc.d.8 rc.shutdown.8 rc.subr.8 release.8 \
-   security.8 ssl.8 starttls.8 sticky.8 yp.8
+   security.8 site.8 ssl.8 starttls.8 sticky.8 yp.8
 
 SUBDIR=man8.alpha man8.amd64 man8.arm64 man8.armv7 \
man8.hppa man8.i386 man8.landisk \
Index: share/man/man8/autoinstall.8
===
RCS file: /cvs/src/share/man/man8/autoinstall.8,v
retrieving revision 1.23
diff -u -p -r1.23 autoinstall.8
--- share/man/man8/autoinstall.818 Jul 2021 11:08:34 -  1.23
+++ share/man/man8/autoinstall.827 Oct 2021 20:42:24 -
@@ -32,6 +32,10 @@ file and HTTP to fetch the file.
 If that fails, the installer asks for the location which can either be
 a URL or a local path.
 .Pp
+See
+.Xr site 8
+for how to provide custom configuration.
+.Pp
 To start unattended installation or upgrade choose '(A)utoinstall' at the
 install prompt.
 If there is only one network interface, the installer fetches the response
@@ -235,7 +239,8 @@ host foo {
 .Sh SEE ALSO
 .Xr dhcp-options 5 ,
 .Xr dhcpd.conf 5 ,
-.Xr diskless 8
+.Xr diskless 8 ,
+.Xr site 8
 .Sh HISTORY
 The
 .Nm
Index: share/man/man8/site.8
===
RCS file: share/man/man8/site.8
diff -N share/man/man8/site.8
--- /dev/null   1 Jan 1970 00:00:00 -
+++ share/man/man8/site.8   27 Oct 2021 21:11:48 -
@@ -0,0 +1,87 @@
+.\" $OpenBSD: $
+.\"
+.\" Copyright (c) 2021 Klemens Nanni 
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH 

Re: rpki-client show attr name in rrdp parse errors

2021-11-05 Thread Claudio Jeker
On Wed, Nov 03, 2021 at 12:58:17PM +0100, Claudio Jeker wrote:
> In one place this is already done but this makes sure we show the bad
> attribute in all cases where a non conforming attribute is found.

Found another bunch of those non conforming attribute errors. Adjust them
as well.

OK?
-- 
:wq Claudio

Index: rrdp_notification.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/rrdp_notification.c,v
retrieving revision 1.9
diff -u -p -r1.9 rrdp_notification.c
--- rrdp_notification.c 29 Oct 2021 09:27:36 -  1.9
+++ rrdp_notification.c 5 Nov 2021 14:06:18 -
@@ -141,7 +141,7 @@ start_notification_elem(struct notificat
continue;
}
PARSE_FAIL(p, "parse failed - non conforming "
-   "attribute found in notification elem");
+   "attribute '%s' found in notification elem", attr[i]);
}
if (!(has_xmlns && nxml->version && nxml->session_id && nxml->serial))
PARSE_FAIL(p, "parse failed - incomplete "
@@ -185,7 +185,7 @@ start_snapshot_elem(struct notification_
continue;
}
PARSE_FAIL(p, "parse failed - non conforming "
-   "attribute found in snapshot elem");
+   "attribute '%s' found in snapshot elem", attr[i]);
}
if (hasUri != 1 || hasHash != 1)
PARSE_FAIL(p, "parse failed - incomplete snapshot attributes");
@@ -239,7 +239,7 @@ start_delta_elem(struct notification_xml
continue;
}
PARSE_FAIL(p, "parse failed - non conforming "
-   "attribute found in snapshot elem");
+   "attribute '%s' found in snapshot elem", attr[i]);
}
/* Only add to the list if we are relevant */
if (hasUri != 1 || hasHash != 1 || delta_serial == 0)



Re: [patch] httpd static gzip compression

2021-11-05 Thread Theo de Raadt
prx  wrote:

> I think this remark should be placed into perspective.
> 
> When a file is requested, its gzipped version is send if : 
> * The client ask for it with appropriate header.
> * The server admin configured httpd to do so **and** compressed files.

and if the gzipd file does not contain the contents as the ungzip'd
file, then two different clients may get different results.





Re: [patch] httpd static gzip compression

2021-11-05 Thread prx
* Theo de Raadt  le [04-11-2021 08:27:47 -0600]:
> prx  wrote:
> 
> > * Stuart Henderson  le [04-11-2021 14:09:39 +]:
> > > On 2021/11/04 14:21, prx wrote:
> > > > Hello,
> > > > The attached patch add support for static gzip compression.
> > > > 
> > > > In other words, if a client support gzip compression, when "file" is
> > > > requested, httpd will check if "file.gz" is avaiable to serve.
> > > > 
> > > > Regards.
> > > > 
> > > > prx
> > > 
> > > btw this was rejected before,
> > > 
> > > https://github.com/reyk/httpd/issues/21
> > > 
> > 
> > This diff doesn't compress "on the fly".
> > It's up to the webmaster to compress files **before** serving them.
> 
> Does any other program work this way?
> 
> Where you request one filename, and it gives you another?
> 
> I have a difficult time understanding why gzip has to sneak it's way
> into everything.
> 
> I always prefer software that does precisely what I expect it to do.
> 

I think this remark should be placed into perspective.

When a file is requested, its gzipped version is send if : 
* The client ask for it with appropriate header.
* The server admin configured httpd to do so **and** compressed files.

In this situation, the client does get the expected file.

Of course, the admin has the responsibility to give the same content in "file" 
and "file.gz".

The cost for the server is negligible but reduce bandwidth usage (for
both client and server).

According to previous comments, find below a modified patch to enable
static gzip compression on a location match. ie : 

server "foo" {
#[... snip ... ]
location "/*.html" { gzip_static }
location "/*.css" { gzip_static }
}

Regards.


Index: httpd.conf.5
===
RCS file: /cvs/src/usr.sbin/httpd/httpd.conf.5,v
retrieving revision 1.119
diff -u -p -r1.119 httpd.conf.5
--- httpd.conf.524 Oct 2021 16:01:04 -  1.119
+++ httpd.conf.55 Nov 2021 14:04:22 -
@@ -425,6 +425,10 @@ A variable that is set to a comma separa
 features in use
 .Pq omitted when TLS client verification is not in use .
 .El
+.It Ic gzip_static
+Enable static gzip compression.
+.Pp
+When a file is requested, serves the file with .gz added to its path if it 
exists.
 .It Ic hsts Oo Ar option Oc
 Enable HTTP Strict Transport Security.
 Valid options are:
Index: httpd.h
===
RCS file: /cvs/src/usr.sbin/httpd/httpd.h,v
retrieving revision 1.158
diff -u -p -r1.158 httpd.h
--- httpd.h 24 Oct 2021 16:01:04 -  1.158
+++ httpd.h 5 Nov 2021 14:04:22 -
@@ -87,6 +87,7 @@
 #define SERVER_DEF_TLS_LIFETIME(2 * 3600)
 #define SERVER_MIN_TLS_LIFETIME(60)
 #define SERVER_MAX_TLS_LIFETIME(24 * 3600)
+#define SERVER_DEFAULT_GZIP_STATIC 0
 
 #define MEDIATYPE_NAMEMAX  128 /* file name extension */
 #define MEDIATYPE_TYPEMAX  64  /* length of type/subtype */
@@ -546,6 +547,7 @@ struct server_config {
struct server_fcgiparams fcgiparams;
int  fcgistrip;
char errdocroot[HTTPD_ERRDOCROOT_MAX];
+   int  gzip_static;
 
TAILQ_ENTRY(server_config) entry;
 };
Index: parse.y
===
RCS file: /cvs/src/usr.sbin/httpd/parse.y,v
retrieving revision 1.127
diff -u -p -r1.127 parse.y
--- parse.y 24 Oct 2021 16:01:04 -  1.127
+++ parse.y 5 Nov 2021 14:04:22 -
@@ -141,7 +141,7 @@ typedef struct {
 %token TIMEOUT TLS TYPE TYPES HSTS MAXAGE SUBDOMAINS DEFAULT PRELOAD REQUEST
 %token ERROR INCLUDE AUTHENTICATE WITH BLOCK DROP RETURN PASS REWRITE
 %token CA CLIENT CRL OPTIONAL PARAM FORWARDED FOUND NOT
-%token ERRDOCS
+%token ERRDOCS GZIPSTATIC
 %token   STRING
 %token   NUMBER
 %type  port
@@ -553,6 +553,7 @@ serveroptsl : LISTEN ON STRING opttls po
| logformat
| fastcgi
| authenticate
+   | gzip_static
| filter
| LOCATION optfound optmatch STRING {
struct server   *s;
@@ -1217,6 +1218,14 @@ fcgiport : NUMBER{
}
;
 
+gzip_static: NO GZIPSTATIC {
+   srv->srv_conf.gzip_static = SERVER_DEFAULT_GZIP_STATIC;
+   }
+   | GZIPSTATIC {
+   srv->srv_conf.gzip_static = 1;
+   }
+   ;
+
 tcpip  : TCP '{' optnl tcpflags_l '}'
| TCP tcpflags
;
@@ -1441,6 +1450,7 @@ lookup(char *s)
{ "fastcgi",FCGI },
{ "forwarded",  FORWARDED },
{ "found",  FOUND },
+   { "gzip_static",GZIPSTATIC },
{ "hsts",   HSTS },
{ "include",

Re: [patch] httpd static gzip compression

2021-11-05 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Thu, Nov 04, 2021 at 08:27:47AM -0600:
> prx  wrote:
>> On 2021/11/04 14:21, prx wrote:

>>> The attached patch add support for static gzip compression.
>>> 
>>> In other words, if a client support gzip compression, when "file" is
>>> requested, httpd will check if "file.gz" is avaiable to serve.

>> This diff doesn't compress "on the fly".
>> It's up to the webmaster to compress files **before** serving them.

> Does any other program work this way?

Yes.  The man(1) program does.  At least on the vast majority of
Linux systems (at least those using the man-db implementation
of man(1)), on FreeBSD, and on DragonFly BSD.

Those systems store most manual pages as gzipped man(7) and mdoc(7)
files, and man(1) decompresses them every time a user wants to look
at one of them.  You say "man ls", and what you get is actually
/usr/share/man/man1/ls.1.gz or something like that.

For man(1), that is next to useless because du -sh /usr/share/man =
42.6M uncompressed.  But it has repeatedly caused bugs in the past.
I would love to remove the feature from mandoc, but even though it is
rarely used in OpenBSD (some ports installed gzipped manuals in the
past, but i think the ports tree has been clean now for quite some
time; you might still need the feature when installing software
or unpacking foreign manual page packages without using ports)
it would be a bad idea to remove it because it is too widely used
elsewhere.  Note that even the old BSD man(1) supported it.

> Where you request one filename, and it gives you another?

You did not ask what web servers do, but we are discussing a patch to
a webserver.  For this reason, let me note in passing that even some
otherwise extremely useful sites get it very wrong the other way round:

 $ ftp https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
Trying 130.89.148.77...
Requesting https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
100% |**|  8050   00:00
8050 bytes received in 0.00 seconds (11.74 MB/s)
 $ file ls.1.en.gz
ls.1.en.gz: troff or preprocessor input text
 $ grep -A 1 '^.SH NAME' ls.1.en.gz  
.SH NAME
ls \- list directory contents
 $ gunzip ls.1.en.gz
gunzip: ls.1.en.gz: unrecognized file format

> I have a difficult time understanding why gzip has to sneak it's way
> into everything.
> 
> I always prefer software that does precisely what I expect it to do.

Certainly.

I have no strong opinion whether a webserver qualifies as "everything",
though, nor did i look at the diff.  While all manpages are small in the
real world, some web servers may have to store huge amounts of data that
clients might request, so disk space might occasionally be an issue on
a web server even in 2021.  Also, some websites deliver huge amounts of
data to the client even when the user merely asked for some text (not sure
such sites would consider running OpenBSD httpd(8), but whatever :) - when
browsing the web, bandwidth is still occasionally an issue even in 2021,
even though that is a rather absurd fact.

Yours,
  Ingo



Re: UNIX sockets: use vnode(9) lock to protect `v_socket' dereference

2021-11-05 Thread Vitaliy Makkoveev
On Fri, Nov 05, 2021 at 08:31:07AM +0100, Martin Pieuchot wrote:
> On 26/10/21(Tue) 14:12, Vitaliy Makkoveev wrote:
> > Another step to make UNIX sockets locking fine grained.
> > 
> > The listening socket has the references from file descriptors layer and
> > from the vnode(9) layer. This means when we close(2)'ing such socket it
> > still referenced by concurrent thread through connect(2) path.
> > 
> > When we bind(2) UNIX socket we link it to vnode(9) by assigning
> > `v_socket'. When we connect(2)'ing socket to the socket we previously
> > bind(2)'ed we finding it by namei(9) and obtain it's reference through
> > `v_socket'. This socket has no extra reference in file descriptor
> > layer and could be closed by concurrent thread.
> > 
> > This time we have `unp_lock' rwlock(9) which protects the whole layer
> > and the dereference of `v_socket' is safe. But with the fine grained
> > locking the `v_socket' will not be protected by global lock. When we
> > obtain the vnode(9) by namei(9) in connect(9) or bind(9) paths it is
> > already exclusively locked by vlode(9) lock. But in unp_detach() which
> > is called on the close(2)'ing socket we assume `unp_lock' protects
> > `v_socket'.
> > 
> > I propose to use exclusive vnode(9) lock to protect `v_socket'. With the
> > fine grained locking, the `v_socket' dereference in unp_bind() or
> > unp_connect() threads will be safe because unp_detach() thread will wait
> > the vnode(9) lock release. The vnode referenced by `unp_vnod' has
> > reference counter bumped so it's dereference is also safe without
> > `unp_lock' held.
> 
> This makes sense to me.  Using the vnode lock here seems the simplest
> approach.
> 
> > The `i_lock' should be take before `unp_lock' and unp_detach() should
> > release solock(). To prevent connections on this socket the
> > 'SO_ACCEPTCONN' bit cleared in soclose().
> 
> This is done to prevent races when solock() is released inside soabort(),
> right?  Is it the only one or some more care is needed?
> Will this stay with per-socket locks or is this only necessary because of
> the global `unp_lock'?
> 

The only "binded" sockets has the associated vnode(9). We link them
when we perform unp_bind(). I used the "binded" instead of "listening"
because datagram sockets could also be binded but they are not
connection oriented.

soabort() kills the not accept(2)ed peers of connection oriented sockets
when we close "listening" socket. I used "listening" because the socket
was bind(2)ed and then marked as "assepting connections" by listen(2).

Since the unaccepted peers we kills with soabort() has no associated
vnode(9) we don't release `unp_lock' because we don't need to call
vrele(9).

This diff solves lock dances in unp_connect(). Our thread own `so' we
passed to unp_connect(). It's file descriptor has refcouter bumped and
the `so' can't be killed by concurrent thread. But we get `so2' the
"binded" socket from vnode(9).

This time the global `unp_lock' protects both sockets, but with the
per-socket locking the only `so' will be protected. So the `so2'
dereference will be unsafe because concurrent thread could kill it.

Also we could release `unp_lock' within unp_attach(). So unp_attach()
called by sonewconn() within unp_connect() could release `unp_lock' too.
The upcoming diff which moves unp_gc() stuff from `unp_lock' will
release `unp_lock' in unp_detach() to enforce lock order `unp_gc_lock'
-> `unp_lock'.

We keep vnode(9) locked when we link the socket with vnode(9) in
unp_bind(). We also keep vnode(9) locked when we connect sockets in
unp_connect(). So unp_detach() should also lock the vnode(9) because
vnode(9) lock protects `v_socket'.

This will be actual with the upcoming fine grained locks implementation.
This time this diff mostly makes `v_socket' protection consistent.

Could I commit this diff?

> > Index: sys/kern/uipc_socket.c
> > ===
> > RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> > retrieving revision 1.265
> > diff -u -p -r1.265 uipc_socket.c
> > --- sys/kern/uipc_socket.c  14 Oct 2021 23:05:10 -  1.265
> > +++ sys/kern/uipc_socket.c  26 Oct 2021 11:05:59 -
> > @@ -315,6 +315,8 @@ soclose(struct socket *so, int flags)
> > /* Revoke async IO early. There is a final revocation in sofree(). */
> > sigio_free(>so_sigio);
> > if (so->so_options & SO_ACCEPTCONN) {
> > +   so->so_options &= ~SO_ACCEPTCONN;
> > +
> > while ((so2 = TAILQ_FIRST(>so_q0)) != NULL) {
> > (void) soqremque(so2, 0);
> > (void) soabort(so2);
> > Index: sys/kern/uipc_usrreq.c
> > ===
> > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
> > retrieving revision 1.150
> > diff -u -p -r1.150 uipc_usrreq.c
> > --- sys/kern/uipc_usrreq.c  21 Oct 2021 22:11:07 -  1.150
> > +++ sys/kern/uipc_usrreq.c  26 Oct 2021 11:05:59 -
> > @@ -474,20 +474,30 @@ 

Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic

2021-11-05 Thread Vitaliy Makkoveev
On Fri, Nov 05, 2021 at 09:50:05AM +0100, Mark Kettenis wrote:
> > Date: Fri, 5 Nov 2021 07:18:03 +0100
> > From: Martin Pieuchot 
> > 
> > On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote:
> > > This completely removes global rwlock(9) from the unp_internalize() and
> > > unp_externalize() normal paths but only leaves it in unp_externalize()
> > > error path. Also we don't need to simultaneously hold both fdplock()
> > > and `unp_lock' in unp_internalize(). As non obvious profit this
> > > simplifies the future lock dances in the UNIX sockets layer.
> > > 
> > > It's safe to call fptounp() without `unp_lock' held. We always got this
> > > file descriptor by fd_getfile(9) so we always have the extra reference
> > > and this descriptor can't be closed by concurrent thread. Some sockets
> > > could be destroyed through 'PRU_ABORT' path but they don't have
> > > associated file descriptor and they are not accessible in the
> > > unp_internalize() path.
> > > 
> > > The `unp_file' access without `unp_lock' held is also safe. Each socket
> > > could have the only associated file descriptor and each file descriptor
> > > could have the only associated socket. We only assign `unp_file' in the
> > > unp_internalize() path where we got the socket by fd_getfile(9). This
> > > descriptor has the extra reference and couldn't be closed concurrently.
> > > We could override `unp_file' but with the same address because the
> > > associated file descriptor can't be changed so the address will be also
> > > the same. So while unp_gc() concurrently runs the dereference of
> > > non-NULL `unp_file' is always safe.
> > 
> > Using an atomic operation for `unp_msgcount' is ok with me, one comment
> > about `unp_rights' below.
> > 
> > [...] 
> > 
> > >  
> > > - rw_enter_write(_lock);
> > > - if (unp_rights + nfds > maxfiles / 10) {
> > > - rw_exit_write(_lock);
> > > + if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) {
> > > + atomic_sub_int(_rights, nfds);
> > 
> > I can't believe this is race free. If two threads, T1 and T2, call
> > atomic_add at the same time both might end up returning EMFILE even
> > if only the first one currently does.  This could happen if T1 exceeds
> > the limit and T2 does atomic_add on an already-exceeded `unp_rights'
> > before T1 could do atomic_sub.
> > 
> > I suggest using a mutex to protect `unp_rights' instead to solve this
> > issue.
> 
> Yes, that would be better.  Otherwise it would be trivial to DOS
> anything that does file descriptor passing.
>

Thanks for pointing, this sounds reasonable. The updated diff introduces
`unp_rights_mtx' mutex(9) to protect `unp_rights'.

Index: sys/kern/uipc_usrreq.c
===
RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.153
diff -u -p -r1.153 uipc_usrreq.c
--- sys/kern/uipc_usrreq.c  30 Oct 2021 16:35:31 -  1.153
+++ sys/kern/uipc_usrreq.c  5 Nov 2021 12:56:41 -
@@ -52,14 +52,19 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
  * Locks used to protect global data and struct members:
  *  I   immutable after creation
  *  U   unp_lock
+ *  R   unp_rights_mtx
+ *  a   atomic
  */
+
 struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock");
+struct mutex unp_rights_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
 
 /*
  * Stack of sets of files that were passed over a socket but were
@@ -99,7 +104,7 @@ SLIST_HEAD(,unp_deferral)unp_deferred =
SLIST_HEAD_INITIALIZER(unp_deferred);
 
 ino_t  unp_ino;/* [U] prototype for fake inode numbers */
-intunp_rights; /* [U] file descriptors in flight */
+intunp_rights; /* [R] file descriptors in flight */
 intunp_defer;  /* [U] number of deferred fp to close by the GC task */
 intunp_gcing;  /* [U] GC task currently running */
 
@@ -927,17 +932,18 @@ restart:
 */
rp = (struct fdpass *)CMSG_DATA(cm);
 
-   rw_enter_write(_lock);
for (i = 0; i < nfds; i++) {
struct unpcb *unp;
 
fp = rp->fp;
rp++;
if ((unp = fptounp(fp)) != NULL)
-   unp->unp_msgcount--;
-   unp_rights--;
+   atomic_dec_long(>unp_msgcount);
}
-   rw_exit_write(_lock);
+
+   mtx_enter(_rights_mtx);
+   unp_rights -= nfds;
+   mtx_leave(_rights_mtx);
 
/*
 * Copy temporary array to message and adjust length, in case of
@@ -985,13 +991,13 @@ unp_internalize(struct mbuf *control, st
return (EINVAL);
nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int);
 
-   rw_enter_write(_lock);
+   mtx_enter(_rights_mtx);
if (unp_rights + nfds > maxfiles / 10) {
-   rw_exit_write(_lock);
+   mtx_leave(_rights_mtx);
return (EMFILE);
}
unp_rights += nfds;
-   rw_exit_write(_lock);
+ 

Re: speedup io marshal in rpki-client

2021-11-05 Thread Job Snijders
On Fri, Nov 05, 2021 at 09:18:15AM +0100, Claudio Jeker wrote:
> Noticed the other day. The ip addr arrays and as number array are
> marshalled element by element which is not very efficent.
> All the data is in one big blob of memory so just use the basic io
> operations for a memory blob and ship the full array at once.
> 
> This seems to reduce runtime by 5-10% (in my unscientific testing).
> Also it makes the code a fair bit simpler.

OK job@



Re: speedup io marshal in rpki-client

2021-11-05 Thread Sebastian Benoit
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.11.05 09:18:15 +0100:
> Noticed the other day. The ip addr arrays and as number array are
> marshalled element by element which is not very efficent.
> All the data is in one big blob of memory so just use the basic io
> operations for a memory blob and ship the full array at once.
> 
> This seems to reduce runtime by 5-10% (in my unscientific testing).
> Also it makes the code a fair bit simpler.

can't test right now, but i think all the multiplications are correct.
You dropped some asserts(), but i guess thats fine.

ok benno@

> -- 
> :wq Claudio
> 
> Index: cert.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/cert.c,v
> retrieving revision 1.46
> diff -u -p -r1.46 cert.c
> --- cert.c4 Nov 2021 11:32:55 -   1.46
> +++ cert.c5 Nov 2021 07:53:50 -
> @@ -1225,34 +1225,6 @@ cert_free(struct cert *p)
>   free(p);
>  }
>  
> -static void
> -cert_ip_buffer(struct ibuf *b, const struct cert_ip *p)
> -{
> - io_simple_buffer(b, >afi, sizeof(enum afi));
> - io_simple_buffer(b, >type, sizeof(enum cert_ip_type));
> -
> - if (p->type != CERT_IP_INHERIT) {
> - io_simple_buffer(b, >min, sizeof(p->min));
> - io_simple_buffer(b, >max, sizeof(p->max));
> - }
> -
> - if (p->type == CERT_IP_RANGE)
> - ip_addr_range_buffer(b, >range);
> - else if (p->type == CERT_IP_ADDR)
> - ip_addr_buffer(b, >ip);
> -}
> -
> -static void
> -cert_as_buffer(struct ibuf *b, const struct cert_as *p)
> -{
> - io_simple_buffer(b, >type, sizeof(enum cert_as_type));
> - if (p->type == CERT_AS_RANGE) {
> - io_simple_buffer(b, >range.min, sizeof(uint32_t));
> - io_simple_buffer(b, >range.max, sizeof(uint32_t));
> - } else if (p->type == CERT_AS_ID)
> - io_simple_buffer(b, >id, sizeof(uint32_t));
> -}
> -
>  /*
>   * Write certificate parsed content into buffer.
>   * See cert_read() for the other side of the pipe.
> @@ -1260,18 +1232,15 @@ cert_as_buffer(struct ibuf *b, const str
>  void
>  cert_buffer(struct ibuf *b, const struct cert *p)
>  {
> - size_t   i;
> -
>   io_simple_buffer(b, >expires, sizeof(p->expires));
>   io_simple_buffer(b, >purpose, sizeof(p->purpose));
>   io_simple_buffer(b, >talid, sizeof(p->talid));
>   io_simple_buffer(b, >ipsz, sizeof(p->ipsz));
> - for (i = 0; i < p->ipsz; i++)
> - cert_ip_buffer(b, >ips[i]);
> -
>   io_simple_buffer(b, >asz, sizeof(p->asz));
> - for (i = 0; i < p->asz; i++)
> - cert_as_buffer(b, >as[i]);
> +
> + io_simple_buffer(b, p->ips, p->ipsz * sizeof(p->ips[0]));
> + io_simple_buffer(b, p->as, p->asz * sizeof(p->as[0]));
> +
>   io_str_buffer(b, p->mft);
>   io_str_buffer(b, p->notify);
>   io_str_buffer(b, p->repo);
> @@ -1282,34 +1251,6 @@ cert_buffer(struct ibuf *b, const struct
>   io_str_buffer(b, p->pubkey);
>  }
>  
> -static void
> -cert_ip_read(struct ibuf *b, struct cert_ip *p)
> -{
> - io_read_buf(b, >afi, sizeof(enum afi));
> - io_read_buf(b, >type, sizeof(enum cert_ip_type));
> -
> - if (p->type != CERT_IP_INHERIT) {
> - io_read_buf(b, >min, sizeof(p->min));
> - io_read_buf(b, >max, sizeof(p->max));
> - }
> -
> - if (p->type == CERT_IP_RANGE)
> - ip_addr_range_read(b, >range);
> - else if (p->type == CERT_IP_ADDR)
> - ip_addr_read(b, >ip);
> -}
> -
> -static void
> -cert_as_read(struct ibuf *b, struct cert_as *p)
> -{
> - io_read_buf(b, >type, sizeof(enum cert_as_type));
> - if (p->type == CERT_AS_RANGE) {
> - io_read_buf(b, >range.min, sizeof(uint32_t));
> - io_read_buf(b, >range.max, sizeof(uint32_t));
> - } else if (p->type == CERT_AS_ID)
> - io_read_buf(b, >id, sizeof(uint32_t));
> -}
> -
>  /*
>   * Allocate and read parsed certificate content from descriptor.
>   * The pointer must be freed with cert_free().
> @@ -1319,7 +1260,6 @@ struct cert *
>  cert_read(struct ibuf *b)
>  {
>   struct cert *p;
> - size_t   i;
>  
>   if ((p = calloc(1, sizeof(struct cert))) == NULL)
>   err(1, NULL);
> @@ -1328,19 +1268,17 @@ cert_read(struct ibuf *b)
>   io_read_buf(b, >purpose, sizeof(p->purpose));
>   io_read_buf(b, >talid, sizeof(p->talid));
>   io_read_buf(b, >ipsz, sizeof(p->ipsz));
> + io_read_buf(b, >asz, sizeof(p->asz));
>  
>   p->ips = calloc(p->ipsz, sizeof(struct cert_ip));
>   if (p->ips == NULL)
>   err(1, NULL);
> - for (i = 0; i < p->ipsz; i++)
> - cert_ip_read(b, >ips[i]);
> + io_read_buf(b, p->ips, p->ipsz * sizeof(p->ips[0]));
>  
> - io_read_buf(b, >asz, sizeof(p->asz));
>   p->as = calloc(p->asz, sizeof(struct cert_as));
>   if (p->as == NULL)
>   err(1, NULL);
> - for (i = 0; i < p->asz; i++)
> -   

poll/select: Lazy removal of knotes

2021-11-05 Thread Martin Pieuchot
New poll/select(2) implementation convert 'struct pollfd' and 'fdset' to
knotes (kqueue event descriptors) then pass them to the kqueue subsystem.
A knote is allocated, with kqueue_register(), for every read, write and
except condition watched on a given FD.  That means at most 3 allocations
might be necessary per FD.

The diff below reduce the overhead of per-syscall allocation/free of those
descriptors by leaving those which didn't trigger on the kqueue across
syscall.  Leaving knotes on the kqueue allows kqueue_register() to re-use
existing descriptor instead of re-allocating a new one.

With this knotes are now lazily removed.  The mechanism uses a serial
number which is incremented for every syscall that indicates if a knote
sitting in the kqueue is still valid or should be freed.

Note that performance improvements might not be visible with this diff
alone because kqueue_register() still pre-allocate a descriptor then drop
it.

visa@ already pointed out that the lazy removal logic could be integrated
in kqueue_scan() which would reduce the complexity of those two syscalls.
I'm arguing for doing this in a next step in-tree.

Please test and review :)

Index: kern/sys_generic.c
===
RCS file: /cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.139
diff -u -p -r1.139 sys_generic.c
--- kern/sys_generic.c  29 Oct 2021 15:52:44 -  1.139
+++ kern/sys_generic.c  5 Nov 2021 08:11:05 -
@@ -598,7 +598,7 @@ sys_pselect(struct proc *p, void *v, reg
 
 int
 dopselect(struct proc *p, int nd, fd_set *in, fd_set *ou, fd_set *ex,
-struct timespec *timeout, const sigset_t *sigmask, register_t *retval)
+struct timespec *tsp, const sigset_t *sigmask, register_t *retval)
 {
struct kqueue_scan_state scan;
fd_mask bits[6];
@@ -666,10 +666,10 @@ dopselect(struct proc *p, int nd, fd_set
if (nevents == 0 && ncollected == 0) {
uint64_t nsecs = INFSLP;
 
-   if (timeout != NULL) {
-   if (!timespecisset(timeout))
+   if (tsp != NULL) {
+   if (!timespecisset(tsp))
goto done;
-   nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP));
+   nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(tsp), MAXTSLP));
}
error = tsleep_nsec(>p_kq, PSOCK | PCATCH, "kqsel", nsecs);
/* select is not restarted after signals... */
@@ -682,28 +682,37 @@ dopselect(struct proc *p, int nd, fd_set
 
/* Collect at most `nevents' possibly waiting in kqueue_scan() */
kqueue_scan_setup(, p->p_kq);
-   while (nevents > 0) {
+   while ((nevents - ncollected) > 0) {
struct kevent kev[KQ_NEVENTS];
int i, ready, count;
 
-   /* Maximum number of events per iteration */
-   count = MIN(nitems(kev), nevents);
-   ready = kqueue_scan(, count, kev, timeout, p, );
+   /*
+* Maximum number of events per iteration.  Use the whole
+* array to gather as many spurious events as possible.
+*/
+   count = nitems(kev);
+   ready = kqueue_scan(, count, kev, tsp, p, );
 #ifdef KTRACE
if (KTRPOINT(p, KTR_STRUCT))
ktrevent(p, kev, ready);
 #endif
-   /* Convert back events that are ready. */
+   /* Convert back events that are ready/delete spurious ones. */
for (i = 0; i < ready && error == 0; i++)
error = pselcollect(p, [i], pobits, );
+
/*
-* Stop if there was an error or if we had enough
-* space to collect all events that were ready.
+* Stop if there was an error or if we had enough space
+* to collect all non-spurious events that were ready.
 */
-   if (error || ready < count)
+   if (error || !ready || (ncollected > 0 && ready < count))
break;
 
-   nevents -= ready;
+   /*
+* If we only got spurious events try again repositioning
+* the marker.
+*/
+   if (ncollected == 0 && ((tsp == NULL) || timespecisset(tsp)))
+   scan.kqs_nevent = 0;
}
kqueue_scan_finish();
*retval = ncollected;
@@ -730,7 +739,7 @@ done:
if (pibits[0] != (fd_set *)[0])
free(pibits[0], M_TEMP, 6 * ni);
 
-   kqueue_purge(p, p->p_kq);
+   /* Needed to remove events lazily. */
p->p_kq_serial += nd;
 
return (error);
@@ -759,7 +768,7 @@ pselregister(struct proc *p, fd_set *pib
DPRINTFN(2, "select fd %d mask %d serial %lu\n",
fd, msk, p->p_kq_serial);

Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic

2021-11-05 Thread Mark Kettenis
> Date: Fri, 5 Nov 2021 07:18:03 +0100
> From: Martin Pieuchot 
> 
> On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote:
> > This completely removes global rwlock(9) from the unp_internalize() and
> > unp_externalize() normal paths but only leaves it in unp_externalize()
> > error path. Also we don't need to simultaneously hold both fdplock()
> > and `unp_lock' in unp_internalize(). As non obvious profit this
> > simplifies the future lock dances in the UNIX sockets layer.
> > 
> > It's safe to call fptounp() without `unp_lock' held. We always got this
> > file descriptor by fd_getfile(9) so we always have the extra reference
> > and this descriptor can't be closed by concurrent thread. Some sockets
> > could be destroyed through 'PRU_ABORT' path but they don't have
> > associated file descriptor and they are not accessible in the
> > unp_internalize() path.
> > 
> > The `unp_file' access without `unp_lock' held is also safe. Each socket
> > could have the only associated file descriptor and each file descriptor
> > could have the only associated socket. We only assign `unp_file' in the
> > unp_internalize() path where we got the socket by fd_getfile(9). This
> > descriptor has the extra reference and couldn't be closed concurrently.
> > We could override `unp_file' but with the same address because the
> > associated file descriptor can't be changed so the address will be also
> > the same. So while unp_gc() concurrently runs the dereference of
> > non-NULL `unp_file' is always safe.
> 
> Using an atomic operation for `unp_msgcount' is ok with me, one comment
> about `unp_rights' below.
> 
> > Index: sys/kern/uipc_usrreq.c
> > ===
> > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
> > retrieving revision 1.153
> > diff -u -p -r1.153 uipc_usrreq.c
> > --- sys/kern/uipc_usrreq.c  30 Oct 2021 16:35:31 -  1.153
> > +++ sys/kern/uipc_usrreq.c  30 Oct 2021 18:41:25 -
> > @@ -58,6 +58,7 @@
> >   * Locks used to protect global data and struct members:
> >   *  I   immutable after creation
> >   *  U   unp_lock
> > + *  a   atomic
> >   */
> >  struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock");
> >  
> > @@ -99,7 +100,7 @@ SLIST_HEAD(,unp_deferral)unp_deferred =
> > SLIST_HEAD_INITIALIZER(unp_deferred);
> >  
> >  ino_t  unp_ino;/* [U] prototype for fake inode numbers */
> > -intunp_rights; /* [U] file descriptors in flight */
> > +intunp_rights; /* [a] file descriptors in flight */
> >  intunp_defer;  /* [U] number of deferred fp to close by the GC 
> > task */
> >  intunp_gcing;  /* [U] GC task currently running */
> >  
> > @@ -927,17 +928,16 @@ restart:
> >  */
> > rp = (struct fdpass *)CMSG_DATA(cm);
> >  
> > -   rw_enter_write(_lock);
> > for (i = 0; i < nfds; i++) {
> > struct unpcb *unp;
> >  
> > fp = rp->fp;
> > rp++;
> > if ((unp = fptounp(fp)) != NULL)
> > -   unp->unp_msgcount--;
> > -   unp_rights--;
> > +   atomic_dec_long(>unp_msgcount);
> > }
> > -   rw_exit_write(_lock);
> > +
> > +   atomic_sub_int(_rights, nfds);
> >  
> > /*
> >  * Copy temporary array to message and adjust length, in case of
> > @@ -985,13 +985,10 @@ unp_internalize(struct mbuf *control, st
> > return (EINVAL);
> > nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int);
> >  
> > -   rw_enter_write(_lock);
> > -   if (unp_rights + nfds > maxfiles / 10) {
> > -   rw_exit_write(_lock);
> > +   if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) {
> > +   atomic_sub_int(_rights, nfds);
> 
> I can't believe this is race free. If two threads, T1 and T2, call
> atomic_add at the same time both might end up returning EMFILE even
> if only the first one currently does.  This could happen if T1 exceeds
> the limit and T2 does atomic_add on an already-exceeded `unp_rights'
> before T1 could do atomic_sub.
> 
> I suggest using a mutex to protect `unp_rights' instead to solve this
> issue.

Yes, that would be better.  Otherwise it would be trivial to DOS
anything that does file descriptor passing.

> > return (EMFILE);
> > }
> > -   unp_rights += nfds;
> > -   rw_exit_write(_lock);
> >  
> > /* Make sure we have room for the struct file pointers */
> >  morespace:
> > @@ -1031,7 +1028,6 @@ morespace:
> > ip = ((int *)CMSG_DATA(cm)) + nfds - 1;
> > rp = ((struct fdpass *)CMSG_DATA(cm)) + nfds - 1;
> > fdplock(fdp);
> > -   rw_enter_write(_lock);
> > for (i = 0; i < nfds; i++) {
> > memcpy(, ip, sizeof fd);
> > ip--;
> > @@ -1056,15 +1052,13 @@ morespace:
> > rp->flags = fdp->fd_ofileflags[fd] & UF_PLEDGED;
> > rp--;
> > if ((unp = fptounp(fp)) != NULL) {
> > +   atomic_inc_long(>unp_msgcount);
> >

speedup io marshal in rpki-client

2021-11-05 Thread Claudio Jeker
Noticed the other day. The ip addr arrays and as number array are
marshalled element by element which is not very efficent.
All the data is in one big blob of memory so just use the basic io
operations for a memory blob and ship the full array at once.

This seems to reduce runtime by 5-10% (in my unscientific testing).
Also it makes the code a fair bit simpler.
-- 
:wq Claudio

Index: cert.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/cert.c,v
retrieving revision 1.46
diff -u -p -r1.46 cert.c
--- cert.c  4 Nov 2021 11:32:55 -   1.46
+++ cert.c  5 Nov 2021 07:53:50 -
@@ -1225,34 +1225,6 @@ cert_free(struct cert *p)
free(p);
 }
 
-static void
-cert_ip_buffer(struct ibuf *b, const struct cert_ip *p)
-{
-   io_simple_buffer(b, >afi, sizeof(enum afi));
-   io_simple_buffer(b, >type, sizeof(enum cert_ip_type));
-
-   if (p->type != CERT_IP_INHERIT) {
-   io_simple_buffer(b, >min, sizeof(p->min));
-   io_simple_buffer(b, >max, sizeof(p->max));
-   }
-
-   if (p->type == CERT_IP_RANGE)
-   ip_addr_range_buffer(b, >range);
-   else if (p->type == CERT_IP_ADDR)
-   ip_addr_buffer(b, >ip);
-}
-
-static void
-cert_as_buffer(struct ibuf *b, const struct cert_as *p)
-{
-   io_simple_buffer(b, >type, sizeof(enum cert_as_type));
-   if (p->type == CERT_AS_RANGE) {
-   io_simple_buffer(b, >range.min, sizeof(uint32_t));
-   io_simple_buffer(b, >range.max, sizeof(uint32_t));
-   } else if (p->type == CERT_AS_ID)
-   io_simple_buffer(b, >id, sizeof(uint32_t));
-}
-
 /*
  * Write certificate parsed content into buffer.
  * See cert_read() for the other side of the pipe.
@@ -1260,18 +1232,15 @@ cert_as_buffer(struct ibuf *b, const str
 void
 cert_buffer(struct ibuf *b, const struct cert *p)
 {
-   size_t   i;
-
io_simple_buffer(b, >expires, sizeof(p->expires));
io_simple_buffer(b, >purpose, sizeof(p->purpose));
io_simple_buffer(b, >talid, sizeof(p->talid));
io_simple_buffer(b, >ipsz, sizeof(p->ipsz));
-   for (i = 0; i < p->ipsz; i++)
-   cert_ip_buffer(b, >ips[i]);
-
io_simple_buffer(b, >asz, sizeof(p->asz));
-   for (i = 0; i < p->asz; i++)
-   cert_as_buffer(b, >as[i]);
+
+   io_simple_buffer(b, p->ips, p->ipsz * sizeof(p->ips[0]));
+   io_simple_buffer(b, p->as, p->asz * sizeof(p->as[0]));
+
io_str_buffer(b, p->mft);
io_str_buffer(b, p->notify);
io_str_buffer(b, p->repo);
@@ -1282,34 +1251,6 @@ cert_buffer(struct ibuf *b, const struct
io_str_buffer(b, p->pubkey);
 }
 
-static void
-cert_ip_read(struct ibuf *b, struct cert_ip *p)
-{
-   io_read_buf(b, >afi, sizeof(enum afi));
-   io_read_buf(b, >type, sizeof(enum cert_ip_type));
-
-   if (p->type != CERT_IP_INHERIT) {
-   io_read_buf(b, >min, sizeof(p->min));
-   io_read_buf(b, >max, sizeof(p->max));
-   }
-
-   if (p->type == CERT_IP_RANGE)
-   ip_addr_range_read(b, >range);
-   else if (p->type == CERT_IP_ADDR)
-   ip_addr_read(b, >ip);
-}
-
-static void
-cert_as_read(struct ibuf *b, struct cert_as *p)
-{
-   io_read_buf(b, >type, sizeof(enum cert_as_type));
-   if (p->type == CERT_AS_RANGE) {
-   io_read_buf(b, >range.min, sizeof(uint32_t));
-   io_read_buf(b, >range.max, sizeof(uint32_t));
-   } else if (p->type == CERT_AS_ID)
-   io_read_buf(b, >id, sizeof(uint32_t));
-}
-
 /*
  * Allocate and read parsed certificate content from descriptor.
  * The pointer must be freed with cert_free().
@@ -1319,7 +1260,6 @@ struct cert *
 cert_read(struct ibuf *b)
 {
struct cert *p;
-   size_t   i;
 
if ((p = calloc(1, sizeof(struct cert))) == NULL)
err(1, NULL);
@@ -1328,19 +1268,17 @@ cert_read(struct ibuf *b)
io_read_buf(b, >purpose, sizeof(p->purpose));
io_read_buf(b, >talid, sizeof(p->talid));
io_read_buf(b, >ipsz, sizeof(p->ipsz));
+   io_read_buf(b, >asz, sizeof(p->asz));
 
p->ips = calloc(p->ipsz, sizeof(struct cert_ip));
if (p->ips == NULL)
err(1, NULL);
-   for (i = 0; i < p->ipsz; i++)
-   cert_ip_read(b, >ips[i]);
+   io_read_buf(b, p->ips, p->ipsz * sizeof(p->ips[0]));
 
-   io_read_buf(b, >asz, sizeof(p->asz));
p->as = calloc(p->asz, sizeof(struct cert_as));
if (p->as == NULL)
err(1, NULL);
-   for (i = 0; i < p->asz; i++)
-   cert_as_read(b, >as[i]);
+   io_read_buf(b, p->as, p->asz * sizeof(p->as[0]));
 
io_read_str(b, >mft);
io_read_str(b, >notify);
Index: extern.h
===
RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
retrieving revision 1.93
diff -u -p -r1.93 

Re: UNIX sockets: use vnode(9) lock to protect `v_socket' dereference

2021-11-05 Thread Martin Pieuchot
On 26/10/21(Tue) 14:12, Vitaliy Makkoveev wrote:
> Another step to make UNIX sockets locking fine grained.
> 
> The listening socket has the references from file descriptors layer and
> from the vnode(9) layer. This means when we close(2)'ing such socket it
> still referenced by concurrent thread through connect(2) path.
> 
> When we bind(2) UNIX socket we link it to vnode(9) by assigning
> `v_socket'. When we connect(2)'ing socket to the socket we previously
> bind(2)'ed we finding it by namei(9) and obtain it's reference through
> `v_socket'. This socket has no extra reference in file descriptor
> layer and could be closed by concurrent thread.
> 
> This time we have `unp_lock' rwlock(9) which protects the whole layer
> and the dereference of `v_socket' is safe. But with the fine grained
> locking the `v_socket' will not be protected by global lock. When we
> obtain the vnode(9) by namei(9) in connect(9) or bind(9) paths it is
> already exclusively locked by vlode(9) lock. But in unp_detach() which
> is called on the close(2)'ing socket we assume `unp_lock' protects
> `v_socket'.
> 
> I propose to use exclusive vnode(9) lock to protect `v_socket'. With the
> fine grained locking, the `v_socket' dereference in unp_bind() or
> unp_connect() threads will be safe because unp_detach() thread will wait
> the vnode(9) lock release. The vnode referenced by `unp_vnod' has
> reference counter bumped so it's dereference is also safe without
> `unp_lock' held.

This makes sense to me.  Using the vnode lock here seems the simplest
approach.

> The `i_lock' should be take before `unp_lock' and unp_detach() should
> release solock(). To prevent connections on this socket the
> 'SO_ACCEPTCONN' bit cleared in soclose().

This is done to prevent races when solock() is released inside soabort(),
right?  Is it the only one or some more care is needed?
Will this stay with per-socket locks or is this only necessary because of
the global `unp_lock'?

> Index: sys/kern/uipc_socket.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> retrieving revision 1.265
> diff -u -p -r1.265 uipc_socket.c
> --- sys/kern/uipc_socket.c14 Oct 2021 23:05:10 -  1.265
> +++ sys/kern/uipc_socket.c26 Oct 2021 11:05:59 -
> @@ -315,6 +315,8 @@ soclose(struct socket *so, int flags)
>   /* Revoke async IO early. There is a final revocation in sofree(). */
>   sigio_free(>so_sigio);
>   if (so->so_options & SO_ACCEPTCONN) {
> + so->so_options &= ~SO_ACCEPTCONN;
> +
>   while ((so2 = TAILQ_FIRST(>so_q0)) != NULL) {
>   (void) soqremque(so2, 0);
>   (void) soabort(so2);
> Index: sys/kern/uipc_usrreq.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
> retrieving revision 1.150
> diff -u -p -r1.150 uipc_usrreq.c
> --- sys/kern/uipc_usrreq.c21 Oct 2021 22:11:07 -  1.150
> +++ sys/kern/uipc_usrreq.c26 Oct 2021 11:05:59 -
> @@ -474,20 +474,30 @@ void
>  unp_detach(struct unpcb *unp)
>  {
>   struct socket *so = unp->unp_socket;
> - struct vnode *vp = NULL;
> + struct vnode *vp = unp->unp_vnode;
>  
>   rw_assert_wrlock(_lock);
>  
>   LIST_REMOVE(unp, unp_link);
> - if (unp->unp_vnode) {
> +
> + if (vp) {
> + unp->unp_vnode = NULL;
> +
>   /*
> -  * `v_socket' is only read in unp_connect and
> -  * unplock prevents concurrent access.
> +  * Enforce `i_lock' -> `unp_lock' because fifo
> +  * subsystem requires it.
>*/
>  
> - unp->unp_vnode->v_socket = NULL;
> - vp = unp->unp_vnode;
> - unp->unp_vnode = NULL;
> + sounlock(so, SL_LOCKED);
> +
> + VOP_LOCK(vp, LK_EXCLUSIVE);
> + vp->v_socket = NULL;
> +
> + KERNEL_LOCK();
> + vput(vp);
> + KERNEL_UNLOCK();
> +
> + solock(so);
>   }
>  
>   if (unp->unp_conn)
> @@ -500,21 +510,6 @@ unp_detach(struct unpcb *unp)
>   pool_put(_pool, unp);
>   if (unp_rights)
>   task_add(systqmp, _gc_task);
> -
> - if (vp != NULL) {
> - /*
> -  * Enforce `i_lock' -> `unplock' because fifo subsystem
> -  * requires it. The socket can't be closed concurrently
> -  * because the file descriptor reference is
> -  * still hold.
> -  */
> -
> - sounlock(so, SL_LOCKED);
> - KERNEL_LOCK();
> - vrele(vp);
> - KERNEL_UNLOCK();
> - solock(so);
> - }
>  }
>  
>  int
> 



Re: UNIX sockets: make `unp_rights', `unp_msgcount' and `unp_file' atomic

2021-11-05 Thread Martin Pieuchot
On 30/10/21(Sat) 21:22, Vitaliy Makkoveev wrote:
> This completely removes global rwlock(9) from the unp_internalize() and
> unp_externalize() normal paths but only leaves it in unp_externalize()
> error path. Also we don't need to simultaneously hold both fdplock()
> and `unp_lock' in unp_internalize(). As non obvious profit this
> simplifies the future lock dances in the UNIX sockets layer.
> 
> It's safe to call fptounp() without `unp_lock' held. We always got this
> file descriptor by fd_getfile(9) so we always have the extra reference
> and this descriptor can't be closed by concurrent thread. Some sockets
> could be destroyed through 'PRU_ABORT' path but they don't have
> associated file descriptor and they are not accessible in the
> unp_internalize() path.
> 
> The `unp_file' access without `unp_lock' held is also safe. Each socket
> could have the only associated file descriptor and each file descriptor
> could have the only associated socket. We only assign `unp_file' in the
> unp_internalize() path where we got the socket by fd_getfile(9). This
> descriptor has the extra reference and couldn't be closed concurrently.
> We could override `unp_file' but with the same address because the
> associated file descriptor can't be changed so the address will be also
> the same. So while unp_gc() concurrently runs the dereference of
> non-NULL `unp_file' is always safe.

Using an atomic operation for `unp_msgcount' is ok with me, one comment
about `unp_rights' below.

> Index: sys/kern/uipc_usrreq.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
> retrieving revision 1.153
> diff -u -p -r1.153 uipc_usrreq.c
> --- sys/kern/uipc_usrreq.c30 Oct 2021 16:35:31 -  1.153
> +++ sys/kern/uipc_usrreq.c30 Oct 2021 18:41:25 -
> @@ -58,6 +58,7 @@
>   * Locks used to protect global data and struct members:
>   *  I   immutable after creation
>   *  U   unp_lock
> + *  a   atomic
>   */
>  struct rwlock unp_lock = RWLOCK_INITIALIZER("unplock");
>  
> @@ -99,7 +100,7 @@ SLIST_HEAD(,unp_deferral)  unp_deferred =
>   SLIST_HEAD_INITIALIZER(unp_deferred);
>  
>  ino_tunp_ino;/* [U] prototype for fake inode numbers */
> -int  unp_rights; /* [U] file descriptors in flight */
> +int  unp_rights; /* [a] file descriptors in flight */
>  int  unp_defer;  /* [U] number of deferred fp to close by the GC task */
>  int  unp_gcing;  /* [U] GC task currently running */
>  
> @@ -927,17 +928,16 @@ restart:
>*/
>   rp = (struct fdpass *)CMSG_DATA(cm);
>  
> - rw_enter_write(_lock);
>   for (i = 0; i < nfds; i++) {
>   struct unpcb *unp;
>  
>   fp = rp->fp;
>   rp++;
>   if ((unp = fptounp(fp)) != NULL)
> - unp->unp_msgcount--;
> - unp_rights--;
> + atomic_dec_long(>unp_msgcount);
>   }
> - rw_exit_write(_lock);
> +
> + atomic_sub_int(_rights, nfds);
>  
>   /*
>* Copy temporary array to message and adjust length, in case of
> @@ -985,13 +985,10 @@ unp_internalize(struct mbuf *control, st
>   return (EINVAL);
>   nfds = (cm->cmsg_len - CMSG_ALIGN(sizeof(*cm))) / sizeof (int);
>  
> - rw_enter_write(_lock);
> - if (unp_rights + nfds > maxfiles / 10) {
> - rw_exit_write(_lock);
> + if (atomic_add_int_nv(_rights, nfds) > maxfiles / 10) {
> + atomic_sub_int(_rights, nfds);

I can't believe this is race free. If two threads, T1 and T2, call
atomic_add at the same time both might end up returning EMFILE even
if only the first one currently does.  This could happen if T1 exceeds
the limit and T2 does atomic_add on an already-exceeded `unp_rights'
before T1 could do atomic_sub.

I suggest using a mutex to protect `unp_rights' instead to solve this
issue.

>   return (EMFILE);
>   }
> - unp_rights += nfds;
> - rw_exit_write(_lock);
>  
>   /* Make sure we have room for the struct file pointers */
>  morespace:
> @@ -1031,7 +1028,6 @@ morespace:
>   ip = ((int *)CMSG_DATA(cm)) + nfds - 1;
>   rp = ((struct fdpass *)CMSG_DATA(cm)) + nfds - 1;
>   fdplock(fdp);
> - rw_enter_write(_lock);
>   for (i = 0; i < nfds; i++) {
>   memcpy(, ip, sizeof fd);
>   ip--;
> @@ -1056,15 +1052,13 @@ morespace:
>   rp->flags = fdp->fd_ofileflags[fd] & UF_PLEDGED;
>   rp--;
>   if ((unp = fptounp(fp)) != NULL) {
> + atomic_inc_long(>unp_msgcount);
>   unp->unp_file = fp;
> - unp->unp_msgcount++;
>   }
>   }
> - rw_exit_write(_lock);
>   fdpunlock(fdp);
>   return (0);
>  fail:
> - rw_exit_write(_lock);
>   fdpunlock(fdp);
>   if (fp != NULL)
>   FRELE(fp, p);
> @@ -1072,17 +1066,13 @@ fail:
>   for ( ; i >