Re: vn_open, EDUPFD, EMOVEFD

2021-06-29 Thread Christos Zoulas
In article ,
David Holland   wrote:
>
>* Does anyone know why dev/kloader.c calls namei and then vn_open on
>the same path? I remember seeing this before and leaving it alone
>because nobody I could find was sure what the deal was, but it's
>unlikely to work as-is and increasingly likely to break over time...

Just fix it, and if something breaks we'll put it back.

>diff -r 4c2d0a182ef8 external/cddl/osnet/sys/sys/vnode.h
>--- a/external/cddl/osnet/sys/sys/vnode.h  Sun Jun 27 18:13:54 2021 -0400
>+++ b/external/cddl/osnet/sys/sys/vnode.h  Mon Jun 28 11:05:45 2021 -0400
>@@ -239,7 +239,6 @@
> vnode_t **vpp, enum create crwhy, mode_t umask)
> {
>   struct pathbuf *pb;
>-  struct nameidata nd;
>   int error;
> 
>   ASSERT(seg == UIO_SYSSPACE);
>@@ -248,11 +247,9 @@
>   ASSERT(umask == 0);
> 
>   pb = pathbuf_create(pnamep);
>-  NDINIT(, LOOKUP, NOFOLLOW, pb);
>-  error = vn_open(, filemode, createmode);
>+  error = vn_open(NULL, pb, 0, filemode, createmode, vpp, NULL, NULL);

This is the only NOFOLLOW NDINIT case, should that be 'createmode | O_NOFOLLOW'?


>-  NDINIT(, CREATE, LOCKPARENT, pb);
>   
>   /*
>* Since we do not hold ulfs_extattr_uepm_lock anymore,
>* another thread may race with us for backend creation,
>-   * but only one can succeed here thanks to O_EXCL
>+   * but only one can succeed here thanks to O_EXCL.
>+   *
>+   * backing_vp is the backing store. 
>*/
>-  error = vn_open(, O_CREAT|O_EXCL|O_RDWR, 0600);
>+  error = vn_open(NULL, pb, 0, O_CREAT|O_EXCL|O_RDWR, 0600,
>+  _vp, NULL, NULL);

I guess O_CREAT will do the LOCKPARENT later...
I would have preferred if EMOVEFD/EDUPFD were gc'ed as part of the patch,
because there is a lot of ugliness left.

christos



Re: vn_open, EDUPFD, EMOVEFD

2021-06-28 Thread David Holland
On Tue, Jun 29, 2021 at 05:07:10AM -, Christos Zoulas wrote:
 > >* Does anyone know why dev/kloader.c calls namei and then vn_open on
 > >the same path? I remember seeing this before and leaving it alone
 > >because nobody I could find was sure what the deal was, but it's
 > >unlikely to work as-is and increasingly likely to break over time...
 > 
 > Just fix it, and if something breaks we'll put it back.

Ok, I will do exactly that :-) but separately.

 > >-   NDINIT(, LOOKUP, NOFOLLOW, pb);
 > >-   error = vn_open(, filemode, createmode);
 > >+   error = vn_open(NULL, pb, 0, filemode, createmode, vpp, NULL, NULL);
 > 
 > This is the only NOFOLLOW NDINIT case, should that be 'createmode |
 > O_NOFOLLOW'?

Yes it should, thanks. (Except filemode, not createmode.) Too many
mostly-the-same changes...

 > >-   NDINIT(, CREATE, LOCKPARENT, pb);
 > >
 > >/*
 > > * Since we do not hold ulfs_extattr_uepm_lock anymore,
 > > * another thread may race with us for backend creation,
 > >-* but only one can succeed here thanks to O_EXCL
 > >+* but only one can succeed here thanks to O_EXCL.
 > >+*
 > >+* backing_vp is the backing store. 
 > > */
 > >-   error = vn_open(, O_CREAT|O_EXCL|O_RDWR, 0600);
 > >+   error = vn_open(NULL, pb, 0, O_CREAT|O_EXCL|O_RDWR, 0600,
 > >+   _vp, NULL, NULL);
 > 
 > I guess O_CREAT will do the LOCKPARENT later...

Indeed.

 > I would have preferred if EMOVEFD/EDUPFD were gc'ed as part of the patch,
 > because there is a lot of ugliness left.

So would I, but it's a big deal (requires changing every [cb]devsw...)

-- 
David A. Holland
dholl...@netbsd.org


vn_open, EDUPFD, EMOVEFD

2021-06-28 Thread David Holland
A couple days ago it came to my attention that the hack used for
cloning devices (returning a magic error code and stuffing a file
descriptor number in a magic field of struct lwp) is also used by
/dev/stderr, and more importantly, every caller of vn_open in the
kernel (there are many) needs to know about this hack to avoid
returning internal negative errnos to users. Needless to say, none of
them (besides do_open) actually do, plus this is gross.

The following patch does not clean out the hack, but establishes a
containment perimeter inside vn_open; vn_open now returns either a
vnode or a file descriptor number. (And, along with the file
descriptor number, a flag saying whether it was the EDUPFD or EMOVEFD
case.) Callers of vn_open not prepared to deal with file descriptors,
which is all of them besides do_open, can pass NULL for the file
descriptor return parameter, in which case vn_open will instead return
EOPNOTSUPP.

It also takes advantage of having rearranged vn_open's signature to
stop exposing struct nameidata, which is a plus.

There are now eight arguments to vn_open, which is a lot, but it's
still simpler than struct nameidata:
at_dvp (directory vnode for openat() starting point, NULL ok)   
pb (pathbuf)
nmode (additional namei flags)
fmode (open flags, converted from O_* to F*)
cmode (creation permissions)
ret_vp (pointer for returned vnode)
ret_domove (pointer for returned is-EMOVEFD flag)
ret_fd (pointer for returned file descriptor)

I have also rearranged the interface to fd_dupopen, which uses the
vn_open results, to match, and so that its return value argument is
last like it ought to be. This was chosen to be incompatible (as
opposed to just moving int arguments around, which isn't safe) in case
there were other calls to it somewhere. (There weren't.)

This patch is missing the doc change to vnsubr(9) but I'll deal with
that before committing.

It also builds but hasn't been tested yet; test results are pending.

Requires a kernel bump because a number of the callers of vn_open are
in modules.

Comments? Objections? Tentative plan is to commit this along with the
VOP_PARSEPATH changes (proposed years back), which also require a
kernel bump, plus some other pending kernel bump stuff other folks
have, in the next few days.

Note that cleaning out the hack entirely would be expensive (because a
similar alternative return value scheme is needed in the open methods
of struct cdevsw and bdevsw) so I'm not going to attempt it yet; plus
it's a natural part of the long-term device plumbing rearrangement I
posted about a few weeks back, and it makes more sense to do it as
part of that than separately.

Oh also, note that a couple of callers of vn_open were passing
LOCKPARENT and relying on vn_open to ignore it. That's been
rectified.

And before I forget: currently uses of vn_open that yield EMOVEFD when
the caller isn't prepared to handle it leak the fd produced. This
patch doesn't change that; doing so without creating new layer
violations would be a major nuisance. Avoiding this behavior sensibly
requires a complete cleanup of the hack, which prompts layering
violations because it's itself a layer violation.

* Does anyone know why dev/kloader.c calls namei and then vn_open on
the same path? I remember seeing this before and leaving it alone
because nobody I could find was sure what the deal was, but it's
unlikely to work as-is and increasingly likely to break over time...


diff -r 4c2d0a182ef8 external/cddl/osnet/sys/sys/vnode.h
--- a/external/cddl/osnet/sys/sys/vnode.h   Sun Jun 27 18:13:54 2021 -0400
+++ b/external/cddl/osnet/sys/sys/vnode.h   Mon Jun 28 11:05:45 2021 -0400
@@ -239,7 +239,6 @@
 vnode_t **vpp, enum create crwhy, mode_t umask)
 {
struct pathbuf *pb;
-   struct nameidata nd;
int error;
 
ASSERT(seg == UIO_SYSSPACE);
@@ -248,11 +247,9 @@
ASSERT(umask == 0);
 
pb = pathbuf_create(pnamep);
-   NDINIT(, LOOKUP, NOFOLLOW, pb);
-   error = vn_open(, filemode, createmode);
+   error = vn_open(NULL, pb, 0, filemode, createmode, vpp, NULL, NULL);
if (error == 0) {
-   VOP_UNLOCK(nd.ni_vp, 0);
-   *vpp = nd.ni_vp;
+   VOP_UNLOCK(*vpp, 0);
}
pathbuf_destroy(pb);
return (error);
diff -r 4c2d0a182ef8 sys/dev/firmload.c
--- a/sys/dev/firmload.cSun Jun 27 18:13:54 2021 -0400
+++ b/sys/dev/firmload.cMon Jun 28 11:05:45 2021 -0400
@@ -204,7 +204,6 @@
 firmware_open(const char *drvname, const char *imgname, firmware_handle_t *fhp)
 {
struct pathbuf *pb;
-   struct nameidata nd;
struct vattr va;
char *pnbuf, *path, *prefix;
firmware_handle_t fh;
@@ -236,8 +235,7 @@
error = ENOMEM;
break;
}
-   NDINIT(, LOOKUP, FOLLOW | NOCHROOT, pb);
-   error = vn_open(,