Re: CVS commit: src/sys/kern
> Module Name:src > Committed By: mrg > Date: Thu Jul 4 05:59:05 UTC 2024 > > Modified Files: > src/sys/kern: vfs_syscalls.c > > Log Message: > don't fd_putfile() if you haven't grabbed a ref already. > > the condition to call fd_getvnode() was changed, but the condition > to call fd_putfile() afterwards was not changed, leading to a panic > seen by Chavdar on current-users, probably. > > builds, runs, seems obvious. The automatic testbed is failing to run tests to completion now: https://releng.netbsd.org/b5reports/i386/commits-2024.07.html#2024.07.04.05.59.05 Our sloppy process for fixing the vfs_syscalls.c issue is obviously failing now, after days of flailing around with band-aids. I propose to back out all of the recent changes: https://mail-index.netbsd.org/source-changes/2024/06/29/msg152020.html https://mail-index.netbsd.org/source-changes/2024/07/01/msg152081.html https://mail-index.netbsd.org/source-changes/2024/07/01/msg152082.html https://mail-index.netbsd.org/source-changes/2024/07/01/msg152083.html https://mail-index.netbsd.org/source-changes/2024/07/04/msg152216.html And then redo them with: 1. A PR that explains the problem, with references, and can track the changes in case we need to pull them up. 2. Posting the patch for public review first. 3. Adding automatic tests that exercise all the relevant cases and xfail, but that would pass with the fix. 4. _Then_ committing the fix.
Re: CVS commit: src/sys/kern
> Date: Fri, 13 Oct 2023 17:52:07 +0900 > From: Rin Okuyama > > It would be really nice if we can find some systematical/reliable methods to > figure out files that really depends on struct syncobj, e.g.. I tried > ctfdump(1) to > *.o for kernel modules, but I cannot extract information better than > ``grep syncobj.h .depend''... The attached patch removes all use of sys/syncobj.h outside .c files under sys, so we can be reasonably confident userland programs -- except for crash(8), which is kind of special -- are unaffected. However, I'm going to hold off on committing this until the tree's sleepq issues are fixed so our testbed can run again. >From 7e9e2af19ecc6f4262b928da8a97a49d171c8072 Mon Sep 17 00:00:00 2001 From: Taylor R Campbell Date: Fri, 13 Oct 2023 11:04:20 + Subject: [PATCH] sys/lwp.h: Nix sys/syncobj.h dependency. Remove it in ddb/db_syncobj.h too. New sys/wchan.h defines wchan_t so that users need not pull in sys/syncobj.h to get it. Sprinkle #include in .c files where it is now needed. --- sys/ddb/db_syncobj.h | 2 +- sys/kern/kern_condvar.c | 1 + sys/kern/kern_ktrace.c| 1 + sys/kern/kern_lwp.c | 1 + sys/kern/kern_mutex.c | 1 + sys/kern/kern_rwlock.c| 1 + sys/kern/kern_sleepq.c| 1 + sys/kern/kern_turnstile.c | 1 + sys/kern/sys_lwp.c| 1 + sys/kern/sys_select.c | 1 + sys/sys/lwp.h | 2 +- sys/sys/sleepq.h | 2 +- sys/sys/sleeptab.h| 6 +- sys/sys/syncobj.h | 4 ++-- sys/sys/wchan.h | 37 + 15 files changed, 56 insertions(+), 6 deletions(-) create mode 100644 sys/sys/wchan.h diff --git a/sys/ddb/db_syncobj.h b/sys/ddb/db_syncobj.h index 2c2ad89ba177..dc7594f5163e 100644 --- a/sys/ddb/db_syncobj.h +++ b/sys/ddb/db_syncobj.h @@ -29,7 +29,7 @@ #ifndef_DDB_DB_SYNCOBJ_H #define_DDB_DB_SYNCOBJ_H -#include +#include struct lwp; struct syncobj; diff --git a/sys/kern/kern_condvar.c b/sys/kern/kern_condvar.c index 478c4a35ff2b..c25282e1beb3 100644 --- a/sys/kern/kern_condvar.c +++ b/sys/kern/kern_condvar.c @@ -45,6 +45,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_condvar.c,v 1.59 2023/10/12 23:51:05 ad Exp $") #include #include #include +#include /* * Accessors for the private contents of the kcondvar_t data type. diff --git a/sys/kern/kern_ktrace.c b/sys/kern/kern_ktrace.c index 5ad5272af7d8..812be0c2c9ca 100644 --- a/sys/kern/kern_ktrace.c +++ b/sys/kern/kern_ktrace.c @@ -77,6 +77,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_ktrace.c,v 1.182 2022/07/01 01:07:56 riastradh #include #include #include +#include #include #include diff --git a/sys/kern/kern_lwp.c b/sys/kern/kern_lwp.c index 77e43242f0f9..971e0180f1f6 100644 --- a/sys/kern/kern_lwp.c +++ b/sys/kern/kern_lwp.c @@ -253,6 +253,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_lwp.c,v 1.265 2023/10/05 19:41:06 ad Exp $"); #include #include #include +#include #include #include diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c index 810ea121a0bd..640909bc533e 100644 --- a/sys/kern/kern_mutex.c +++ b/sys/kern/kern_mutex.c @@ -57,6 +57,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_mutex.c,v 1.110 2023/09/23 18:48:04 ad Exp $"); #include #include #include +#include #include diff --git a/sys/kern/kern_rwlock.c b/sys/kern/kern_rwlock.c index 88db7d507b4d..96312874a069 100644 --- a/sys/kern/kern_rwlock.c +++ b/sys/kern/kern_rwlock.c @@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_rwlock.c,v 1.74 2023/10/04 20:39:35 ad Exp $"); #include #include #include +#include #include diff --git a/sys/kern/kern_sleepq.c b/sys/kern/kern_sleepq.c index e9d39445f75b..bb43e78f6f6b 100644 --- a/sys/kern/kern_sleepq.c +++ b/sys/kern/kern_sleepq.c @@ -49,6 +49,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_sleepq.c,v 1.83 2023/10/08 13:37:26 ad Exp $"); #include #include #include +#include /* * for sleepq_abort: diff --git a/sys/kern/kern_turnstile.c b/sys/kern/kern_turnstile.c index 0cd8886cb6b5..85bdf946c325 100644 --- a/sys/kern/kern_turnstile.c +++ b/sys/kern/kern_turnstile.c @@ -68,6 +68,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_turnstile.c,v 1.53 2023/10/08 13:23:05 ad Exp $ #include #include #include +#include #include /* diff --git a/sys/kern/sys_lwp.c b/sys/kern/sys_lwp.c index c71cf1e823d6..108d40641e38 100644 --- a/sys/kern/sys_lwp.c +++ b/sys/kern/sys_lwp.c @@ -51,6 +51,7 @@ __KERNEL_RCSID(0, "$NetBSD: sys_lwp.c,v 1.87 2023/10/08 13:23:05 ad Exp $"); #include #include #include +#include #include diff --git a/sys/kern/sys_select.c b/sys/kern/sys_select.c index 9719d220e319..16962505663c 100644 --- a/sys/kern/sys_select.c +++ b/sys/kern/sys_select.c @@ -106,6 +106,7 @@ __KERNEL_RCSID(0, "$NetBSD: sys_select.c,v 1.64 2023/10/08 13:23:05 ad Exp $"); #include #include #include +#include /* Flags for lwp::l_selflag. */ #defineSEL_RESET 0 /* awoken, interrupted, or not yet
Re: CVS commit: src/sys/kern
On Thu, Oct 12, 2023 at 8:23 PM Taylor R Campbell wrote: > > > Date: Thu, 12 Oct 2023 17:06:02 +0900 > > From: Rin Okuyama > > > > On Thu, Oct 5, 2023 at 5:39 AM Andrew Doran wrote: > > > > > > Module Name:src > > > Committed By: ad > > > Date: Wed Oct 4 20:39:35 UTC 2023 > > > > > > Modified Files: > > > src/sys/kern: kern_rwlock.c kern_turnstile.c > > > > > > Log Message: > > > Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's > > > more informative than "tstile". > > > > Cool! Can I send a pull up request to netbsd-10? > > Not sure -- it would depend on this commit to introduce struct > syncobj::sobj_name: > > https://mail-index.netbsd.org/source-changes/2023/07/17/msg146058.html > > This is a potential kernel ABI change. I didn't investigate to > determine whether it would be safe to pull up. Thanks, I didn't notice that. It should be too risky to pull these up just before RC1. I withdraw this proposal. PS It would be really nice if we can find some systematical/reliable methods to figure out files that really depends on struct syncobj, e.g.. I tried ctfdump(1) to *.o for kernel modules, but I cannot extract information better than ``grep syncobj.h .depend''... Thanks, rin
Re: CVS commit: src/sys/kern
> Date: Thu, 12 Oct 2023 17:06:02 +0900 > From: Rin Okuyama > > On Thu, Oct 5, 2023 at 5:39â¯AM Andrew Doran wrote: > > > > Module Name:src > > Committed By: ad > > Date: Wed Oct 4 20:39:35 UTC 2023 > > > > Modified Files: > > src/sys/kern: kern_rwlock.c kern_turnstile.c > > > > Log Message: > > Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's > > more informative than "tstile". > > Cool! Can I send a pull up request to netbsd-10? Not sure -- it would depend on this commit to introduce struct syncobj::sobj_name: https://mail-index.netbsd.org/source-changes/2023/07/17/msg146058.html This is a potential kernel ABI change. I didn't investigate to determine whether it would be safe to pull up.
Re: CVS commit: src/sys/kern
Cool! Can I send a pull up request to netbsd-10? I've not yet observed deadlocks since this was committed, fortunately or unfortunately, although ;) Thanks, rin On Thu, Oct 5, 2023 at 5:39 AM Andrew Doran wrote: > > Module Name:src > Committed By: ad > Date: Wed Oct 4 20:39:35 UTC 2023 > > Modified Files: > src/sys/kern: kern_rwlock.c kern_turnstile.c > > Log Message: > Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's > more informative than "tstile". > > > To generate a diff of this commit: > cvs rdiff -u -r1.73 -r1.74 src/sys/kern/kern_rwlock.c > cvs rdiff -u -r1.51 -r1.52 src/sys/kern/kern_turnstile.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. >
Re: CVS commit: src/sys/kern
> Date: Fri, 28 Jul 2023 23:40:40 +0900 > From: Izumi Tsutsui > > > Module Name:src > > Committed By: riastradh > > Date: Fri Jul 28 10:37:28 UTC 2023 > > > > Modified Files: > > src/sys/kern: kern_tc.c > > > > Log Message: > > timecounter(9): Link to phk's timecounter paper for reference. > > Maybe it's better to refer our timecounter(9) man page? > https://man.netbsd.org/timecounter.9 That's fine to add, but the man page is about the interface, not about the algorithm. The paper is a critical reference for obscure details of the algorithm that are not obvious from the interface. The man page should perhaps also mention more of the API contract, like the relation between tc_frequency, tc_counter_mask, and hz. > (that refers http://phk.freebsd.dk/pubs/timecounter.pdf though) No opinion on which URL to use as long as it goes to the right paper (and preferably https).
Re: CVS commit: src/sys/kern
> Module Name: src > Committed By: riastradh > Date: Fri Jul 28 10:37:28 UTC 2023 > > Modified Files: > src/sys/kern: kern_tc.c > > Log Message: > timecounter(9): Link to phk's timecounter paper for reference. Maybe it's better to refer our timecounter(9) man page? https://man.netbsd.org/timecounter.9 (that refers http://phk.freebsd.dk/pubs/timecounter.pdf though) --- Izumi Tsutsui
Re: CVS commit: src/sys/kern
Hello, On Tue, 15 Nov 2022 10:29:56 + "Michael Lorenz" wrote: > Module Name: src > Committed By: macallan > Date: Tue Nov 15 10:29:56 UTC 2022 > > Modified Files: > src/sys/kern: subr_pserialize.c > > Log Message: > don't KASSERT(kpreempt_disabled()) while cold > pserialize_read_*() can be called *very* early in kernel startup these days ... with this clang-built kernels work again on macppc have fun Michael
Re: CVS commit: src/sys/kern
Date:Tue, 04 Oct 2022 10:09:35 -0400 From:Christos Zoulas Message-ID: <8dd220d16861eb3a890461bdf02d1...@zoulas.com> | I always forget the O_CLOEXEC is special | in that regard. I wish it was not, but it is difficult to fix. POSIX is adding O_CLOFORK in the next version (no guarantees I remembered the symbol name spelling correctly here) which will have essentially the same (open time) semantics (similar long term sematics as well, just applied at a different time). I assume we will need to add that at some point or other. | The question is how to find the vnode? Not really, I assume that part will be fairly easy (probably trivial), I just didn't have the energy to go work it out when sending that mail. We have the file descriptor, and I suspect the file* (need to check to make sure the right one is immediately available, but we can get it from the fd if not), we know it refers to a vnode (it came from vn_open()), so getting the vnode* from the file* is not something difficult, I think. | Perhaps it is easiest to fail the open call if O_EXLOCK or | O_SHLOCK are specified in a cloning open? That would be an option, and is better than just ignoring them, but better still would be to make them work. Since open_setfp() does nothing (much) when none of the relevant O_xxx flags that it tests are set (the fd open flags, as distinct from the fp ones), and the code calls VOP_UNLOCK(vp) after it, we know that vp is intended to be locked when open_setfp() is called (further confirmed as when any of the O_??LOCK flags is set, open_setfp() does a VOP_UNLOCK() and later a vn_lock() (which I am guessing is the inverse). Maybe all that's needed is a vn_lock() call (on the vp that we still need to fetch) and then call open_setfp() ? But this is all beyond what I know enough about to be sure, particularly to avoid doing anything which might deadlock, etc. kre ps: if this gets done properly, then special case code to handle O_NONBLOCK (and O_NOSIGPIPE, ...) in cloning device drivers won't be needed either, open_setfp() is where all of that is normally added to the file* for the fd being returned, it was not "simply happening" because that call is missing in the cloned device case.
Re: CVS commit: src/sys/kern
On 2022-10-01 3:39 pm, Robert Elz wrote: Date:Sat, 1 Oct 2022 13:00:04 -0400 [stuff deleted] Even when it is called, the code is: fp->f_flag = flags & FMASK; where FMASK is (from fcntl.h) #define FMASK (FREAD|FWRITE|FCNTLFLAGS|FEXEC) and #define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FDSYNC|FRSYNC|FALTIO|\ FDIRECT|FNOSIGPIPE) which all looks exactly as it should be to me - and note that O_CLOEXEC (which has no F equivalent name - it doesn't need one) is not there. So, fp->f_flag isn't set at all (is probably 0), and even if it were O_CLOEXEC would not be there, and should not be. Thanks for pointing that out, I always forget the O_CLOEXEC is special in that regard. I wish it was not, but it is difficult to fix. For the vnode actually being opened, none of this matters, as the open lasts as long (actually not as long) as the open() sys call - when that returns, the device being opened has been closed again, so what the file that refers to it looks like (or would have looked like) really doesn't matter at all, it never becomes visible. That's my guess as to why the open_setfp() call is missing in that case. But what got forgotten (or deliberately was not done) was anything to affect the modes of the clone which is opened. | What does it mean when the open specifies O_CLOEXEC | and ff->ff_exclose is false? Can that happen? Is that desirable? It is what is currently happening whenever we open a cloning device. (Or that is what it looks like to me). Desirable, no idea, I didn't define the semantics of cloning device opens, nor which of the open flags should apply to the clone that is created. | I am fine with the locking to stay where it is. I guess it is probably | not needed after dup/clone, since the underlying vnode is shared... Assuming you mean dup(2) (and dup2()) and clone(2) then no - those have no way to pass the relevant locking flags, if you have a fd and want to apply a lock, fcntl() is what does that (and the fcntl operations that duplicate fds do not also apply locks). The only issue is O_EXLOCK and O_SHLOCK (and O_CLOEXEC) for cloned devices. I suspect it makes sense for O_CLOEXEC to be applied in that case, it makes little sense for open("/dev/ptmx", O_RDWR|O_CLOEXEC) to succeed, returning an open fd which does not have cloexec set (which is the issue, along with O_NONBLOCK) which started all of this discussion. The locking flags I am less sure about. I don't see how they can fail to succeed if applied, as the vnode for the device has just been created, nothing else can possibly have any kind of lock on it. Whether there's any benefit in applying the lock so that the node is locked for later, I don't know - but it certainly should do no harm to do that. It seems clear to me that what we need is (something like) Index: vfs_syscalls.c === RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.555 diff -u -r1.555 vfs_syscalls.c --- vfs_syscalls.c 12 Feb 2022 15:51:29 - 1.555 +++ vfs_syscalls.c 1 Oct 2022 19:27:15 - @@ -1763,6 +1763,9 @@ error = fd_dupopen(dupfd, dupfd_move, flags, &indx); if (error) return error; + error = open_setfp(l, fp, XXXvp, indx, flags); + if (error) + return error; *fd = indx; } else { error = open_setfp(l, fp, vp, indx, flags); where XXXvp needs to be extracted from somewhere (it isn't vp, as vp==NULL) except that what follows in the else case is ... if (error) return error; VOP_UNLOCK(vp); *fd = indx; That VOP_UNLOCK(vp) is what is bothering me, It tells me that open_setfp() is expecting to be called with vp locked - but in the first case (the cloning case) there is no VOP_UNLOCK() call (and what's more, fd_dupopen() cannot do it, as it has no vp arg). That means, I believe, that when vn_open() returns in the normal case, vp is returned locked, but in the cloning case the vnode that was created for the clone is not locked. I'm not sure what is the right way to find the vnode, or how it should properly be locked so open_setfp() can do its thing. If I knew all of that I would have made an attempt at fixing this already. We need someone who really understands what is happening here, and the right way to handle it all (which very likely is nothing like I just suggested). The question is how to find the vnode? Perhaps it is easiest to fail the open call if O_EXLOCK or O_SHLOCK are specified in a cloning open? At least we will not silently ignore them? Best, christos
Re: CVS commit: src/sys/kern
Date:Sat, 1 Oct 2022 13:00:04 -0400 From:Christos Zoulas Message-ID: <8bd3a408-77fd-402c-8d6c-ad4b4a5e8...@zoulas.com> | This is what I meant: | | RCS file: /cvsroot/src/sys/kern/kern_descrip.c,v | retrieving revision 1.251 | diff -u -u -r1.251 kern_descrip.c | --- kern_descrip.c 29 Jun 2021 22:40:53 - 1.251 | +++ kern_descrip.c 1 Oct 2022 16:56:44 - | @@ -1162,6 +1162,7 @@ | KASSERT(ff->ff_allocated); | KASSERT(fd_isused(fdp, fd)); | KASSERT(fd >= NDFDFILE || ff == (fdfile_t *)fdp->fd_dfdfile[fd]); | + ff->ff_exclose = (fp->f_flag & O_CLOEXEC) != 0; | | /* No need to lock in order to make file initially visible. */ | ff->ff_file = fp; That's not going to work in the situation we're in, as nothing will have set O_CLOEXEC in fp->f_flag (or shouldn't have anyway, right, O_CLOEXEC isn't that kind of a flag, it belongs to the descriptor, not the file*). f_flag is set in vfs_syscalls.c in open_setfp() - which is never called in the cloning device case, that's the underlying problem I believe. Even when it is called, the code is: fp->f_flag = flags & FMASK; where FMASK is (from fcntl.h) #define FMASK (FREAD|FWRITE|FCNTLFLAGS|FEXEC) and #define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FDSYNC|FRSYNC|FALTIO|\ FDIRECT|FNOSIGPIPE) which all looks exactly as it should be to me - and note that O_CLOEXEC (which has no F equivalent name - it doesn't need one) is not there. So, fp->f_flag isn't set at all (is probably 0), and even if it were O_CLOEXEC would not be there, and should not be. For the vnode actually being opened, none of this matters, as the open lasts as long (actually not as long) as the open() sys call - when that returns, the device being opened has been closed again, so what the file that refers to it looks like (or would have looked like) really doesn't matter at all, it never becomes visible. That's my guess as to why the open_setfp() call is missing in that case. But what got forgotten (or deliberately was not done) was anything to affect the modes of the clone which is opened. | What does it mean when the open specifies O_CLOEXEC | and ff->ff_exclose is false? Can that happen? Is that desirable? It is what is currently happening whenever we open a cloning device. (Or that is what it looks like to me). Desirable, no idea, I didn't define the semantics of cloning device opens, nor which of the open flags should apply to the clone that is created. | I am fine with the locking to stay where it is. I guess it is probably | not needed after dup/clone, since the underlying vnode is shared... Assuming you mean dup(2) (and dup2()) and clone(2) then no - those have no way to pass the relevant locking flags, if you have a fd and want to apply a lock, fcntl() is what does that (and the fcntl operations that duplicate fds do not also apply locks). The only issue is O_EXLOCK and O_SHLOCK (and O_CLOEXEC) for cloned devices. I suspect it makes sense for O_CLOEXEC to be applied in that case, it makes little sense for open("/dev/ptmx", O_RDWR|O_CLOEXEC) to succeed, returning an open fd which does not have cloexec set (which is the issue, along with O_NONBLOCK) which started all of this discussion. The locking flags I am less sure about. I don't see how they can fail to succeed if applied, as the vnode for the device has just been created, nothing else can possibly have any kind of lock on it. Whether there's any benefit in applying the lock so that the node is locked for later, I don't know - but it certainly should do no harm to do that. It seems clear to me that what we need is (something like) Index: vfs_syscalls.c === RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.555 diff -u -r1.555 vfs_syscalls.c --- vfs_syscalls.c 12 Feb 2022 15:51:29 - 1.555 +++ vfs_syscalls.c 1 Oct 2022 19:27:15 - @@ -1763,6 +1763,9 @@ error = fd_dupopen(dupfd, dupfd_move, flags, &indx); if (error) return error; + error = open_setfp(l, fp, XXXvp, indx, flags); + if (error) + return error; *fd = indx; } else { error = open_setfp(l, fp, vp, indx, flags); where XXXvp needs to be extracted from somewhere (it isn't vp, as vp==NULL) except that what follows in the else case is ... if (error) return error; VOP_UNLOCK(vp); *fd = indx; That VOP_UNLOCK(vp) is what is bothering me, It tells me that open_setfp() is expecting to be called with vp locked - but in the first case (the cloning case) there is no VOP_UNLOCK() call (and what's more, fd_dupopen() cannot do it, as it has no vp arg). That means, I
Re: CVS commit: src/sys/kern
> On Sep 30, 2022, at 11:02 PM, Robert Elz wrote: > >Date:Fri, 30 Sep 2022 20:15:07 -0400 >From:Christos Zoulas >Message-ID: > > | It does not need an extra flag (it looks in the file descriptor flags to > | find if it needs to set or not. > > One of us is confused. From where in this case does anything > get the exclose flag set? That's the whole question here. The > flags arg that is passed around has O_CLOEXEC set in it - you used > that in the call to fd_set_exclose() in kern/tty_ptm.c ... but where > you said that would be better done in fd_affix(). This is what I meant: RCS file: /cvsroot/src/sys/kern/kern_descrip.c,v retrieving revision 1.251 diff -u -u -r1.251 kern_descrip.c --- kern_descrip.c 29 Jun 2021 22:40:53 - 1.251 +++ kern_descrip.c 1 Oct 2022 16:56:44 - @@ -1162,6 +1162,7 @@ KASSERT(ff->ff_allocated); KASSERT(fd_isused(fdp, fd)); KASSERT(fd >= NDFDFILE || ff == (fdfile_t *)fdp->fd_dfdfile[fd]); + ff->ff_exclose = (fp->f_flag & O_CLOEXEC) != 0; /* No need to lock in order to make file initially visible. */ ff->ff_file = fp; > > That does not have access to the flags. So from where is it going > to get the close on exec info ? > > My reading of do_open() is that the O_CLOEXEC flag is never even > examined when a cloning device is opened, it doesn't get set on > the original fd (the cloner) or the cloned device (other than by > your recent modification for /dev/pmx). > > Did I misread the code? > > Or are you planning something different than it seemed? > > | to find other cases where we forgot to call fd_set_exclose() before calling > | fd_affix(). > > My point is that it should not be necessary to call fd_set_exclose() > in every (or any) cloning device driver. The open syscall handling > is where that should be done, just as it is for all the opens that > are not cloning devices. What does it mean when the open specifies O_CLOEXEC and ff->ff_exclose is false? Can that happen? Is that desirable? > Why be different? > > | It also does not need locking because the process can't access > | the descriptor before calling fd_affix. > > The locking I was referring to are the vnode locks/references in > do_open(), not anything related to the file struct or descriptor. > I just do not feel competent to get all of that correct in this > case (more complex than the normal case because of the extra vnode > involved) and would prefer if someone familiar with all of that > were to handle it - particularly in the extra error case that will > need to be handled, even if I cannot see how it would actually fire > in the case in question. I am fine with the locking to stay where it is. I guess it is probably not needed after dup/clone, since the underlying vnode is shared... christos signature.asc Description: Message signed with OpenPGP
Re: CVS commit: src/sys/kern
Date:Fri, 30 Sep 2022 20:15:07 -0400 From:Christos Zoulas Message-ID: | It does not need an extra flag (it looks in the file descriptor flags to | find if it needs to set or not. One of us is confused. From where in this case does anything get the exclose flag set? That's the whole question here. The flags arg that is passed around has O_CLOEXEC set in it - you used that in the call to fd_set_exclose() in kern/tty_ptm.c ... but where you said that would be better done in fd_affix(). That does not have access to the flags. So from where is it going to get the close on exec info ? My reading of do_open() is that the O_CLOEXEC flag is never even examined when a cloning device is opened, it doesn't get set on the original fd (the cloner) or the cloned device (other than by your recent modification for /dev/pmx). Did I misread the code? Or are you planning something different than it seemed? | to find other cases where we forgot to call fd_set_exclose() before calling | fd_affix(). My point is that it should not be necessary to call fd_set_exclose() in every (or any) cloning device driver. The open syscall handling is where that should be done, just as it is for all the opens that are not cloning devices. Why be different? | It also does not need locking because the process can't access | the descriptor before calling fd_affix. The locking I was referring to are the vnode locks/references in do_open(), not anything related to the file struct or descriptor. I just do not feel competent to get all of that correct in this case (more complex than the normal case because of the extra vnode involved) and would prefer if someone familiar with all of that were to handle it - particularly in the extra error case that will need to be handled, even if I cannot see how it would actually fire in the case in question. kre
Re: CVS commit: src/sys/kern
> On Sep 30, 2022, at 5:57 PM, Robert Elz wrote: > >Date:Fri, 30 Sep 2022 16:34:20 -0400 >From:Christos Zoulas >Message-ID: <232331ad-d501-4547-b730-03590c0c9...@zoulas.com> > > | How about handling exclose there? > > That would be possible, but why? We still need higher level code to > handle the locking, which can also handle cloexec -- the problem we > have now is simply that the relevant call is missing, I don't think adding > it will be hard, but it needs to be done by someone who understands the > locking requirements, and correct exit strategy in this case if an error > occurs (failing to successfully lock a newly created clone would seem to > be a very bizarre case, but still...) That is, I don't feel competent to > suggest the 3 or 4 lines that ought be added in do_open to fix this (for > just O_CLOEXEC it would be trivial there, as that cannot fail). > > Currently fd_affix (I mistakenly made it fp_affix in the last message...) > doesn't have a flags parameter, so to do it the way you suggest, we'd need > to alter its signature, bump to 9.99.101 (and I haven't yet gotten around > to making my kernel be 98.99.100 which I'm kind of planning to do ...) > and go alter all the calls everywhere, mostly just filling in an extra > arg with a 0. It does not need an extra flag (it looks in the file descriptor flags to find if it needs to set or not. In fact the first thing I thought was to add an assertion to make sure that the flags agrees with that is set in exclose, to find other cases where we forgot to call fd_set_exclose() before calling fd_affix(). It also does not need locking because the process can't access the descriptor before calling fd_affix. christos signature.asc Description: Message signed with OpenPGP
Re: CVS commit: src/sys/kern
On Sat, 1 Oct 2022, Robert Elz wrote: Currently fd_affix (I mistakenly made it fp_affix in the last message...) doesn't have a flags parameter, so to do it the way you suggest, we'd need to alter its signature, bump to 9.99.101 ... and add some COMPAT_09 goop for backward compability :) ... (and I haven't yet gotten around to making my kernel be 98.99.100 which I'm kind of planning to do ...) and go alter all the calls everywhere, mostly just filling in an extra arg with a 0. kre !DSPAM:63376773211686829812153! ++--+--+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses:| | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com| | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | ++--+--+
Re: CVS commit: src/sys/kern
Date:Fri, 30 Sep 2022 16:34:20 -0400 From:Christos Zoulas Message-ID: <232331ad-d501-4547-b730-03590c0c9...@zoulas.com> | How about handling exclose there? That would be possible, but why? We still need higher level code to handle the locking, which can also handle cloexec -- the problem we have now is simply that the relevant call is missing, I don't think adding it will be hard, but it needs to be done by someone who understands the locking requirements, and correct exit strategy in this case if an error occurs (failing to successfully lock a newly created clone would seem to be a very bizarre case, but still...) That is, I don't feel competent to suggest the 3 or 4 lines that ought be added in do_open to fix this (for just O_CLOEXEC it would be trivial there, as that cannot fail). Currently fd_affix (I mistakenly made it fp_affix in the last message...) doesn't have a flags parameter, so to do it the way you suggest, we'd need to alter its signature, bump to 9.99.101 (and I haven't yet gotten around to making my kernel be 98.99.100 which I'm kind of planning to do ...) and go alter all the calls everywhere, mostly just filling in an extra arg with a 0. kre
Re: CVS commit: src/sys/kern
> On Sep 30, 2022, at 10:13 AM, Robert Elz wrote: > >Date:Thu, 29 Sep 2022 16:47:06 - (UTC) >From:chris...@astron.com (Christos Zoulas) >Message-ID: > > | I think that the way to go is to: > | > | 1. Do the fd_set_exclose() in fd_affix(). That will remove most of the > calls > |to fd_set_exclose() *and* the open-coded versions of it. > | 2. Move the open_setfp locking initialization code to fd_affix() and do it > |if fp->f_type == DTYPE_VNODE. This should enable locking in all the > |appropriate cloners. > > I initially intended to reply and say that decisions where to put stuff > like that were for someone else (you, dholland, ...) rather than me, as > I haven't played around much at this level since before vnodes existed. > > But I have been thinking about it, and I disagree with that approach. > > fp_affix() has a job to do, and should be left to do it, without being > burdened by applying weird side effects, sometimes. The "do one thing > and do it well" philosophy applies to more than the commands. > > eg: currently fd_affix() is a void func, but to handle the lock flags > it would need to be able to fail, and return an error code. It would > also need to be able to sleep. That's just wrong. > > O_CLOEXEC and O_??LOCK are high level open() flags, and deserve to be > handled somewhere near the upper levels of the open syscall handling, > not buried in some utility function. That is the feedback that I wanted. But there were two parts to it. How about handling exclose there? It is just making sure that the value from flags is propagated to the exclose field. christos signature.asc Description: Message signed with OpenPGP
Re: CVS commit: src/sys/kern
Date:Thu, 29 Sep 2022 16:47:06 - (UTC) From:chris...@astron.com (Christos Zoulas) Message-ID: | I think that the way to go is to: | | 1. Do the fd_set_exclose() in fd_affix(). That will remove most of the calls |to fd_set_exclose() *and* the open-coded versions of it. | 2. Move the open_setfp locking initialization code to fd_affix() and do it |if fp->f_type == DTYPE_VNODE. This should enable locking in all the |appropriate cloners. I initially intended to reply and say that decisions where to put stuff like that were for someone else (you, dholland, ...) rather than me, as I haven't played around much at this level since before vnodes existed. But I have been thinking about it, and I disagree with that approach. fp_affix() has a job to do, and should be left to do it, without being burdened by applying weird side effects, sometimes. The "do one thing and do it well" philosophy applies to more than the commands. eg: currently fd_affix() is a void func, but to handle the lock flags it would need to be able to fail, and return an error code. It would also need to be able to sleep. That's just wrong. O_CLOEXEC and O_??LOCK are high level open() flags, and deserve to be handled somewhere near the upper levels of the open syscall handling, not buried in some utility function. kre
Re: CVS commit: src/sys/kern
In article <9275.1664462...@jacaranda.noi.kre.to>, Robert Elz wrote: >Date:Thu, 29 Sep 2022 08:18:28 -0400 >From:"Christos Zoulas" >Message-ID: <20220929121828.06edff...@cvs.netbsd.org> > > | Log Message: > | Add fd_set_exclose(). It is probably better to do this automatically in > | fd_affix()... > >Since that only affects /dev/ptmx I'd suggest fixing it generally for all >cloning devices (and handling O_??LOCK as well) would be a better method. I think that the way to go is to: 1. Do the fd_set_exclose() in fd_affix(). That will remove most of the calls to fd_set_exclose() *and* the open-coded versions of it. 2. Move the open_setfp locking initialization code to fd_affix() and do it if fp->f_type == DTYPE_VNODE. This should enable locking in all the appropriate cloners. Best, christos
Re: CVS commit: src/sys/kern
Date:Thu, 29 Sep 2022 08:18:28 -0400 From:"Christos Zoulas" Message-ID: <20220929121828.06edff...@cvs.netbsd.org> | Log Message: | Add fd_set_exclose(). It is probably better to do this automatically in | fd_affix()... Since that only affects /dev/ptmx I'd suggest fixing it generally for all cloning devices (and handling O_??LOCK as well) would be a better method. kre
Re: CVS commit: src/sys/kern
Reverted and alternative proposed on tech-kern! Sorry for the unilateral toe-stomping.
Re: CVS commit: src/sys/kern
On Sun, 7 Aug 2022, Paul Goyette wrote: On Sun, 7 Aug 2022, Taylor R Campbell wrote: Module Name:src Committed By: riastradh Date: Sun Aug 7 21:17:18 UTC 2022 Modified Files: src/sys/kern: kern_module.c Log Message: module(9): Disable module autounload by default. I don't know why this was ever enabled by default; many modules are still not safe to unload, let alone autounload. If any autounload is to happen by default, it should only be for modules that have opted into it in some way after audit. One reason for the current behavior involves the modules used for various emulations. When a file is executed, and none of the currently-loaded modules can "deal" with it, we load _all_ of the available emulation modules with the hope that one of them will "deal with" the new executable, and with the expectation that the remaining emulation modules will just "go away". Modules that are known to be unsafe to unload should declare that in their modcmd() unload (by returning EBUSY). After all, one might well expect that the module itself is the most likely place that the unloadable status would be known. Making no-autounload as the default seems like using a 20-pound sledge hammer on a carpet tack. I might also note that making such a fundamental behavior change when we're so close to the -10 release, without actually providing the suggested "opt-in" mechanism to retain current behavior, is not very user friendly. :-) At least this should be discussed (on tech-kern@, perhaps) before being unilaterally decided and committed. IIRC, we had this discussion some time ago, and the decision at that time was to retain current behavior. Unfortunately, I didn't save any pointer to that old discussion. :-( ++--+--+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses:| | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com| | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | ++--+--+ !DSPAM:62f0321a29502088639869! ++--+--+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses:| | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com| | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | ++--+--+
Re: CVS commit: src/sys/kern
On Sun, 7 Aug 2022, Taylor R Campbell wrote: Module Name:src Committed By: riastradh Date: Sun Aug 7 21:17:18 UTC 2022 Modified Files: src/sys/kern: kern_module.c Log Message: module(9): Disable module autounload by default. I don't know why this was ever enabled by default; many modules are still not safe to unload, let alone autounload. If any autounload is to happen by default, it should only be for modules that have opted into it in some way after audit. One reason for the current behavior involves the modules used for various emulations. When a file is executed, and none of the currently-loaded modules can "deal" with it, we load _all_ of the available emulation modules with the hope that one of them will "deal with" the new executable, and with the expectation that the remaining emulation modules will just "go away". Modules that are known to be unsafe to unload should declare that in their modcmd() unload (by returning EBUSY). After all, one might well expect that the module itself is the most likely place that the unloadable status would be known. Making no-autounload as the default seems like using a 20-pound sledge hammer on a carpet tack. ++--+--+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses:| | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com| | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | ++--+--+
Re: CVS commit: src/sys/kern
> Date: Fri, 11 Feb 2022 16:50:16 -0800 > From: Jason Thorpe > > > On Feb 11, 2022, at 9:53 AM, Taylor R Campbell wrote: > > > > That is, this was presumably meant to ensure (A) happens-before (B). > > This relation is already guaranteed by ipi(9), so there is no need > > for any explicit memory barrier. > > Is this documented in ipi(9)? Is now!
Re: CVS commit: src/sys/kern
> On Feb 11, 2022, at 9:53 AM, Taylor R Campbell wrote: > > That is, this was presumably meant to ensure (A) happens-before (B). > This relation is already guaranteed by ipi(9), so there is no need > for any explicit memory barrier. Is this documented in ipi(9)? -- thorpej
Re: CVS commit: src/sys/kern
> Date: Sat, 5 Feb 2022 22:47:30 +0100 > From: Tobias Nygren > > On Sat, 5 Feb 2022 15:17:40 + > Martin Husemann wrote: > > > Modified Files: > > src/sys/kern: subr_autoconf.c > > > > Log Message: > > cfdriver_iattr_count() is only used in a KASSERT, so #ifdef DIAGNOSTIC it. > > This doesn't seem to work as intended. > In a kernel with "no options DIAGNOSTIC" I now get: > > /work/src/sys/kern/subr_autoconf.c: In function 'config_search_internal': > /work/src/sys/kern/subr_autoconf.c:1149:3: error: implicit declaration of > function 'cfdriver_iattr_count' [-Werror=implicit-function-declaration] I guess we need to decorate cfdriver_iattr_count with __diagused. (Point of the KASSERT change the other month was to eliminate most of these `oops I forgot to try a DIAGNOSTIC / !DIAGNOSTIC build' problems and reduce #ifdefs; thanks, clang...)
Re: CVS commit: src/sys/kern
On Sat, 5 Feb 2022 15:17:40 + Martin Husemann wrote: > Modified Files: > src/sys/kern: subr_autoconf.c > > Log Message: > cfdriver_iattr_count() is only used in a KASSERT, so #ifdef DIAGNOSTIC it. This doesn't seem to work as intended. In a kernel with "no options DIAGNOSTIC" I now get: /work/src/sys/kern/subr_autoconf.c: In function 'config_search_internal': /work/src/sys/kern/subr_autoconf.c:1149:3: error: implicit declaration of function 'cfdriver_iattr_count' [-Werror=implicit-function-declaration] 1149 | cfdriver_iattr_count(parent->dv_cfdriver) < 2); | ^~~~ /work/src/sys/lib/libkern/libkern.h:279:42: note: in definition of macro 'KASSERT' 279 | #define KASSERT(e) ((void)sizeof((long)(e)))
Re: CVS commit: src/sys/kern
On 2021/09/22 14:42, Rin Okuyama wrote: Module Name:src Committed By: rin Date: Wed Sep 22 05:42:19 UTC 2021 Modified Files: src/sys/kern: kern_ksyms.c Log Message: ksymsmmap: Add missing uao_reference(9) call for ks->ks_uobj. Fix failure for savecore(8) and subsequent kernel panic, introduced to kern_ksyms.c rev 1.03, at least for sh3 and alpha. Oops, I meant rev 1.103 here. For sh3 and alpha, savecore(8) supports coff and ecoff, respectively, via libkvm via nlist(3). nlist(3) routines for coff and ecoff use mmap(2) and munmap(2) for /dev/ksyms. This munmap(2) decrements reference count for ks->ks_uobj. Unless it is incremented in ksymsmmap(), ks->ks_uobj will be freed unexpectedly. To generate a diff of this commit: cvs rdiff -u -r1.104 -r1.105 src/sys/kern/kern_ksyms.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Re: CVS commit: src/sys/kern
On Tue, Aug 17, 2021 at 11:39:26PM +, Taylor R Campbell wrote: > > > > Log Message: > > skip symbol tables that were unloaded again to avoid EFAULT when reading > > ksyms. > > > > also restore TAILQ_FOREACH idiom. > > This change isn't quite right: Reading past st = ksyms_last_snapshot > in ksymsread, or in any context where we rely on the snapshot without > holding ksyms_lock, is not allowed -- it will lead to a corrupted view > of the snapshot, and possibly end up reading uninitialized memory. > > That's why this code didn't use TAILQ_FOREACH -- it must stop at > ksyms_last_snapshot; if it gets to the end of the list, and witnesses > st = NULL, then that's a bug. TAILQ_FOREACH just adds another exit condition that prevents entering the loop with st = NULL. A problem might be to continue the loop in case ksyms_last_snapshot itself is gone. Something like: if (st->sd_gone) goto skip; ... skip: if (st == ksyms_last_snapshot) break; avoids that case. > The code also didn't skip entries with st->sd_gone because we arrange > for st->sd_nmap not to be freed until the last ksymsclose, at which > point no more ksymsread can be active. Keeping sd_nmap isn't sufficient, ksymsread failed with EFAULT as you still derefence pointers to the unloaded module. Skipping the unloaded module when reading the symbols avoids this.
Re: CVS commit: src/sys/kern
> Module Name:src > Committed By: mlelstv > Date: Sun Jul 18 06:57:28 UTC 2021 > > Modified Files: > src/sys/kern: kern_ksyms.c > > Log Message: > skip symbol tables that were unloaded again to avoid EFAULT when reading > ksyms. > > also restore TAILQ_FOREACH idiom. This change isn't quite right: Reading past st = ksyms_last_snapshot in ksymsread, or in any context where we rely on the snapshot without holding ksyms_lock, is not allowed -- it will lead to a corrupted view of the snapshot, and possibly end up reading uninitialized memory. That's why this code didn't use TAILQ_FOREACH -- it must stop at ksyms_last_snapshot; if it gets to the end of the list, and witnesses st = NULL, then that's a bug. The code also didn't skip entries with st->sd_gone because we arrange for st->sd_nmap not to be freed until the last ksymsclose, at which point no more ksymsread can be active. If you start skipping some entries, that will cause assertions to fire about mismatched symbol table size which we computed up front in ksymsopen. I made a mistake in some previous changes: Although we defer freeing st->sd_nmap while /dev/ksyms is open, we do not defer freeing st->sd_strstart or st->sd_symstart. And it is possible for a concurrent module _unload_ to issue kobj_unload in the middle of concurrent ksymsread and free st->sd_strstart or st->sd_symstart while ksymsread is reading from them. (My last round of changes was done to fix bugs found during module _load_.) The commit message doesn't say, but I assume your change was meant to fix that bug. Unfortunately, the change doesn't completely fix it -- we could easily find ourselves in the following situation: CPU 0 CPU 1 - - read from /dev/ksyms modunload foo.kmod load st->sd_gone = false kobj_unload store st->sd_gone = true kobj_free(ko, ko->ko_symtab); copy from st->sd_symstart -> FAULT In order to avoid this situation, we either need to: 1. Make ksyms_modunload block until last /dev/ksyms close. => Easy enough: slap a condvar on ksyms_opencnt, wait for ksyms_opencnt == 0 in ksyms_modunload, notify if ksyms_opencnt == 0 in ksymsclose. => Holding /dev/ksyms open indefinitely can block module unload indefinitely; unclear if this might lead to problematic deadlock scenarios. => Bonus: Can eliminate some of the logic in ksymsclose, since ksyms_modunload will handle it synchronously before return anyway. 2. Convert ksymsread to use psref or localcount. => Takes a little more effort to convert tailq to pslist -- which requires some care because struct ksyms_symtab is shared with userland. So we still have to keep the tailq entry records (or add disgusting compat code). This is why I didn't go for pserialize in my last round of changes. => Complicated because we would only get a snapshot within a single ksymsread call, not from open to close of /dev/ksyms; this would require some kind of mechanism to persuade userland tools to restart if the snapshot has changed from read to read. => (Can't use pserialize because we have to hold the snapshot stable across copyout to userland, which may sleep, which is not allowed in pserialize read sections.) 3. Find some other way to copy the tables or something while /dev/ksyms is open so they won't go away when kobj_unload runs, or just keep copies of the strtab and symtab for ksyms that persist unload -- might cost a lot of kernel memory, probably not worth pursuing. In any case this change should otherwise be reverted, since it is no necessary (sd_symstart/sd_strstart will no longer go away during ksymsread) and it is also wrong (iteration past ksyms_last_snapshot is not allowed). I'm leaning toward option (1); other thoughts welcome.
Re: CVS commit: src/sys/kern
On Thu, Jul 01, 2021 at 12:25:51AM -0400, Christos Zoulas wrote: > Modified Files: > src/sys/kern: vfs_vnops.c > > Log Message: > don't clear the error before we use it to determine if we are moving or > duping. oh ffs... *goes to soak head* -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/kern
Hmm, misleading commit log (code itself is correct). More precisely: s/the last element of ksyms_symtabs/ksyms_last_snapshot/ Anyway, does this help you to reintroduce "ksyms(4): Don't skip symbol tables that are soon to be freed."? That KASSERT does not fire anymore for aarch64, as far as I can see. Thanks, rin On 2021/06/03 0:43, Rin Okuyama wrote: Module Name:src Committed By: rin Date: Wed Jun 2 15:43:33 UTC 2021 Modified Files: src/sys/kern: kern_ksyms.c Log Message: Fix regression introduced in rev 1.90: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_ksyms.c#rev1.90 in which the last element of ksyms_symtabs is skipped by mistake. To generate a diff of this commit: cvs rdiff -u -r1.93 -r1.94 src/sys/kern/kern_ksyms.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/kern/kern_ksyms.c diff -u src/sys/kern/kern_ksyms.c:1.93 src/sys/kern/kern_ksyms.c:1.94 --- src/sys/kern/kern_ksyms.c:1.93 Wed Jun 2 08:46:16 2021 +++ src/sys/kern/kern_ksyms.c Wed Jun 2 15:43:33 2021 @@ -1,4 +1,4 @@ -/* $NetBSD: kern_ksyms.c,v 1.93 2021/06/02 08:46:16 riastradh Exp $ */ +/* $NetBSD: kern_ksyms.c,v 1.94 2021/06/02 15:43:33 rin Exp $ */ /*- * Copyright (c) 2008 The NetBSD Foundation, Inc. @@ -73,7 +73,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: kern_ksyms.c,v 1.93 2021/06/02 08:46:16 riastradh Exp $"); +__KERNEL_RCSID(0, "$NetBSD: kern_ksyms.c,v 1.94 2021/06/02 15:43:33 rin Exp $"); #if defined(_KERNEL) && defined(_KERNEL_OPT) #include "opt_copy_symtab.h" @@ -1087,7 +1087,7 @@ ksymsread(dev_t dev, struct uio *uio, in */ filepos = sizeof(struct ksyms_hdr); for (st = TAILQ_FIRST(&ksyms_symtabs); -st != ksyms_last_snapshot; +st != TAILQ_NEXT(ksyms_last_snapshot, sd_queue); st = TAILQ_NEXT(st, sd_queue)) { if (__predict_false(st->sd_gone)) continue; @@ -1109,7 +1109,7 @@ ksymsread(dev_t dev, struct uio *uio, in KASSERT(filepos <= sizeof(struct ksyms_hdr) + ksyms_hdr.kh_shdr[SYMTAB].sh_size); for (st = TAILQ_FIRST(&ksyms_symtabs); -st != ksyms_last_snapshot; +st != TAILQ_NEXT(ksyms_last_snapshot, sd_queue); st = TAILQ_NEXT(st, sd_queue)) { if (__predict_false(st->sd_gone)) continue;
Re: CVS commit: src/sys/kern
> Date: Wed, 2 Jun 2021 17:33:39 +0900 > From: Rin Okuyama > > On 2021/06/02 6:11, Taylor R Campbell wrote: > > - KASSERT(filepos <= sizeof(struct ksyms_hdr) + > > + KASSERT(filepos == sizeof(struct ksyms_hdr) + > > ksyms_hdr.kh_shdr[SYMTAB].sh_size); > ... > > This KASSERT fires at least for arm and aarch64, with savecore or ``nm > /dev/ksyms'': Oops. I reverted that change for now, will investigate later.
Re: CVS commit: src/sys/kern
Hi, On 2021/06/02 6:11, Taylor R Campbell wrote: Module Name:src Committed By: riastradh Date: Tue Jun 1 21:11:52 UTC 2021 Modified Files: src/sys/kern: kern_ksyms.c Log Message: ksyms(4): Don't skip symbol tables that are soon to be freed. They will not actually be freed until /dev/ksyms is closed, so continued access to them remains kosher. To generate a diff of this commit: cvs rdiff -u -r1.91 -r1.92 src/sys/kern/kern_ksyms.c ... Index: src/sys/kern/kern_ksyms.c diff -u src/sys/kern/kern_ksyms.c:1.91 src/sys/kern/kern_ksyms.c:1.92 --- src/sys/kern/kern_ksyms.c:1.91 Tue Jun 1 21:11:07 2021 +++ src/sys/kern/kern_ksyms.c Tue Jun 1 21:11:52 2021 ... - KASSERT(filepos <= sizeof(struct ksyms_hdr) + + KASSERT(filepos == sizeof(struct ksyms_hdr) + ksyms_hdr.kh_shdr[SYMTAB].sh_size); ... This KASSERT fires at least for arm and aarch64, with savecore or ``nm /dev/ksyms'': $ nm /dev/ksyms [ 1501.3538912] panic: kernel diagnostic assertion "filepos == sizeof(struct ksyms_hdr) + ksyms_hdr.kh_shdr[SYMTAB].sh_size" failed: file "../../../../kern/kern_ksyms.c", line 1107 [ 1501.3701109] cpu0: Begin traceback... [ 1501.3701109] 0xc9c2ecfc: netbsd:db_panic+0xc [ 1501.3806387] 0xc9c2ed14: netbsd:vpanic+0xc4 [ 1501.3806387] 0xc9c2ed2c: netbsd:kern_assert+0x3c [ 1501.3906766] 0xc9c2ed6c: netbsd:ksymsread+0x208 [ 1501.4006964] 0xc9c2ee1c: netbsd:spec_read+0xbc [ 1501.4112302] 0xc9c2ee44: netbsd:VOP_READ+0x58 [ 1501.4112302] 0xc9c2ee6c: netbsd:vn_read+0x78 [ 1501.4215160] 0xc9c2eebc: netbsd:dofileread+0x80 [ 1501.4315439] 0xc9c2eeec: netbsd:sys_read+0x50 [ 1501.4426847] 0xc9c2efac: netbsd:syscall+0x188 [ 1501.4426847] cpu0: End traceback... Stopped in pid 11825.11825 (nm) at netbsd:cpu_Debugger+0x4:bx r14 db> Can you please take a look? Thanks, rin
Re: CVS commit: src/sys/kern (kern_event.c)
On Fri, 22 Jan 2021, Paul Goyette wrote: On Thu, 21 Jan 2021, Paul Goyette wrote: Ooopppsss ignore me - looks like this was already fixed and my update missed it. I'll retry. OK, I built and installed a new kernel+userland. Most everything works, and syslogd seems to work fine (at least, it no longer panics during startup). HOWEVER, firefox seems to be badly broken. Attempting to open certain pages results in never-ending-hang, and nothing ever gets rendered. I can use the Stop-Reloading "X" button, and the "oscillating dot" load indicator stops oscillating, but nothing ever happens. At that point, the tab is hung and cannot load any other page, not even pages that loaded successfully previously! I _can_ delete the tab, and opening a new tab works. Some of the "failing" pages are: airnow.gov gmail.com www.prudential.com/login www.myaccountviewonline.com/AccountView/Logon Slight correction: above I said "nothing ever happens" but while I've been composing this Email a couple of the above pages seem to have made some progress (although none of them have finished and stopped the "oscillating dot"). So "ever" is at least 5 minutes or longer ... :) I don't know if the kern_event.c changes are responsible, but I haven't seen anything else recently. I reverted kern_event.c to rev 1.110 and firefox behaves correctly. So it's pretty fair bet that the subsequent kern_event.c changes are the reason for the breakage. PR kern/55946 has been filed. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern (kern_event.c)
On Thu, 21 Jan 2021, Paul Goyette wrote: Ooopppsss ignore me - looks like this was already fixed and my update missed it. I'll retry. OK, I built and installed a new kernel+userland. Most everything works, and syslogd seems to work fine (at least, it no longer panics during startup). HOWEVER, firefox seems to be badly broken. Attempting to open certain pages results in never-ending-hang, and nothing ever gets rendered. I can use the Stop-Reloading "X" button, and the "oscillating dot" load indicator stops oscillating, but nothing ever happens. At that point, the tab is hung and cannot load any other page, not even pages that loaded successfully previously! I _can_ delete the tab, and opening a new tab works. Some of the "failing" pages are: airnow.gov gmail.com www.prudential.com/login www.myaccountviewonline.com/AccountView/Logon Slight correction: above I said "nothing ever happens" but while I've been composing this Email a couple of the above pages seem to have made some progress (although none of them have finished and stopped the "oscillating dot"). So "ever" is at least 5 minutes or longer ... :) I don't know if the kern_event.c changes are responsible, but I haven't seen anything else recently. FWIW, I'm running firefox 83.0 from pkgsrc, around 2020-12-08 On Thu, 21 Jan 2021, Paul Goyette wrote: This change seems to break everything! As soon as I try to start syslogd I hit the panic() that you added [ 28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 kq->kq_count(1) != count(0), nmarker=1 [ 28.0253983] cpu0: Begin traceback... [ 28.0253983] vpanic() at netbsd:vpanic+0x156 [ 28.0253983] snprintf() at netbsd:snprintf [ 28.0253983] kqueue_check() at netbsd:kqueue_check+0x183 [ 28.0253983] kevent1() at netbsd:kevent1+0x49f [ 28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33 [ 28.0253983] syscall() at netbsd:syscall+0x23e [ 28.0253983] --- syscall (number 435) --- [ 28.0253983] netbsd:syscall+0x23e: [ 28.0253983] cpu0: End traceback... [ 28.0253983] fatal breakpoint trap in supervisor mode [ 28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 0x202 cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50 [ 28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 0xa809281e72c0 Stopped in pid 1352.1352 (syslogd) at netbsd:breakpoint+0x5: leave I have a full crash dump if you need any further info. Module Name:src Committed By: jdolecek Date: Thu Jan 21 18:09:23 UTC 2021 Modified Files: src/sys/kern: kern_event.c Log Message: adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly normal to have kq_count bigger than number of the linked entries on the kqueue PR kern/50094, problem pointed out by Chuck Silvers To generate a diff of this commit: cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+ ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+ ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern (kern_event.c)
Ooopppsss ignore me - looks like this was already fixed and my update missed it. I'll retry. On Thu, 21 Jan 2021, Paul Goyette wrote: This change seems to break everything! As soon as I try to start syslogd I hit the panic() that you added [ 28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 kq->kq_count(1) != count(0), nmarker=1 [ 28.0253983] cpu0: Begin traceback... [ 28.0253983] vpanic() at netbsd:vpanic+0x156 [ 28.0253983] snprintf() at netbsd:snprintf [ 28.0253983] kqueue_check() at netbsd:kqueue_check+0x183 [ 28.0253983] kevent1() at netbsd:kevent1+0x49f [ 28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33 [ 28.0253983] syscall() at netbsd:syscall+0x23e [ 28.0253983] --- syscall (number 435) --- [ 28.0253983] netbsd:syscall+0x23e: [ 28.0253983] cpu0: End traceback... [ 28.0253983] fatal breakpoint trap in supervisor mode [ 28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 0x202 cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50 [ 28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 0xa809281e72c0 Stopped in pid 1352.1352 (syslogd) at netbsd:breakpoint+0x5: leave I have a full crash dump if you need any further info. Module Name:src Committed By: jdolecek Date: Thu Jan 21 18:09:23 UTC 2021 Modified Files: src/sys/kern: kern_event.c Log Message: adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly normal to have kq_count bigger than number of the linked entries on the kqueue PR kern/50094, problem pointed out by Chuck Silvers To generate a diff of this commit: cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+ ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern (kern_event.c)
This change seems to break everything! As soon as I try to start syslogd I hit the panic() that you added [ 28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 kq->kq_count(1) != count(0), nmarker=1 [ 28.0253983] cpu0: Begin traceback... [ 28.0253983] vpanic() at netbsd:vpanic+0x156 [ 28.0253983] snprintf() at netbsd:snprintf [ 28.0253983] kqueue_check() at netbsd:kqueue_check+0x183 [ 28.0253983] kevent1() at netbsd:kevent1+0x49f [ 28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33 [ 28.0253983] syscall() at netbsd:syscall+0x23e [ 28.0253983] --- syscall (number 435) --- [ 28.0253983] netbsd:syscall+0x23e: [ 28.0253983] cpu0: End traceback... [ 28.0253983] fatal breakpoint trap in supervisor mode [ 28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 0x202 cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50 [ 28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 0xa809281e72c0 Stopped in pid 1352.1352 (syslogd) at netbsd:breakpoint+0x5: leave I have a full crash dump if you need any further info. Module Name:src Committed By: jdolecek Date: Thu Jan 21 18:09:23 UTC 2021 Modified Files: src/sys/kern: kern_event.c Log Message: adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly normal to have kq_count bigger than number of the linked entries on the kqueue PR kern/50094, problem pointed out by Chuck Silvers To generate a diff of this commit: cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
I believe it's this change that's made my vbox image panic at the drop of a hat; while it occasionally panics before it even hits single user, it also consistently panics when starting syslogd (even from single-user): panic: kqueue_scan,1491: kq=0xc779aeff6dc0 kq->kq_count(1) != count(0), nmarker=1 and the call chain is syscall -> sys___kevent50 -> kevent1 -> kqueue_check it also causes a panic while trying to dump more than half the time, but I have managed to get an image or two. ("panic: atastart: channel 0 busy, xfer not possible", if you're curious.) On Wed, Jan 20, 2021 at 09:39:09PM +, Jaromir Dolecek wrote: > Module Name: src > Committed By: jdolecek > Date: Wed Jan 20 21:39:09 UTC 2021 > > Modified Files: > src/sys/kern: kern_event.c > > Log Message: > fix a race in kqueue_scan() - when multiple threads check the same > kqueue, it could happen other thread seen empty kqueue while kevent > was being checked for re-firing and re-queued > > make sure to keep retrying if there are outstanding kevents even > if no kevent is found on first pass through the queue, and only > drop the KN_QUEUED flag and kq_count when actually completely done > with the kevent > > change is inspired by the FreeBSD in-flux handling, but without > introducing the reference counting > > PR kern/50094 by Christof Meerwald > > > To generate a diff of this commit: > cvs rdiff -u -r1.110 -r1.111 src/sys/kern/kern_event.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. >
Re: CVS commit: src/sys/kern
On 2020/11/05 2:45, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: On 2020/11/04 22:52, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. If the module is built-in (``options PTRACE'' selected in the config file), then the module will already have been initialized. If the module is not built-in, then a privileged user will need to modload(8) the module. Prior to this change, the built-in ptrace_common module was calling the ptrace module's init/fini routine. Quite likely ptrace_common was built-in (due to inclusion of file-system PROCFS), so the init was handled during init_main(). This change ensures that the ptrace init/fini routines are called ONLY if the ptrace module itself (not the ptrace_common) routine is built-in. Please check to make sure that ``options PTRACE'' is included in your kernel config. Yes: $ config -x netbsd.gdb | grep PTRACE ###> options PTRACE # Include ptrace(2) syscall ###> options PTRACE_HOOKS # Include ptrace hooks The problem is that ptrace_{init,fini}() are not called from ptrace_modcmd(): https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184 184 static int 185 ptrace_modcmd(modcmd_t cmd, void *arg) 186 { 187 int error; 188 189 switch (cmd) { 190 case MODULE_CMD_INIT: 191 error = syscall_establish(&emul_netbsd, ptrace_syscalls); 192 break; 193 case MODULE_CMD_FINI: 194 error = syscall_disestablish(&emul_netbsd, ptrace_syscalls); 195 break; 196 default: 197 error = ENOTTY; 198 break; 199 } 200 return error; 201 } Yes that would be a problem. Can you easily confirm that ktrace(2) is unusable for non-privileged users on 9.99.75 kernel: $ gdb echo GNU gdb (GDB) 8.3 ... (gdb) b main Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58. (gdb) r Starting program: /bin/echo warning: Could not trace the inferior process. Error: warning: ptrace: Operation not permitted terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR' [1] Abort trap (core dumped) gdb echo Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to sys_ptrace.c, IMO. I have some prior obligations, so I won't be able to look at this until this evening. Thanks for the detailed analysis. Fixed. Thanks! rin
Re: CVS commit: src/sys/kern
OK, this is my mistake. When I change the calls in the ptrace_common modcmd, I should also have renamed the functions (including their entries in sys/ptrace.h). I will commit this shortly, before I leave. Thanks for the "recipe" for reproducing the problem - I will try it later when I return. On Wed, 4 Nov 2020, Rin Okuyama wrote: On 2020/11/04 22:52, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. If the module is built-in (``options PTRACE'' selected in the config file), then the module will already have been initialized. If the module is not built-in, then a privileged user will need to modload(8) the module. Prior to this change, the built-in ptrace_common module was calling the ptrace module's init/fini routine. Quite likely ptrace_common was built-in (due to inclusion of file-system PROCFS), so the init was handled during init_main(). This change ensures that the ptrace init/fini routines are called ONLY if the ptrace module itself (not the ptrace_common) routine is built-in. Please check to make sure that ``options PTRACE'' is included in your kernel config. Yes: $ config -x netbsd.gdb | grep PTRACE ###> optionsPTRACE # Include ptrace(2) syscall ###> optionsPTRACE_HOOKS# Include ptrace hooks The problem is that ptrace_{init,fini}() are not called from ptrace_modcmd(): https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184 184 static int 185 ptrace_modcmd(modcmd_t cmd, void *arg) 186 { 187 int error; 188 189 switch (cmd) { 190 case MODULE_CMD_INIT: 191 error = syscall_establish(&emul_netbsd, ptrace_syscalls); 192 break; 193 case MODULE_CMD_FINI: 194 error = syscall_disestablish(&emul_netbsd, ptrace_syscalls); 195 break; 196 default: 197 error = ENOTTY; 198 break; 199 } 200 return error; 201 } Can you easily confirm that ktrace(2) is unusable for non-privileged users on 9.99.75 kernel: $ gdb echo GNU gdb (GDB) 8.3 ... (gdb) b main Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58. (gdb) r Starting program: /bin/echo warning: Could not trace the inferior process. Error: warning: ptrace: Operation not permitted terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR' [1] Abort trap (core dumped) gdb echo Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to sys_ptrace.c, IMO. Thanks, rin !DSPAM:5fa2b869233318156490363! ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
On Wed, 4 Nov 2020, Rin Okuyama wrote: On 2020/11/04 22:52, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. If the module is built-in (``options PTRACE'' selected in the config file), then the module will already have been initialized. If the module is not built-in, then a privileged user will need to modload(8) the module. Prior to this change, the built-in ptrace_common module was calling the ptrace module's init/fini routine. Quite likely ptrace_common was built-in (due to inclusion of file-system PROCFS), so the init was handled during init_main(). This change ensures that the ptrace init/fini routines are called ONLY if the ptrace module itself (not the ptrace_common) routine is built-in. Please check to make sure that ``options PTRACE'' is included in your kernel config. Yes: $ config -x netbsd.gdb | grep PTRACE ###> optionsPTRACE # Include ptrace(2) syscall ###> optionsPTRACE_HOOKS# Include ptrace hooks The problem is that ptrace_{init,fini}() are not called from ptrace_modcmd(): https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184 184 static int 185 ptrace_modcmd(modcmd_t cmd, void *arg) 186 { 187 int error; 188 189 switch (cmd) { 190 case MODULE_CMD_INIT: 191 error = syscall_establish(&emul_netbsd, ptrace_syscalls); 192 break; 193 case MODULE_CMD_FINI: 194 error = syscall_disestablish(&emul_netbsd, ptrace_syscalls); 195 break; 196 default: 197 error = ENOTTY; 198 break; 199 } 200 return error; 201 } Yes that would be a problem. Can you easily confirm that ktrace(2) is unusable for non-privileged users on 9.99.75 kernel: $ gdb echo GNU gdb (GDB) 8.3 ... (gdb) b main Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58. (gdb) r Starting program: /bin/echo warning: Could not trace the inferior process. Error: warning: ptrace: Operation not permitted terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR' [1] Abort trap (core dumped) gdb echo Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to sys_ptrace.c, IMO. I have some prior obligations, so I won't be able to look at this until this evening. Thanks for the detailed analysis. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
On 2020/11/04 22:52, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. If the module is built-in (``options PTRACE'' selected in the config file), then the module will already have been initialized. If the module is not built-in, then a privileged user will need to modload(8) the module. Prior to this change, the built-in ptrace_common module was calling the ptrace module's init/fini routine. Quite likely ptrace_common was built-in (due to inclusion of file-system PROCFS), so the init was handled during init_main(). This change ensures that the ptrace init/fini routines are called ONLY if the ptrace module itself (not the ptrace_common) routine is built-in. Please check to make sure that ``options PTRACE'' is included in your kernel config. Yes: $ config -x netbsd.gdb | grep PTRACE ###> optionsPTRACE # Include ptrace(2) syscall ###> optionsPTRACE_HOOKS# Include ptrace hooks The problem is that ptrace_{init,fini}() are not called from ptrace_modcmd(): https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184 184 static int 185 ptrace_modcmd(modcmd_t cmd, void *arg) 186 { 187 int error; 188 189 switch (cmd) { 190 case MODULE_CMD_INIT: 191 error = syscall_establish(&emul_netbsd, ptrace_syscalls); 192 break; 193 case MODULE_CMD_FINI: 194 error = syscall_disestablish(&emul_netbsd, ptrace_syscalls); 195 break; 196 default: 197 error = ENOTTY; 198 break; 199 } 200 return error; 201 } Can you easily confirm that ktrace(2) is unusable for non-privileged users on 9.99.75 kernel: $ gdb echo GNU gdb (GDB) 8.3 ... (gdb) b main Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58. (gdb) r Starting program: /bin/echo warning: Could not trace the inferior process. Error: warning: ptrace: Operation not permitted terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR' [1] Abort trap (core dumped) gdb echo Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to sys_ptrace.c, IMO. Thanks, rin
Re: CVS commit: src/sys/kern
On Wed, 4 Nov 2020, Rin Okuyama wrote: ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. If the module is built-in (``options PTRACE'' selected in the config file), then the module will already have been initialized. If the module is not built-in, then a privileged user will need to modload(8) the module. Prior to this change, the built-in ptrace_common module was calling the ptrace module's init/fini routine. Quite likely ptrace_common was built-in (due to inclusion of file-system PROCFS), so the init was handled during init_main(). This change ensures that the ptrace init/fini routines are called ONLY if the ptrace module itself (not the ptrace_common) routine is built-in. Please check to make sure that ``options PTRACE'' is included in your kernel config. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
On 2020/11/04 22:31, Paul Goyette wrote: On Wed, 4 Nov 2020, Rin Okuyama wrote: Hi, On 2020/10/26 0:55, Paul Goyette wrote: Module Name: src Committed By: pgoyette Date: Sun Oct 25 15:55:37 UTC 2020 Modified Files: src/sys/kern: sys_ptrace_common.c Log Message: ptrace_Common is a module unto itself. Don't use the ptrace module's init/fini routines. To generate a diff of this commit: cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. This commit makes ptrace(2) unusable for non-privileged users; ptrace_common_{init,fini}() should be called from somewhere. ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called at all since this commit, which forbids ptrace(2) for non-root users. Thanks, rin
Re: CVS commit: src/sys/kern
On Wed, 4 Nov 2020, Rin Okuyama wrote: Hi, On 2020/10/26 0:55, Paul Goyette wrote: Module Name:src Committed By: pgoyette Date: Sun Oct 25 15:55:37 UTC 2020 Modified Files: src/sys/kern: sys_ptrace_common.c Log Message: ptrace_Common is a module unto itself. Don't use the ptrace module's init/fini routines. To generate a diff of this commit: cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. This commit makes ptrace(2) unusable for non-privileged users; ptrace_common_{init,fini}() should be called from somewhere. ptrace_common_{init,fini} are called from the ptrace_common module's modcmd routine in kern/sys_ptrace_common.c. The modcmd routine in turn is called at module initialization time. In the case of a built-in module, it will be called by module_init via init_main; if the module is loaded (or auto-loaded) module_load will call the modcmd routine. The module will be built-in if either ``options PTRACE'' or ``file- system PROCFS'' is set in the kernel configuration file. ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
Hi, On 2020/10/26 0:55, Paul Goyette wrote: Module Name:src Committed By: pgoyette Date: Sun Oct 25 15:55:37 UTC 2020 Modified Files: src/sys/kern: sys_ptrace_common.c Log Message: ptrace_Common is a module unto itself. Don't use the ptrace module's init/fini routines. To generate a diff of this commit: cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. This commit makes ptrace(2) unusable for non-privileged users; ptrace_common_{init,fini}() should be called from somewhere. Thanks, rin
Re: CVS commit: src/sys/kern
In article <20201019144701.a6d3bf...@cvs.netbsd.org>, Kamil Rytarowski wrote: >-=-=-=-=-=- > >Module Name: src >Committed By: kamil >Date: Mon Oct 19 14:47:01 UTC 2020 > >Modified Files: > src/sys/kern: sys_ptrace.c sys_ptrace_common.c > >Log Message: >Remove obsolete references to 4.4BSD papers This removes the ptrace module code. Please diff before committing! christos
re: CVS commit: src/sys/kern
On Mon, 07 Sep 2020 at 20:47:25 +1000, matthew green wrote: "Jason R Thorpe" writes: Module Name:src Committed By: thorpej Date: Mon Sep 7 03:50:41 UTC 2020 Modified Files: src/sys/kern: files.kern init_main.c Log Message: Add the ability to set an alternate cnmagic in the kernel config file, e.g.: optionsCNMAGIC="\"+\"" thanks! i need this for my er4 that some how does do break properly.. options(4) update? I just added an entry for this to options(4). The bare bones, anyway. (It seems that DDB_BREAK_CHAR is only used in one place now, that being src/sys/arch/arm/sa11x0/sa11x0_com.c. I'm not sure if another detail to contextualize DDB_BREAK_CHAR vs. CNMAGIC would be warranted?) Dave
re: CVS commit: src/sys/kern
"Jason R Thorpe" writes: > Module Name: src > Committed By: thorpej > Date: Mon Sep 7 03:50:41 UTC 2020 > > Modified Files: > src/sys/kern: files.kern init_main.c > > Log Message: > Add the ability to set an alternate cnmagic in the kernel config > file, e.g.: > > optionsCNMAGIC="\"+\"" thanks! i need this for my er4 that some how does do break properly.. options(4) update?
Re: CVS commit: src/sys/kern
On 02.08.2020 17:50, Taylor R Campbell wrote: >> Date: Sun, 2 Aug 2020 17:35:06 +0200 >> From: Kamil Rytarowski >> >> On 02.08.2020 16:44, Taylor R Campbell wrote: Date: Sun, 2 Aug 2020 16:04:15 +0200 From: Kamil Rytarowski On 02.08.2020 15:57, Taylor R Campbell wrote: > But it sounds like the original motivation is that it triggered > -Wvla...which frankly strikes me as a compiler bug since there's > obviously no actual VLA created in sizeof; as far as I can tell > there's no semantic difference between sizeof(device_t[n]) and > sizeof(device_t) * n. This is not true: >>> >>> Which part of what I said are you claiming is not true, and what are >>> you illustrating with the example program below? >> >> Calling it a compiler bug. >> >> Clang behaves the same way. >> >> $ clang -Wvla test.c >> test.c:6:37: warning: variable length array used [-Wvla] >> printf("sizeof = %zu\n", sizeof(int[argc])); >>^ >> 1 warning generated. >> >> Creating VLA is not needed for using it as an intermediate. In practice >> in most/all cases it is optimized and actual VLA is not allocated. > > This is not a matter of optimization in practice. It's absolutely not > an `intermediate' at all -- there is no VLA created, period, any more > than sizeof(malloc(1)) causes any actual call to malloc or sizeof(*p) > causes any dereference of a pointer. > > This makes -Wvla less useful as a tool, because apparently it flags > code that unquestionably does not have any bad effects of VLAs. This > happens because -Wvla is dumb -- it just looks for the syntax, not for > the semantics of creating VLAs. That's why I call it a bug -- it > raises a false positive that makes it less useful. > Compilers warns about VLA using ("variable length array used"), not creating. There is no distinct compiler warning to discriminate VLA creating from usage. Anyway, code just using VLA is not that frequent, avoiding it is rather simple and VLA can be avoided altogether (at least for non-external). signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
> Date: Sun, 2 Aug 2020 17:35:06 +0200 > From: Kamil Rytarowski > > On 02.08.2020 16:44, Taylor R Campbell wrote: > >> Date: Sun, 2 Aug 2020 16:04:15 +0200 > >> From: Kamil Rytarowski > >> > >> On 02.08.2020 15:57, Taylor R Campbell wrote: > >>> But it sounds like the original motivation is that it triggered > >>> -Wvla...which frankly strikes me as a compiler bug since there's > >>> obviously no actual VLA created in sizeof; as far as I can tell > >>> there's no semantic difference between sizeof(device_t[n]) and > >>> sizeof(device_t) * n. > >> > >> This is not true: > > > > Which part of what I said are you claiming is not true, and what are > > you illustrating with the example program below? > > Calling it a compiler bug. > > Clang behaves the same way. > > $ clang -Wvla test.c > test.c:6:37: warning: variable length array used [-Wvla] > printf("sizeof = %zu\n", sizeof(int[argc])); >^ > 1 warning generated. > > Creating VLA is not needed for using it as an intermediate. In practice > in most/all cases it is optimized and actual VLA is not allocated. This is not a matter of optimization in practice. It's absolutely not an `intermediate' at all -- there is no VLA created, period, any more than sizeof(malloc(1)) causes any actual call to malloc or sizeof(*p) causes any dereference of a pointer. This makes -Wvla less useful as a tool, because apparently it flags code that unquestionably does not have any bad effects of VLAs. This happens because -Wvla is dumb -- it just looks for the syntax, not for the semantics of creating VLAs. That's why I call it a bug -- it raises a false positive that makes it less useful.
Re: CVS commit: src/sys/kern
On 02.08.2020 16:44, Taylor R Campbell wrote: >> Date: Sun, 2 Aug 2020 16:04:15 +0200 >> From: Kamil Rytarowski >> >> On 02.08.2020 15:57, Taylor R Campbell wrote: >>> But it sounds like the original motivation is that it triggered >>> -Wvla...which frankly strikes me as a compiler bug since there's >>> obviously no actual VLA created in sizeof; as far as I can tell >>> there's no semantic difference between sizeof(device_t[n]) and >>> sizeof(device_t) * n. >> >> This is not true: > > Which part of what I said are you claiming is not true, and what are > you illustrating with the example program below? > Calling it a compiler bug. Clang behaves the same way. $ clang -Wvla test.c test.c:6:37: warning: variable length array used [-Wvla] printf("sizeof = %zu\n", sizeof(int[argc])); ^ 1 warning generated. Creating VLA is not needed for using it as an intermediate. In practice in most/all cases it is optimized and actual VLA is not allocated. > It seems to illustrate that sizeof(int[argc]) does exactly what one > would expect it to do -- return the size of an argc-length array of > ints, just like sizeof(int) * argc does. > > The result is the same and I find the change as beneficial. > >> #include >> >> int >> main(int argc, char **argv) >> { >> printf("sizeof = %zu\n", sizeof(int[argc])); >> return 0; >> } >> >> $ ./a.out >> >> sizeof = 4 >> $ ./a.out 12 3 >> sizeof = 12 >> $ ./a.out 12 3 45 6 >> sizeof = 20 signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
Le dim. 2 août 2020 à 15:57, Taylor R Campbell a écrit : > Why does it improve readability? Certainly using cringe language features does not help readability. > What else does -Wvla choke on in src/sys? Some drm drivers, and uvm, particularly uvm_bio.c. I'd like to work towards enabling -Wvla in kernel (i.e. remove all VLAs, same as Linux kernel did in 2018), but I have some other stuff I want to finish before I pick a new project. Jaromir
Re: CVS commit: src/sys/kern
> Date: Sun, 2 Aug 2020 16:04:15 +0200 > From: Kamil Rytarowski > > On 02.08.2020 15:57, Taylor R Campbell wrote: > > But it sounds like the original motivation is that it triggered > > -Wvla...which frankly strikes me as a compiler bug since there's > > obviously no actual VLA created in sizeof; as far as I can tell > > there's no semantic difference between sizeof(device_t[n]) and > > sizeof(device_t) * n. > > This is not true: Which part of what I said are you claiming is not true, and what are you illustrating with the example program below? It seems to illustrate that sizeof(int[argc]) does exactly what one would expect it to do -- return the size of an argc-length array of ints, just like sizeof(int) * argc does. > #include > > int > main(int argc, char **argv) > { > printf("sizeof = %zu\n", sizeof(int[argc])); > return 0; > } > > $ ./a.out > > sizeof = 4 > $ ./a.out 12 3 > sizeof = 12 > $ ./a.out 12 3 45 6 > sizeof = 20
Re: CVS commit: src/sys/kern
On Sun, Aug 02, 2020 at 01:57:11PM +, Taylor R Campbell wrote: > But it sounds like the original motivation is that it triggered > -Wvla...which frankly strikes me as a compiler bug since there's > obviously no actual VLA created in sizeof; as far as I can tell > there's no semantic difference between sizeof(device_t[n]) and > sizeof(device_t) * n. Yes, it seems strange to trigger a VLA warning in code that it is required to be compile-time evaluated. Joerg
Re: CVS commit: src/sys/kern
On 02.08.2020 16:25, Paul Goyette wrote: > On Sun, 2 Aug 2020, Kamil Rytarowski wrote: > >> On 02.08.2020 15:57, Taylor R Campbell wrote: >>> But it sounds like the original motivation is that it triggered >>> -Wvla...which frankly strikes me as a compiler bug since there's >>> obviously no actual VLA created in sizeof; as far as I can tell >>> there's no semantic difference between sizeof(device_t[n]) and >>> sizeof(device_t) * n. >>> >> >> This is not true: >> >> #include >> >> int >> main(int argc, char **argv) >> { >> printf("sizeof = %zu\n", sizeof(int[argc])); >> return 0; >> } >> >> $ ./a.out >> >> sizeof = 4 >> $ ./a.out 12 3 >> sizeof = 12 >> $ ./a.out 12 3 45 6 >> sizeof = 20 > > Modifying your example slightly, I print both variations: > > #include > > int > main(int argc, char **argv) > { > printf("sizeof = %zu\t%zu\n", sizeof(int[argc]), sizeof(int) * argc); > return 0; > } > speedy:paul {653} ./a.out > sizeof = 4 4 > speedy:paul {654} ./a.out 12 3 > sizeof = 12 12 > speedy:paul {655} ./a.out 12 3 45 6 > sizeof = 20 20 > > > Looks the same to me! > The result is the same, but one uses VLA (at least formally), other not. > > ++--+---+ > | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | > | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | > | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | > ++--+---+ signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
On Sun, 2 Aug 2020, Kamil Rytarowski wrote: On 02.08.2020 15:57, Taylor R Campbell wrote: But it sounds like the original motivation is that it triggered -Wvla...which frankly strikes me as a compiler bug since there's obviously no actual VLA created in sizeof; as far as I can tell there's no semantic difference between sizeof(device_t[n]) and sizeof(device_t) * n. This is not true: #include int main(int argc, char **argv) { printf("sizeof = %zu\n", sizeof(int[argc])); return 0; } $ ./a.out sizeof = 4 $ ./a.out 12 3 sizeof = 12 $ ./a.out 12 3 45 6 sizeof = 20 Modifying your example slightly, I print both variations: #include int main(int argc, char **argv) { printf("sizeof = %zu\t%zu\n", sizeof(int[argc]), sizeof(int) * argc); return 0; } speedy:paul {653} ./a.out sizeof = 4 4 speedy:paul {654} ./a.out 12 3 sizeof = 12 12 speedy:paul {655} ./a.out 12 3 45 6 sizeof = 20 20 Looks the same to me! ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: CVS commit: src/sys/kern
On 02.08.2020 15:57, Taylor R Campbell wrote: > But it sounds like the original motivation is that it triggered > -Wvla...which frankly strikes me as a compiler bug since there's > obviously no actual VLA created in sizeof; as far as I can tell > there's no semantic difference between sizeof(device_t[n]) and > sizeof(device_t) * n. > This is not true: #include int main(int argc, char **argv) { printf("sizeof = %zu\n", sizeof(int[argc])); return 0; } $ ./a.out sizeof = 4 $ ./a.out 12 3 sizeof = 12 $ ./a.out 12 3 45 6 sizeof = 20 signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
> Date: Sun, 2 Aug 2020 10:47:21 +0200 > From: Jarom�r Dole ek > > Readability first and foremost in this case. > > I was exploring if I can disable VLAs for the kernel altogether, this > can't be done for now. Nevertheless, this change looked like it would > be useful to make anyway. Why does it improve readability? Surely if we want the size of an array of device_t of length n it's more to the point to say sizeof(device_t[n]) directly than to talk about multiplying sizeof(device_t) by n. If that were the only question I think the change makes readability worse, personally! But it sounds like the original motivation is that it triggered -Wvla...which frankly strikes me as a compiler bug since there's obviously no actual VLA created in sizeof; as far as I can tell there's no semantic difference between sizeof(device_t[n]) and sizeof(device_t) * n. What else does -Wvla choke on in src/sys?
Re: CVS commit: src/sys/kern
Readability first and foremost in this case. I was exploring if I can disable VLAs for the kernel altogether, this can't be done for now. Nevertheless, this change looked like it would be useful to make anyway. Le dim. 2 août 2020 à 01:15, Taylor R Campbell a écrit : > > > Module Name:src > > Committed By: jdolecek > > Date: Sat Aug 1 11:18:26 UTC 2020 > > > > Modified Files: > > src/sys/kern: subr_autoconf.c > > > > Log Message: > > avoid VLA for the sizeof() calculations > > Why?
Re: CVS commit: src/sys/kern
> Module Name:src > Committed By: jdolecek > Date: Sat Aug 1 11:18:26 UTC 2020 > > Modified Files: > src/sys/kern: subr_autoconf.c > > Log Message: > avoid VLA for the sizeof() calculations Why?
Re: [sctp fix] Re: CVS commit: src/sys/kern
Le 28/04/2020 à 09:16, Luke Mewburn a écrit : On 20-04-26 18:15, Maxime Villard wrote: | - There was no demonstrated use-case justifying importing it. In addition, |major OSes like Windows and macOS do not implement SCTP. There just is no |demand for SCTP on the market; and on NetBSD, proportionally even less. SCTP is used in mobile telco environments; the control plane for 3G and 4G networks uses SCTP (or TCP as an option, but mostly SCTP). That may be true. I am mostly aware of the multimedia use cases. Le 01/05/2020 à 18:46, m...@netbsd.org a écrit : We can setup an equivalence: put as much effort into the SCTP removal proposal as there was for the SCTP introduction proposal. Since SCTP was just dropped in src without any prior discussion, I don't think we need any discussion for removing it. That would be fair, yes. Le 02/05/2020 à 13:55, m...@netbsd.org a écrit : I'm sorry for picking on SCTP in particular. Apparently it was added because it was listed in src/doc/roadmaps.networking (and it's still listed there). But this doesn't address your own point, does it. The one-liner in src/doc/roadmaps/networking gives no explanation on why we would want SCTP in the first place. It doesn't even say where the code was imported from. This brings the question of who, exactly, made this list. Several of the items on this wanted list are actively *not* wanted, as Christos noted in five of his "Comment[christos]". At this point there is no doubt that SCTP in the NetBSD kernel has no future. The only viable option I see is usrsctp: https://github.com/sctplab/usrsctp A userland version of the code, but portable, and actively maintained. While here, notice the crazy buffer overflows that were fixed in it and are still present in the NetBSD SCTP kernel code... Adds to my point, that the code is of extremely poor quality. Maxime
Re: CVS commit: src/sys/kern
On 2020/06/02 2:08, Joerg Sonnenberger wrote: On Sun, May 31, 2020 at 11:24:20PM +, Rin Okuyama wrote: Module Name:src Committed By: rin Date: Sun May 31 23:24:20 UTC 2020 Modified Files: src/sys/kern: kern_timeout.c Log Message: Stop allocating buffers dynamically in a DDB session, in order not to disturb on-going debugged state of kernel datastructures. This breaks the build with clang as it uses a declared-but-not-defined type callout_cpu. Fixed. Sorry for breakage. Thanks, rin
Re: CVS commit: src/sys/kern
On Sun, May 31, 2020 at 11:24:20PM +, Rin Okuyama wrote: > Module Name: src > Committed By: rin > Date: Sun May 31 23:24:20 UTC 2020 > > Modified Files: > src/sys/kern: kern_timeout.c > > Log Message: > Stop allocating buffers dynamically in a DDB session, in order not to > disturb on-going debugged state of kernel datastructures. This breaks the build with clang as it uses a declared-but-not-defined type callout_cpu. Joerg
re: CVS commit: src/sys/kern
"Rin Okuyama" writes: > Module Name: src > Committed By: rin > Date: Sun May 31 08:33:48 UTC 2020 > > Modified Files: > src/sys/kern: kern_timeout.c > > Log Message: > db_show_callout(): struct callout_cpu and cpu_info are too much for stack. > > XXX > DDB can be running in the interrupt context, e.g., when activated from > console. Therefore, use kmem_intr_alloc(9) instead of kmem_alloc(9). > > Frame size, e.g. for m68k, becomes: > 9212 (oops!) --> 0 please don't use allocators from ddb. inspection from ddb shouldn't go change internal datastructures and rely upon things that can break working where possible allocate a single static in the bss to use? .mrg.
Re: CVS commit: src/sys/kern
> On May 31, 2020, at 1:33 AM, Rin Okuyama wrote: > > db_show_callout(): struct callout_cpu and cpu_info are too much for stack. > > XXX > DDB can be running in the interrupt context, e.g., when activated from > console. Therefore, use kmem_intr_alloc(9) instead of kmem_alloc(9). > > Frame size, e.g. for m68k, becomes: >9212 (oops!) --> 0 > I'm not sure we want to be calling memory allocators from within ddb. I think it would be better to simply allocate that space in the .bss segment -- ddb only runs on 1 CPU at a time. -- thorpej
Re: CVS commit: src/sys/kern
On 07.05.2020 07:46, Michael van Elst wrote: > On Wed, May 06, 2020 at 12:39:21PM +0200, Kamil Rytarowski wrote: > > Hi Kamil, > >> If I revert the pipe(2) changes on top of NetBSD-current, the test is no >> longer racy again. > > I tried 9.99.60 with and without my bugfix. The test always failed > after some time with exactly that error. > > If the change really had an effect, there would be a use-after-free bug > somewhere else. > > > Greetings, > Thanks, I will investigating further. signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
On Wed, May 06, 2020 at 12:39:21PM +0200, Kamil Rytarowski wrote: Hi Kamil, > If I revert the pipe(2) changes on top of NetBSD-current, the test is no > longer racy again. I tried 9.99.60 with and without my bugfix. The test always failed after some time with exactly that error. If the change really had an effect, there would be a use-after-free bug somewhere else. Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: CVS commit: src/sys/kern
This caused regressions in t_ptrace_wait* tests and random hangs/timeouts/failures. If I revert the pipe(2) changes on top of NetBSD-current, the test is no longer racy again. http://netbsd.org/~kamil/patch-00249-pipe-revert.txt Reproducer: cd /usr/tests/lib/libc/sys ./t_ptrace_waitpid tracer_sysctl_lookup_without_duplicates It fails in a non-deterministic number of iterations: [ 126.7088900] sorry, pid 20803 was killed: orphaned traced process failed: /usr/src/tests/lib/libc/sys/t_ptrace_topology_wait.h:191: msg_read_child("tracer ready" " from parent " "parent_tracer", &parent_tracer, &msg, sizeof(msg)) == 0: Undefined error: 0 With this patch it is easier to reproduce the race: Index: t_ptrace_topology_wait.h === RCS file: /cvsroot/src/tests/lib/libc/sys/t_ptrace_topology_wait.h,v retrieving revision 1.1 diff -u -r1.1 t_ptrace_topology_wait.h --- t_ptrace_topology_wait.h5 May 2020 00:33:37 - 1.1 +++ t_ptrace_topology_wait.h6 May 2020 10:32:37 - @@ -248,7 +248,7 @@ ATF_TC(tracer_sysctl_lookup_without_duplicates); ATF_TC_HEAD(tracer_sysctl_lookup_without_duplicates, tc) { - atf_tc_set_md_var(tc, "timeout", "15"); +// atf_tc_set_md_var(tc, "timeout", "15"); atf_tc_set_md_var(tc, "descr", "Assert that await_zombie() in attach1 always finds a single " "process and no other error is reported"); @@ -269,11 +269,13 @@ start = time(NULL); while (true) { DPRINTF("Step: %lu\n", N); + if (N % 100 == 0) + printf("Step: %lu\n", N); tracer_sees_terminaton_before_the_parent_raw(true, false, false); end = time(NULL); diff = difftime(end, start); - if (diff >= 5.0) + if (diff >= 30.0) break; ++N; } Can you have a look? On 26.04.2019 19:20, Michael van Elst wrote: > Module Name: src > Committed By: mlelstv > Date: Fri Apr 26 17:20:49 UTC 2019 > > Modified Files: > src/sys/kern: sys_pipe.c > > Log Message: > Clean up pipe structure before recycling it. > > > To generate a diff of this commit: > cvs rdiff -u -r1.146 -r1.147 src/sys/kern/sys_pipe.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > > > Modified files: > > Index: src/sys/kern/sys_pipe.c > diff -u src/sys/kern/sys_pipe.c:1.146 src/sys/kern/sys_pipe.c:1.147 > --- src/sys/kern/sys_pipe.c:1.146 Sun Jun 10 17:54:51 2018 > +++ src/sys/kern/sys_pipe.c Fri Apr 26 17:20:49 2019 > @@ -1,4 +1,4 @@ > -/* $NetBSD: sys_pipe.c,v 1.146 2018/06/10 17:54:51 jdolecek Exp $ */ > +/* $NetBSD: sys_pipe.c,v 1.147 2019/04/26 17:20:49 mlelstv Exp $ */ > > /*- > * Copyright (c) 2003, 2007, 2008, 2009 The NetBSD Foundation, Inc. > @@ -68,7 +68,7 @@ > */ > > #include > -__KERNEL_RCSID(0, "$NetBSD: sys_pipe.c,v 1.146 2018/06/10 17:54:51 jdolecek > Exp $"); > +__KERNEL_RCSID(0, "$NetBSD: sys_pipe.c,v 1.147 2019/04/26 17:20:49 mlelstv > Exp $"); > > #include > #include > @@ -1331,6 +1331,8 @@ pipeclose(struct pipe *pipe) > free_resources: > pipe->pipe_pgid = 0; > pipe->pipe_state = PIPE_SIGNALR; > + pipe->pipe_peer = NULL; > + pipe->pipe_lock = NULL; > pipe_free_kmem(pipe); > if (pipe->pipe_kmem != 0) { > pool_cache_put(pipe_rd_cache, pipe); > signature.asc Description: OpenPGP digital signature
Re: [sctp fix] Re: CVS commit: src/sys/kern
On Fri, May 01, 2020 at 04:46:36PM +, m...@netbsd.org wrote: > We can setup an equivalence: put as much effort into the SCTP removal > proposal as there was for the SCTP introduction proposal. > > Since SCTP was just dropped in src without any prior discussion, I don't > think we need any discussion for removing it. I'm sorry for picking on SCTP in particular. Apparently it was added because it was listed in src/doc/roadmaps.networking (and it's still listed there).
Re: [sctp fix] Re: CVS commit: src/sys/kern
We can setup an equivalence: put as much effort into the SCTP removal proposal as there was for the SCTP introduction proposal. Since SCTP was just dropped in src without any prior discussion, I don't think we need any discussion for removing it.
Re: [sctp fix] Re: CVS commit: src/sys/kern
On 20-04-26 18:15, Maxime Villard wrote: | - There was no demonstrated use-case justifying importing it. In addition, |major OSes like Windows and macOS do not implement SCTP. There just is no |demand for SCTP on the market; and on NetBSD, proportionally even less. SCTP is used in mobile telco environments; the control plane for 3G and 4G networks uses SCTP (or TCP as an option, but mostly SCTP). NetBSD is a server OS; and could be viable in that market, which currently is mostly Linux (AFAICT), which does have a viable SCTP stack. It's understandable why macOS (a client OS) doesn't support SCTP. Besides the rest of your arguments (which may be valid), this particular one is not. regards, Luke.
Re: CVS commit: src/sys/kern
> On Apr 26, 2020, at 2:04 PM, Michael van Elst wrote: > > Log Message: > fix DIAGNOSTIC build non-DIAGNOSTIC you mean? -- thorpej
[sctp fix] Re: CVS commit: src/sys/kern
Le 26/04/2020 à 16:21, Jonathan A. Kollasch a écrit : > Module Name: src > Committed By: jakllsch > Date: Sun Apr 26 14:21:14 UTC 2020 > > Modified Files: > src/sys/kern: uipc_socket.c > > Log Message: > Implement SCTP bug fixes found by maxv@. > > Adding these seems to improve the SCTP situation. Yeah, thanks... I remember I had sent an email about these bugs, no one cared, so I just put big XXXs and left the broken thing as-is. That was more than a year ago. I remember also pointing out at some point the severe deficiencies of the SCTP code we have, with no one caring once again (not even me actually). In its current state (that is, the state it has been in ever since it was imported five years ago), our SCTP code is a near-perfect example of gigantic, buggy and useless bloat. Specifically, as far as I remember: - Last I checked, SCTP occupies, in number of lines of code, half of our IPv4 network stack. In other words, when it was imported, our kernel netinet stack suddenly doubled in size. I don't see how we will ever MP-ify all of that. - There was no demonstrated use-case justifying importing it. In addition, major OSes like Windows and macOS do not implement SCTP. There just is no demand for SCTP on the market; and on NetBSD, proportionally even less. - The code is of remarkably poor quality, with bugs in all directions, complex pieces of logic that seem rarely justified, and implementation errors like the pointer bugs you just fixed. The bloat and bugs are already evident as early as in the first twenty lines of code of the sctp_input() entry point. - IIRC the mbuf API usage is very poor (though I don't remember the specifics here; I must have kept paper notes somewhere). - It seems that no one is maintaining it? Nothing has improved over the last five years, and there are no apparent signs that this situation will ever change. There are piecemeal changes like yours and mine, to accommodate API changes and fix obvious bugs, and that's about it, as far as I can tell. At one point I hesitated about doing a big cleanup of SCTP. But I concluded that it is too structurally broken, and that fixing it is a waste of time; rewriting it would be faster and would result in a better, more functional, and easier to maintain code. Overall I think this is an example of bad policy. It's good to have support for new protocols, but at some point, we need to question whether landing 40K lines of buggy yet critical kernel code out of nowhere with no one maintaining it and little justification for the feature is a good thing to do. CC'ing core@ in case they are interested in making a useful statement for once. In all cases, this code is disabled by default, which is a good thing given its state, but results in us not giving a damn about it. Maxime
Re: CVS commit: src/sys/kern
On Mon, Apr 13, 2020 at 04:34:48PM +0100, Roy Marples wrote: > On 13/04/2020 16:31, Andrew Doran wrote: > > Hi Roy. > > > > On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote: > > > On 10/04/2020 23:34, Andrew Doran wrote: > > > > Module Name:src > > > > Committed By: ad > > > > Date: Fri Apr 10 22:34:36 UTC 2020 > > > > > > > > Modified Files: > > > > src/sys/kern: vfs_mount.c > > > > > > > > Log Message: > > > > vfs_mountroot(): don't needlessly grab a second reference to the root > > > > vnode > > > > (the kernel never chdirs) nor a lock on it. > > > > > > So the kernel chrooting to sysctl init.root is still ok? > > > > How is that accomplished? I couldn't find the code and don't recall seeing > > it. > > sysctl -w init.root=/altroot > > See the ZFS Root ramdisk: > https://nxr.netbsd.org/xref/src/distrib/common/zfsroot.rc#33 > > I tested kernel from yesterdays sources and it still works fine, so I guess > nothing was needed here. I had a look and it's init that chroots, not the kernel, so no issue. The kernel chrooting would be evil. Andrew
Re: CVS commit: src/sys/kern
On 13/04/2020 16:31, Andrew Doran wrote: Hi Roy. On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote: On 10/04/2020 23:34, Andrew Doran wrote: Module Name:src Committed By: ad Date: Fri Apr 10 22:34:36 UTC 2020 Modified Files: src/sys/kern: vfs_mount.c Log Message: vfs_mountroot(): don't needlessly grab a second reference to the root vnode (the kernel never chdirs) nor a lock on it. So the kernel chrooting to sysctl init.root is still ok? How is that accomplished? I couldn't find the code and don't recall seeing it. sysctl -w init.root=/altroot See the ZFS Root ramdisk: https://nxr.netbsd.org/xref/src/distrib/common/zfsroot.rc#33 I tested kernel from yesterdays sources and it still works fine, so I guess nothing was needed here. Roy
Re: CVS commit: src/sys/kern
Hi Roy. On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote: > On 10/04/2020 23:34, Andrew Doran wrote: > > Module Name:src > > Committed By: ad > > Date: Fri Apr 10 22:34:36 UTC 2020 > > > > Modified Files: > > src/sys/kern: vfs_mount.c > > > > Log Message: > > vfs_mountroot(): don't needlessly grab a second reference to the root vnode > > (the kernel never chdirs) nor a lock on it. > > So the kernel chrooting to sysctl init.root is still ok? How is that accomplished? I couldn't find the code and don't recall seeing it. Andrew
Re: CVS commit: src/sys/kern
On 10/04/2020 23:34, Andrew Doran wrote: Module Name:src Committed By: ad Date: Fri Apr 10 22:34:36 UTC 2020 Modified Files: src/sys/kern: vfs_mount.c Log Message: vfs_mountroot(): don't needlessly grab a second reference to the root vnode (the kernel never chdirs) nor a lock on it. So the kernel chrooting to sysctl init.root is still ok? Roy
Re: [vfs_cache] Re: CVS commit: src/sys/kern
On Sun, Mar 29, 2020 at 11:41:23AM +0200, Maxime Villard wrote: > Le 23/03/2020 ? 21:02, Andrew Doran a ?crit?: > > Module Name:src > > Committed By: ad > > Date: Mon Mar 23 20:02:14 UTC 2020 > > > > Modified Files: > > src/sys/kern: vfs_cache.c > > > > Log Message: > > cache_remove(): remove from the vnode list first, so cache_revlookup() > > doesn't try to re-activate an entry no longer on the LRU list. > > > > > > To generate a diff of this commit: > > cvs rdiff -u -r1.133 -r1.134 src/sys/kern/vfs_cache.c > > > > Please note that diffs are not public domain; they are subject to the > > copyright notices on the relevant files. > > It appears that your recent changes in vfs_cache.c have introduced a > use-after-free. Booting KASAN gives: > > ASan: Unauthorized Access In 0x...: Addr 0x... [1 byte, read, > PoolUseAfterFree] > > It seems that the problem is here: > > 557 if (nameiop == CREATE && (cnflags & ISLASTCN) != 0) { > 558 /* > 559* Last component and we are preparing to create > 560* the named object, so flush the negative cache > 561* entry. > 562*/ > 563 COUNT(ncs_badhits); > 564 cache_remove(ncp, true);< HERE > 565 hit = false; > 566 } else { > 567 COUNT(ncs_neghits); > 568 SDT_PROBE(vfs, namecache, lookup, hit, dvp, name, > 569 namelen, 0, 0); > 570 /* found neg entry; vn is already null from above */ > 571 hit = true; > 572 } > 573 if (iswht_ret != NULL) { > 574 /* > 575* Restore the ISWHITEOUT flag saved earlier. > 576*/ > 577 *iswht_ret = ncp->nc_whiteout; <-- ouch > 578 } else { > 579 KASSERT(!ncp->nc_whiteout); <-- ouch > 580 } > > cache_remove() frees 'ncp', and then 'ncp->nc_whiteout' is read. Fixed with vfs_cache.c 1.136. Thank you for bringing it to my attention. Andrew
[vfs_cache] Re: CVS commit: src/sys/kern
Le 23/03/2020 à 21:02, Andrew Doran a écrit : > Module Name: src > Committed By: ad > Date: Mon Mar 23 20:02:14 UTC 2020 > > Modified Files: > src/sys/kern: vfs_cache.c > > Log Message: > cache_remove(): remove from the vnode list first, so cache_revlookup() > doesn't try to re-activate an entry no longer on the LRU list. > > > To generate a diff of this commit: > cvs rdiff -u -r1.133 -r1.134 src/sys/kern/vfs_cache.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. It appears that your recent changes in vfs_cache.c have introduced a use-after-free. Booting KASAN gives: ASan: Unauthorized Access In 0x...: Addr 0x... [1 byte, read, PoolUseAfterFree] It seems that the problem is here: 557 if (nameiop == CREATE && (cnflags & ISLASTCN) != 0) { 558 /* 559 * Last component and we are preparing to create 560 * the named object, so flush the negative cache 561 * entry. 562 */ 563 COUNT(ncs_badhits); 564 cache_remove(ncp, true);< HERE 565 hit = false; 566 } else { 567 COUNT(ncs_neghits); 568 SDT_PROBE(vfs, namecache, lookup, hit, dvp, name, 569 namelen, 0, 0); 570 /* found neg entry; vn is already null from above */ 571 hit = true; 572 } 573 if (iswht_ret != NULL) { 574 /* 575 * Restore the ISWHITEOUT flag saved earlier. 576 */ 577 *iswht_ret = ncp->nc_whiteout; <-- ouch 578 } else { 579 KASSERT(!ncp->nc_whiteout); <-- ouch 580 } cache_remove() frees 'ncp', and then 'ncp->nc_whiteout' is read. Maxime
Re: CVS commit: src/sys/kern
Le 08/03/2020 à 21:41, Andrew Doran a écrit : > On Sun, Mar 08, 2020 at 08:34:29AM +0100, Maxime Villard wrote: >> Le 08/03/2020 ? 01:31, Andrew Doran a ?crit?: >>> Module Name:src >>> Committed By: ad >>> Date: Sun Mar 8 00:31:19 UTC 2020 >>> >>> Modified Files: >>> src/sys/kern: subr_kmem.c >>> >>> Log Message: >>> KMEM_SIZE: append the size_t to the allocated buffer, rather than >>> prepending, so it doesn't screw up the alignment of the buffer. >>> >>> Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com >>> >>> >>> To generate a diff of this commit: >>> cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c >>> >>> Please note that diffs are not public domain; they are subject to the >>> copyright notices on the relevant files. >> >> This is wrong. The point of KMEM_SIZE is to store at a reliable location >> the size of the buffer, in order to then verify that the 'size' argument >> given to kmem_free() is correct. >> >> Here, you are using that size argument to compute the location, which >> breaks the whole point of the check. > > Sure I understand the frustration, but it's still correct according to > the parameters I set for it 10+ years ago, which were for it to be a > quick-n-dirty diagnostic aid. No, it is not. KMEM_SIZE has found real bugs. Now this useful feature has been lessened just to shut a sanitizer which was pointing another issue. Again, as said previously, the real bug here is that you are making a caller assume specific alignment from an allocator that _does not_ guarantee this alignment. To fix that, either (1) revert the assumption or (2) make it part of the allocator's contract to guarantee this alignment. I don't have a preference on (1) or (2), but the current design is more a hack than anything else, and the subsequent change in KMEM_SIZE is definitely wrong. >> Also it probably collides with the KASAN shadow. > > Hmm, is that purely an alignment issue then, because it works in 8 byte > cells? Otherwise it sounds like this should not be enabled with KASAN at > all. Not sure what you mean, but KASAN definitely needs to be enabled on kmem. I've checked the code, and actually it is fine regarding KASAN for now, because the computation of the redzone doesn't take KMEM_SIZE into account (and so isn't affected by the fact that the location changed).
Re: CVS commit: src/sys/kern
On Sun, Mar 08, 2020 at 08:34:29AM +0100, Maxime Villard wrote: > Le 08/03/2020 ? 01:31, Andrew Doran a ?crit?: > > Module Name:src > > Committed By: ad > > Date: Sun Mar 8 00:31:19 UTC 2020 > > > > Modified Files: > > src/sys/kern: subr_kmem.c > > > > Log Message: > > KMEM_SIZE: append the size_t to the allocated buffer, rather than > > prepending, so it doesn't screw up the alignment of the buffer. > > > > Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com > > > > > > To generate a diff of this commit: > > cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c > > > > Please note that diffs are not public domain; they are subject to the > > copyright notices on the relevant files. > > This is wrong. The point of KMEM_SIZE is to store at a reliable location > the size of the buffer, in order to then verify that the 'size' argument > given to kmem_free() is correct. > > Here, you are using that size argument to compute the location, which > breaks the whole point of the check. Sure I understand the frustration, but it's still correct according to the parameters I set for it 10+ years ago, which were for it to be a quick-n-dirty diagnostic aid. > Also it probably collides with the KASAN shadow. Hmm, is that purely an alignment issue then, because it works in 8 byte cells? Otherwise it sounds like this should not be enabled with KASAN at all. Andrew > Please revert this change. > > As said previously, if cacheline alignment is expected this way, then it > should clearly be part of the kmem contract, and documented to be so.
Re: CVS commit: src/sys/kern
Le 08/03/2020 à 01:31, Andrew Doran a écrit : > Module Name: src > Committed By: ad > Date: Sun Mar 8 00:31:19 UTC 2020 > > Modified Files: > src/sys/kern: subr_kmem.c > > Log Message: > KMEM_SIZE: append the size_t to the allocated buffer, rather than > prepending, so it doesn't screw up the alignment of the buffer. > > Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com > > > To generate a diff of this commit: > cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. This is wrong. The point of KMEM_SIZE is to store at a reliable location the size of the buffer, in order to then verify that the 'size' argument given to kmem_free() is correct. Here, you are using that size argument to compute the location, which breaks the whole point of the check. Also it probably collides with the KASAN shadow. Please revert this change. As said previously, if cacheline alignment is expected this way, then it should clearly be part of the kmem contract, and documented to be so. Thanks
Re: CVS commit: src/sys/kern
On Tue, Mar 03, 2020 at 02:55:16PM -0500, Christos Zoulas wrote: > Module Name: src > Committed By: christos > Date: Tue Mar 3 19:55:16 UTC 2020 > > Modified Files: > src/sys/kern: vfs_syscalls.c > > Log Message: > don't skip the rdir check for the lazy case; breaks chroot df(1) hiding. That's likely my mistake. Thank you. Andrew
Re: CVS commit: src/sys/kern
> Date: Tue, 3 Mar 2020 12:41:32 +0900 > From: Rin Okuyama > > On 2020/03/03 1:00, Taylor R Campbell wrote: > > Include kern_crashme.c in non-DEBUG kernels. > > > > This is useful for simulating crashes in production to test failover. > > I like this. > > Could you please send pullup request to netbsd-9, or can I? Go for it!
Re: CVS commit: src/sys/kern
Hi, On 2020/03/03 1:00, Taylor R Campbell wrote: Module Name:src Committed By: riastradh Date: Mon Mar 2 16:00:54 UTC 2020 Modified Files: src/sys/kern: files.kern Log Message: Include kern_crashme.c in non-DEBUG kernels. This is useful for simulating crashes in production to test failover. To generate a diff of this commit: cvs rdiff -u -r1.43 -r1.44 src/sys/kern/files.kern Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. I like this. Could you please send pullup request to netbsd-9, or can I? Thanks, rin
Re: CVS commit: src/sys/kern
On 24.02.2020 21:47, Jaromir Dolecek wrote: > Module Name: src > Committed By: jdolecek > Date: Mon Feb 24 20:47:47 UTC 2020 > > Modified Files: > src/sys/kern: init_main.c > > Log Message: > move config_init_mi() call before vfsinit(), which can trigger loading > of VFS modules > > fixes crash with LOCKDEBUG due to uninitialized mutex when zfs > module is loaded in boot, because zfs's spa_init() calls config_mountroot() > which now requires the config init having been done > > > To generate a diff of this commit: > cvs rdiff -u -r1.521 -r1.522 src/sys/kern/init_main.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > > kASan is still broken on boot. Please fix. [ 1.0188350] acpicpu1 at cpu1: ACPI CPU [ 1.0188350] cpu0 has 2 core siblings: cpu1 cpu0 [ 1.0188350] cpu0 has 2 pkg siblings: cpu1 cpu0 [ 1.0188350] cpu0 has 1 1st siblings: cpu0 [ 1.0188350] cpu0 first in package: cpu0 [ 1.0188350] cpu1 has 2 core siblings: cpu0 cpu1 [ 1.0188350] cpu1 has 2 pkg siblings: cpu0 cpu1 [ 1.0188350] cpu1 has 1 1st siblings: cpu0 [ 1.0188350] cpu1 first in package: cpu0 [ 1.2307575] panic: ASan: Unauthorized Access In 0x811e6be6: Addr 0x98000f382b58 [8 bytes, read, PoolUseAfterFree] [ 1.2401020] cpu1: Begin traceback... [ 1.2501232] vpanic() at netbsd:vpanic+0x241 [ 1.2701652] snprintf() at netbsd:snprintf [ 1.2902064] kasan_report() at netbsd:kasan_report+0x98 [ 1.3102484] __asan_load8() at netbsd:__asan_load8+0x294 [ 1.3302897] config_interrupts_thread() at netbsd:config_interrupts_thread+0x68 [ 1.3403126] cpu1: End traceback... [ 1.3403126] fatal breakpoint trap in supervisor mode [ 1.3503263] trap type 1 code 0 rip 0x8021e4b5 cs 0x8 rflags 0x246 cr2 0 ilevel 0 rsp 0x98017de07d60 [ 1.3603479] curlwp 0x9800116a16c0 pid 0.30 lowest kstack 0x98017de002c0 Stopped in pid 0.30 (system) at netbsd:breakpoint+0x5: leave db{1}> https://syzkaller.appspot.com/bug?id=aa6e0c00233b3e55340da80d7636bb2c18181e5f signature.asc Description: OpenPGP digital signature
re: CVS commit: src/sys/kern
"Andrew Doran" writes: > Module Name: src > Committed By: ad > Date: Sun Feb 23 20:08:35 UTC 2020 > > Modified Files: > src/sys/kern: kern_pmf.c > > Log Message: > shutdown_all: take kernel_lock now that kern_reboot() doesn't. ah, i am now guessing that having the kernel lock held is why reboots hang and can't be broken into again. we really need to have autoconf run out side of a spin lock... anyone want to work on this please? .mrg.
re: CVS commit: src/sys/kern
"Andrew Doran" writes: > Module Name: src > Committed By: ad > Date: Sun Feb 23 20:06:30 UTC 2020 > > Modified Files: > src/sys/kern: kern_reboot.c > > Log Message: > - If concurrent calls to kern_reboot(), only let the first do the deed. > - Don't need kernel_lock for this (either OK, or suspendsched() called). what happens if i enter ddb while rebooting, such that this now sets 'rebooter' to some lwp. inside ddb, i use 'mach cpu' to change CPUs, which on some platforms actually swithces to that CPU -- ie, we now have a different curlwp. this while () will now hang forever, won't it? .mrg.
Re: CVS commit: src/sys/kern
On 20.02.2020 22:14, Jaromir Dolecek wrote: > Module Name: src > Committed By: jdolecek > Date: Thu Feb 20 21:14:23 UTC 2020 > > Modified Files: > src/sys/kern: subr_autoconf.c > > Log Message: > protect deferred lists' manipulation by config_misc_lock, same as > config_pending semaphore itself; right now this also covers > DVF_ATTACH_INPROGRESS flag > > > To generate a diff of this commit: > cvs rdiff -u -r1.265 -r1.266 src/sys/kern/subr_autoconf.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > After this commit kASan breaks on boot: [ 1.2418653] panic: ASan: Unauthorized Access In 0x811e0c46: Addr 0xa7000f382a58 [8 bytes, read, PoolUseAfterFree] [ 1.2511909] cpu1: Begin traceback... [ 1.2612093] vpanic() at netbsd:vpanic+0x241 sys/kern/subr_prf.c:336 [ 1.2812516] snprintf() at netbsd:snprintf [ 1.3012883] kasan_report() at netbsd:kasan_report+0x98 kasan_code_name sys/kern/subr_asan.c:186 [inline] [ 1.3012883] kasan_report() at netbsd:kasan_report+0x98 sys/kern/subr_asan.c:196 [ 1.3213274] __asan_load8() at netbsd:__asan_load8+0x294 kasan_shadow_4byte_isvalid sys/kern/subr_asan.c:346 [inline] [ 1.3213274] __asan_load8() at netbsd:__asan_load8+0x294 kasan_shadow_8byte_isvalid sys/kern/subr_asan.c:360 [inline] [ 1.3213274] __asan_load8() at netbsd:__asan_load8+0x294 kasan_shadow_check sys/kern/subr_asan.c:412 [inline] [ 1.3213274] __asan_load8() at netbsd:__asan_load8+0x294 sys/kern/subr_asan.c:1182 [ 1.3413734] config_interrupts_thread() at netbsd:config_interrupts_thread+0x68 sys/kern/subr_autoconf.c:459 [ 1.3513931] cpu1: End traceback... [ 1.3513931] fatal breakpoint trap in supervisor mode [ 1.3614094] trap type 1 code 0 rip 0x8021e4b5 cs 0x8 rflags 0x246 cr2 0 ilevel 0 rsp 0xa7017de07d60 [ 1.3714294] curlwp 0xa700116a16c0 pid 0.30 lowest kstack 0xa7017de002c0 Stopped in pid 0.30 (system) at netbsd:breakpoint+0x5: leave db{1}> https://syzkaller.appspot.com/bug?extid=1f0aefb06a387371fa14 signature.asc Description: OpenPGP digital signature
Re: CVS commit: src/sys/kern
> On Feb 17, 2020, at 11:19 PM, Andrew Doran wrote: > > Hi, > > On Thu, Feb 06, 2020 at 06:33:55PM +0100, J. Hannken-Illjes wrote: > >>> On 12. Jan 2020, at 18:49, Andrew Doran wrote: >>> >>> Module Name:src >>> Committed By: ad >>> Date: Sun Jan 12 17:49:17 UTC 2020 >>> >>> Modified Files: >>> src/sys/kern: vfs_vnode.c >>> >>> Log Message: >>> vput(): don't drop the vnode lock, carry the hold over into vrelel() which >>> might need it anyway. >>> >>> >>> To generate a diff of this commit: >>> cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c >>> >>> Please note that diffs are not public domain; they are subject to the >>> copyright notices on the relevant files. >>> >> >> vput(vnode_t *vp) >> { >> + int lktype; >> >> - VOP_UNLOCK(vp); >> - vrele(vp); >> + if ((vp->v_vflag & VV_LOCKSWORK) == 0) { >> + lktype = LK_EXCLUSIVE; >> + } else { >> + lktype = VOP_ISLOCKED(vp); >> + KASSERT(lktype != LK_NONE); >> + } >> + mutex_enter(vp->v_interlock); >> + vrelel(vp, 0, lktype); >> } >> >> This is quite wrong, from the manual: >> >> VOP_ISLOCKED(vp) >> Test if the vnode vp is locked. Possible return values are >> LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the >> calling thread, shared lock held by anyone or unlocked, >> respectively. >> >> This function must never be used to make locking decisions at >> run time: it is provided only for diagnostic purposes. >> >> I suppose you cannot determine if the current thread holds >> a shared lock. > > The intention of that last sentence was to dissuade people from doing error > prone, complicated stuff with locks. To my mind that idea of the locking > primitives is to take something difficult (concurrency) and package it up in > a way that's much easier to think about and work with - and yes it's still > complicated. > > There are limited cases when I think it makes sense to infer lock ownership, > for example when known for sure that a RW lock is held but the type of hold > needs to be known - like this case. The failure case there is the lock > being unheld, which would be caught by LOCKDEBUG in this case. Consider > that a rw_exit() with the lock unheld by the caller will produce what you > might charitably call "undefined behaviour" in a non-LOCKDEBUG kernel. > > Andrew Yes, I think it is safe here but it deserves a comment like the text above. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany) signature.asc Description: Message signed with OpenPGP
Re: CVS commit: src/sys/kern
Hi, On Thu, Feb 06, 2020 at 06:33:55PM +0100, J. Hannken-Illjes wrote: > > On 12. Jan 2020, at 18:49, Andrew Doran wrote: > > > > Module Name:src > > Committed By: ad > > Date: Sun Jan 12 17:49:17 UTC 2020 > > > > Modified Files: > > src/sys/kern: vfs_vnode.c > > > > Log Message: > > vput(): don't drop the vnode lock, carry the hold over into vrelel() which > > might need it anyway. > > > > > > To generate a diff of this commit: > > cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c > > > > Please note that diffs are not public domain; they are subject to the > > copyright notices on the relevant files. > > > > vput(vnode_t *vp) > { > + int lktype; > > - VOP_UNLOCK(vp); > - vrele(vp); > + if ((vp->v_vflag & VV_LOCKSWORK) == 0) { > + lktype = LK_EXCLUSIVE; > + } else { > + lktype = VOP_ISLOCKED(vp); > + KASSERT(lktype != LK_NONE); > + } > + mutex_enter(vp->v_interlock); > + vrelel(vp, 0, lktype); > } > > This is quite wrong, from the manual: > > VOP_ISLOCKED(vp) > Test if the vnode vp is locked. Possible return values are > LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the > calling thread, shared lock held by anyone or unlocked, > respectively. > > This function must never be used to make locking decisions at > run time: it is provided only for diagnostic purposes. > > I suppose you cannot determine if the current thread holds > a shared lock. The intention of that last sentence was to dissuade people from doing error prone, complicated stuff with locks. To my mind that idea of the locking primitives is to take something difficult (concurrency) and package it up in a way that's much easier to think about and work with - and yes it's still complicated. There are limited cases when I think it makes sense to infer lock ownership, for example when known for sure that a RW lock is held but the type of hold needs to be known - like this case. The failure case there is the lock being unheld, which would be caught by LOCKDEBUG in this case. Consider that a rw_exit() with the lock unheld by the caller will produce what you might charitably call "undefined behaviour" in a non-LOCKDEBUG kernel. Andrew
Re: CVS commit: src/sys/kern
> On 12. Jan 2020, at 18:49, Andrew Doran wrote: > > Module Name: src > Committed By: ad > Date: Sun Jan 12 17:49:17 UTC 2020 > > Modified Files: > src/sys/kern: vfs_vnode.c > > Log Message: > vput(): don't drop the vnode lock, carry the hold over into vrelel() which > might need it anyway. > > > To generate a diff of this commit: > cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c > > Please note that diffs are not public domain; they are subject to the > copyright notices on the relevant files. > vput(vnode_t *vp) { + int lktype; - VOP_UNLOCK(vp); - vrele(vp); + if ((vp->v_vflag & VV_LOCKSWORK) == 0) { + lktype = LK_EXCLUSIVE; + } else { + lktype = VOP_ISLOCKED(vp); + KASSERT(lktype != LK_NONE); + } + mutex_enter(vp->v_interlock); + vrelel(vp, 0, lktype); } This is quite wrong, from the manual: VOP_ISLOCKED(vp) Test if the vnode vp is locked. Possible return values are LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the calling thread, shared lock held by anyone or unlocked, respectively. This function must never be used to make locking decisions at run time: it is provided only for diagnostic purposes. I suppose you cannot determine if the current thread holds a shared lock. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig signature.asc Description: Message signed with OpenPGP
Re: CVS commit: src/sys/kern
"Christos Zoulas" wrote: > Log Message: > > Don't crash if we are on a hippie trail, head full of zombie +1 for any Australian references in a commit message :) Cheers, Simon.
re: CVS commit: src/sys/kern
> Therefore, I wouldn't bother adding kcov.h headers, and rolling back to the > previous version of in_interrupt for now is fine, considering that kcov > currently has no use outside of amd64. if you want to put code in sys/kern please make it portable. adding more MD code in MI places is the opposite stance we have been working towards since netbsd formed. .mrg.
Re: CVS commit: src/sys/kern
Le 08/12/2019 à 14:22, Martin Husemann a écrit : On Sun, Dec 08, 2019 at 12:58:20PM +0100, Maxime Villard wrote: kMSan has special constraints which, in this specific case, come down to: each function called from a KCOV instrumentation callback must be a static inline tagged with __nomsan. This was not the case with the updated in_interrupt(), but also still isn't the case with the lwp_getspecific() call, which will have to be dropped. This does not sound like a good reason to introduce MD code in sys/kern to me. Could should not be made worse to deal with sanitizer restrictions. Are there any alternatives? The clean way of doing this is having an MD kcov.h included from subr_kcov.c, like kASan. However, I expect that we will want a per-lwp kcov flag to indicate the current context (in exception, in interrupt, in nmi, etc), and when that happens, I don't expect that we will need in_interrupt anymore. Therefore, I wouldn't bother adding kcov.h headers, and rolling back to the previous version of in_interrupt for now is fine, considering that kcov currently has no use outside of amd64.
re: CVS commit: src/sys/kern
> > > Log Message: > > > Expunge the panicstr checks, we don't need them. > > > > can you explain why? > > Sure, [ .. ] ah, wow. i didn't realise it had such a bad effect on cpus before they actually go properly down. we should probably work hard to make them go down faster if possible, though maybe that's already as good as it can be. > > what sort of crash-time testing did you perform? > > That the system can be debugged and reset cleanly. If we've got code in DDB > that hangs up or crashdups don't work then that's something we should fix. > I've not run into any in a long time, they seem to get fixed. > > Do you have a particular concern or something else in mind? i was also curious about crash dumps working, but mostly this change is about reducing true lossage during a crash. thanks. .mrg.
Re: CVS commit: src/sys/kern
On Wed, Dec 11, 2019 at 09:06:33AM +1100, matthew green wrote: > "Andrew Doran" writes: > > Module Name:src > > Committed By: ad > > Date: Mon Dec 9 21:02:10 UTC 2019 > > > > Modified Files: > > src/sys/kern: kern_rwlock.c > > > > Log Message: > > Expunge the panicstr checks, we don't need them. > > can you explain why? Sure, I have developed a bit of a feel for it after years off watching the thing panic. The checks to not go too hard on the assertions once panicstr are set are pretty good in my experience - we don't want that to snowball. The ones that disable locking were of some kind of use (at least to me) back in 2007 before we had a decent LOCKDEBUG setup for the newlock2 primitives. So it sprang out of development requirements and an uncertainty about all how this new synchronization stuff was going to behave in practice more than a desire to do the right thing. On a uniprocessor or dual core box back then the panicstr checks didn't really seem to have many bad effects, but with more CPUs it often seems to make the crash much worse than it needs to be and I keep bumping into the effects of it. Here's a craptacular example: http://www.netbsd.org/~ad/ That's kind of amusing to look at and it's only my framebuffer memory but I worry that it could just as well be inodes or mbufs or anything else that belongs to the user, and I think that until the CPUs are locked up and activity stopped, the thing needs to keep working as properly as it can. > what sort of crash-time testing did you perform? That the system can be debugged and reset cleanly. If we've got code in DDB that hangs up or crashdups don't work then that's something we should fix. I've not run into any in a long time, they seem to get fixed. Do you have a particular concern or something else in mind? Cheers, Andrew
re: CVS commit: src/sys/kern
"Andrew Doran" writes: > Module Name: src > Committed By: ad > Date: Mon Dec 9 21:02:10 UTC 2019 > > Modified Files: > src/sys/kern: kern_rwlock.c > > Log Message: > Expunge the panicstr checks, we don't need them. can you explain why? what sort of crash-time testing did you perform? thanks. .mrg.