subject:"\"Re\\\: CVS commit\\\: src\\\/sys\\\/kern\""

Re: CVS commit: src/sys/kern

2024-07-04 Thread Taylor R Campbell

> Module Name:src
> Committed By:   mrg
> Date:   Thu Jul  4 05:59:05 UTC 2024
> 
> Modified Files:
> src/sys/kern: vfs_syscalls.c
> 
> Log Message:
> don't fd_putfile() if you haven't grabbed a ref already.
> 
> the condition to call fd_getvnode() was changed, but the condition
> to call fd_putfile() afterwards was not changed, leading to a panic
> seen by Chavdar on current-users, probably.
> 
> builds, runs, seems obvious.

The automatic testbed is failing to run tests to completion now:

https://releng.netbsd.org/b5reports/i386/commits-2024.07.html#2024.07.04.05.59.05

Our sloppy process for fixing the vfs_syscalls.c issue is obviously
failing now, after days of flailing around with band-aids.  I propose
to back out all of the recent changes:

https://mail-index.netbsd.org/source-changes/2024/06/29/msg152020.html
https://mail-index.netbsd.org/source-changes/2024/07/01/msg152081.html
https://mail-index.netbsd.org/source-changes/2024/07/01/msg152082.html
https://mail-index.netbsd.org/source-changes/2024/07/01/msg152083.html
https://mail-index.netbsd.org/source-changes/2024/07/04/msg152216.html

And then redo them with:

1. A PR that explains the problem, with references, and can track the
   changes in case we need to pull them up.

2. Posting the patch for public review first.

3. Adding automatic tests that exercise all the relevant cases and
   xfail, but that would pass with the fix.

4. _Then_ committing the fix.

Re: CVS commit: src/sys/kern

2023-10-13 Thread Taylor R Campbell

> Date: Fri, 13 Oct 2023 17:52:07 +0900
> From: Rin Okuyama 
> 
> It would be really nice if we can find some systematical/reliable methods to
> figure out files that really depends on struct syncobj, e.g.. I tried
> ctfdump(1) to
> *.o for kernel modules, but I cannot extract information better than
> ``grep syncobj.h .depend''...

The attached patch removes all use of sys/syncobj.h outside .c files
under sys, so we can be reasonably confident userland programs --
except for crash(8), which is kind of special -- are unaffected.

However, I'm going to hold off on committing this until the tree's
sleepq issues are fixed so our testbed can run again.
>From 7e9e2af19ecc6f4262b928da8a97a49d171c8072 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell 
Date: Fri, 13 Oct 2023 11:04:20 +
Subject: [PATCH] sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include  in .c files where it is now needed.
---
 sys/ddb/db_syncobj.h  |  2 +-
 sys/kern/kern_condvar.c   |  1 +
 sys/kern/kern_ktrace.c|  1 +
 sys/kern/kern_lwp.c   |  1 +
 sys/kern/kern_mutex.c |  1 +
 sys/kern/kern_rwlock.c|  1 +
 sys/kern/kern_sleepq.c|  1 +
 sys/kern/kern_turnstile.c |  1 +
 sys/kern/sys_lwp.c|  1 +
 sys/kern/sys_select.c |  1 +
 sys/sys/lwp.h |  2 +-
 sys/sys/sleepq.h  |  2 +-
 sys/sys/sleeptab.h|  6 +-
 sys/sys/syncobj.h |  4 ++--
 sys/sys/wchan.h   | 37 +
 15 files changed, 56 insertions(+), 6 deletions(-)
 create mode 100644 sys/sys/wchan.h

diff --git a/sys/ddb/db_syncobj.h b/sys/ddb/db_syncobj.h
index 2c2ad89ba177..dc7594f5163e 100644
--- a/sys/ddb/db_syncobj.h
+++ b/sys/ddb/db_syncobj.h
@@ -29,7 +29,7 @@
 #ifndef_DDB_DB_SYNCOBJ_H
 #define_DDB_DB_SYNCOBJ_H
 
-#include 
+#include 
 
 struct lwp;
 struct syncobj;
diff --git a/sys/kern/kern_condvar.c b/sys/kern/kern_condvar.c
index 478c4a35ff2b..c25282e1beb3 100644
--- a/sys/kern/kern_condvar.c
+++ b/sys/kern/kern_condvar.c
@@ -45,6 +45,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_condvar.c,v 1.59 2023/10/12 
23:51:05 ad Exp $")
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Accessors for the private contents of the kcondvar_t data type.
diff --git a/sys/kern/kern_ktrace.c b/sys/kern/kern_ktrace.c
index 5ad5272af7d8..812be0c2c9ca 100644
--- a/sys/kern/kern_ktrace.c
+++ b/sys/kern/kern_ktrace.c
@@ -77,6 +77,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_ktrace.c,v 1.182 2022/07/01 
01:07:56 riastradh
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/sys/kern/kern_lwp.c b/sys/kern/kern_lwp.c
index 77e43242f0f9..971e0180f1f6 100644
--- a/sys/kern/kern_lwp.c
+++ b/sys/kern/kern_lwp.c
@@ -253,6 +253,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_lwp.c,v 1.265 2023/10/05 
19:41:06 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
index 810ea121a0bd..640909bc533e 100644
--- a/sys/kern/kern_mutex.c
+++ b/sys/kern/kern_mutex.c
@@ -57,6 +57,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_mutex.c,v 1.110 2023/09/23 
18:48:04 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
diff --git a/sys/kern/kern_rwlock.c b/sys/kern/kern_rwlock.c
index 88db7d507b4d..96312874a069 100644
--- a/sys/kern/kern_rwlock.c
+++ b/sys/kern/kern_rwlock.c
@@ -62,6 +62,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_rwlock.c,v 1.74 2023/10/04 
20:39:35 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
diff --git a/sys/kern/kern_sleepq.c b/sys/kern/kern_sleepq.c
index e9d39445f75b..bb43e78f6f6b 100644
--- a/sys/kern/kern_sleepq.c
+++ b/sys/kern/kern_sleepq.c
@@ -49,6 +49,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_sleepq.c,v 1.83 2023/10/08 
13:37:26 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 /*
  * for sleepq_abort:
diff --git a/sys/kern/kern_turnstile.c b/sys/kern/kern_turnstile.c
index 0cd8886cb6b5..85bdf946c325 100644
--- a/sys/kern/kern_turnstile.c
+++ b/sys/kern/kern_turnstile.c
@@ -68,6 +68,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_turnstile.c,v 1.53 
2023/10/08 13:23:05 ad Exp $
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
diff --git a/sys/kern/sys_lwp.c b/sys/kern/sys_lwp.c
index c71cf1e823d6..108d40641e38 100644
--- a/sys/kern/sys_lwp.c
+++ b/sys/kern/sys_lwp.c
@@ -51,6 +51,7 @@ __KERNEL_RCSID(0, "$NetBSD: sys_lwp.c,v 1.87 2023/10/08 
13:23:05 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
diff --git a/sys/kern/sys_select.c b/sys/kern/sys_select.c
index 9719d220e319..16962505663c 100644
--- a/sys/kern/sys_select.c
+++ b/sys/kern/sys_select.c
@@ -106,6 +106,7 @@ __KERNEL_RCSID(0, "$NetBSD: sys_select.c,v 1.64 2023/10/08 
13:23:05 ad Exp $");
 #include 
 #include 
 #include 
+#include 
 
 /* Flags for lwp::l_selflag. */
 #defineSEL_RESET   0   /* awoken, interrupted, or not yet

Re: CVS commit: src/sys/kern

2023-10-13 Thread Rin Okuyama

On Thu, Oct 12, 2023 at 8:23 PM Taylor R Campbell
 wrote:
>
> > Date: Thu, 12 Oct 2023 17:06:02 +0900
> > From: Rin Okuyama 
> >
> > On Thu, Oct 5, 2023 at 5:39â€¯AM Andrew Doran  wrote:
> > >
> > > Module Name:src
> > > Committed By:   ad
> > > Date:   Wed Oct  4 20:39:35 UTC 2023
> > >
> > > Modified Files:
> > > src/sys/kern: kern_rwlock.c kern_turnstile.c
> > >
> > > Log Message:
> > > Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's
> > > more informative than "tstile".
> >
> > Cool! Can I send a pull up request to netbsd-10?
>
> Not sure -- it would depend on this commit to introduce struct
> syncobj::sobj_name:
>
> https://mail-index.netbsd.org/source-changes/2023/07/17/msg146058.html
>
> This is a potential kernel ABI change.  I didn't investigate to
> determine whether it would be safe to pull up.

Thanks, I didn't notice that. It should be too risky to pull these up
just before RC1.
I withdraw this proposal.

PS
It would be really nice if we can find some systematical/reliable methods to
figure out files that really depends on struct syncobj, e.g.. I tried
ctfdump(1) to
*.o for kernel modules, but I cannot extract information better than
``grep syncobj.h .depend''...

Thanks,
rin

Re: CVS commit: src/sys/kern

2023-10-12 Thread Taylor R Campbell

> Date: Thu, 12 Oct 2023 17:06:02 +0900
> From: Rin Okuyama 
> 
> On Thu, Oct 5, 2023 at 5:39â¯AM Andrew Doran  wrote:
> >
> > Module Name:src
> > Committed By:   ad
> > Date:   Wed Oct  4 20:39:35 UTC 2023
> >
> > Modified Files:
> > src/sys/kern: kern_rwlock.c kern_turnstile.c
> >
> > Log Message:
> > Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's
> > more informative than "tstile".
> 
> Cool! Can I send a pull up request to netbsd-10?

Not sure -- it would depend on this commit to introduce struct
syncobj::sobj_name:

https://mail-index.netbsd.org/source-changes/2023/07/17/msg146058.html

This is a potential kernel ABI change.  I didn't investigate to
determine whether it would be safe to pull up.

Re: CVS commit: src/sys/kern

2023-10-12 Thread Rin Okuyama

Cool! Can I send a pull up request to netbsd-10?

I've not yet observed deadlocks since this was committed,
fortunately or unfortunately, although ;)

Thanks,
rin

On Thu, Oct 5, 2023 at 5:39 AM Andrew Doran  wrote:
>
> Module Name:src
> Committed By:   ad
> Date:   Wed Oct  4 20:39:35 UTC 2023
>
> Modified Files:
> src/sys/kern: kern_rwlock.c kern_turnstile.c
>
> Log Message:
> Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's
> more informative than "tstile".
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.73 -r1.74 src/sys/kern/kern_rwlock.c
> cvs rdiff -u -r1.51 -r1.52 src/sys/kern/kern_turnstile.c
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>

Re: CVS commit: src/sys/kern

2023-07-28 Thread Taylor R Campbell

> Date: Fri, 28 Jul 2023 23:40:40 +0900
> From: Izumi Tsutsui 
> 
> > Module Name:src
> > Committed By:   riastradh
> > Date:   Fri Jul 28 10:37:28 UTC 2023
> > 
> > Modified Files:
> > src/sys/kern: kern_tc.c
> > 
> > Log Message:
> > timecounter(9): Link to phk's timecounter paper for reference.
> 
> Maybe it's better to refer our timecounter(9) man page?
> https://man.netbsd.org/timecounter.9

That's fine to add, but the man page is about the interface, not about
the algorithm.  The paper is a critical reference for obscure details
of the algorithm that are not obvious from the interface.

The man page should perhaps also mention more of the API contract,
like the relation between tc_frequency, tc_counter_mask, and hz.

> (that refers http://phk.freebsd.dk/pubs/timecounter.pdf though)

No opinion on which URL to use as long as it goes to the right paper
(and preferably https).

Re: CVS commit: src/sys/kern

2023-07-28 Thread Izumi Tsutsui

> Module Name:  src
> Committed By: riastradh
> Date: Fri Jul 28 10:37:28 UTC 2023
> 
> Modified Files:
>   src/sys/kern: kern_tc.c
> 
> Log Message:
> timecounter(9): Link to phk's timecounter paper for reference.

Maybe it's better to refer our timecounter(9) man page?
https://man.netbsd.org/timecounter.9
(that refers http://phk.freebsd.dk/pubs/timecounter.pdf though)

---
Izumi Tsutsui

Re: CVS commit: src/sys/kern

2022-11-15 Thread Michael

Hello,

On Tue, 15 Nov 2022 10:29:56 +
"Michael Lorenz"  wrote:

> Module Name:  src
> Committed By: macallan
> Date: Tue Nov 15 10:29:56 UTC 2022
> 
> Modified Files:
>   src/sys/kern: subr_pserialize.c
> 
> Log Message:
> don't KASSERT(kpreempt_disabled()) while cold
> pserialize_read_*() can be called *very* early in kernel startup these days

... with this clang-built kernels work again on macppc

have fun
Michael

Re: CVS commit: src/sys/kern

2022-10-04 Thread Robert Elz

Date:Tue, 04 Oct 2022 10:09:35 -0400
From:Christos Zoulas 
Message-ID:  <8dd220d16861eb3a890461bdf02d1...@zoulas.com>

  | I always forget the O_CLOEXEC is special
  | in that regard. I wish it was not, but it is difficult to fix.

POSIX is adding O_CLOFORK in the next version (no guarantees I remembered
the symbol name spelling correctly here) which will have essentially the
same (open time) semantics (similar long term sematics as well, just
applied at a different time).

I assume we will need to add that at some point or other.

  | The question is how to find the vnode?

Not really, I assume that part will be fairly easy (probably trivial),
I just didn't have the energy to go work it out when sending that mail.
We have the file descriptor, and I suspect the file* (need to check to
make sure the right one is immediately available, but we can get it from
the fd if not), we know it refers to a vnode (it came from vn_open()),
so getting the vnode* from the file* is not something difficult, I think.

  | Perhaps it is easiest to fail the open call if O_EXLOCK or
  | O_SHLOCK are specified in a cloning open?

That would be an option, and is better than just ignoring them, but
better still would be to make them work.

Since open_setfp() does nothing (much) when none of the relevant O_xxx
flags that it tests are set (the fd open flags, as distinct from the
fp ones), and the code calls VOP_UNLOCK(vp) after it, we know that vp
is intended to be locked when open_setfp() is called (further confirmed
as when any of the O_??LOCK flags is set, open_setfp() does a VOP_UNLOCK()
and later a vn_lock() (which I am guessing is the inverse).

Maybe all that's needed is a vn_lock() call (on the vp that we still need
to fetch) and then call open_setfp() ?   But this is all beyond what I
know enough about to be sure, particularly to avoid doing anything which
might deadlock, etc.

kre

ps: if this gets done properly, then special case code to handle O_NONBLOCK
(and O_NOSIGPIPE, ...) in cloning device drivers won't be needed either,
open_setfp() is where all of that is normally added to the file* for the
fd being returned, it was not "simply happening" because that call is
missing in the cloned device case.

Re: CVS commit: src/sys/kern

2022-10-04 Thread Christos Zoulas


On 2022-10-01 3:39 pm, Robert Elz wrote:

Date:Sat, 1 Oct 2022 13:00:04 -0400

[stuff deleted]


Even when it is called, the code is:

fp->f_flag = flags & FMASK;

where FMASK is (from fcntl.h)

#define FMASK   (FREAD|FWRITE|FCNTLFLAGS|FEXEC)

and

#define FCNTLFLAGS  
(FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FDSYNC|FRSYNC|FALTIO|\

 FDIRECT|FNOSIGPIPE)

which all looks exactly as it should be to me - and note that O_CLOEXEC
(which has no F equivalent name - it doesn't need one) is not 
there.


So, fp->f_flag isn't set at all (is probably 0), and even if it were
O_CLOEXEC would not be there, and should not be.


Thanks for pointing that out, I always forget the O_CLOEXEC is special
in that regard. I wish it was not, but it is difficult to fix.



For the vnode actually being opened, none of this matters, as the open
lasts as long (actually not as long) as the open() sys call - when that
returns, the device being opened has been closed again, so what the
file that refers to it looks like (or would have looked like) really
doesn't matter at all, it never becomes visible.   That's my guess as
to why the open_setfp() call is missing in that case.

But what got forgotten (or deliberately was not done) was anything to
affect the modes of the clone which is opened.

  | What does it mean when the open specifies O_CLOEXEC
  | and ff->ff_exclose is false? Can that happen? Is that desirable?

It is what is currently happening whenever we open a cloning device.
(Or that is what it looks like to me).   Desirable, no idea, I didn't
define the semantics of cloning device opens, nor which of the open
flags should apply to the clone that is created.

  | I am fine with the locking to stay where it is. I guess it is 
probably

  | not needed after dup/clone, since the underlying vnode is shared...

Assuming you mean dup(2) (and dup2()) and clone(2) then no - those have
no way to pass the relevant locking flags, if you have a fd and want to
apply a lock, fcntl() is what does that (and the fcntl operations that
duplicate fds do not also apply locks).

The only issue is O_EXLOCK and O_SHLOCK (and O_CLOEXEC) for cloned 
devices.


I suspect it makes sense for O_CLOEXEC to be applied in that case, it
makes little sense for open("/dev/ptmx", O_RDWR|O_CLOEXEC) to succeed,
returning an open fd which does not have cloexec set (which is the 
issue,

along with O_NONBLOCK) which started all of this discussion.

The locking flags I am less sure about.   I don't see how they can fail
to succeed if applied, as the vnode for the device has just been 
created,
nothing else can possibly have any kind of lock on it.   Whether 
there's

any benefit in applying the lock so that the node is locked for later,
I don't know - but it certainly should do no harm to do that.

It seems clear to me that what we need is (something like)

Index: vfs_syscalls.c
===
RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.555
diff -u -r1.555 vfs_syscalls.c
--- vfs_syscalls.c  12 Feb 2022 15:51:29 -  1.555
+++ vfs_syscalls.c  1 Oct 2022 19:27:15 -
@@ -1763,6 +1763,9 @@
error = fd_dupopen(dupfd, dupfd_move, flags, &indx);
if (error)
return error;
+   error = open_setfp(l, fp, XXXvp, indx, flags);
+   if (error)
+   return error;
*fd = indx;
} else {
error = open_setfp(l, fp, vp, indx, flags);


where XXXvp needs to be extracted from somewhere (it isn't vp, as 
vp==NULL)

except that what follows in the else case is ...

if (error)
return error;
VOP_UNLOCK(vp);
*fd = indx;


That VOP_UNLOCK(vp) is what is bothering me,   It tells me that 
open_setfp()
is expecting to be called with vp locked - but in the first case (the 
cloning
case) there is no VOP_UNLOCK() call (and what's more, fd_dupopen() 
cannot
do it, as it has no vp arg).   That means, I believe, that when 
vn_open()
returns in the normal case, vp is returned locked, but in the cloning 
case

the vnode that was created for the clone is not locked.

I'm not sure what is the right way to find the vnode, or how it should
properly be locked so open_setfp() can do its thing.   If I knew all of
that I would have made an attempt at fixing this already.   We need
someone who really understands what is happening here, and the right
way to handle it all (which very likely is nothing like I just 
suggested).


The question is how to find the vnode? Perhaps it is easiest to fail the
open call if O_EXLOCK or O_SHLOCK are specified in a cloning open? At 
least

we will not silently ignore them?

Best,

christos

Re: CVS commit: src/sys/kern

2022-10-01 Thread Robert Elz

Date:Sat, 1 Oct 2022 13:00:04 -0400
From:Christos Zoulas 
Message-ID:  <8bd3a408-77fd-402c-8d6c-ad4b4a5e8...@zoulas.com>

  | This is what I meant:
  |
  | RCS file: /cvsroot/src/sys/kern/kern_descrip.c,v
  | retrieving revision 1.251
  | diff -u -u -r1.251 kern_descrip.c
  | --- kern_descrip.c  29 Jun 2021 22:40:53 -  1.251
  | +++ kern_descrip.c  1 Oct 2022 16:56:44 -
  | @@ -1162,6 +1162,7 @@
  | KASSERT(ff->ff_allocated);
  | KASSERT(fd_isused(fdp, fd));
  | KASSERT(fd >= NDFDFILE || ff == (fdfile_t *)fdp->fd_dfdfile[fd]);
  | +   ff->ff_exclose = (fp->f_flag & O_CLOEXEC) != 0;
  |
  | /* No need to lock in order to make file initially visible. */
  | ff->ff_file = fp;

That's not going to work in the situation we're in, as nothing will have
set O_CLOEXEC in fp->f_flag (or shouldn't have anyway, right, O_CLOEXEC isn't
that kind of a flag, it belongs to the descriptor, not the file*).

f_flag is set in vfs_syscalls.c in open_setfp() - which is never called
in the cloning device case, that's the underlying problem I believe.

Even when it is called, the code is:

fp->f_flag = flags & FMASK;

where FMASK is (from fcntl.h)

#define FMASK   (FREAD|FWRITE|FCNTLFLAGS|FEXEC)

and

#define FCNTLFLAGS  (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FDSYNC|FRSYNC|FALTIO|\
 FDIRECT|FNOSIGPIPE)

which all looks exactly as it should be to me - and note that O_CLOEXEC
(which has no F equivalent name - it doesn't need one) is not there.

So, fp->f_flag isn't set at all (is probably 0), and even if it were
O_CLOEXEC would not be there, and should not be.

For the vnode actually being opened, none of this matters, as the open
lasts as long (actually not as long) as the open() sys call - when that
returns, the device being opened has been closed again, so what the
file that refers to it looks like (or would have looked like) really
doesn't matter at all, it never becomes visible.   That's my guess as
to why the open_setfp() call is missing in that case.

But what got forgotten (or deliberately was not done) was anything to
affect the modes of the clone which is opened.

  | What does it mean when the open specifies O_CLOEXEC
  | and ff->ff_exclose is false? Can that happen? Is that desirable?

It is what is currently happening whenever we open a cloning device.
(Or that is what it looks like to me).   Desirable, no idea, I didn't
define the semantics of cloning device opens, nor which of the open
flags should apply to the clone that is created.

  | I am fine with the locking to stay where it is. I guess it is probably
  | not needed after dup/clone, since the underlying vnode is shared...

Assuming you mean dup(2) (and dup2()) and clone(2) then no - those have
no way to pass the relevant locking flags, if you have a fd and want to
apply a lock, fcntl() is what does that (and the fcntl operations that
duplicate fds do not also apply locks).

The only issue is O_EXLOCK and O_SHLOCK (and O_CLOEXEC) for cloned devices.

I suspect it makes sense for O_CLOEXEC to be applied in that case, it
makes little sense for open("/dev/ptmx", O_RDWR|O_CLOEXEC) to succeed,
returning an open fd which does not have cloexec set (which is the issue,
along with O_NONBLOCK) which started all of this discussion.

The locking flags I am less sure about.   I don't see how they can fail
to succeed if applied, as the vnode for the device has just been created,
nothing else can possibly have any kind of lock on it.   Whether there's
any benefit in applying the lock so that the node is locked for later,
I don't know - but it certainly should do no harm to do that.

It seems clear to me that what we need is (something like)

Index: vfs_syscalls.c
===
RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.555
diff -u -r1.555 vfs_syscalls.c
--- vfs_syscalls.c  12 Feb 2022 15:51:29 -  1.555
+++ vfs_syscalls.c  1 Oct 2022 19:27:15 -
@@ -1763,6 +1763,9 @@
error = fd_dupopen(dupfd, dupfd_move, flags, &indx);
if (error)
return error;
+   error = open_setfp(l, fp, XXXvp, indx, flags);
+   if (error)
+   return error;
*fd = indx;
} else {
error = open_setfp(l, fp, vp, indx, flags);

where XXXvp needs to be extracted from somewhere (it isn't vp, as vp==NULL)
except that what follows in the else case is ...

if (error)
return error;
VOP_UNLOCK(vp);
*fd = indx;

That VOP_UNLOCK(vp) is what is bothering me,   It tells me that open_setfp()
is expecting to be called with vp locked - but in the first case (the cloning
case) there is no VOP_UNLOCK() call (and what's more, fd_dupopen() cannot
do it, as it has no vp arg).   That means, I

Re: CVS commit: src/sys/kern

2022-10-01 Thread Christos Zoulas



> On Sep 30, 2022, at 11:02 PM, Robert Elz  wrote:
> 
>Date:Fri, 30 Sep 2022 20:15:07 -0400
>From:Christos Zoulas 
>Message-ID:  
> 
>  | It does not need an extra flag (it looks in the file descriptor flags to
>  | find if it needs to set or not.
> 
> One of us is confused.   From where in this case does anything
> get the exclose flag set?   That's the whole question here.  The
> flags arg that is passed around has O_CLOEXEC set in it - you used
> that in the call to fd_set_exclose() in kern/tty_ptm.c ... but where
> you said that would be better done in fd_affix().

This is what I meant:

RCS file: /cvsroot/src/sys/kern/kern_descrip.c,v
retrieving revision 1.251
diff -u -u -r1.251 kern_descrip.c
--- kern_descrip.c  29 Jun 2021 22:40:53 -  1.251
+++ kern_descrip.c  1 Oct 2022 16:56:44 -
@@ -1162,6 +1162,7 @@
KASSERT(ff->ff_allocated);
KASSERT(fd_isused(fdp, fd));
KASSERT(fd >= NDFDFILE || ff == (fdfile_t *)fdp->fd_dfdfile[fd]);
+   ff->ff_exclose = (fp->f_flag & O_CLOEXEC) != 0;

/* No need to lock in order to make file initially visible. */
ff->ff_file = fp;

> 
> That does not have access to the flags.   So from where is it going
> to get the close on exec info ?
> 
> My reading of do_open() is that the O_CLOEXEC flag is never even
> examined when a cloning device is opened, it doesn't get set on
> the original fd (the cloner) or the cloned device (other than by
> your recent modification for /dev/pmx).
> 
> Did I misread the code?
> 
> Or are you planning something different than it seemed?
> 
>  | to find other cases where we forgot to call fd_set_exclose() before calling
>  | fd_affix().
> 
> My point is that it should not be necessary to call fd_set_exclose()
> in every (or any) cloning device driver.  The open syscall handling
> is where that should be done, just as it is for all the opens that
> are not cloning devices.

What does it mean when the open specifies O_CLOEXEC
and ff->ff_exclose is false? Can that happen? Is that desirable?

> Why be different?
> 
>  | It also does not need locking because the process can't access
>  | the descriptor before calling fd_affix.
> 
> The locking I was referring to are the vnode locks/references in
> do_open(), not anything related to the file struct or descriptor.
> I just do not feel competent to get all of that correct in this
> case (more complex than the normal case because of the extra vnode
> involved) and would prefer if someone familiar with all of that
> were to handle it - particularly in the extra error case that will
> need to be handled, even if I cannot see how it would actually fire
> in the case in question.

I am fine with the locking to stay where it is. I guess it is probably
not needed after dup/clone, since the underlying vnode is shared...

christos


signature.asc
Description: Message signed with OpenPGP

Re: CVS commit: src/sys/kern

2022-09-30 Thread Robert Elz

Date:Fri, 30 Sep 2022 20:15:07 -0400
From:Christos Zoulas 
Message-ID:  

  | It does not need an extra flag (it looks in the file descriptor flags to
  | find if it needs to set or not.

One of us is confused.   From where in this case does anything
get the exclose flag set?   That's the whole question here.  The
flags arg that is passed around has O_CLOEXEC set in it - you used
that in the call to fd_set_exclose() in kern/tty_ptm.c ... but where
you said that would be better done in fd_affix().

That does not have access to the flags.   So from where is it going
to get the close on exec info ?

My reading of do_open() is that the O_CLOEXEC flag is never even
examined when a cloning device is opened, it doesn't get set on
the original fd (the cloner) or the cloned device (other than by
your recent modification for /dev/pmx).

Did I misread the code?

Or are you planning something different than it seemed?

  | to find other cases where we forgot to call fd_set_exclose() before calling
  | fd_affix().

My point is that it should not be necessary to call fd_set_exclose()
in every (or any) cloning device driver.  The open syscall handling
is where that should be done, just as it is for all the opens that
are not cloning devices.

Why be different?

  | It also does not need locking because the process can't access
  | the descriptor before calling fd_affix.

The locking I was referring to are the vnode locks/references in
do_open(), not anything related to the file struct or descriptor.
I just do not feel competent to get all of that correct in this
case (more complex than the normal case because of the extra vnode
involved) and would prefer if someone familiar with all of that
were to handle it - particularly in the extra error case that will
need to be handled, even if I cannot see how it would actually fire
in the case in question.

kre

Re: CVS commit: src/sys/kern

2022-09-30 Thread Christos Zoulas



> On Sep 30, 2022, at 5:57 PM, Robert Elz  wrote:
> 
>Date:Fri, 30 Sep 2022 16:34:20 -0400
>From:Christos Zoulas 
>Message-ID:  <232331ad-d501-4547-b730-03590c0c9...@zoulas.com>
> 
>  | How about handling exclose there?
> 
> That would be possible, but why?   We still need higher level code to
> handle the locking, which can also handle cloexec -- the problem we
> have now is simply that the relevant call is missing, I don't think adding
> it will be hard, but it needs to be done by someone who understands the
> locking requirements, and correct exit strategy in this case if an error
> occurs (failing to successfully lock a newly created clone would seem to
> be a very bizarre case, but still...)   That is, I don't feel competent to
> suggest the 3 or 4 lines that ought be added in do_open to fix this (for
> just O_CLOEXEC it would be trivial there, as that cannot fail).
> 
> Currently fd_affix (I mistakenly made it fp_affix in the last message...)
> doesn't have a flags parameter, so to do it the way you suggest, we'd need
> to alter its signature, bump to 9.99.101 (and I haven't yet gotten around
> to making my kernel be 98.99.100 which I'm kind of planning to do ...)
> and go alter all the calls everywhere, mostly just filling in an extra
> arg with a 0.

It does not need an extra flag (it looks in the file descriptor flags to
find if it needs to set or not. In fact the first thing I thought was to add
an assertion to make sure that the flags agrees with that is set in exclose,
to find other cases where we forgot to call fd_set_exclose() before calling
fd_affix(). It also does not need locking because the process can't access
the descriptor before calling fd_affix.

christos


signature.asc
Description: Message signed with OpenPGP

Re: CVS commit: src/sys/kern

2022-09-30 Thread Paul Goyette


On Sat, 1 Oct 2022, Robert Elz wrote:


Currently fd_affix (I mistakenly made it fp_affix in the last message...)
doesn't have a flags parameter, so to do it the way you suggest, we'd need
to alter its signature, bump to 9.99.101 ...


and add some COMPAT_09 goop for backward compability  :)



... (and I haven't yet gotten around
to making my kernel be 98.99.100 which I'm kind of planning to do ...)
and go alter all the calls everywhere, mostly just filling in an extra
arg with a 0.

kre


!DSPAM:63376773211686829812153!




++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+

Re: CVS commit: src/sys/kern

2022-09-30 Thread Robert Elz

Date:Fri, 30 Sep 2022 16:34:20 -0400
From:Christos Zoulas 
Message-ID:  <232331ad-d501-4547-b730-03590c0c9...@zoulas.com>

  | How about handling exclose there?

That would be possible, but why?   We still need higher level code to
handle the locking, which can also handle cloexec -- the problem we
have now is simply that the relevant call is missing, I don't think adding
it will be hard, but it needs to be done by someone who understands the
locking requirements, and correct exit strategy in this case if an error
occurs (failing to successfully lock a newly created clone would seem to
be a very bizarre case, but still...)   That is, I don't feel competent to
suggest the 3 or 4 lines that ought be added in do_open to fix this (for
just O_CLOEXEC it would be trivial there, as that cannot fail).

Currently fd_affix (I mistakenly made it fp_affix in the last message...)
doesn't have a flags parameter, so to do it the way you suggest, we'd need
to alter its signature, bump to 9.99.101 (and I haven't yet gotten around
to making my kernel be 98.99.100 which I'm kind of planning to do ...)
and go alter all the calls everywhere, mostly just filling in an extra
arg with a 0.

kre

Re: CVS commit: src/sys/kern

2022-09-30 Thread Christos Zoulas



> On Sep 30, 2022, at 10:13 AM, Robert Elz  wrote:
> 
>Date:Thu, 29 Sep 2022 16:47:06 - (UTC)
>From:chris...@astron.com (Christos Zoulas)
>Message-ID:  
> 
>  | I think that the way to go is to:
>  |
>  | 1. Do the fd_set_exclose() in fd_affix(). That will remove most of the 
> calls
>  |to fd_set_exclose() *and* the open-coded versions of it.
>  | 2. Move the open_setfp locking initialization code to fd_affix() and do it
>  |if fp->f_type == DTYPE_VNODE. This should enable locking in all the
>  |appropriate cloners.
> 
> I initially intended to reply and say that decisions where to put stuff
> like that were for someone else (you, dholland, ...) rather than me, as
> I haven't played around much at this level since before vnodes existed.
> 
> But I have been thinking about it, and I disagree with that approach.
> 
> fp_affix() has a job to do, and should be left to do it, without being
> burdened by applying weird side effects, sometimes.   The "do one thing
> and do it well" philosophy applies to more than the commands.
> 
> eg: currently fd_affix() is a void func, but to handle the lock flags
> it would need to be able to fail, and return an error code.  It would
> also need to be able to sleep.   That's just wrong.
> 
> O_CLOEXEC and O_??LOCK are high level open() flags, and deserve to be
> handled somewhere near the upper levels of the open syscall handling,
> not buried in some utility function.

That is the feedback that I wanted. But there were two parts to it. How about
handling exclose there? It is just making sure that the value from flags is 
propagated
to the exclose field.

christos


signature.asc
Description: Message signed with OpenPGP

Re: CVS commit: src/sys/kern

2022-09-30 Thread Robert Elz

Date:Thu, 29 Sep 2022 16:47:06 - (UTC)
From:chris...@astron.com (Christos Zoulas)
Message-ID:  

  | I think that the way to go is to:
  |
  | 1. Do the fd_set_exclose() in fd_affix(). That will remove most of the calls
  |to fd_set_exclose() *and* the open-coded versions of it.
  | 2. Move the open_setfp locking initialization code to fd_affix() and do it
  |if fp->f_type == DTYPE_VNODE. This should enable locking in all the
  |appropriate cloners.

I initially intended to reply and say that decisions where to put stuff
like that were for someone else (you, dholland, ...) rather than me, as
I haven't played around much at this level since before vnodes existed.

But I have been thinking about it, and I disagree with that approach.

fp_affix() has a job to do, and should be left to do it, without being
burdened by applying weird side effects, sometimes.   The "do one thing
and do it well" philosophy applies to more than the commands.

eg: currently fd_affix() is a void func, but to handle the lock flags
it would need to be able to fail, and return an error code.  It would
also need to be able to sleep.   That's just wrong.

O_CLOEXEC and O_??LOCK are high level open() flags, and deserve to be
handled somewhere near the upper levels of the open syscall handling,
not buried in some utility function.

kre

Re: CVS commit: src/sys/kern

2022-09-29 Thread Christos Zoulas

In article <9275.1664462...@jacaranda.noi.kre.to>,
Robert Elz   wrote:
>Date:Thu, 29 Sep 2022 08:18:28 -0400
>From:"Christos Zoulas" 
>Message-ID:  <20220929121828.06edff...@cvs.netbsd.org>
>
>  | Log Message:
>  | Add fd_set_exclose(). It is probably better to do this automatically in
>  | fd_affix()...
>
>Since that only affects /dev/ptmx I'd suggest fixing it generally for all
>cloning devices (and handling O_??LOCK as well) would be a better method.

I think that the way to go is to:

1. Do the fd_set_exclose() in fd_affix(). That will remove most of the calls
   to fd_set_exclose() *and* the open-coded versions of it.
2. Move the open_setfp locking initialization code to fd_affix() and do it
   if fp->f_type == DTYPE_VNODE. This should enable locking in all the
   appropriate cloners.

Best,

christos

Re: CVS commit: src/sys/kern

2022-09-29 Thread Robert Elz

Date:Thu, 29 Sep 2022 08:18:28 -0400
From:"Christos Zoulas" 
Message-ID:  <20220929121828.06edff...@cvs.netbsd.org>

  | Log Message:
  | Add fd_set_exclose(). It is probably better to do this automatically in
  | fd_affix()...

Since that only affects /dev/ptmx I'd suggest fixing it generally for all
cloning devices (and handling O_??LOCK as well) would be a better method.

kre

Re: CVS commit: src/sys/kern

2022-08-07 Thread Taylor R Campbell

Reverted and alternative proposed on tech-kern!  Sorry for the
unilateral toe-stomping.

Re: CVS commit: src/sys/kern

2022-08-07 Thread Paul Goyette


On Sun, 7 Aug 2022, Paul Goyette wrote:


On Sun, 7 Aug 2022, Taylor R Campbell wrote:


Module Name:src
Committed By:   riastradh
Date:   Sun Aug  7 21:17:18 UTC 2022

Modified Files:
src/sys/kern: kern_module.c

Log Message:
module(9): Disable module autounload by default.

I don't know why this was ever enabled by default; many modules are
still not safe to unload, let alone autounload.  If any autounload is
to happen by default, it should only be for modules that have opted
into it in some way after audit.


One reason for the current behavior involves the modules used for
various emulations.  When a file is executed, and none of the
currently-loaded modules can "deal" with it, we load _all_ of the
available emulation modules with the hope that one of them will
"deal with" the new executable, and with the expectation that the
remaining emulation modules will just "go away".

Modules that are known to be unsafe to unload should declare that
in their modcmd() unload (by returning EBUSY).  After all, one
might well expect that the module itself is the most likely place
that the unloadable status would be known.

Making no-autounload as the default seems like using a 20-pound
sledge hammer on a carpet tack.


I might also note that making such a fundamental behavior change
when we're so close to the -10 release, without actually providing
the suggested "opt-in" mechanism to retain current behavior, is
not very user friendly.   :-)

At least this should be discussed (on tech-kern@, perhaps) before
being unilaterally decided and committed.  IIRC, we had this
discussion some time ago, and the decision at that time was to
retain current behavior.  Unfortunately, I didn't save any pointer
to that old discussion.  :-(




++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+

!DSPAM:62f0321a29502088639869!




++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+

Re: CVS commit: src/sys/kern

2022-08-07 Thread Paul Goyette


On Sun, 7 Aug 2022, Taylor R Campbell wrote:


Module Name:src
Committed By:   riastradh
Date:   Sun Aug  7 21:17:18 UTC 2022

Modified Files:
src/sys/kern: kern_module.c

Log Message:
module(9): Disable module autounload by default.

I don't know why this was ever enabled by default; many modules are
still not safe to unload, let alone autounload.  If any autounload is
to happen by default, it should only be for modules that have opted
into it in some way after audit.


One reason for the current behavior involves the modules used for
various emulations.  When a file is executed, and none of the
currently-loaded modules can "deal" with it, we load _all_ of the
available emulation modules with the hope that one of them will
"deal with" the new executable, and with the expectation that the
remaining emulation modules will just "go away".

Modules that are known to be unsafe to unload should declare that
in their modcmd() unload (by returning EBUSY).  After all, one
might well expect that the module itself is the most likely place
that the unloadable status would be known.

Making no-autounload as the default seems like using a 20-pound
sledge hammer on a carpet tack.


++--+--+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses:|
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com|
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org  |
| & Network Engineer |  | pgoyett...@gmail.com |
++--+--+

Re: CVS commit: src/sys/kern

2022-02-11 Thread Taylor R Campbell

> Date: Fri, 11 Feb 2022 16:50:16 -0800
> From: Jason Thorpe 
> 
> > On Feb 11, 2022, at 9:53 AM, Taylor R Campbell  wrote:
> > 
> >  That is, this was presumably meant to ensure (A) happens-before (B).
> >  This relation is already guaranteed by ipi(9), so there is no need
> >  for any explicit memory barrier.
> 
> Is this documented in ipi(9)?

Is now!

Re: CVS commit: src/sys/kern

2022-02-11 Thread Jason Thorpe



> On Feb 11, 2022, at 9:53 AM, Taylor R Campbell  wrote:
> 
>  That is, this was presumably meant to ensure (A) happens-before (B).
>  This relation is already guaranteed by ipi(9), so there is no need
>  for any explicit memory barrier.

Is this documented in ipi(9)?

-- thorpej

Re: CVS commit: src/sys/kern

2022-02-05 Thread Taylor R Campbell

> Date: Sat, 5 Feb 2022 22:47:30 +0100
> From: Tobias Nygren 
> 
> On Sat, 5 Feb 2022 15:17:40 +
> Martin Husemann  wrote:
> 
> > Modified Files:
> > src/sys/kern: subr_autoconf.c
> > 
> > Log Message:
> > cfdriver_iattr_count() is only used in a KASSERT, so #ifdef DIAGNOSTIC it.
> 
> This doesn't seem to work as intended.
> In a kernel with "no options DIAGNOSTIC" I now get:
> 
> /work/src/sys/kern/subr_autoconf.c: In function 'config_search_internal':
> /work/src/sys/kern/subr_autoconf.c:1149:3: error: implicit declaration of 
> function 'cfdriver_iattr_count' [-Werror=implicit-function-declaration]

I guess we need to decorate cfdriver_iattr_count with __diagused.

(Point of the KASSERT change the other month was to eliminate most of
these `oops I forgot to try a DIAGNOSTIC / !DIAGNOSTIC build' problems
and reduce #ifdefs; thanks, clang...)

Re: CVS commit: src/sys/kern

2022-02-05 Thread Tobias Nygren

On Sat, 5 Feb 2022 15:17:40 +
Martin Husemann  wrote:

> Modified Files:
>   src/sys/kern: subr_autoconf.c
> 
> Log Message:
> cfdriver_iattr_count() is only used in a KASSERT, so #ifdef DIAGNOSTIC it.

This doesn't seem to work as intended.
In a kernel with "no options DIAGNOSTIC" I now get:

/work/src/sys/kern/subr_autoconf.c: In function 'config_search_internal':
/work/src/sys/kern/subr_autoconf.c:1149:3: error: implicit declaration of 
function 'cfdriver_iattr_count' [-Werror=implicit-function-declaration]
 1149 |   cfdriver_iattr_count(parent->dv_cfdriver) < 2);
  |   ^~~~
/work/src/sys/lib/libkern/libkern.h:279:42: note: in definition of macro 
'KASSERT'
  279 | #define KASSERT(e)  ((void)sizeof((long)(e)))

Re: CVS commit: src/sys/kern

2021-09-22 Thread Rin Okuyama


On 2021/09/22 14:42, Rin Okuyama wrote:

Module Name:src
Committed By:   rin
Date:   Wed Sep 22 05:42:19 UTC 2021

Modified Files:
src/sys/kern: kern_ksyms.c

Log Message:
ksymsmmap: Add missing uao_reference(9) call for ks->ks_uobj.

Fix failure for savecore(8) and subsequent kernel panic, introduced to
kern_ksyms.c rev 1.03, at least for sh3 and alpha.


Oops, I meant rev 1.103 here.


For sh3 and alpha, savecore(8) supports coff and ecoff, respectively, via
libkvm via nlist(3). nlist(3) routines for coff and ecoff use mmap(2) and
munmap(2) for /dev/ksyms.

This munmap(2) decrements reference count for ks->ks_uobj. Unless it is
incremented in ksymsmmap(), ks->ks_uobj will be freed unexpectedly.


To generate a diff of this commit:
cvs rdiff -u -r1.104 -r1.105 src/sys/kern/kern_ksyms.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Re: CVS commit: src/sys/kern

2021-08-17 Thread Michael van Elst

On Tue, Aug 17, 2021 at 11:39:26PM +, Taylor R Campbell wrote:
> > 
> > Log Message:
> > skip symbol tables that were unloaded again to avoid EFAULT when reading
> > ksyms.
> > 
> > also restore TAILQ_FOREACH idiom.
> 
> This change isn't quite right: Reading past st = ksyms_last_snapshot
> in ksymsread, or in any context where we rely on the snapshot without
> holding ksyms_lock, is not allowed -- it will lead to a corrupted view
> of the snapshot, and possibly end up reading uninitialized memory.
> 
> That's why this code didn't use TAILQ_FOREACH -- it must stop at
> ksyms_last_snapshot; if it gets to the end of the list, and witnesses
> st = NULL, then that's a bug.

TAILQ_FOREACH just adds another exit condition that prevents entering
the loop with st = NULL.

A problem might be to continue the loop in case ksyms_last_snapshot itself
is gone.

Something like:

if (st->sd_gone)
goto skip;
...
skip:
if (st == ksyms_last_snapshot)
break;

avoids that case.

> The code also didn't skip entries with st->sd_gone because we arrange
> for st->sd_nmap not to be freed until the last ksymsclose, at which
> point no more ksymsread can be active.

Keeping sd_nmap isn't sufficient, ksymsread failed with EFAULT as
you still derefence pointers to the unloaded module.
Skipping the unloaded module when reading the symbols avoids this.

Re: CVS commit: src/sys/kern

2021-08-17 Thread Taylor R Campbell

> Module Name:src
> Committed By:   mlelstv
> Date:   Sun Jul 18 06:57:28 UTC 2021
> 
> Modified Files:
> src/sys/kern: kern_ksyms.c
> 
> Log Message:
> skip symbol tables that were unloaded again to avoid EFAULT when reading
> ksyms.
> 
> also restore TAILQ_FOREACH idiom.

This change isn't quite right: Reading past st = ksyms_last_snapshot
in ksymsread, or in any context where we rely on the snapshot without
holding ksyms_lock, is not allowed -- it will lead to a corrupted view
of the snapshot, and possibly end up reading uninitialized memory.

That's why this code didn't use TAILQ_FOREACH -- it must stop at
ksyms_last_snapshot; if it gets to the end of the list, and witnesses
st = NULL, then that's a bug.

The code also didn't skip entries with st->sd_gone because we arrange
for st->sd_nmap not to be freed until the last ksymsclose, at which
point no more ksymsread can be active.  If you start skipping some
entries, that will cause assertions to fire about mismatched symbol
table size which we computed up front in ksymsopen.

I made a mistake in some previous changes: Although we defer freeing
st->sd_nmap while /dev/ksyms is open, we do not defer freeing
st->sd_strstart or st->sd_symstart.  And it is possible for a
concurrent module _unload_ to issue kobj_unload in the middle of
concurrent ksymsread and free st->sd_strstart or st->sd_symstart while
ksymsread is reading from them.  (My last round of changes was done to
fix bugs found during module _load_.)  The commit message doesn't say,
but I assume your change was meant to fix that bug.

Unfortunately, the change doesn't completely fix it -- we could easily
find ourselves in the following situation:

CPU 0   CPU 1
-   -
read from /dev/ksyms
modunload foo.kmod
load st->sd_gone = false
kobj_unload
store st->sd_gone = true
kobj_free(ko, ko->ko_symtab);
copy from st->sd_symstart -> FAULT

In order to avoid this situation, we either need to:

1. Make ksyms_modunload block until last /dev/ksyms close.

   => Easy enough: slap a condvar on ksyms_opencnt, wait for
  ksyms_opencnt == 0 in ksyms_modunload, notify if ksyms_opencnt
  == 0 in ksymsclose.

   => Holding /dev/ksyms open indefinitely can block module unload
  indefinitely; unclear if this might lead to problematic deadlock
  scenarios.

   => Bonus: Can eliminate some of the logic in ksymsclose, since
  ksyms_modunload will handle it synchronously before return
  anyway.

2. Convert ksymsread to use psref or localcount.

   => Takes a little more effort to convert tailq to pslist -- which
  requires some care because struct ksyms_symtab is shared with
  userland.  So we still have to keep the tailq entry records (or
  add disgusting compat code).  This is why I didn't go for
  pserialize in my last round of changes.

   => Complicated because we would only get a snapshot within a single
  ksymsread call, not from open to close of /dev/ksyms; this would
  require some kind of mechanism to persuade userland tools to
  restart if the snapshot has changed from read to read.

   => (Can't use pserialize because we have to hold the snapshot
  stable across copyout to userland, which may sleep, which is not
  allowed in pserialize read sections.)

3. Find some other way to copy the tables or something while
   /dev/ksyms is open so they won't go away when kobj_unload runs, or
   just keep copies of the strtab and symtab for ksyms that persist
   unload -- might cost a lot of kernel memory, probably not worth
   pursuing.

In any case this change should otherwise be reverted, since it is no
necessary (sd_symstart/sd_strstart will no longer go away during
ksymsread) and it is also wrong (iteration past ksyms_last_snapshot is
not allowed).

I'm leaning toward option (1); other thoughts welcome.

Re: CVS commit: src/sys/kern

2021-06-30 Thread David Holland

On Thu, Jul 01, 2021 at 12:25:51AM -0400, Christos Zoulas wrote:
 > Modified Files:
 >  src/sys/kern: vfs_vnops.c
 > 
 > Log Message:
 > don't clear the error before we use it to determine if we are moving or 
 > duping.

oh ffs...

*goes to soak head*

-- 
David A. Holland
dholl...@netbsd.org

Re: CVS commit: src/sys/kern

2021-06-02 Thread Rin Okuyama


Hmm, misleading commit log (code itself is correct). More precisely:

s/the last element of ksyms_symtabs/ksyms_last_snapshot/

Anyway, does this help you to reintroduce
"ksyms(4): Don't skip symbol tables that are soon to be freed."?

That KASSERT does not fire anymore for aarch64, as far as I can see.

Thanks,
rin

On 2021/06/03 0:43, Rin Okuyama wrote:

Module Name:src
Committed By:   rin
Date:   Wed Jun  2 15:43:33 UTC 2021

Modified Files:
src/sys/kern: kern_ksyms.c

Log Message:
Fix regression introduced in rev 1.90:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_ksyms.c#rev1.90

in which the last element of ksyms_symtabs is skipped by mistake.


To generate a diff of this commit:
cvs rdiff -u -r1.93 -r1.94 src/sys/kern/kern_ksyms.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Modified files:

Index: src/sys/kern/kern_ksyms.c
diff -u src/sys/kern/kern_ksyms.c:1.93 src/sys/kern/kern_ksyms.c:1.94
--- src/sys/kern/kern_ksyms.c:1.93  Wed Jun  2 08:46:16 2021
+++ src/sys/kern/kern_ksyms.c   Wed Jun  2 15:43:33 2021
@@ -1,4 +1,4 @@
-/* $NetBSD: kern_ksyms.c,v 1.93 2021/06/02 08:46:16 riastradh Exp $
*/
+/* $NetBSD: kern_ksyms.c,v 1.94 2021/06/02 15:43:33 rin Exp $  */
  
  /*-

   * Copyright (c) 2008 The NetBSD Foundation, Inc.
@@ -73,7 +73,7 @@
   */
  
  #include 

-__KERNEL_RCSID(0, "$NetBSD: kern_ksyms.c,v 1.93 2021/06/02 08:46:16 riastradh Exp 
$");
+__KERNEL_RCSID(0, "$NetBSD: kern_ksyms.c,v 1.94 2021/06/02 15:43:33 rin Exp 
$");
  
  #if defined(_KERNEL) && defined(_KERNEL_OPT)

  #include "opt_copy_symtab.h"
@@ -1087,7 +1087,7 @@ ksymsread(dev_t dev, struct uio *uio, in
 */
filepos = sizeof(struct ksyms_hdr);
for (st = TAILQ_FIRST(&ksyms_symtabs);
-st != ksyms_last_snapshot;
+st != TAILQ_NEXT(ksyms_last_snapshot, sd_queue);
 st = TAILQ_NEXT(st, sd_queue)) {
if (__predict_false(st->sd_gone))
continue;
@@ -1109,7 +1109,7 @@ ksymsread(dev_t dev, struct uio *uio, in
KASSERT(filepos <= sizeof(struct ksyms_hdr) +
ksyms_hdr.kh_shdr[SYMTAB].sh_size);
for (st = TAILQ_FIRST(&ksyms_symtabs);
-st != ksyms_last_snapshot;
+st != TAILQ_NEXT(ksyms_last_snapshot, sd_queue);
 st = TAILQ_NEXT(st, sd_queue)) {
if (__predict_false(st->sd_gone))
continue;

Re: CVS commit: src/sys/kern

2021-06-02 Thread Taylor R Campbell

> Date: Wed, 2 Jun 2021 17:33:39 +0900
> From: Rin Okuyama 
> 
> On 2021/06/02 6:11, Taylor R Campbell wrote:
> > -   KASSERT(filepos <= sizeof(struct ksyms_hdr) +
> > +   KASSERT(filepos == sizeof(struct ksyms_hdr) +
> > ksyms_hdr.kh_shdr[SYMTAB].sh_size);
> ...
> 
> This KASSERT fires at least for arm and aarch64, with savecore or ``nm 
> /dev/ksyms'':

Oops.  I reverted that change for now, will investigate later.

Re: CVS commit: src/sys/kern

2021-06-02 Thread Rin Okuyama


Hi,

On 2021/06/02 6:11, Taylor R Campbell wrote:

Module Name:src
Committed By:   riastradh
Date:   Tue Jun  1 21:11:52 UTC 2021

Modified Files:
src/sys/kern: kern_ksyms.c

Log Message:
ksyms(4): Don't skip symbol tables that are soon to be freed.

They will not actually be freed until /dev/ksyms is closed, so
continued access to them remains kosher.


To generate a diff of this commit:
cvs rdiff -u -r1.91 -r1.92 src/sys/kern/kern_ksyms.c

...

Index: src/sys/kern/kern_ksyms.c
diff -u src/sys/kern/kern_ksyms.c:1.91 src/sys/kern/kern_ksyms.c:1.92
--- src/sys/kern/kern_ksyms.c:1.91  Tue Jun  1 21:11:07 2021
+++ src/sys/kern/kern_ksyms.c   Tue Jun  1 21:11:52 2021

...

-   KASSERT(filepos <= sizeof(struct ksyms_hdr) +
+   KASSERT(filepos == sizeof(struct ksyms_hdr) +
ksyms_hdr.kh_shdr[SYMTAB].sh_size);

...

This KASSERT fires at least for arm and aarch64, with savecore or ``nm 
/dev/ksyms'':

$ nm /dev/ksyms
[ 1501.3538912] panic: kernel diagnostic assertion "filepos == sizeof(struct ksyms_hdr) + 
ksyms_hdr.kh_shdr[SYMTAB].sh_size" failed: file "../../../../kern/kern_ksyms.c", 
line 1107
[ 1501.3701109] cpu0: Begin traceback...
[ 1501.3701109] 0xc9c2ecfc: netbsd:db_panic+0xc
[ 1501.3806387] 0xc9c2ed14: netbsd:vpanic+0xc4
[ 1501.3806387] 0xc9c2ed2c: netbsd:kern_assert+0x3c
[ 1501.3906766] 0xc9c2ed6c: netbsd:ksymsread+0x208
[ 1501.4006964] 0xc9c2ee1c: netbsd:spec_read+0xbc
[ 1501.4112302] 0xc9c2ee44: netbsd:VOP_READ+0x58
[ 1501.4112302] 0xc9c2ee6c: netbsd:vn_read+0x78
[ 1501.4215160] 0xc9c2eebc: netbsd:dofileread+0x80
[ 1501.4315439] 0xc9c2eeec: netbsd:sys_read+0x50
[ 1501.4426847] 0xc9c2efac: netbsd:syscall+0x188
[ 1501.4426847] cpu0: End traceback...
Stopped in pid 11825.11825 (nm) at  netbsd:cpu_Debugger+0x4:bx
r14
db>

Can you please take a look?

Thanks,
rin

Re: CVS commit: src/sys/kern (kern_event.c)

2021-01-22 Thread Paul Goyette


On Fri, 22 Jan 2021, Paul Goyette wrote:


On Thu, 21 Jan 2021, Paul Goyette wrote:


Ooopppsss ignore me - looks like this was already fixed and my update
missed it.

I'll retry.


OK, I built and installed a new kernel+userland.

Most everything works, and syslogd seems to work fine (at least, it
no longer panics during startup).

HOWEVER, firefox seems to be badly broken.  Attempting to open certain
pages results in never-ending-hang, and nothing ever gets rendered.  I
can use the Stop-Reloading "X" button, and the "oscillating dot" load
indicator stops oscillating, but nothing ever happens.  At that point,
the tab is hung and cannot load any other page, not even pages that
loaded successfully previously!  I _can_ delete the tab, and opening a
new tab works.

Some of the "failing" pages are:

airnow.gov
gmail.com
www.prudential.com/login
www.myaccountviewonline.com/AccountView/Logon


Slight correction:  above I said "nothing ever happens" but while I've
been composing this Email a couple of the above pages seem to have made
some progress (although none of them have finished and stopped the
"oscillating dot").  So "ever" is at least 5 minutes or longer ...  :)


I don't know if the kern_event.c changes are responsible, but I haven't
seen anything else recently.


I reverted kern_event.c to rev 1.110 and firefox behaves correctly.  So
it's pretty fair bet that the subsequent kern_event.c changes are the
reason for the breakage.

PR kern/55946 has been filed.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern (kern_event.c)

2021-01-22 Thread Paul Goyette


On Thu, 21 Jan 2021, Paul Goyette wrote:


Ooopppsss ignore me - looks like this was already fixed and my update
missed it.

I'll retry.


OK, I built and installed a new kernel+userland.

Most everything works, and syslogd seems to work fine (at least, it
no longer panics during startup).

HOWEVER, firefox seems to be badly broken.  Attempting to open certain
pages results in never-ending-hang, and nothing ever gets rendered.  I
can use the Stop-Reloading "X" button, and the "oscillating dot" load
indicator stops oscillating, but nothing ever happens.  At that point,
the tab is hung and cannot load any other page, not even pages that
loaded successfully previously!  I _can_ delete the tab, and opening a
new tab works.

Some of the "failing" pages are:

airnow.gov
gmail.com
www.prudential.com/login
www.myaccountviewonline.com/AccountView/Logon


Slight correction:  above I said "nothing ever happens" but while I've
been composing this Email a couple of the above pages seem to have made
some progress (although none of them have finished and stopped the
"oscillating dot").  So "ever" is at least 5 minutes or longer ...  :)


I don't know if the kern_event.c changes are responsible, but I haven't
seen anything else recently.

FWIW, I'm running firefox 83.0 from pkgsrc, around 2020-12-08





On Thu, 21 Jan 2021, Paul Goyette wrote:


This change seems to break everything!  As soon as I try to start
syslogd I hit the panic() that you added

[  28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 
kq->kq_count(1) != count(0), nmarker=1


[  28.0253983] cpu0: Begin traceback...
[  28.0253983] vpanic() at netbsd:vpanic+0x156
[  28.0253983] snprintf() at netbsd:snprintf
[  28.0253983] kqueue_check() at netbsd:kqueue_check+0x183
[  28.0253983] kevent1() at netbsd:kevent1+0x49f
[  28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33
[  28.0253983] syscall() at netbsd:syscall+0x23e
[  28.0253983] --- syscall (number 435) ---
[  28.0253983] netbsd:syscall+0x23e:
[  28.0253983] cpu0: End traceback...
[  28.0253983] fatal breakpoint trap in supervisor mode
[  28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 
0x202 cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50
[  28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 
0xa809281e72c0

Stopped in pid 1352.1352 (syslogd) at   netbsd:breakpoint+0x5:  leave

I have a full crash dump if you need any further info.


Module Name:src
Committed By:   jdolecek
Date:   Thu Jan 21 18:09:23 UTC 2021

Modified Files:
   src/sys/kern: kern_event.c

Log Message:
adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly
normal to have kq_count bigger than number of the linked entries
on the kqueue

PR kern/50094, problem pointed out by Chuck Silvers


To generate a diff of this commit:
cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern (kern_event.c)

2021-01-21 Thread Paul Goyette


Ooopppsss ignore me - looks like this was already fixed and my update
missed it.

I'll retry.


On Thu, 21 Jan 2021, Paul Goyette wrote:


This change seems to break everything!  As soon as I try to start
syslogd I hit the panic() that you added

[  28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 kq->kq_count(1) 
!= count(0), nmarker=1


[  28.0253983] cpu0: Begin traceback...
[  28.0253983] vpanic() at netbsd:vpanic+0x156
[  28.0253983] snprintf() at netbsd:snprintf
[  28.0253983] kqueue_check() at netbsd:kqueue_check+0x183
[  28.0253983] kevent1() at netbsd:kevent1+0x49f
[  28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33
[  28.0253983] syscall() at netbsd:syscall+0x23e
[  28.0253983] --- syscall (number 435) ---
[  28.0253983] netbsd:syscall+0x23e:
[  28.0253983] cpu0: End traceback...
[  28.0253983] fatal breakpoint trap in supervisor mode
[  28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 0x202 
cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50
[  28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 
0xa809281e72c0

Stopped in pid 1352.1352 (syslogd) at   netbsd:breakpoint+0x5:  leave

I have a full crash dump if you need any further info.


Module Name:src
Committed By:   jdolecek
Date:   Thu Jan 21 18:09:23 UTC 2021

Modified Files:
   src/sys/kern: kern_event.c

Log Message:
adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly
normal to have kq_count bigger than number of the linked entries
on the kqueue

PR kern/50094, problem pointed out by Chuck Silvers


To generate a diff of this commit:
cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern (kern_event.c)

2021-01-21 Thread Paul Goyette


This change seems to break everything!  As soon as I try to start
syslogd I hit the panic() that you added

[  28.0253983] panic: kqueue_scan,1491: kq=0xdc13890bc4c0 kq->kq_count(1) 
!= count(0), nmarker=1

[  28.0253983] cpu0: Begin traceback...
[  28.0253983] vpanic() at netbsd:vpanic+0x156
[  28.0253983] snprintf() at netbsd:snprintf
[  28.0253983] kqueue_check() at netbsd:kqueue_check+0x183
[  28.0253983] kevent1() at netbsd:kevent1+0x49f
[  28.0253983] sys___kevent50() at netbsd:sys___kevent50+0x33
[  28.0253983] syscall() at netbsd:syscall+0x23e
[  28.0253983] --- syscall (number 435) ---
[  28.0253983] netbsd:syscall+0x23e:
[  28.0253983] cpu0: End traceback...
[  28.0253983] fatal breakpoint trap in supervisor mode
[  28.0253983] trap type 1 code 0 rip 0x8021f415 cs 0x8 rflags 0x202 
cr2 0x78742459e000 ilevel 0x8 rsp 0xa809281ebb50
[  28.0253983] curlwp 0xdc138aa46540 pid 1352.1352 lowest kstack 
0xa809281e72c0
Stopped in pid 1352.1352 (syslogd) at   netbsd:breakpoint+0x5:  leave

I have a full crash dump if you need any further info.


Module Name:src
Committed By:   jdolecek
Date:   Thu Jan 21 18:09:23 UTC 2021

Modified Files:
src/sys/kern: kern_event.c

Log Message:
adjust kq_check() (enabled with DEBUG) to new reality - it's now 
perfectly

normal to have kq_count bigger than number of the linked entries
on the kqueue

PR kern/50094, problem pointed out by Chuck Silvers


To generate a diff of this commit:
cvs rdiff -u -r1.111 -r1.112 src/sys/kern/kern_event.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2021-01-21 Thread Tom Spindler (moof)

I believe it's this change that's made my vbox image panic at the drop of
a hat; while it occasionally panics before it even hits single user, it
also consistently panics when starting syslogd (even from single-user):

panic: kqueue_scan,1491: kq=0xc779aeff6dc0 kq->kq_count(1) != count(0), 
nmarker=1

and the call chain is syscall -> sys___kevent50 -> kevent1 -> kqueue_check

it also causes a panic while trying to dump more than half the time,
but I have managed to get an image or two. ("panic: atastart: channel 0
busy, xfer not possible", if you're curious.)

On Wed, Jan 20, 2021 at 09:39:09PM +, Jaromir Dolecek wrote:
> Module Name:  src
> Committed By: jdolecek
> Date: Wed Jan 20 21:39:09 UTC 2021
>
> Modified Files:
>   src/sys/kern: kern_event.c
>
> Log Message:
> fix a race in kqueue_scan() - when multiple threads check the same
> kqueue, it could happen other thread seen empty kqueue while kevent
> was being checked for re-firing and re-queued
>
> make sure to keep retrying if there are outstanding kevents even
> if no kevent is found on first pass through the queue, and only
> drop the KN_QUEUED flag and kq_count when actually completely done
> with the kevent
>
> change is inspired by the FreeBSD in-flux handling, but without
> introducing the reference counting
>
> PR kern/50094 by Christof Meerwald
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.110 -r1.111 src/sys/kern/kern_event.c
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>

Re: CVS commit: src/sys/kern

2020-11-05 Thread Rin Okuyama


On 2020/11/05 2:45, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


On 2020/11/04 22:52, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.


If the module is built-in (``options PTRACE'' selected in the config
file), then the module will already have been initialized.

If the module is not built-in, then a privileged user will need to
modload(8) the module.

Prior to this change, the built-in ptrace_common module was calling
the ptrace module's init/fini routine.  Quite likely ptrace_common
was built-in (due to inclusion of file-system PROCFS), so the init
was handled during init_main().  This change ensures that the ptrace
init/fini routines are called ONLY if the ptrace module itself (not
the ptrace_common) routine is built-in.

Please check to make sure that ``options PTRACE'' is included in
your kernel config.


Yes:

$ config -x netbsd.gdb | grep PTRACE
###> options    PTRACE  # Include ptrace(2) syscall
###> options    PTRACE_HOOKS    # Include ptrace hooks

The problem is that ptrace_{init,fini}() are not called from
ptrace_modcmd():

https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184

   184 static int
   185 ptrace_modcmd(modcmd_t cmd, void *arg)
   186 {
   187 int error;
   188
   189 switch (cmd) {
   190 case MODULE_CMD_INIT:
   191 error = syscall_establish(&emul_netbsd, ptrace_syscalls);
   192 break;
   193 case MODULE_CMD_FINI:
   194 error = syscall_disestablish(&emul_netbsd, ptrace_syscalls);
   195 break;
   196 default:
   197 error = ENOTTY;
   198 break;
   199 }
   200 return error;
   201 }


Yes that would be a problem.


Can you easily confirm that ktrace(2) is unusable for non-privileged
users on 9.99.75 kernel:

$ gdb echo
GNU gdb (GDB) 8.3
...
(gdb) b main
Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58.
(gdb) r
Starting program: /bin/echo
warning: Could not trace the inferior process.
Error:
warning: ptrace: Operation not permitted
terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR'
[1]   Abort trap (core dumped) gdb echo

Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to
sys_ptrace.c, IMO.


I have some prior obligations, so I won't be able to look at this
until this evening.

Thanks for the detailed analysis.


Fixed. Thanks!

rin

Re: CVS commit: src/sys/kern

2020-11-04 Thread Paul Goyette


OK, this is my mistake.  When I change the calls in the ptrace_common
modcmd, I should also have renamed the functions (including their
entries in sys/ptrace.h).  I will commit this shortly, before I leave.

Thanks for the "recipe" for reproducing the problem - I will try it 
later when I return.



On Wed, 4 Nov 2020, Rin Okuyama wrote:


On 2020/11/04 22:52, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.


If the module is built-in (``options PTRACE'' selected in the config
file), then the module will already have been initialized.

If the module is not built-in, then a privileged user will need to
modload(8) the module.

Prior to this change, the built-in ptrace_common module was calling
the ptrace module's init/fini routine.  Quite likely ptrace_common
was built-in (due to inclusion of file-system PROCFS), so the init
was handled during init_main().  This change ensures that the ptrace
init/fini routines are called ONLY if the ptrace module itself (not
the ptrace_common) routine is built-in.

Please check to make sure that ``options PTRACE'' is included in
your kernel config.


Yes:

$ config -x netbsd.gdb | grep PTRACE
###> optionsPTRACE  # Include ptrace(2) syscall
###> optionsPTRACE_HOOKS# Include ptrace hooks

The problem is that ptrace_{init,fini}() are not called from
ptrace_modcmd():

https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184

   184 static int
   185 ptrace_modcmd(modcmd_t cmd, void *arg)
   186 {
   187  int error;
   188
   189  switch (cmd) {
   190  case MODULE_CMD_INIT:
   191 		error = syscall_establish(&emul_netbsd, 
ptrace_syscalls);

   192  break;
   193  case MODULE_CMD_FINI:
   194 		error = syscall_disestablish(&emul_netbsd, 
ptrace_syscalls);

   195  break;
   196  default:
   197  error = ENOTTY;
   198  break;
   199  }
   200  return error;
   201 }

Can you easily confirm that ktrace(2) is unusable for non-privileged
users on 9.99.75 kernel:

$ gdb echo
GNU gdb (GDB) 8.3
...
(gdb) b main
Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58.
(gdb) r
Starting program: /bin/echo
warning: Could not trace the inferior process.
Error:
warning: ptrace: Operation not permitted
terminate called after throwing an instance of 
'gdb_exception_RETURN_MASK_ERROR'

[1]   Abort trap (core dumped) gdb echo

Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to
sys_ptrace.c, IMO.

Thanks,
rin

!DSPAM:5fa2b869233318156490363!




++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2020-11-04 Thread Paul Goyette


On Wed, 4 Nov 2020, Rin Okuyama wrote:


On 2020/11/04 22:52, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.


If the module is built-in (``options PTRACE'' selected in the config
file), then the module will already have been initialized.

If the module is not built-in, then a privileged user will need to
modload(8) the module.

Prior to this change, the built-in ptrace_common module was calling
the ptrace module's init/fini routine.  Quite likely ptrace_common
was built-in (due to inclusion of file-system PROCFS), so the init
was handled during init_main().  This change ensures that the ptrace
init/fini routines are called ONLY if the ptrace module itself (not
the ptrace_common) routine is built-in.

Please check to make sure that ``options PTRACE'' is included in
your kernel config.


Yes:

$ config -x netbsd.gdb | grep PTRACE
###> optionsPTRACE  # Include ptrace(2) syscall
###> optionsPTRACE_HOOKS# Include ptrace hooks

The problem is that ptrace_{init,fini}() are not called from
ptrace_modcmd():

https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184

   184 static int
   185 ptrace_modcmd(modcmd_t cmd, void *arg)
   186 {
   187  int error;
   188
   189  switch (cmd) {
   190  case MODULE_CMD_INIT:
   191 		error = syscall_establish(&emul_netbsd, 
ptrace_syscalls);

   192  break;
   193  case MODULE_CMD_FINI:
   194 		error = syscall_disestablish(&emul_netbsd, 
ptrace_syscalls);

   195  break;
   196  default:
   197  error = ENOTTY;
   198  break;
   199  }
   200  return error;
   201 }


Yes that would be a problem.


Can you easily confirm that ktrace(2) is unusable for non-privileged
users on 9.99.75 kernel:

$ gdb echo
GNU gdb (GDB) 8.3
...
(gdb) b main
Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58.
(gdb) r
Starting program: /bin/echo
warning: Could not trace the inferior process.
Error:
warning: ptrace: Operation not permitted
terminate called after throwing an instance of 
'gdb_exception_RETURN_MASK_ERROR'

[1]   Abort trap (core dumped) gdb echo

Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to
sys_ptrace.c, IMO.


I have some prior obligations, so I won't be able to look at this
until this evening.

Thanks for the detailed analysis.



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2020-11-04 Thread Rin Okuyama


On 2020/11/04 22:52, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.


If the module is built-in (``options PTRACE'' selected in the config
file), then the module will already have been initialized.

If the module is not built-in, then a privileged user will need to
modload(8) the module.

Prior to this change, the built-in ptrace_common module was calling
the ptrace module's init/fini routine.  Quite likely ptrace_common
was built-in (due to inclusion of file-system PROCFS), so the init
was handled during init_main().  This change ensures that the ptrace
init/fini routines are called ONLY if the ptrace module itself (not
the ptrace_common) routine is built-in.

Please check to make sure that ``options PTRACE'' is included in
your kernel config.


Yes:

$ config -x netbsd.gdb | grep PTRACE
###> optionsPTRACE  # Include ptrace(2) syscall
###> optionsPTRACE_HOOKS# Include ptrace hooks

The problem is that ptrace_{init,fini}() are not called from
ptrace_modcmd():

https://nxr.netbsd.org/xref/src/sys/kern/sys_ptrace.c#184

184 static int
185 ptrace_modcmd(modcmd_t cmd, void *arg)
186 {
187 int error;
188
189 switch (cmd) {
190 case MODULE_CMD_INIT:
191 error = syscall_establish(&emul_netbsd, 
ptrace_syscalls);
192 break;
193 case MODULE_CMD_FINI:
194 error = syscall_disestablish(&emul_netbsd, 
ptrace_syscalls);
195 break;
196 default:
197 error = ENOTTY;
198 break;
199 }
200 return error;
201 }

Can you easily confirm that ktrace(2) is unusable for non-privileged
users on 9.99.75 kernel:

$ gdb echo
GNU gdb (GDB) 8.3
...
(gdb) b main
Breakpoint 1 at 0x950: file /usr/src/bin/echo/echo.c, line 58.
(gdb) r
Starting program: /bin/echo
warning: Could not trace the inferior process.
Error:
warning: ptrace: Operation not permitted
terminate called after throwing an instance of 'gdb_exception_RETURN_MASK_ERROR'
[1]   Abort trap (core dumped) gdb echo

Also, ptrace_{init,fini} should be moved from sys_ptrace_common.c to
sys_ptrace.c, IMO.

Thanks,
rin

Re: CVS commit: src/sys/kern

2020-11-04 Thread Paul Goyette


On Wed, 4 Nov 2020, Rin Okuyama wrote:


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.


If the module is built-in (``options PTRACE'' selected in the config
file), then the module will already have been initialized.

If the module is not built-in, then a privileged user will need to
modload(8) the module.

Prior to this change, the built-in ptrace_common module was calling
the ptrace module's init/fini routine.  Quite likely ptrace_common
was built-in (due to inclusion of file-system PROCFS), so the init
was handled during init_main().  This change ensures that the ptrace
init/fini routines are called ONLY if the ptrace module itself (not
the ptrace_common) routine is built-in.

Please check to make sure that ``options PTRACE'' is included in
your kernel config.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2020-11-04 Thread Rin Okuyama


On 2020/11/04 22:31, Paul Goyette wrote:

On Wed, 4 Nov 2020, Rin Okuyama wrote:


Hi,

On 2020/10/26 0:55, Paul Goyette wrote:

Module Name:    src
Committed By:    pgoyette
Date:    Sun Oct 25 15:55:37 UTC 2020

Modified Files:
src/sys/kern: sys_ptrace_common.c

Log Message:
ptrace_Common is a module unto itself.  Don't use the ptrace module's
init/fini routines.


To generate a diff of this commit:
cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


This commit makes ptrace(2) unusable for non-privileged users;
ptrace_common_{init,fini}() should be called from somewhere.


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


Oops, sorry, I meant ptrace_{init,fini}(). These functions are not called
at all since this commit, which forbids ptrace(2) for non-root users.

Thanks,
rin

Re: CVS commit: src/sys/kern

2020-11-04 Thread Paul Goyette


On Wed, 4 Nov 2020, Rin Okuyama wrote:


Hi,

On 2020/10/26 0:55, Paul Goyette wrote:

Module Name:src
Committed By:   pgoyette
Date:   Sun Oct 25 15:55:37 UTC 2020

Modified Files:
src/sys/kern: sys_ptrace_common.c

Log Message:
ptrace_Common is a module unto itself.  Don't use the ptrace module's
init/fini routines.


To generate a diff of this commit:
cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


This commit makes ptrace(2) unusable for non-privileged users;
ptrace_common_{init,fini}() should be called from somewhere.


ptrace_common_{init,fini} are called from the ptrace_common module's
modcmd routine in kern/sys_ptrace_common.c.  The modcmd routine in
turn is called at module initialization time.  In the case of a
built-in module, it will be called by module_init via init_main; if
the module is loaded (or auto-loaded) module_load will call the
modcmd routine.

The module will be built-in if either ``options PTRACE'' or ``file-
system PROCFS'' is set in the kernel configuration file.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2020-11-03 Thread Rin Okuyama


Hi,

On 2020/10/26 0:55, Paul Goyette wrote:

Module Name:src
Committed By:   pgoyette
Date:   Sun Oct 25 15:55:37 UTC 2020

Modified Files:
src/sys/kern: sys_ptrace_common.c

Log Message:
ptrace_Common is a module unto itself.  Don't use the ptrace module's
init/fini routines.


To generate a diff of this commit:
cvs rdiff -u -r1.87 -r1.88 src/sys/kern/sys_ptrace_common.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


This commit makes ptrace(2) unusable for non-privileged users;
ptrace_common_{init,fini}() should be called from somewhere.

Thanks,
rin

Re: CVS commit: src/sys/kern

2020-10-19 Thread Christos Zoulas

In article <20201019144701.a6d3bf...@cvs.netbsd.org>,
Kamil Rytarowski  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  kamil
>Date:  Mon Oct 19 14:47:01 UTC 2020
>
>Modified Files:
>   src/sys/kern: sys_ptrace.c sys_ptrace_common.c
>
>Log Message:
>Remove obsolete references to 4.4BSD papers

This removes the ptrace module code. Please diff before committing!

christos

re: CVS commit: src/sys/kern

2020-10-08 Thread David H. Gutteridge


On Mon, 07 Sep 2020 at 20:47:25 +1000, matthew green wrote:

"Jason R Thorpe" writes:

Module Name:src
Committed By:   thorpej
Date:   Mon Sep  7 03:50:41 UTC 2020

Modified Files:
src/sys/kern: files.kern init_main.c

Log Message:
Add the ability to set an alternate cnmagic in the kernel config
file, e.g.:

optionsCNMAGIC="\"+\""


thanks!  i need this for my er4 that some how does do break properly..

options(4) update?


I just added an entry for this to options(4). The bare bones, anyway.

(It seems that DDB_BREAK_CHAR is only used in one place now, that being
src/sys/arch/arm/sa11x0/sa11x0_com.c. I'm not sure if another detail to
contextualize DDB_BREAK_CHAR vs. CNMAGIC would be warranted?)

Dave

re: CVS commit: src/sys/kern

2020-09-07 Thread matthew green

"Jason R Thorpe" writes:
> Module Name:  src
> Committed By: thorpej
> Date: Mon Sep  7 03:50:41 UTC 2020
> 
> Modified Files:
>   src/sys/kern: files.kern init_main.c
> 
> Log Message:
> Add the ability to set an alternate cnmagic in the kernel config
> file, e.g.:
> 
> optionsCNMAGIC="\"+\""

thanks!  i need this for my er4 that some how does do break properly..

options(4) update?

Re: CVS commit: src/sys/kern

2020-08-02 Thread Kamil Rytarowski

On 02.08.2020 17:50, Taylor R Campbell wrote:
>> Date: Sun, 2 Aug 2020 17:35:06 +0200
>> From: Kamil Rytarowski 
>>
>> On 02.08.2020 16:44, Taylor R Campbell wrote:
 Date: Sun, 2 Aug 2020 16:04:15 +0200
 From: Kamil Rytarowski 

 On 02.08.2020 15:57, Taylor R Campbell wrote:
> But it sounds like the original motivation is that it triggered
> -Wvla...which frankly strikes me as a compiler bug since there's
> obviously no actual VLA created in sizeof; as far as I can tell
> there's no semantic difference between sizeof(device_t[n]) and
> sizeof(device_t) * n.

 This is not true:
>>>
>>> Which part of what I said are you claiming is not true, and what are
>>> you illustrating with the example program below?
>>
>> Calling it a compiler bug.
>>
>> Clang behaves the same way.
>>
>> $ clang -Wvla test.c
>> test.c:6:37: warning: variable length array used [-Wvla]
>> printf("sizeof = %zu\n", sizeof(int[argc]));
>>^
>> 1 warning generated.
>>
>> Creating VLA is not needed for using it as an intermediate. In practice
>> in most/all cases it is optimized and actual VLA is not allocated.
> 
> This is not a matter of optimization in practice.  It's absolutely not
> an `intermediate' at all -- there is no VLA created, period, any more
> than sizeof(malloc(1)) causes any actual call to malloc or sizeof(*p)
> causes any dereference of a pointer.
> 
> This makes -Wvla less useful as a tool, because apparently it flags
> code that unquestionably does not have any bad effects of VLAs.  This
> happens because -Wvla is dumb -- it just looks for the syntax, not for
> the semantics of creating VLAs.  That's why I call it a bug -- it
> raises a false positive that makes it less useful.
> 

Compilers warns about VLA using ("variable length array used"), not
creating. There is no distinct compiler warning to discriminate VLA
creating from usage.

Anyway, code just using VLA is not that frequent, avoiding it is rather
simple and VLA can be avoided altogether (at least for non-external).



signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-08-02 Thread Taylor R Campbell

> Date: Sun, 2 Aug 2020 17:35:06 +0200
> From: Kamil Rytarowski 
> 
> On 02.08.2020 16:44, Taylor R Campbell wrote:
> >> Date: Sun, 2 Aug 2020 16:04:15 +0200
> >> From: Kamil Rytarowski 
> >>
> >> On 02.08.2020 15:57, Taylor R Campbell wrote:
> >>> But it sounds like the original motivation is that it triggered
> >>> -Wvla...which frankly strikes me as a compiler bug since there's
> >>> obviously no actual VLA created in sizeof; as far as I can tell
> >>> there's no semantic difference between sizeof(device_t[n]) and
> >>> sizeof(device_t) * n.
> >>
> >> This is not true:
> > 
> > Which part of what I said are you claiming is not true, and what are
> > you illustrating with the example program below?
> 
> Calling it a compiler bug.
> 
> Clang behaves the same way.
> 
> $ clang -Wvla test.c
> test.c:6:37: warning: variable length array used [-Wvla]
> printf("sizeof = %zu\n", sizeof(int[argc]));
>^
> 1 warning generated.
> 
> Creating VLA is not needed for using it as an intermediate. In practice
> in most/all cases it is optimized and actual VLA is not allocated.

This is not a matter of optimization in practice.  It's absolutely not
an `intermediate' at all -- there is no VLA created, period, any more
than sizeof(malloc(1)) causes any actual call to malloc or sizeof(*p)
causes any dereference of a pointer.

This makes -Wvla less useful as a tool, because apparently it flags
code that unquestionably does not have any bad effects of VLAs.  This
happens because -Wvla is dumb -- it just looks for the syntax, not for
the semantics of creating VLAs.  That's why I call it a bug -- it
raises a false positive that makes it less useful.

Re: CVS commit: src/sys/kern

2020-08-02 Thread Kamil Rytarowski

On 02.08.2020 16:44, Taylor R Campbell wrote:
>> Date: Sun, 2 Aug 2020 16:04:15 +0200
>> From: Kamil Rytarowski 
>>
>> On 02.08.2020 15:57, Taylor R Campbell wrote:
>>> But it sounds like the original motivation is that it triggered
>>> -Wvla...which frankly strikes me as a compiler bug since there's
>>> obviously no actual VLA created in sizeof; as far as I can tell
>>> there's no semantic difference between sizeof(device_t[n]) and
>>> sizeof(device_t) * n.
>>
>> This is not true:
> 
> Which part of what I said are you claiming is not true, and what are
> you illustrating with the example program below?
> 

Calling it a compiler bug.

Clang behaves the same way.

$ clang -Wvla test.c
test.c:6:37: warning: variable length array used [-Wvla]
printf("sizeof = %zu\n", sizeof(int[argc]));
   ^
1 warning generated.

Creating VLA is not needed for using it as an intermediate. In practice
in most/all cases it is optimized and actual VLA is not allocated.

> It seems to illustrate that sizeof(int[argc]) does exactly what one
> would expect it to do -- return the size of an argc-length array of
> ints, just like sizeof(int) * argc does.
> 
> 

The result is the same and I find the change as beneficial.

> 
>> #include 
>>
>> int
>> main(int argc, char **argv)
>> {
>> printf("sizeof = %zu\n", sizeof(int[argc]));
>> return 0;
>> }
>>
>> $ ./a.out
>>
>> sizeof = 4
>> $ ./a.out 12 3
>> sizeof = 12
>> $ ./a.out 12 3 45 6
>> sizeof = 20




signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-08-02 Thread Jaromír Doleček

Le dim. 2 août 2020 à 15:57, Taylor R Campbell
 a écrit :
> Why does it improve readability?

Certainly using cringe language features does not help readability.

> What else does -Wvla choke on in src/sys?

Some drm drivers, and uvm, particularly uvm_bio.c. I'd like to work
towards enabling -Wvla in kernel (i.e. remove all VLAs, same as Linux
kernel did in 2018), but I have some other stuff I want to finish
before I pick a new project.

Jaromir

Re: CVS commit: src/sys/kern

2020-08-02 Thread Taylor R Campbell

> Date: Sun, 2 Aug 2020 16:04:15 +0200
> From: Kamil Rytarowski 
> 
> On 02.08.2020 15:57, Taylor R Campbell wrote:
> > But it sounds like the original motivation is that it triggered
> > -Wvla...which frankly strikes me as a compiler bug since there's
> > obviously no actual VLA created in sizeof; as far as I can tell
> > there's no semantic difference between sizeof(device_t[n]) and
> > sizeof(device_t) * n.
> 
> This is not true:

Which part of what I said are you claiming is not true, and what are
you illustrating with the example program below?

It seems to illustrate that sizeof(int[argc]) does exactly what one
would expect it to do -- return the size of an argc-length array of
ints, just like sizeof(int) * argc does.

> #include 
> 
> int
> main(int argc, char **argv)
> {
> printf("sizeof = %zu\n", sizeof(int[argc]));
> return 0;
> }
> 
> $ ./a.out
> 
> sizeof = 4
> $ ./a.out 12 3
> sizeof = 12
> $ ./a.out 12 3 45 6
> sizeof = 20

Re: CVS commit: src/sys/kern

2020-08-02 Thread Joerg Sonnenberger

On Sun, Aug 02, 2020 at 01:57:11PM +, Taylor R Campbell wrote:
> But it sounds like the original motivation is that it triggered
> -Wvla...which frankly strikes me as a compiler bug since there's
> obviously no actual VLA created in sizeof; as far as I can tell
> there's no semantic difference between sizeof(device_t[n]) and
> sizeof(device_t) * n.

Yes, it seems strange to trigger a VLA warning in code that it is
required to be compile-time evaluated.

Joerg

Re: CVS commit: src/sys/kern

2020-08-02 Thread Kamil Rytarowski

On 02.08.2020 16:25, Paul Goyette wrote:
> On Sun, 2 Aug 2020, Kamil Rytarowski wrote:
> 
>> On 02.08.2020 15:57, Taylor R Campbell wrote:
>>> But it sounds like the original motivation is that it triggered
>>> -Wvla...which frankly strikes me as a compiler bug since there's
>>> obviously no actual VLA created in sizeof; as far as I can tell
>>> there's no semantic difference between sizeof(device_t[n]) and
>>> sizeof(device_t) * n.
>>>
>>
>> This is not true:
>>
>> #include 
>>
>> int
>> main(int argc, char **argv)
>> {
>>    printf("sizeof = %zu\n", sizeof(int[argc]));
>>    return 0;
>> }
>>
>> $ ./a.out
>>
>> sizeof = 4
>> $ ./a.out 12 3
>> sizeof = 12
>> $ ./a.out 12 3 45 6
>> sizeof = 20
> 
> Modifying your example slightly, I print both variations:
> 
> #include 
> 
> int
> main(int argc, char **argv)
> {
> printf("sizeof = %zu\t%zu\n", sizeof(int[argc]), sizeof(int) * argc);
> return 0;
> }
> speedy:paul {653} ./a.out
> sizeof = 4  4
> speedy:paul {654} ./a.out 12 3
> sizeof = 12 12
> speedy:paul {655} ./a.out 12 3 45 6
> sizeof = 20 20
> 
> 
> Looks the same to me!
> 

The result is the same, but one uses VLA (at least formally), other not.

> 
> ++--+---+
> | Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
> | (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
> | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
> ++--+---+




signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-08-02 Thread Paul Goyette


On Sun, 2 Aug 2020, Kamil Rytarowski wrote:


On 02.08.2020 15:57, Taylor R Campbell wrote:

But it sounds like the original motivation is that it triggered
-Wvla...which frankly strikes me as a compiler bug since there's
obviously no actual VLA created in sizeof; as far as I can tell
there's no semantic difference between sizeof(device_t[n]) and
sizeof(device_t) * n.



This is not true:

#include 

int
main(int argc, char **argv)
{
   printf("sizeof = %zu\n", sizeof(int[argc]));
   return 0;
}

$ ./a.out

sizeof = 4
$ ./a.out 12 3
sizeof = 12
$ ./a.out 12 3 45 6
sizeof = 20


Modifying your example slightly, I print both variations:

#include 

int
main(int argc, char **argv)
{
printf("sizeof = %zu\t%zu\n", sizeof(int[argc]), sizeof(int) * argc);
return 0;
}
speedy:paul {653} ./a.out
sizeof = 4  4
speedy:paul {654} ./a.out 12 3
sizeof = 12 12
speedy:paul {655} ./a.out 12 3 45 6
sizeof = 20 20


Looks the same to me!


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+

Re: CVS commit: src/sys/kern

2020-08-02 Thread Kamil Rytarowski

On 02.08.2020 15:57, Taylor R Campbell wrote:
> But it sounds like the original motivation is that it triggered
> -Wvla...which frankly strikes me as a compiler bug since there's
> obviously no actual VLA created in sizeof; as far as I can tell
> there's no semantic difference between sizeof(device_t[n]) and
> sizeof(device_t) * n.
> 

This is not true:

#include 

int
main(int argc, char **argv)
{
printf("sizeof = %zu\n", sizeof(int[argc]));
return 0;
}

$ ./a.out

sizeof = 4
$ ./a.out 12 3
sizeof = 12
$ ./a.out 12 3 45 6
sizeof = 20



signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-08-02 Thread Taylor R Campbell

> Date: Sun, 2 Aug 2020 10:47:21 +0200
> From: Jarom�r Dole ek 
> 
> Readability first and foremost in this case.
> 
> I was exploring if I can disable VLAs for the kernel altogether, this
> can't be done for now. Nevertheless, this change looked like it would
> be useful to make anyway.

Why does it improve readability?  Surely if we want the size of an
array of device_t of length n it's more to the point to say
sizeof(device_t[n]) directly than to talk about multiplying
sizeof(device_t) by n.  If that were the only question I think the
change makes readability worse, personally!

But it sounds like the original motivation is that it triggered
-Wvla...which frankly strikes me as a compiler bug since there's
obviously no actual VLA created in sizeof; as far as I can tell
there's no semantic difference between sizeof(device_t[n]) and
sizeof(device_t) * n.

What else does -Wvla choke on in src/sys?

Re: CVS commit: src/sys/kern

2020-08-02 Thread Jaromír Doleček

Readability first and foremost in this case.

I was exploring if I can disable VLAs for the kernel altogether, this
can't be done for now. Nevertheless, this change looked like it would
be useful to make anyway.

Le dim. 2 août 2020 à 01:15, Taylor R Campbell
 a écrit :
>
> > Module Name:src
> > Committed By:   jdolecek
> > Date:   Sat Aug  1 11:18:26 UTC 2020
> >
> > Modified Files:
> > src/sys/kern: subr_autoconf.c
> >
> > Log Message:
> > avoid VLA for the sizeof() calculations
>
> Why?

Re: CVS commit: src/sys/kern

2020-08-01 Thread Taylor R Campbell

> Module Name:src
> Committed By:   jdolecek
> Date:   Sat Aug  1 11:18:26 UTC 2020
> 
> Modified Files:
> src/sys/kern: subr_autoconf.c
> 
> Log Message:
> avoid VLA for the sizeof() calculations

Why?

Re: [sctp fix] Re: CVS commit: src/sys/kern

2020-07-18 Thread Maxime Villard


Le 28/04/2020 à 09:16, Luke Mewburn a écrit :

On 20-04-26 18:15, Maxime Villard wrote:
   |  - There was no demonstrated use-case justifying importing it. In addition,
   |major OSes like Windows and macOS do not implement SCTP. There just is 
no
   |demand for SCTP on the market; and on NetBSD, proportionally even less.

SCTP is used in mobile telco environments; the control plane for
3G and 4G networks uses SCTP (or TCP as an option, but mostly SCTP).


That may be true. I am mostly aware of the multimedia use cases.


Le 01/05/2020 à 18:46, m...@netbsd.org a écrit :

We can setup an equivalence: put as much effort into the SCTP removal
proposal as there was for the SCTP introduction proposal.

Since SCTP was just dropped in src without any prior discussion, I don't
think we need any discussion for removing it.


That would be fair, yes.


Le 02/05/2020 à 13:55, m...@netbsd.org a écrit :

I'm sorry for picking on SCTP in particular. Apparently it was added
because it was listed in src/doc/roadmaps.networking (and it's still
listed there).


But this doesn't address your own point, does it. The one-liner in
src/doc/roadmaps/networking gives no explanation on why we would want SCTP
in the first place. It doesn't even say where the code was imported from.

This brings the question of who, exactly, made this list. Several of the
items on this wanted list are actively *not* wanted, as Christos noted in
five of his "Comment[christos]".

At this point there is no doubt that SCTP in the NetBSD kernel has no
future. The only viable option I see is usrsctp:

https://github.com/sctplab/usrsctp

A userland version of the code, but portable, and actively maintained.

While here, notice the crazy buffer overflows that were fixed in it and
are still present in the NetBSD SCTP kernel code... Adds to my point,
that the code is of extremely poor quality.

Maxime

Re: CVS commit: src/sys/kern

2020-06-01 Thread Rin Okuyama


On 2020/06/02 2:08, Joerg Sonnenberger wrote:

On Sun, May 31, 2020 at 11:24:20PM +, Rin Okuyama wrote:

Module Name:src
Committed By:   rin
Date:   Sun May 31 23:24:20 UTC 2020

Modified Files:
src/sys/kern: kern_timeout.c

Log Message:
Stop allocating buffers dynamically in a DDB session, in order not to
disturb on-going debugged state of kernel datastructures.


This breaks the build with clang as it uses a declared-but-not-defined
type callout_cpu.


Fixed. Sorry for breakage.

Thanks,
rin

Re: CVS commit: src/sys/kern

2020-06-01 Thread Joerg Sonnenberger

On Sun, May 31, 2020 at 11:24:20PM +, Rin Okuyama wrote:
> Module Name:  src
> Committed By: rin
> Date: Sun May 31 23:24:20 UTC 2020
> 
> Modified Files:
>   src/sys/kern: kern_timeout.c
> 
> Log Message:
> Stop allocating buffers dynamically in a DDB session, in order not to
> disturb on-going debugged state of kernel datastructures.

This breaks the build with clang as it uses a declared-but-not-defined
type callout_cpu.

Joerg

re: CVS commit: src/sys/kern

2020-05-31 Thread matthew green

"Rin Okuyama" writes:
> Module Name:  src
> Committed By: rin
> Date: Sun May 31 08:33:48 UTC 2020
> 
> Modified Files:
>   src/sys/kern: kern_timeout.c
> 
> Log Message:
> db_show_callout(): struct callout_cpu and cpu_info are too much for stack.
> 
> XXX
> DDB can be running in the interrupt context, e.g., when activated from
> console. Therefore, use kmem_intr_alloc(9) instead of kmem_alloc(9).
> 
> Frame size, e.g. for m68k, becomes:
> 9212 (oops!) --> 0

please don't use allocators from ddb.  inspection from ddb
shouldn't go change internal datastructures and rely upon
things that can break working where possible

allocate a single static in the bss to use?

.mrg.

Re: CVS commit: src/sys/kern

2020-05-31 Thread Jason Thorpe

> On May 31, 2020, at 1:33 AM, Rin Okuyama  wrote:
> 
> db_show_callout(): struct callout_cpu and cpu_info are too much for stack.
> 
> XXX
> DDB can be running in the interrupt context, e.g., when activated from
> console. Therefore, use kmem_intr_alloc(9) instead of kmem_alloc(9).
> 
> Frame size, e.g. for m68k, becomes:
>9212 (oops!) --> 0
> 

I'm not sure we want to be calling memory allocators from within ddb.  I think 
it would be better to simply allocate that space in the .bss segment -- ddb 
only runs on 1 CPU at a time.

-- thorpej

Re: CVS commit: src/sys/kern

2020-05-07 Thread Kamil Rytarowski

On 07.05.2020 07:46, Michael van Elst wrote:
> On Wed, May 06, 2020 at 12:39:21PM +0200, Kamil Rytarowski wrote:
> 
> Hi Kamil,
> 
>> If I revert the pipe(2) changes on top of NetBSD-current, the test is no
>> longer racy again.
> 
> I tried 9.99.60 with and without my bugfix. The test always failed
> after some time with exactly that error.
> 
> If the change really had an effect, there would be a use-after-free bug
> somewhere else.
> 
> 
> Greetings,
> 

Thanks, I will investigating further.



signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-05-06 Thread Michael van Elst

On Wed, May 06, 2020 at 12:39:21PM +0200, Kamil Rytarowski wrote:

Hi Kamil,

> If I revert the pipe(2) changes on top of NetBSD-current, the test is no
> longer racy again.

I tried 9.99.60 with and without my bugfix. The test always failed
after some time with exactly that error.

If the change really had an effect, there would be a use-after-free bug
somewhere else.

Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: CVS commit: src/sys/kern

2020-05-06 Thread Kamil Rytarowski

This caused regressions in t_ptrace_wait* tests and random
hangs/timeouts/failures.

If I revert the pipe(2) changes on top of NetBSD-current, the test is no
longer racy again.

http://netbsd.org/~kamil/patch-00249-pipe-revert.txt

Reproducer:

cd /usr/tests/lib/libc/sys
./t_ptrace_waitpid  tracer_sysctl_lookup_without_duplicates


It fails in a non-deterministic number of iterations:

[ 126.7088900] sorry, pid 20803 was killed: orphaned traced process
failed: /usr/src/tests/lib/libc/sys/t_ptrace_topology_wait.h:191:
msg_read_child("tracer ready" " from parent " "parent_tracer",
&parent_tracer, &msg, sizeof(msg)) == 0: Undefined error: 0

With this patch it is easier to reproduce the race:

Index: t_ptrace_topology_wait.h
===
RCS file: /cvsroot/src/tests/lib/libc/sys/t_ptrace_topology_wait.h,v
retrieving revision 1.1
diff -u -r1.1 t_ptrace_topology_wait.h
--- t_ptrace_topology_wait.h5 May 2020 00:33:37 -   1.1
+++ t_ptrace_topology_wait.h6 May 2020 10:32:37 -
@@ -248,7 +248,7 @@
 ATF_TC(tracer_sysctl_lookup_without_duplicates);
 ATF_TC_HEAD(tracer_sysctl_lookup_without_duplicates, tc)
 {
-   atf_tc_set_md_var(tc, "timeout", "15");
+// atf_tc_set_md_var(tc, "timeout", "15");
atf_tc_set_md_var(tc, "descr",
"Assert that await_zombie() in attach1 always finds a single "
"process and no other error is reported");
@@ -269,11 +269,13 @@
start = time(NULL);
while (true) {
DPRINTF("Step: %lu\n", N);
+   if (N % 100 == 0)
+   printf("Step: %lu\n", N);
tracer_sees_terminaton_before_the_parent_raw(true, false,
 false);
end = time(NULL);
diff = difftime(end, start);
-   if (diff >= 5.0)
+   if (diff >= 30.0)
break;
++N;
}


Can you have a look?

On 26.04.2019 19:20, Michael van Elst wrote:
> Module Name:  src
> Committed By: mlelstv
> Date: Fri Apr 26 17:20:49 UTC 2019
> 
> Modified Files:
>   src/sys/kern: sys_pipe.c
> 
> Log Message:
> Clean up pipe structure before recycling it.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.146 -r1.147 src/sys/kern/sys_pipe.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 
> 
> Modified files:
> 
> Index: src/sys/kern/sys_pipe.c
> diff -u src/sys/kern/sys_pipe.c:1.146 src/sys/kern/sys_pipe.c:1.147
> --- src/sys/kern/sys_pipe.c:1.146 Sun Jun 10 17:54:51 2018
> +++ src/sys/kern/sys_pipe.c   Fri Apr 26 17:20:49 2019
> @@ -1,4 +1,4 @@
> -/*   $NetBSD: sys_pipe.c,v 1.146 2018/06/10 17:54:51 jdolecek Exp $  */
> +/*   $NetBSD: sys_pipe.c,v 1.147 2019/04/26 17:20:49 mlelstv Exp $   */
>  
>  /*-
>   * Copyright (c) 2003, 2007, 2008, 2009 The NetBSD Foundation, Inc.
> @@ -68,7 +68,7 @@
>   */
>  
>  #include 
> -__KERNEL_RCSID(0, "$NetBSD: sys_pipe.c,v 1.146 2018/06/10 17:54:51 jdolecek 
> Exp $");
> +__KERNEL_RCSID(0, "$NetBSD: sys_pipe.c,v 1.147 2019/04/26 17:20:49 mlelstv 
> Exp $");
>  
>  #include 
>  #include 
> @@ -1331,6 +1331,8 @@ pipeclose(struct pipe *pipe)
>  free_resources:
>   pipe->pipe_pgid = 0;
>   pipe->pipe_state = PIPE_SIGNALR;
> + pipe->pipe_peer = NULL;
> + pipe->pipe_lock = NULL;
>   pipe_free_kmem(pipe);
>   if (pipe->pipe_kmem != 0) {
>   pool_cache_put(pipe_rd_cache, pipe);
> 




signature.asc
Description: OpenPGP digital signature

Re: [sctp fix] Re: CVS commit: src/sys/kern

2020-05-02 Thread maya

On Fri, May 01, 2020 at 04:46:36PM +, m...@netbsd.org wrote:
> We can setup an equivalence: put as much effort into the SCTP removal
> proposal as there was for the SCTP introduction proposal.
> 
> Since SCTP was just dropped in src without any prior discussion, I don't
> think we need any discussion for removing it.

I'm sorry for picking on SCTP in particular. Apparently it was added
because it was listed in src/doc/roadmaps.networking (and it's still
listed there).

Re: [sctp fix] Re: CVS commit: src/sys/kern

2020-05-01 Thread maya

We can setup an equivalence: put as much effort into the SCTP removal
proposal as there was for the SCTP introduction proposal.

Since SCTP was just dropped in src without any prior discussion, I don't
think we need any discussion for removing it.

Re: [sctp fix] Re: CVS commit: src/sys/kern

2020-04-28 Thread Luke Mewburn

On 20-04-26 18:15, Maxime Villard wrote:
  |  - There was no demonstrated use-case justifying importing it. In addition,
  |major OSes like Windows and macOS do not implement SCTP. There just is no
  |demand for SCTP on the market; and on NetBSD, proportionally even less.

SCTP is used in mobile telco environments; the control plane for
3G and 4G networks uses SCTP (or TCP as an option, but mostly SCTP).

NetBSD is a server OS; and could be viable in that market, which
currently is mostly Linux (AFAICT), which does have a viable SCTP stack.

It's understandable why macOS (a client OS) doesn't support SCTP.


Besides the rest of your arguments (which may be valid), this particular
one is not.


regards,
Luke.

Re: CVS commit: src/sys/kern

2020-04-26 Thread Jason Thorpe




> On Apr 26, 2020, at 2:04 PM, Michael van Elst  wrote:
> 
> Log Message:
> fix DIAGNOSTIC build

non-DIAGNOSTIC you mean?

-- thorpej

[sctp fix] Re: CVS commit: src/sys/kern

2020-04-26 Thread Maxime Villard

Le 26/04/2020 à 16:21, Jonathan A. Kollasch a écrit :
> Module Name:  src
> Committed By: jakllsch
> Date: Sun Apr 26 14:21:14 UTC 2020
> 
> Modified Files:
>   src/sys/kern: uipc_socket.c
> 
> Log Message:
> Implement SCTP bug fixes found by maxv@.
> 
> Adding these seems to improve the SCTP situation.

Yeah, thanks... I remember I had sent an email about these bugs, no one cared,
so I just put big XXXs and left the broken thing as-is. That was more than a
year ago.

I remember also pointing out at some point the severe deficiencies of the SCTP
code we have, with no one caring once again (not even me actually).

In its current state (that is, the state it has been in ever since it was
imported five years ago), our SCTP code is a near-perfect example of gigantic,
buggy and useless bloat. Specifically, as far as I remember:

 - Last I checked, SCTP occupies, in number of lines of code, half of our IPv4
   network stack. In other words, when it was imported, our kernel netinet stack
   suddenly doubled in size. I don't see how we will ever MP-ify all of that.

 - There was no demonstrated use-case justifying importing it. In addition,
   major OSes like Windows and macOS do not implement SCTP. There just is no
   demand for SCTP on the market; and on NetBSD, proportionally even less.

 - The code is of remarkably poor quality, with bugs in all directions, complex
   pieces of logic that seem rarely justified, and implementation errors like
   the pointer bugs you just fixed. The bloat and bugs are already evident as
   early as in the first twenty lines of code of the sctp_input() entry point.

 - IIRC the mbuf API usage is very poor (though I don't remember the specifics
   here; I must have kept paper notes somewhere).

 - It seems that no one is maintaining it? Nothing has improved over the last
   five years, and there are no apparent signs that this situation will ever
   change. There are piecemeal changes like yours and mine, to accommodate API
   changes and fix obvious bugs, and that's about it, as far as I can tell.

At one point I hesitated about doing a big cleanup of SCTP. But I concluded
that it is too structurally broken, and that fixing it is a waste of time;
rewriting it would be faster and would result in a better, more functional,
and easier to maintain code.

Overall I think this is an example of bad policy. It's good to have support for
new protocols, but at some point, we need to question whether landing 40K lines
of buggy yet critical kernel code out of nowhere with no one maintaining it and
little justification for the feature is a good thing to do.

CC'ing core@ in case they are interested in making a useful statement for once.

In all cases, this code is disabled by default, which is a good thing given
its state, but results in us not giving a damn about it.

Maxime

Re: CVS commit: src/sys/kern

2020-04-13 Thread Andrew Doran

On Mon, Apr 13, 2020 at 04:34:48PM +0100, Roy Marples wrote:
> On 13/04/2020 16:31, Andrew Doran wrote:
> > Hi Roy.
> > 
> > On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote:
> > > On 10/04/2020 23:34, Andrew Doran wrote:
> > > > Module Name:src
> > > > Committed By:   ad
> > > > Date:   Fri Apr 10 22:34:36 UTC 2020
> > > > 
> > > > Modified Files:
> > > > src/sys/kern: vfs_mount.c
> > > > 
> > > > Log Message:
> > > > vfs_mountroot(): don't needlessly grab a second reference to the root 
> > > > vnode
> > > > (the kernel never chdirs) nor a lock on it.
> > > 
> > > So the kernel chrooting to sysctl init.root is still ok?
> > 
> > How is that accomplished?  I couldn't find the code and don't recall seeing
> > it.
> 
> sysctl -w init.root=/altroot
> 
> See the ZFS Root ramdisk:
> https://nxr.netbsd.org/xref/src/distrib/common/zfsroot.rc#33
> 
> I tested kernel from yesterdays sources and it still works fine, so I guess
> nothing was needed here.

I had a look and it's init that chroots, not the kernel, so no issue.  The
kernel chrooting would be evil.

Andrew

Re: CVS commit: src/sys/kern

2020-04-13 Thread Roy Marples


On 13/04/2020 16:31, Andrew Doran wrote:

Hi Roy.

On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote:

On 10/04/2020 23:34, Andrew Doran wrote:

Module Name:src
Committed By:   ad
Date:   Fri Apr 10 22:34:36 UTC 2020

Modified Files:
src/sys/kern: vfs_mount.c

Log Message:
vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


So the kernel chrooting to sysctl init.root is still ok?


How is that accomplished?  I couldn't find the code and don't recall seeing
it.


sysctl -w init.root=/altroot

See the ZFS Root ramdisk:
https://nxr.netbsd.org/xref/src/distrib/common/zfsroot.rc#33

I tested kernel from yesterdays sources and it still works fine, so I guess 
nothing was needed here.


Roy

Re: CVS commit: src/sys/kern

2020-04-13 Thread Andrew Doran

Hi Roy.

On Sat, Apr 11, 2020 at 02:13:06AM +0100, Roy Marples wrote:
> On 10/04/2020 23:34, Andrew Doran wrote:
> > Module Name:src
> > Committed By:   ad
> > Date:   Fri Apr 10 22:34:36 UTC 2020
> > 
> > Modified Files:
> > src/sys/kern: vfs_mount.c
> > 
> > Log Message:
> > vfs_mountroot(): don't needlessly grab a second reference to the root vnode
> > (the kernel never chdirs) nor a lock on it.
> 
> So the kernel chrooting to sysctl init.root is still ok?

How is that accomplished?  I couldn't find the code and don't recall seeing
it.

Andrew

Re: CVS commit: src/sys/kern

2020-04-10 Thread Roy Marples


On 10/04/2020 23:34, Andrew Doran wrote:

Module Name:src
Committed By:   ad
Date:   Fri Apr 10 22:34:36 UTC 2020

Modified Files:
src/sys/kern: vfs_mount.c

Log Message:
vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


So the kernel chrooting to sysctl init.root is still ok?

Roy

Re: [vfs_cache] Re: CVS commit: src/sys/kern

2020-04-05 Thread Andrew Doran

On Sun, Mar 29, 2020 at 11:41:23AM +0200, Maxime Villard wrote:
> Le 23/03/2020 ? 21:02, Andrew Doran a ?crit?:
> > Module Name:src
> > Committed By:   ad
> > Date:   Mon Mar 23 20:02:14 UTC 2020
> > 
> > Modified Files:
> > src/sys/kern: vfs_cache.c
> > 
> > Log Message:
> > cache_remove(): remove from the vnode list first, so cache_revlookup()
> > doesn't try to re-activate an entry no longer on the LRU list.
> > 
> > 
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.133 -r1.134 src/sys/kern/vfs_cache.c
> > 
> > Please note that diffs are not public domain; they are subject to the
> > copyright notices on the relevant files.
> 
> It appears that your recent changes in vfs_cache.c have introduced a
> use-after-free. Booting KASAN gives:
> 
>   ASan: Unauthorized Access In 0x...: Addr 0x... [1 byte, read, 
> PoolUseAfterFree]
> 
> It seems that the problem is here:
> 
> 557   if (nameiop == CREATE && (cnflags & ISLASTCN) != 0) {
> 558   /*
> 559* Last component and we are preparing to create
> 560* the named object, so flush the negative cache
> 561* entry.
> 562*/
> 563   COUNT(ncs_badhits);
> 564   cache_remove(ncp, true);< HERE
> 565   hit = false;
> 566   } else {
> 567   COUNT(ncs_neghits);
> 568   SDT_PROBE(vfs, namecache, lookup, hit, dvp, name,
> 569   namelen, 0, 0);
> 570   /* found neg entry; vn is already null from above */
> 571   hit = true;
> 572   }
> 573   if (iswht_ret != NULL) {
> 574   /*
> 575* Restore the ISWHITEOUT flag saved earlier.
> 576*/
> 577   *iswht_ret = ncp->nc_whiteout;  <-- ouch
> 578   } else {
> 579   KASSERT(!ncp->nc_whiteout);  <-- ouch
> 580   }
> 
> cache_remove() frees 'ncp', and then 'ncp->nc_whiteout' is read.

Fixed with vfs_cache.c 1.136.  Thank you for bringing it to my attention.

Andrew

[vfs_cache] Re: CVS commit: src/sys/kern

2020-03-29 Thread Maxime Villard

Le 23/03/2020 à 21:02, Andrew Doran a écrit :
> Module Name:  src
> Committed By: ad
> Date: Mon Mar 23 20:02:14 UTC 2020
> 
> Modified Files:
>   src/sys/kern: vfs_cache.c
> 
> Log Message:
> cache_remove(): remove from the vnode list first, so cache_revlookup()
> doesn't try to re-activate an entry no longer on the LRU list.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.133 -r1.134 src/sys/kern/vfs_cache.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.

It appears that your recent changes in vfs_cache.c have introduced a
use-after-free. Booting KASAN gives:

ASan: Unauthorized Access In 0x...: Addr 0x... [1 byte, read, 
PoolUseAfterFree]

It seems that the problem is here:

557 if (nameiop == CREATE && (cnflags & ISLASTCN) != 0) {
558 /*
559  * Last component and we are preparing to create
560  * the named object, so flush the negative cache
561  * entry.
562  */
563 COUNT(ncs_badhits);
564 cache_remove(ncp, true);< HERE
565 hit = false;
566 } else {
567 COUNT(ncs_neghits);
568 SDT_PROBE(vfs, namecache, lookup, hit, dvp, name,
569 namelen, 0, 0);
570 /* found neg entry; vn is already null from above */
571 hit = true;
572 }
573 if (iswht_ret != NULL) {
574 /*
575  * Restore the ISWHITEOUT flag saved earlier.
576  */
577 *iswht_ret = ncp->nc_whiteout;  <-- ouch
578 } else {
579 KASSERT(!ncp->nc_whiteout);  <-- ouch
580 }

cache_remove() frees 'ncp', and then 'ncp->nc_whiteout' is read.

Maxime

Re: CVS commit: src/sys/kern

2020-03-08 Thread Maxime Villard

Le 08/03/2020 à 21:41, Andrew Doran a écrit :
> On Sun, Mar 08, 2020 at 08:34:29AM +0100, Maxime Villard wrote:
>> Le 08/03/2020 ? 01:31, Andrew Doran a ?crit?:
>>> Module Name:src
>>> Committed By:   ad
>>> Date:   Sun Mar  8 00:31:19 UTC 2020
>>>
>>> Modified Files:
>>> src/sys/kern: subr_kmem.c
>>>
>>> Log Message:
>>> KMEM_SIZE: append the size_t to the allocated buffer, rather than
>>> prepending, so it doesn't screw up the alignment of the buffer.
>>>
>>> Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com
>>>
>>>
>>> To generate a diff of this commit:
>>> cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c
>>>
>>> Please note that diffs are not public domain; they are subject to the
>>> copyright notices on the relevant files.
>>
>> This is wrong. The point of KMEM_SIZE is to store at a reliable location
>> the size of the buffer, in order to then verify that the 'size' argument
>> given to kmem_free() is correct.
>>
>> Here, you are using that size argument to compute the location, which
>> breaks the whole point of the check.
> 
> Sure I understand the frustration, but it's still correct according to
> the parameters I set for it 10+ years ago, which were for it to be a
> quick-n-dirty diagnostic aid.

No, it is not. KMEM_SIZE has found real bugs. Now this useful feature has
been lessened just to shut a sanitizer which was pointing another issue.

Again, as said previously, the real bug here is that you are making a
caller assume specific alignment from an allocator that _does not_
guarantee this alignment. To fix that, either (1) revert the assumption or
(2) make it part of the allocator's contract to guarantee this alignment.

I don't have a preference on (1) or (2), but the current design is more a
hack than anything else, and the subsequent change in KMEM_SIZE is
definitely wrong.

>> Also it probably collides with the KASAN shadow.
> 
> Hmm, is that purely an alignment issue then, because it works in 8 byte
> cells?  Otherwise it sounds like this should not be enabled with KASAN at
> all.

Not sure what you mean, but KASAN definitely needs to be enabled on
kmem. I've checked the code, and actually it is fine regarding KASAN for
now, because the computation of the redzone doesn't take KMEM_SIZE into
account (and so isn't affected by the fact that the location changed).

Re: CVS commit: src/sys/kern

2020-03-08 Thread Andrew Doran

On Sun, Mar 08, 2020 at 08:34:29AM +0100, Maxime Villard wrote:
> Le 08/03/2020 ? 01:31, Andrew Doran a ?crit?:
> > Module Name:src
> > Committed By:   ad
> > Date:   Sun Mar  8 00:31:19 UTC 2020
> > 
> > Modified Files:
> > src/sys/kern: subr_kmem.c
> > 
> > Log Message:
> > KMEM_SIZE: append the size_t to the allocated buffer, rather than
> > prepending, so it doesn't screw up the alignment of the buffer.
> > 
> > Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com
> > 
> > 
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c
> > 
> > Please note that diffs are not public domain; they are subject to the
> > copyright notices on the relevant files.
> 
> This is wrong. The point of KMEM_SIZE is to store at a reliable location
> the size of the buffer, in order to then verify that the 'size' argument
> given to kmem_free() is correct.
> 
> Here, you are using that size argument to compute the location, which
> breaks the whole point of the check.

Sure I understand the frustration, but it's still correct according to
the parameters I set for it 10+ years ago, which were for it to be a
quick-n-dirty diagnostic aid.

> Also it probably collides with the KASAN shadow.

Hmm, is that purely an alignment issue then, because it works in 8 byte
cells?  Otherwise it sounds like this should not be enabled with KASAN at
all.

Andrew
 
> Please revert this change.
>
> As said previously, if cacheline alignment is expected this way, then it
> should clearly be part of the kmem contract, and documented to be so.

Re: CVS commit: src/sys/kern

2020-03-07 Thread Maxime Villard

Le 08/03/2020 à 01:31, Andrew Doran a écrit :
> Module Name:  src
> Committed By: ad
> Date: Sun Mar  8 00:31:19 UTC 2020
> 
> Modified Files:
>   src/sys/kern: subr_kmem.c
> 
> Log Message:
> KMEM_SIZE: append the size_t to the allocated buffer, rather than
> prepending, so it doesn't screw up the alignment of the buffer.
> 
> Reported-by: syzbot+c024c50570cccac51...@syzkaller.appspotmail.com
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.78 -r1.79 src/sys/kern/subr_kmem.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.

This is wrong. The point of KMEM_SIZE is to store at a reliable location
the size of the buffer, in order to then verify that the 'size' argument
given to kmem_free() is correct.

Here, you are using that size argument to compute the location, which
breaks the whole point of the check.

Also it probably collides with the KASAN shadow.

Please revert this change.

As said previously, if cacheline alignment is expected this way, then it
should clearly be part of the kmem contract, and documented to be so.

Thanks

Re: CVS commit: src/sys/kern

2020-03-03 Thread Andrew Doran

On Tue, Mar 03, 2020 at 02:55:16PM -0500, Christos Zoulas wrote:

> Module Name:  src
> Committed By: christos
> Date: Tue Mar  3 19:55:16 UTC 2020
> 
> Modified Files:
>   src/sys/kern: vfs_syscalls.c
> 
> Log Message:
> don't skip the rdir check for the lazy case; breaks chroot df(1) hiding.

That's likely my mistake.  Thank you.

Andrew

Re: CVS commit: src/sys/kern

2020-03-02 Thread Taylor R Campbell

> Date: Tue, 3 Mar 2020 12:41:32 +0900
> From: Rin Okuyama 
> 
> On 2020/03/03 1:00, Taylor R Campbell wrote:
> > Include kern_crashme.c in non-DEBUG kernels.
> > 
> > This is useful for simulating crashes in production to test failover.
> 
> I like this.
> 
> Could you please send pullup request to netbsd-9, or can I?

Go for it!

Re: CVS commit: src/sys/kern

2020-03-02 Thread Rin Okuyama


Hi,

On 2020/03/03 1:00, Taylor R Campbell wrote:

Module Name:src
Committed By:   riastradh
Date:   Mon Mar  2 16:00:54 UTC 2020

Modified Files:
src/sys/kern: files.kern

Log Message:
Include kern_crashme.c in non-DEBUG kernels.

This is useful for simulating crashes in production to test failover.


To generate a diff of this commit:
cvs rdiff -u -r1.43 -r1.44 src/sys/kern/files.kern

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


I like this.

Could you please send pullup request to netbsd-9, or can I?

Thanks,
rin

Re: CVS commit: src/sys/kern

2020-02-25 Thread Kamil Rytarowski

On 24.02.2020 21:47, Jaromir Dolecek wrote:
> Module Name:  src
> Committed By: jdolecek
> Date: Mon Feb 24 20:47:47 UTC 2020
> 
> Modified Files:
>   src/sys/kern: init_main.c
> 
> Log Message:
> move config_init_mi() call before vfsinit(), which can trigger loading
> of VFS modules
> 
> fixes crash with LOCKDEBUG due to uninitialized mutex when zfs
> module is loaded in boot, because zfs's spa_init() calls config_mountroot()
> which now requires the config init having been done
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.521 -r1.522 src/sys/kern/init_main.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 
> 

kASan is still broken on boot. Please fix.


[   1.0188350] acpicpu1 at cpu1: ACPI CPU
[   1.0188350] cpu0 has 2 core siblings: cpu1 cpu0
[   1.0188350] cpu0 has 2 pkg siblings: cpu1 cpu0
[   1.0188350] cpu0 has 1 1st siblings: cpu0
[   1.0188350] cpu0 first in package: cpu0
[   1.0188350] cpu1 has 2 core siblings: cpu0 cpu1
[   1.0188350] cpu1 has 2 pkg siblings: cpu0 cpu1
[   1.0188350] cpu1 has 1 1st siblings: cpu0
[   1.0188350] cpu1 first in package: cpu0
[   1.2307575] panic: ASan: Unauthorized Access In 0x811e6be6:
Addr 0x98000f382b58 [8 bytes, read, PoolUseAfterFree]

[   1.2401020] cpu1: Begin traceback...
[   1.2501232] vpanic() at netbsd:vpanic+0x241
[   1.2701652] snprintf() at netbsd:snprintf
[   1.2902064] kasan_report() at netbsd:kasan_report+0x98
[   1.3102484] __asan_load8() at netbsd:__asan_load8+0x294
[   1.3302897] config_interrupts_thread() at
netbsd:config_interrupts_thread+0x68
[   1.3403126] cpu1: End traceback...
[   1.3403126] fatal breakpoint trap in supervisor mode
[   1.3503263] trap type 1 code 0 rip 0x8021e4b5 cs 0x8 rflags
0x246 cr2 0 ilevel 0 rsp 0x98017de07d60
[   1.3603479] curlwp 0x9800116a16c0 pid 0.30 lowest kstack
0x98017de002c0
Stopped in pid 0.30 (system) at netbsd:breakpoint+0x5:  leave
db{1}>

https://syzkaller.appspot.com/bug?id=aa6e0c00233b3e55340da80d7636bb2c18181e5f



signature.asc
Description: OpenPGP digital signature

re: CVS commit: src/sys/kern

2020-02-23 Thread matthew green

"Andrew Doran" writes:
> Module Name:  src
> Committed By: ad
> Date: Sun Feb 23 20:08:35 UTC 2020
> 
> Modified Files:
>   src/sys/kern: kern_pmf.c
> 
> Log Message:
> shutdown_all: take kernel_lock now that kern_reboot() doesn't.

ah, i am now guessing that having the kernel lock held
is why reboots hang and can't be broken into again.

we really need to have autoconf run out side of a spin
lock...  anyone want to work on this please?

.mrg.

re: CVS commit: src/sys/kern

2020-02-23 Thread matthew green

"Andrew Doran" writes:
> Module Name:  src
> Committed By: ad
> Date: Sun Feb 23 20:06:30 UTC 2020
> 
> Modified Files:
>   src/sys/kern: kern_reboot.c
> 
> Log Message:
> - If concurrent calls to kern_reboot(), only let the first do the deed.
> - Don't need kernel_lock for this (either OK, or suspendsched() called).

what happens if i enter ddb while rebooting, such that this
now sets 'rebooter' to some lwp.  inside ddb, i use 'mach cpu'
to change CPUs, which on some platforms actually swithces to
that CPU -- ie, we now have a different curlwp.

this while () will now hang forever, won't it?

.mrg.

Re: CVS commit: src/sys/kern

2020-02-21 Thread Kamil Rytarowski

On 20.02.2020 22:14, Jaromir Dolecek wrote:
> Module Name:  src
> Committed By: jdolecek
> Date: Thu Feb 20 21:14:23 UTC 2020
> 
> Modified Files:
>   src/sys/kern: subr_autoconf.c
> 
> Log Message:
> protect deferred lists' manipulation by config_misc_lock, same as
> config_pending semaphore itself; right now this also covers
> DVF_ATTACH_INPROGRESS flag
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.265 -r1.266 src/sys/kern/subr_autoconf.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 

After this commit kASan breaks on boot:

[   1.2418653] panic: ASan: Unauthorized Access In 0x811e0c46:
Addr 0xa7000f382a58 [8 bytes, read, PoolUseAfterFree]

[   1.2511909] cpu1: Begin traceback...
[   1.2612093] vpanic() at netbsd:vpanic+0x241 sys/kern/subr_prf.c:336
[   1.2812516] snprintf() at netbsd:snprintf
[   1.3012883] kasan_report() at netbsd:kasan_report+0x98
kasan_code_name sys/kern/subr_asan.c:186 [inline]
[   1.3012883] kasan_report() at netbsd:kasan_report+0x98
sys/kern/subr_asan.c:196
[   1.3213274] __asan_load8() at netbsd:__asan_load8+0x294
kasan_shadow_4byte_isvalid sys/kern/subr_asan.c:346 [inline]
[   1.3213274] __asan_load8() at netbsd:__asan_load8+0x294
kasan_shadow_8byte_isvalid sys/kern/subr_asan.c:360 [inline]
[   1.3213274] __asan_load8() at netbsd:__asan_load8+0x294
kasan_shadow_check sys/kern/subr_asan.c:412 [inline]
[   1.3213274] __asan_load8() at netbsd:__asan_load8+0x294
sys/kern/subr_asan.c:1182
[   1.3413734] config_interrupts_thread() at
netbsd:config_interrupts_thread+0x68 sys/kern/subr_autoconf.c:459
[   1.3513931] cpu1: End traceback...
[   1.3513931] fatal breakpoint trap in supervisor mode
[   1.3614094] trap type 1 code 0 rip 0x8021e4b5 cs 0x8 rflags
0x246 cr2 0 ilevel 0 rsp 0xa7017de07d60
[   1.3714294] curlwp 0xa700116a16c0 pid 0.30 lowest kstack
0xa7017de002c0
Stopped in pid 0.30 (system) at netbsd:breakpoint+0x5:  leave
db{1}>

https://syzkaller.appspot.com/bug?extid=1f0aefb06a387371fa14



signature.asc
Description: OpenPGP digital signature

Re: CVS commit: src/sys/kern

2020-02-18 Thread J. Hannken-Illjes

> On Feb 17, 2020, at 11:19 PM, Andrew Doran  wrote:
> 
> Hi,
> 
> On Thu, Feb 06, 2020 at 06:33:55PM +0100, J. Hannken-Illjes wrote:
> 
>>> On 12. Jan 2020, at 18:49, Andrew Doran  wrote:
>>> 
>>> Module Name:src
>>> Committed By:   ad
>>> Date:   Sun Jan 12 17:49:17 UTC 2020
>>> 
>>> Modified Files:
>>> src/sys/kern: vfs_vnode.c
>>> 
>>> Log Message:
>>> vput(): don't drop the vnode lock, carry the hold over into vrelel() which
>>> might need it anyway.
>>> 
>>> 
>>> To generate a diff of this commit:
>>> cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c
>>> 
>>> Please note that diffs are not public domain; they are subject to the
>>> copyright notices on the relevant files.
>>> 
>> 
>> vput(vnode_t *vp)
>> {
>> +   int lktype;
>> 
>> -   VOP_UNLOCK(vp);
>> -   vrele(vp);
>> +   if ((vp->v_vflag & VV_LOCKSWORK) == 0) {
>> +   lktype = LK_EXCLUSIVE;
>> +   } else {
>> +   lktype = VOP_ISLOCKED(vp);
>> +   KASSERT(lktype != LK_NONE);
>> +   }
>> +   mutex_enter(vp->v_interlock);
>> +   vrelel(vp, 0, lktype);
>> }
>> 
>> This is quite wrong, from the manual:
>> 
>> VOP_ISLOCKED(vp)
>>  Test if the vnode vp is locked.  Possible return values are
>>  LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the
>>  calling thread, shared lock held by anyone or unlocked,
>>  respectively.
>> 
>>  This function must never be used to make locking decisions at
>>  run time: it is provided only for diagnostic purposes.
>> 
>> I suppose you cannot determine if the current thread holds
>> a shared lock.
> 
> The intention of that last sentence was to dissuade people from doing error
> prone, complicated stuff with locks.  To my mind that idea of the locking
> primitives is to take something difficult (concurrency) and package it up in
> a way that's much easier to think about and work with - and yes it's still
> complicated.
> 
> There are limited cases when I think it makes sense to infer lock ownership,
> for example when known for sure that a RW lock is held but the type of hold
> needs to be known - like this case.  The failure case there is the lock
> being unheld, which would be caught by LOCKDEBUG in this case.  Consider
> that a rw_exit() with the lock unheld by the caller will produce what you
> might charitably call "undefined behaviour" in a non-LOCKDEBUG kernel.
> 
> Andrew

Yes, I think it is safe here but it deserves a comment like the text above.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


signature.asc
Description: Message signed with OpenPGP

Re: CVS commit: src/sys/kern

2020-02-17 Thread Andrew Doran

Hi,

On Thu, Feb 06, 2020 at 06:33:55PM +0100, J. Hannken-Illjes wrote:

> > On 12. Jan 2020, at 18:49, Andrew Doran  wrote:
> > 
> > Module Name:src
> > Committed By:   ad
> > Date:   Sun Jan 12 17:49:17 UTC 2020
> > 
> > Modified Files:
> > src/sys/kern: vfs_vnode.c
> > 
> > Log Message:
> > vput(): don't drop the vnode lock, carry the hold over into vrelel() which
> > might need it anyway.
> > 
> > 
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c
> > 
> > Please note that diffs are not public domain; they are subject to the
> > copyright notices on the relevant files.
> > 
> 
>  vput(vnode_t *vp)
>  {
> +   int lktype;
> 
> -   VOP_UNLOCK(vp);
> -   vrele(vp);
> +   if ((vp->v_vflag & VV_LOCKSWORK) == 0) {
> +   lktype = LK_EXCLUSIVE;
> +   } else {
> +   lktype = VOP_ISLOCKED(vp);
> +   KASSERT(lktype != LK_NONE);
> +   }
> +   mutex_enter(vp->v_interlock);
> +   vrelel(vp, 0, lktype);
>  }
> 
> This is quite wrong, from the manual:
> 
>  VOP_ISLOCKED(vp)
>   Test if the vnode vp is locked.  Possible return values are
>   LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the
>   calling thread, shared lock held by anyone or unlocked,
>   respectively.
> 
>   This function must never be used to make locking decisions at
>   run time: it is provided only for diagnostic purposes.
> 
> I suppose you cannot determine if the current thread holds
> a shared lock.

The intention of that last sentence was to dissuade people from doing error
prone, complicated stuff with locks.  To my mind that idea of the locking
primitives is to take something difficult (concurrency) and package it up in
a way that's much easier to think about and work with - and yes it's still
complicated.

There are limited cases when I think it makes sense to infer lock ownership,
for example when known for sure that a RW lock is held but the type of hold
needs to be known - like this case.  The failure case there is the lock
being unheld, which would be caught by LOCKDEBUG in this case.  Consider
that a rw_exit() with the lock unheld by the caller will produce what you
might charitably call "undefined behaviour" in a non-LOCKDEBUG kernel.

Andrew

Re: CVS commit: src/sys/kern

2020-02-06 Thread J. Hannken-Illjes

> On 12. Jan 2020, at 18:49, Andrew Doran  wrote:
> 
> Module Name:  src
> Committed By: ad
> Date: Sun Jan 12 17:49:17 UTC 2020
> 
> Modified Files:
>   src/sys/kern: vfs_vnode.c
> 
> Log Message:
> vput(): don't drop the vnode lock, carry the hold over into vrelel() which
> might need it anyway.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.106 -r1.107 src/sys/kern/vfs_vnode.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 

 vput(vnode_t *vp)
 {
+   int lktype;

-   VOP_UNLOCK(vp);
-   vrele(vp);
+   if ((vp->v_vflag & VV_LOCKSWORK) == 0) {
+   lktype = LK_EXCLUSIVE;
+   } else {
+   lktype = VOP_ISLOCKED(vp);
+   KASSERT(lktype != LK_NONE);
+   }
+   mutex_enter(vp->v_interlock);
+   vrelel(vp, 0, lktype);
 }

This is quite wrong, from the manual:

 VOP_ISLOCKED(vp)
  Test if the vnode vp is locked.  Possible return values are
  LK_EXCLUSIVE, LK_SHARED or 0 for lock held exclusively by the
  calling thread, shared lock held by anyone or unlocked,
  respectively.

  This function must never be used to make locking decisions at
  run time: it is provided only for diagnostic purposes.

I suppose you cannot determine if the current thread holds
a shared lock.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig

signature.asc
Description: Message signed with OpenPGP

Re: CVS commit: src/sys/kern

2020-01-21 Thread Simon Burge

"Christos Zoulas" wrote:

> Log Message:
>
> Don't crash if we are on a hippie trail, head full of zombie

+1 for any Australian references in a commit message :)

Cheers,
Simon.

re: CVS commit: src/sys/kern

2019-12-12 Thread matthew green

> Therefore, I wouldn't bother adding kcov.h headers, and rolling back to the
> previous version of in_interrupt for now is fine, considering that kcov
> currently has no use outside of amd64.

if you want to put code in sys/kern please make it portable.

adding more MD code in MI places is the opposite stance we
have been working towards since netbsd formed.


.mrg.

Re: CVS commit: src/sys/kern

2019-12-12 Thread Maxime Villard


Le 08/12/2019 à 14:22, Martin Husemann a écrit :

On Sun, Dec 08, 2019 at 12:58:20PM +0100, Maxime Villard wrote:

kMSan has special constraints which, in this specific case, come down to: each
function called from a KCOV instrumentation callback must be a static inline
tagged with __nomsan.

This was not the case with the updated in_interrupt(), but also still isn't the
case with the lwp_getspecific() call, which will have to be dropped.


This does not sound like a good reason to introduce MD code in sys/kern to
me. Could should not be made worse to deal with sanitizer restrictions.

Are there any alternatives?


The clean way of doing this is having an MD kcov.h included from subr_kcov.c,
like kASan.

However, I expect that we will want a per-lwp kcov flag to indicate the current
context (in exception, in interrupt, in nmi, etc), and when that happens, I
don't expect that we will need in_interrupt anymore.

Therefore, I wouldn't bother adding kcov.h headers, and rolling back to the
previous version of in_interrupt for now is fine, considering that kcov
currently has no use outside of amd64.

re: CVS commit: src/sys/kern

2019-12-10 Thread matthew green

> > > Log Message:
> > > Expunge the panicstr checks, we don't need them.
> > 
> > can you explain why?
> 
> Sure, [ .. ]

ah, wow.  i didn't realise it had such a bad effect on cpus before
they actually go properly down.

we should probably work hard to make them go down faster if possible,
though maybe that's already as good as it can be.

> > what sort of crash-time testing did you perform?
> 
> That the system can be debugged and reset cleanly.  If we've got code in DDB
> that hangs up or crashdups don't work then that's something we should fix.
> I've not run into any in a long time, they seem to get fixed.
> 
> Do you have a particular concern or something else in mind?

i was also curious about crash dumps working, but mostly
this change is about reducing true lossage during a crash.

thanks.


.mrg.

Re: CVS commit: src/sys/kern

2019-12-10 Thread Andrew Doran

On Wed, Dec 11, 2019 at 09:06:33AM +1100, matthew green wrote:

> "Andrew Doran" writes:
> > Module Name:src
> > Committed By:   ad
> > Date:   Mon Dec  9 21:02:10 UTC 2019
> > 
> > Modified Files:
> > src/sys/kern: kern_rwlock.c
> > 
> > Log Message:
> > Expunge the panicstr checks, we don't need them.
> 
> can you explain why?

Sure, I have developed a bit of a feel for it after years off watching the
thing panic.  The checks to not go too hard on the assertions once panicstr
are set are pretty good in my experience - we don't want that to snowball.

The ones that disable locking were of some kind of use (at least to me) back
in 2007 before we had a decent LOCKDEBUG setup for the newlock2 primitives. 
So it sprang out of development requirements and an uncertainty about all
how this new synchronization stuff was going to behave in practice more than
a desire to do the right thing.

On a uniprocessor or dual core box back then the panicstr checks didn't
really seem to have many bad effects, but with more CPUs it often seems to
make the crash much worse than it needs to be and I keep bumping into the
effects of it.  Here's a craptacular example:

http://www.netbsd.org/~ad/

That's kind of amusing to look at and it's only my framebuffer memory but I
worry that it could just as well be inodes or mbufs or anything else that
belongs to the user, and I think that until the CPUs are locked up and
activity stopped, the thing needs to keep working as properly as it can.

> what sort of crash-time testing did you perform?

That the system can be debugged and reset cleanly.  If we've got code in DDB
that hangs up or crashdups don't work then that's something we should fix.
I've not run into any in a long time, they seem to get fixed.

Do you have a particular concern or something else in mind?

Cheers,
Andrew

re: CVS commit: src/sys/kern

2019-12-10 Thread matthew green

"Andrew Doran" writes:
> Module Name:  src
> Committed By: ad
> Date: Mon Dec  9 21:02:10 UTC 2019
> 
> Modified Files:
>   src/sys/kern: kern_rwlock.c
> 
> Log Message:
> Expunge the panicstr checks, we don't need them.

can you explain why?  what sort of crash-time testing did
you perform?

thanks.


.mrg.

1 2 3 4 5 6 7 >

1 - 100 of 685 matches

Mail list logo