amd64, i386: lapic_calibrate_timer: panic if timer calibration fails

2022-09-10 Thread Scott Cheloha
Hi,

In lapic_calibrate_timer() we only conditionally decide to use the
lapic timer as our interrupt clock.  That is, lapic timer calibration
can fail and the system will boot anyway.

If after measuring the lapic timer frequency we somehow come up with
zero hertz, we do *not* set initclock_func to lapic_initclocks().
Here's the relevant bits from amd64/lapic.c:

   554  skip_calibration:
   555  printf("%s: apic clock running at %dMHz\n",
   556  ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000));
   557  
   558  if (lapic_per_second != 0) {

  [...] /* (skip ahead a bit...) */

   588  /*
   589   * Now that the timer's calibrated, use the apic timer 
routines
   590   * for all our timing needs..
   591   */
   592  delay_init(lapic_delay, 3000);
   593  initclock_func = lapic_initclocks;
   594  }
   595  }

Line 558.  The corresponding code is identical in i386/lapic.c.

I went ahead and tried it on amd64.  If you force lapic_per_second to
zero the system still boots, but the secondary CPUs just sit idle.
lapic_tval is zero, so when they call lapic_startclock() from
cpu_hatch(), nothing happens.  The i8254 still sends clock interrupts
to CPU0, though, so the system runs in a oddball state where one
processor is doing all the work.

I don't think that this is the intended behavior.  I think this is
just an oversight left over from some older code.  It would be a lot
more sensible to just panic if lapic_per_second is zero here.  Patch
attached.

If a bunch of you prefer to develop a more elaborate fallback scheme
where we don't hatch the secondary CPUs in the event that lapic timer
calibration fails, we could explore that later.  But for now I would
prefer to panic and try to spotlight the problem if it ever occurs in
the wild.

If this change is too risky -- maybe I am breaking someone's weird
setup? -- I can wait until after release.

Thoughts?  Preferences?

Index: amd64/amd64/lapic.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.63
diff -u -p -r1.63 lapic.c
--- amd64/amd64/lapic.c 10 Sep 2022 01:30:14 -  1.63
+++ amd64/amd64/lapic.c 10 Sep 2022 01:59:52 -
@@ -555,43 +555,44 @@ skip_calibration:
printf("%s: apic clock running at %dMHz\n",
ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000));
 
-   if (lapic_per_second != 0) {
-   /*
-* reprogram the apic timer to run in periodic mode.
-* XXX need to program timer on other cpu's, too.
-*/
-   lapic_tval = (lapic_per_second * 2) / hz;
-   lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
-
-   lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval);
-
-   /*
-* Compute fixed-point ratios between cycles and
-* microseconds to avoid having to do any division
-* in lapic_delay.
-*/
-
-   tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second;
-   lapic_frac_usec_per_cycle = tmp;
-
-   tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100;
-
-   lapic_frac_cycle_per_usec = tmp;
-
-   /*
-* Compute delay in cycles for likely short delays in usec.
-*/
-   for (i = 0; i < 26; i++)
-   lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >>
-   32;
-
-   /*
-* Now that the timer's calibrated, use the apic timer routines
-* for all our timing needs..
-*/
-   delay_init(lapic_delay, 3000);
-   initclock_func = lapic_initclocks;
-   }
+   if (lapic_per_second == 0)
+   panic("%s: apic timer calibration failed", __func__);
+
+   /*
+* reprogram the apic timer to run in periodic mode.
+* XXX need to program timer on other cpu's, too.
+*/
+   lapic_tval = (lapic_per_second * 2) / hz;
+   lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
+
+   lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval);
+
+   /*
+* Compute fixed-point ratios between cycles and
+* microseconds to avoid having to do any division
+* in lapic_delay.
+*/
+
+   tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second;
+   lapic_frac_usec_per_cycle = tmp;
+
+   tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100;
+
+   lapic_frac_cycle_per_usec = tmp;
+
+   /*
+* Compute delay in cycles for likely short delays in usec.
+*/
+   for (i = 0; i < 26; i++)
+   lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >>
+   32;
+
+   /*
+* Now that the timer's 

Re: Change pru_rcvd() return type to the type of void

2022-09-10 Thread Philip Guenther
ok guenther@

(Thanks!)

On Sat, Sep 10, 2022 at 10:20 AM Vitaliy Makkoveev  wrote:

> We have no interest on pru_rcvd() return value. Also, we call pru_rcvd()
> only if the socket's protocol have PR_WANTRCVD flag set. Such sockets
> are route domain, tcp(4) and unix(4) sockets.
>
> This diff keeps the PR_WANTRCVD check. In other hand we could always
> call pru_rcvd() and do "pru_rcvd != NULL" check within, but in the
> future with per buffer locking, we could have some re-locking around
> pru_rcvd() call and I want to do it outside wrapper.
>
>
> Index: sys/kern/uipc_usrreq.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
> retrieving revision 1.185
> diff -u -p -r1.185 uipc_usrreq.c
> --- sys/kern/uipc_usrreq.c  3 Sep 2022 22:43:38 -   1.185
> +++ sys/kern/uipc_usrreq.c  10 Sep 2022 18:51:42 -
> @@ -363,7 +363,7 @@ uipc_shutdown(struct socket *so)
> return (0);
>  }
>
> -int
> +void
>  uipc_rcvd(struct socket *so)
>  {
> struct socket *so2;
> @@ -390,8 +390,6 @@ uipc_rcvd(struct socket *so)
> default:
> panic("uipc 2");
> }
> -
> -   return (0);
>  }
>
>  int
> Index: sys/net/rtsock.c
> ===
> RCS file: /cvs/src/sys/net/rtsock.c,v
> retrieving revision 1.355
> diff -u -p -r1.355 rtsock.c
> --- sys/net/rtsock.c8 Sep 2022 10:22:06 -   1.355
> +++ sys/net/rtsock.c10 Sep 2022 18:51:42 -
> @@ -115,7 +115,7 @@ int route_attach(struct socket *, int);
>  introute_detach(struct socket *);
>  introute_disconnect(struct socket *);
>  introute_shutdown(struct socket *);
> -introute_rcvd(struct socket *);
> +void   route_rcvd(struct socket *);
>  introute_send(struct socket *, struct mbuf *, struct mbuf *,
> struct mbuf *);
>  introute_abort(struct socket *);
> @@ -299,7 +299,7 @@ route_shutdown(struct socket *so)
> return (0);
>  }
>
> -int
> +void
>  route_rcvd(struct socket *so)
>  {
> struct rtpcb *rop = sotortpcb(so);
> @@ -314,8 +314,6 @@ route_rcvd(struct socket *so)
> ((sbspace(rop->rop_socket, >rop_socket->so_rcv) ==
> rop->rop_socket->so_rcv.sb_hiwat)))
> rop->rop_flags &= ~ROUTECB_FLAG_FLUSH;
> -
> -   return (0);
>  }
>
>  int
> Index: sys/netinet/tcp_usrreq.c
> ===
> RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v
> retrieving revision 1.207
> diff -u -p -r1.207 tcp_usrreq.c
> --- sys/netinet/tcp_usrreq.c3 Sep 2022 22:43:38 -   1.207
> +++ sys/netinet/tcp_usrreq.c10 Sep 2022 18:51:42 -
> @@ -792,18 +792,17 @@ out:
>  /*
>   * After a receive, possibly send window update to peer.
>   */
> -int
> +void
>  tcp_rcvd(struct socket *so)
>  {
> struct inpcb *inp;
> struct tcpcb *tp;
> -   int error;
> short ostate;
>
> soassertlocked(so);
>
> -   if ((error = tcp_sogetpcb(so, , )))
> -   return (error);
> +   if (tcp_sogetpcb(so, , ))
> +   return;
>
> if (so->so_options & SO_DEBUG)
> ostate = tp->t_state;
> @@ -820,7 +819,6 @@ tcp_rcvd(struct socket *so)
>
> if (so->so_options & SO_DEBUG)
> tcp_trace(TA_USER, ostate, tp, tp, NULL, PRU_RCVD, 0);
> -   return (0);
>  }
>
>  /*
> Index: sys/netinet/tcp_var.h
> ===
> RCS file: /cvs/src/sys/netinet/tcp_var.h,v
> retrieving revision 1.157
> diff -u -p -r1.157 tcp_var.h
> --- sys/netinet/tcp_var.h   3 Sep 2022 22:43:38 -   1.157
> +++ sys/netinet/tcp_var.h   10 Sep 2022 18:51:42 -
> @@ -725,7 +725,7 @@ int  tcp_connect(struct socket *, struct
>  int tcp_accept(struct socket *, struct mbuf *);
>  int tcp_disconnect(struct socket *);
>  int tcp_shutdown(struct socket *);
> -int tcp_rcvd(struct socket *);
> +voidtcp_rcvd(struct socket *);
>  int tcp_send(struct socket *, struct mbuf *, struct mbuf *,
>  struct mbuf *);
>  int tcp_abort(struct socket *);
> Index: sys/sys/protosw.h
> ===
> RCS file: /cvs/src/sys/sys/protosw.h,v
> retrieving revision 1.55
> diff -u -p -r1.55 protosw.h
> --- sys/sys/protosw.h   5 Sep 2022 14:56:09 -   1.55
> +++ sys/sys/protosw.h   10 Sep 2022 18:51:42 -
> @@ -72,7 +72,7 @@ struct pr_usrreqs {
> int (*pru_accept)(struct socket *, struct mbuf *);
> int (*pru_disconnect)(struct socket *);
> int (*pru_shutdown)(struct socket *);
> -   int (*pru_rcvd)(struct socket *);
> +   void(*pru_rcvd)(struct socket *);
> int (*pru_send)(struct socket *, struct mbuf *, struct mbuf *,
> struct mbuf *);
> int (*pru_abort)(struct socket *);
> @@ -336,12 

Re: strtonum.3: Use the proper macro for "long long"

2022-09-10 Thread Ingo Schwarze
Hi,

yes, this is completely correct, with one tiny exception that should
be fixed while committing, see in-line below.

Jason, since you already started working on this, could you please
commit this patch with OK schwarze@?

I'm surprised there was still so much .Li in our tree where .Vt
should have been.  These are not even edge cases but completely
unambiguous .Vt.

Note that the mdoc(7) manual deprecates .Li (it is a presentational
macro with an invisible effect - we usually want semantic rather
than presentational markup).  Rare cases exist where it may not be
completely obvious what to use instead, but here it is.

Thanks,
  Ingo


Josiah Frentsos wrote on Sat, Sep 10, 2022 at 12:29:28PM -0400:

> Index: lib/libc/gen/frexp.3
> Index: lib/libc/gen/getgrent.3
> Index: lib/libc/gen/getpwent.3
> Index: lib/libc/gen/getpwnam.3
> Index: lib/libc/gen/glob.3
> Index: lib/libc/gen/isalnum.3
> Index: lib/libc/gen/isalpha.3
> Index: lib/libc/gen/isblank.3
> Index: lib/libc/gen/iscntrl.3
> Index: lib/libc/gen/isdigit.3
> Index: lib/libc/gen/isgraph.3
> Index: lib/libc/gen/islower.3
> Index: lib/libc/gen/isprint.3
> Index: lib/libc/gen/ispunct.3
> Index: lib/libc/gen/isspace.3
> Index: lib/libc/gen/isupper.3
> Index: lib/libc/gen/isxdigit.3
> Index: lib/libc/gen/lockf.3
> Index: lib/libc/gen/login_cap.3
> Index: lib/libc/gen/modf.3
> Index: lib/libc/gen/opendir.3
> Index: lib/libc/gen/setjmp.3
> Index: lib/libc/gen/times.3
> Index: lib/libc/gen/tolower.3
> Index: lib/libc/gen/toupper.3
> Index: lib/libc/gen/uname.3
> Index: lib/libc/gen/utime.3
> Index: lib/libc/locale/localeconv.3
> Index: lib/libc/net/ether_aton.3
> Index: lib/libc/net/getaddrinfo.3
> Index: lib/libc/net/getnameinfo.3
> Index: lib/libc/net/getpeereid.3
> Index: lib/libc/net/getrrsetbyname.3
> Index: lib/libc/net/htonl.3
> ===
> RCS file: /cvs/src/lib/libc/net/htonl.3,v
> retrieving revision 1.5
> diff -u -p -r1.5 htonl.3
> --- lib/libc/net/htonl.3  13 Feb 2019 07:02:09 -  1.5
> +++ lib/libc/net/htonl.3  10 Sep 2022 16:10:01 -
> @@ -66,14 +66,14 @@ or
>  .Sq l )
>  is a mnemonic
>  for the traditional names for such quantities,
> -.Li short
> +.Vt short
>  and
> -.Li long ,
> +.Vt long ,
>  respectively.

This is misleading, as explained in the very next sentence.
I suggest just dropping the .Li markup in these two instances
without any replacement, or .Dq if you insist on some markup.

>  Today, the C concept of
> -.Li short
> +.Vt short
>  and
> -.Li long
> +.Vt long
>  integers need not coincide with this traditional misunderstanding.
>  On machines which have a byte order which is the same as the network
>  order, routines are defined as null macros.

This part is correct.

> Index: lib/libc/net/inet_addr.3
> Index: lib/libc/net/inet_net_ntop.3
> Index: lib/libc/net/inet_ntop.3
> Index: lib/libc/regex/regex.3
> Index: lib/libc/rpc/xdr.3
> Index: lib/libc/stdio/fseek.3
> Index: lib/libc/stdio/getc.3
> Index: lib/libc/stdio/putc.3
> Index: lib/libc/stdio/ungetc.3
> Index: lib/libc/stdlib/atof.3
> Index: lib/libc/stdlib/atoi.3
> Index: lib/libc/stdlib/atol.3
> Index: lib/libc/stdlib/atoll.3
> Index: lib/libc/stdlib/div.3
> Index: lib/libc/stdlib/getopt_long.3
> Index: lib/libc/stdlib/imaxdiv.3
> Index: lib/libc/stdlib/ldiv.3
> Index: lib/libc/stdlib/lldiv.3
> Index: lib/libc/stdlib/strtod.3
> Index: lib/libc/stdlib/strtonum.3
> Index: lib/libc/string/memccpy.3
> Index: lib/libc/string/memchr.3
> Index: lib/libc/string/memcmp.3
> Index: lib/libc/string/memset.3
> Index: lib/libc/sys/accept.2
> Index: lib/libc/sys/fcntl.2
> Index: lib/libc/sys/getpeername.2
> Index: lib/libc/sys/getrlimit.2
> Index: lib/libc/sys/getsockname.2
> Index: lib/libc/sys/getsockopt.2
> Index: lib/libc/sys/ioctl.2
> Index: lib/libc/sys/ptrace.2
> Index: lib/libc/sys/quotactl.2
> Index: lib/libc/termios/tcsetattr.3
> Index: lib/libc/time/ctime.3
> Index: lib/libradius/radius_new_request_packet.3
> Index: share/man/man3/bit_alloc.3
> Index: share/man/man3/dl_iterate_phdr.3
> Index: share/man/man4/bpf.4
> Index: share/man/man4/ddb.4
> Index: share/man/man4/openprom.4
> Index: share/man/man4/options.4
> Index: share/man/man4/speaker.4
> Index: share/man/man5/ranlib.5
> Index: share/man/man8/crash.8
> Index: share/man/man9/printf.9
> Index: share/man/man9/socreate.9
> Index: share/man/man9/style.9
> Index: usr.bin/ssh/sshd.8
> Index: usr.sbin/zdump/zdump.8



uvm_vnode locking & documentation

2022-09-10 Thread Martin Pieuchot
Previous fix from gnezdo@ pointed out that `u_flags' accesses should be
serialized by `vmobjlock'.  Diff below documents this and fix the
remaining places where the lock isn't yet taken.  One exception still
remains, the first loop of uvm_vnp_sync().  This cannot be fixed right
now due to possible deadlocks but that's not a reason for not documenting
& fixing the rest of this file.

This has been tested on amd64 and arm64.

Comments?  Oks?

Index: uvm/uvm_vnode.c
===
RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.128
diff -u -p -r1.128 uvm_vnode.c
--- uvm/uvm_vnode.c 10 Sep 2022 16:14:36 -  1.128
+++ uvm/uvm_vnode.c 10 Sep 2022 18:23:57 -
@@ -68,11 +68,8 @@
  * we keep a simpleq of vnodes that are currently being sync'd.
  */
 
-LIST_HEAD(uvn_list_struct, uvm_vnode);
-struct uvn_list_struct uvn_wlist;  /* writeable uvns */
-
-SIMPLEQ_HEAD(uvn_sq_struct, uvm_vnode);
-struct uvn_sq_struct uvn_sync_q;   /* sync'ing uvns */
+LIST_HEAD(, uvm_vnode) uvn_wlist;  /* [K] writeable uvns */
+SIMPLEQ_HEAD(, uvm_vnode)  uvn_sync_q; /* [S] sync'ing uvns */
 struct rwlock uvn_sync_lock;   /* locks sync operation */
 
 extern int rebooting;
@@ -144,41 +141,40 @@ uvn_attach(struct vnode *vp, vm_prot_t a
struct partinfo pi;
u_quad_t used_vnode_size = 0;
 
-   /* first get a lock on the uvn. */
-   while (uvn->u_flags & UVM_VNODE_BLOCKED) {
-   uvn->u_flags |= UVM_VNODE_WANTED;
-   tsleep_nsec(uvn, PVM, "uvn_attach", INFSLP);
-   }
-
/* if we're mapping a BLK device, make sure it is a disk. */
if (vp->v_type == VBLK && bdevsw[major(vp->v_rdev)].d_type != D_DISK) {
return NULL;
}
 
+   /* first get a lock on the uvn. */
+   rw_enter(uvn->u_obj.vmobjlock, RW_WRITE);
+   while (uvn->u_flags & UVM_VNODE_BLOCKED) {
+   uvn->u_flags |= UVM_VNODE_WANTED;
+   rwsleep_nsec(uvn, uvn->u_obj.vmobjlock, PVM, "uvn_attach",
+   INFSLP);
+   }
+
/*
 * now uvn must not be in a blocked state.
 * first check to see if it is already active, in which case
 * we can bump the reference count, check to see if we need to
 * add it to the writeable list, and then return.
 */
-   rw_enter(uvn->u_obj.vmobjlock, RW_WRITE);
if (uvn->u_flags & UVM_VNODE_VALID) {   /* already active? */
KASSERT(uvn->u_obj.uo_refs > 0);
 
uvn->u_obj.uo_refs++;   /* bump uvn ref! */
-   rw_exit(uvn->u_obj.vmobjlock);
 
/* check for new writeable uvn */
if ((accessprot & PROT_WRITE) != 0 &&
(uvn->u_flags & UVM_VNODE_WRITEABLE) == 0) {
-   LIST_INSERT_HEAD(_wlist, uvn, u_wlist);
-   /* we are now on wlist! */
uvn->u_flags |= UVM_VNODE_WRITEABLE;
+   LIST_INSERT_HEAD(_wlist, uvn, u_wlist);
}
+   rw_exit(uvn->u_obj.vmobjlock);
 
return (>u_obj);
}
-   rw_exit(uvn->u_obj.vmobjlock);
 
/*
 * need to call VOP_GETATTR() to get the attributes, but that could
@@ -189,6 +185,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a
 * it.
 */
uvn->u_flags = UVM_VNODE_ALOCK;
+   rw_exit(uvn->u_obj.vmobjlock);
 
if (vp->v_type == VBLK) {
/*
@@ -213,9 +210,11 @@ uvn_attach(struct vnode *vp, vm_prot_t a
}
 
if (result != 0) {
+   rw_enter(uvn->u_obj.vmobjlock, RW_WRITE);
if (uvn->u_flags & UVM_VNODE_WANTED)
wakeup(uvn);
uvn->u_flags = 0;
+   rw_exit(uvn->u_obj.vmobjlock);
return NULL;
}
 
@@ -236,18 +235,19 @@ uvn_attach(struct vnode *vp, vm_prot_t a
uvn->u_nio = 0;
uvn->u_size = used_vnode_size;
 
-   /* if write access, we need to add it to the wlist */
-   if (accessprot & PROT_WRITE) {
-   LIST_INSERT_HEAD(_wlist, uvn, u_wlist);
-   uvn->u_flags |= UVM_VNODE_WRITEABLE;/* we are on wlist! */
-   }
-
/*
 * add a reference to the vnode.   this reference will stay as long
 * as there is a valid mapping of the vnode.   dropped when the
 * reference count goes to zero.
 */
vref(vp);
+
+   /* if write access, we need to add it to the wlist */
+   if (accessprot & PROT_WRITE) {
+   uvn->u_flags |= UVM_VNODE_WRITEABLE;
+   LIST_INSERT_HEAD(_wlist, uvn, u_wlist);
+   }
+
if (oldflags & UVM_VNODE_WANTED)
wakeup(uvn);
 
@@ -273,6 +273,7 @@ uvn_reference(struct uvm_object *uobj)
struct uvm_vnode *uvn = (struct uvm_vnode *) uobj;
 #endif
 
+   

soreceive() with shared netlock for raw sockets

2022-09-10 Thread Vitaliy Makkoveev
As it was done for udp and divert sockets.

Index: sys/netinet/ip_var.h
===
RCS file: /cvs/src/sys/netinet/ip_var.h,v
retrieving revision 1.104
diff -u -p -r1.104 ip_var.h
--- sys/netinet/ip_var.h3 Sep 2022 22:43:38 -   1.104
+++ sys/netinet/ip_var.h10 Sep 2022 19:41:56 -
@@ -258,6 +258,8 @@ int  rip_output(struct mbuf *, struct so
struct mbuf *);
 int rip_attach(struct socket *, int);
 int rip_detach(struct socket *);
+voidrip_lock(struct socket *);
+voidrip_unlock(struct socket *);
 int rip_bind(struct socket *so, struct mbuf *, struct proc *);
 int rip_connect(struct socket *, struct mbuf *);
 int rip_disconnect(struct socket *);
Index: sys/netinet/raw_ip.c
===
RCS file: /cvs/src/sys/netinet/raw_ip.c,v
retrieving revision 1.147
diff -u -p -r1.147 raw_ip.c
--- sys/netinet/raw_ip.c3 Sep 2022 22:43:38 -   1.147
+++ sys/netinet/raw_ip.c10 Sep 2022 19:41:56 -
@@ -106,6 +106,8 @@ struct inpcbtable rawcbtable;
 const struct pr_usrreqs rip_usrreqs = {
.pru_attach = rip_attach,
.pru_detach = rip_detach,
+   .pru_lock   = rip_lock,
+   .pru_unlock = rip_unlock,
.pru_bind   = rip_bind,
.pru_connect= rip_connect,
.pru_disconnect = rip_disconnect,
@@ -220,12 +222,19 @@ rip_input(struct mbuf **mp, int *offp, i
else
n = m_copym(m, 0, M_COPYALL, M_NOWAIT);
if (n != NULL) {
+   int ret;
+
if (inp->inp_flags & INP_CONTROLOPTS ||
inp->inp_socket->so_options & SO_TIMESTAMP)
ip_savecontrol(inp, , ip, n);
-   if (sbappendaddr(inp->inp_socket,
+
+   mtx_enter(>inp_mtx);
+   ret = sbappendaddr(inp->inp_socket,
>inp_socket->so_rcv,
-   sintosa(), n, opts) == 0) {
+   sintosa(), n, opts);
+   mtx_leave(>inp_mtx);
+
+   if (ret == 0) {
/* should notify about lost packet */
m_freem(n);
m_freem(opts);
@@ -498,6 +507,24 @@ rip_detach(struct socket *so)
in_pcbdetach(inp);
 
return (0);
+}
+
+void
+rip_lock(struct socket *so)
+{
+   struct inpcb *inp = sotoinpcb(so);
+
+   NET_ASSERT_LOCKED();
+   mtx_enter(>inp_mtx);
+}
+
+void
+rip_unlock(struct socket *so)
+{
+   struct inpcb *inp = sotoinpcb(so);
+
+   NET_ASSERT_LOCKED();
+   mtx_leave(>inp_mtx);
 }
 
 int
Index: sys/netinet6/ip6_var.h
===
RCS file: /cvs/src/sys/netinet6/ip6_var.h,v
retrieving revision 1.102
diff -u -p -r1.102 ip6_var.h
--- sys/netinet6/ip6_var.h  3 Sep 2022 22:43:38 -   1.102
+++ sys/netinet6/ip6_var.h  10 Sep 2022 19:41:56 -
@@ -353,6 +353,8 @@ int rip6_output(struct mbuf *, struct so
struct mbuf *);
 intrip6_attach(struct socket *, int);
 intrip6_detach(struct socket *);
+void   rip6_lock(struct socket *);
+void   rip6_unlock(struct socket *);
 intrip6_bind(struct socket *, struct mbuf *, struct proc *);
 intrip6_connect(struct socket *, struct mbuf *);
 intrip6_disconnect(struct socket *);
Index: sys/netinet6/raw_ip6.c
===
RCS file: /cvs/src/sys/netinet6/raw_ip6.c,v
retrieving revision 1.168
diff -u -p -r1.168 raw_ip6.c
--- sys/netinet6/raw_ip6.c  3 Sep 2022 22:43:38 -   1.168
+++ sys/netinet6/raw_ip6.c  10 Sep 2022 19:41:56 -
@@ -108,6 +108,8 @@ struct cpumem *rip6counters;
 const struct pr_usrreqs rip6_usrreqs = {
.pru_attach = rip6_attach,
.pru_detach = rip6_detach,
+   .pru_lock   = rip6_lock,
+   .pru_unlock = rip6_unlock,
.pru_bind   = rip6_bind,
.pru_connect= rip6_connect,
.pru_disconnect = rip6_disconnect,
@@ -261,13 +263,20 @@ rip6_input(struct mbuf **mp, int *offp, 
else
n = m_copym(m, 0, M_COPYALL, M_NOWAIT);
if (n != NULL) {
+   int ret;
+
if (in6p->inp_flags & IN6P_CONTROLOPTS)
ip6_savecontrol(in6p, n, );
/* strip intermediate headers */
m_adj(n, *offp);
-   if (sbappendaddr(in6p->inp_socket,
+
+   mtx_enter(>inp_mtx);
+   ret = sbappendaddr(in6p->inp_socket,
>inp_socket->so_rcv,
-   sin6tosa(), n, opts) == 0) {
+   

Change pru_rcvd() return type to the type of void

2022-09-10 Thread Vitaliy Makkoveev
We have no interest on pru_rcvd() return value. Also, we call pru_rcvd()
only if the socket's protocol have PR_WANTRCVD flag set. Such sockets
are route domain, tcp(4) and unix(4) sockets.

This diff keeps the PR_WANTRCVD check. In other hand we could always
call pru_rcvd() and do "pru_rcvd != NULL" check within, but in the
future with per buffer locking, we could have some re-locking around
pru_rcvd() call and I want to do it outside wrapper.


Index: sys/kern/uipc_usrreq.c
===
RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.185
diff -u -p -r1.185 uipc_usrreq.c
--- sys/kern/uipc_usrreq.c  3 Sep 2022 22:43:38 -   1.185
+++ sys/kern/uipc_usrreq.c  10 Sep 2022 18:51:42 -
@@ -363,7 +363,7 @@ uipc_shutdown(struct socket *so)
return (0);
 }
 
-int
+void
 uipc_rcvd(struct socket *so)
 {
struct socket *so2;
@@ -390,8 +390,6 @@ uipc_rcvd(struct socket *so)
default:
panic("uipc 2");
}
-
-   return (0);
 }
 
 int
Index: sys/net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.355
diff -u -p -r1.355 rtsock.c
--- sys/net/rtsock.c8 Sep 2022 10:22:06 -   1.355
+++ sys/net/rtsock.c10 Sep 2022 18:51:42 -
@@ -115,7 +115,7 @@ int route_attach(struct socket *, int);
 introute_detach(struct socket *);
 introute_disconnect(struct socket *);
 introute_shutdown(struct socket *);
-introute_rcvd(struct socket *);
+void   route_rcvd(struct socket *);
 introute_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *);
 introute_abort(struct socket *);
@@ -299,7 +299,7 @@ route_shutdown(struct socket *so)
return (0);
 }
 
-int
+void
 route_rcvd(struct socket *so)
 {
struct rtpcb *rop = sotortpcb(so);
@@ -314,8 +314,6 @@ route_rcvd(struct socket *so)
((sbspace(rop->rop_socket, >rop_socket->so_rcv) ==
rop->rop_socket->so_rcv.sb_hiwat)))
rop->rop_flags &= ~ROUTECB_FLAG_FLUSH;
-
-   return (0);
 }
 
 int
Index: sys/netinet/tcp_usrreq.c
===
RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.207
diff -u -p -r1.207 tcp_usrreq.c
--- sys/netinet/tcp_usrreq.c3 Sep 2022 22:43:38 -   1.207
+++ sys/netinet/tcp_usrreq.c10 Sep 2022 18:51:42 -
@@ -792,18 +792,17 @@ out:
 /*
  * After a receive, possibly send window update to peer.
  */
-int
+void
 tcp_rcvd(struct socket *so)
 {
struct inpcb *inp;
struct tcpcb *tp;
-   int error;
short ostate;
 
soassertlocked(so);
 
-   if ((error = tcp_sogetpcb(so, , )))
-   return (error);
+   if (tcp_sogetpcb(so, , ))
+   return;
 
if (so->so_options & SO_DEBUG)
ostate = tp->t_state;
@@ -820,7 +819,6 @@ tcp_rcvd(struct socket *so)
 
if (so->so_options & SO_DEBUG)
tcp_trace(TA_USER, ostate, tp, tp, NULL, PRU_RCVD, 0);
-   return (0);
 }
 
 /*
Index: sys/netinet/tcp_var.h
===
RCS file: /cvs/src/sys/netinet/tcp_var.h,v
retrieving revision 1.157
diff -u -p -r1.157 tcp_var.h
--- sys/netinet/tcp_var.h   3 Sep 2022 22:43:38 -   1.157
+++ sys/netinet/tcp_var.h   10 Sep 2022 18:51:42 -
@@ -725,7 +725,7 @@ int  tcp_connect(struct socket *, struct
 int tcp_accept(struct socket *, struct mbuf *);
 int tcp_disconnect(struct socket *);
 int tcp_shutdown(struct socket *);
-int tcp_rcvd(struct socket *);
+voidtcp_rcvd(struct socket *);
 int tcp_send(struct socket *, struct mbuf *, struct mbuf *,
 struct mbuf *);
 int tcp_abort(struct socket *);
Index: sys/sys/protosw.h
===
RCS file: /cvs/src/sys/sys/protosw.h,v
retrieving revision 1.55
diff -u -p -r1.55 protosw.h
--- sys/sys/protosw.h   5 Sep 2022 14:56:09 -   1.55
+++ sys/sys/protosw.h   10 Sep 2022 18:51:42 -
@@ -72,7 +72,7 @@ struct pr_usrreqs {
int (*pru_accept)(struct socket *, struct mbuf *);
int (*pru_disconnect)(struct socket *);
int (*pru_shutdown)(struct socket *);
-   int (*pru_rcvd)(struct socket *);
+   void(*pru_rcvd)(struct socket *);
int (*pru_send)(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *);
int (*pru_abort)(struct socket *);
@@ -336,12 +336,10 @@ pru_shutdown(struct socket *so)
return (*so->so_proto->pr_usrreqs->pru_shutdown)(so);
 }
 
-static inline int
+static inline void
 pru_rcvd(struct socket *so)
 {
-   if (so->so_proto->pr_usrreqs->pru_rcvd)
-   return (*so->so_proto->pr_usrreqs->pru_rcvd)(so);
-   return (EOPNOTSUPP);
+   

Re: immutable userland mappings

2022-09-10 Thread Theo de Raadt
Theo de Raadt  wrote:

> Theo de Raadt  wrote:
> 
> > Theo de Raadt  wrote:
> > 
> > > In this version of the diff, the kernel manages to mark immutable most of
> > > the main binary, and in the shared-binary case, also most of ld.so.  But 
> > > it
> > > cannot mark all of the ELF mapping -- because of two remaining problems 
> > > (RELRO
> > > in .data, and the malloc.c self-protected bookkeeping page in .bss).  I am
> > > looking into various solutions for both of those.
> 
> Yet another version of the diff as I incrementally get it working better.
> Call it version 22..
> 
> Some things of note:
> 
> 1. Some linkers appear to be creating non-aligned relro sections, which
>has security implications as they cannot be mprotected correctly.  I
>am happy this work has exposed the problem as severe.  I have a
>workaround in ld.so for now that allows these cases to work, and
>later on we can perhaps add a warning to ld.so to identify these
>linkers and get them fixed.
> 
> 2. But the relro is still not handled perfectly, and I hope someone else's
>eyes can compare addresses and spot what's wrong.
> 
> 3. ld.so has to cut the list of mutable mappings from the immutable mappings,
>before applying them (late, to satisfy 1).This is in 
> _dl_apply_mutable().
>After redoing this a couple of times, I am still not proud of it.
> 
> 4. uvm_unmap_remove() must walk the entries in the region twice.  It cannot
>do unmapping work until it knows the region is completely muteable.  This
>might turn into a performance issue.
> 
> 5. binutils ld support completely untested, I mainly went in there to fix
>objdump and readelf.
> 
> 6. It would be nice to hear of a pkg that actually has a problem with this
>change.  I haven't found any yet but don't run many myself.
> 
> If anyone wants to debug issues, uncomment the // _dl_printf's in ld.so,
> and expect a lot of noise.  Then do something like "ktrace -di program
> >& log", and generate a kdump seperately.  It also helps if you test
> with programs that don't exit, so you can procmap -a -p $pid.  In the
> kdump output, it is important to look for "mprotect -1", because that
> provides evidence of the worst (silent) problems...

Oops, 1 line error in the diff, try this instead.

And to point 1 above, notice that a rare page of shared library mapping
is not immutable, right on the edge of the relro...

Index: gnu/llvm/lld/ELF/ScriptParser.cpp
===
RCS file: /cvs/src/gnu/llvm/lld/ELF/ScriptParser.cpp,v
retrieving revision 1.1.1.4
diff -u -p -u -r1.1.1.4 ScriptParser.cpp
--- gnu/llvm/lld/ELF/ScriptParser.cpp   17 Dec 2021 12:25:02 -  1.1.1.4
+++ gnu/llvm/lld/ELF/ScriptParser.cpp   2 Sep 2022 15:23:20 -
@@ -1478,6 +1478,7 @@ unsigned ScriptParser::readPhdrType() {
  .Case("PT_GNU_EH_FRAME", PT_GNU_EH_FRAME)
  .Case("PT_GNU_STACK", PT_GNU_STACK)
  .Case("PT_GNU_RELRO", PT_GNU_RELRO)
+ .Case("PT_OPENBSD_MUTABLE", PT_OPENBSD_MUTABLE)
  .Case("PT_OPENBSD_RANDOMIZE", PT_OPENBSD_RANDOMIZE)
  .Case("PT_OPENBSD_WXNEEDED", PT_OPENBSD_WXNEEDED)
  .Case("PT_OPENBSD_BOOTDATA", PT_OPENBSD_BOOTDATA)
Index: gnu/llvm/lld/ELF/Writer.cpp
===
RCS file: /cvs/src/gnu/llvm/lld/ELF/Writer.cpp,v
retrieving revision 1.3
diff -u -p -u -r1.3 Writer.cpp
--- gnu/llvm/lld/ELF/Writer.cpp 17 Dec 2021 14:46:47 -  1.3
+++ gnu/llvm/lld/ELF/Writer.cpp 2 Sep 2022 21:53:22 -
@@ -146,7 +146,7 @@ StringRef elf::getOutputSectionName(cons
{".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.",
 ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", 
".tbss.",
 ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab.",
-".openbsd.randomdata."})
+".openbsd.randomdata.", ".openbsd.mutable." })
 if (isSectionPrefix(v, s->name))
   return v.drop_back();
 
@@ -2469,6 +2469,12 @@ std::vector Writer::c
   part.ehFrame->getParent() && part.ehFrameHdr->getParent())
 addHdr(PT_GNU_EH_FRAME, part.ehFrameHdr->getParent()->getPhdrFlags())
 ->add(part.ehFrameHdr->getParent());
+
+  // PT_OPENBSD_MUTABLE is an OpenBSD-specific feature. That makes
+  // the dynamic linker fill the segment with zero data, like bss, but
+  // it can be treated differently.
+  if (OutputSection *cmd = findSection(".openbsd.mutable", partNo))
+addHdr(PT_OPENBSD_MUTABLE, cmd->getPhdrFlags())->add(cmd);
 
   // PT_OPENBSD_RANDOMIZE is an OpenBSD-specific feature. That makes
   // the dynamic linker fill the segment with random data.
Index: gnu/usr.bin/binutils/bfd/elf.c
===
RCS file: /cvs/src/gnu/usr.bin/binutils/bfd/elf.c,v
retrieving revision 1.23
diff -u -p -u -r1.23 

Re: immutable userland mappings

2022-09-10 Thread Theo de Raadt
Theo de Raadt  wrote:

> Theo de Raadt  wrote:
> 
> > In this version of the diff, the kernel manages to mark immutable most of
> > the main binary, and in the shared-binary case, also most of ld.so.  But it
> > cannot mark all of the ELF mapping -- because of two remaining problems 
> > (RELRO
> > in .data, and the malloc.c self-protected bookkeeping page in .bss).  I am
> > looking into various solutions for both of those.

Yet another version of the diff as I incrementally get it working better.
Call it version 22..

Some things of note:

1. Some linkers appear to be creating non-aligned relro sections, which
   has security implications as they cannot be mprotected correctly.  I
   am happy this work has exposed the problem as severe.  I have a
   workaround in ld.so for now that allows these cases to work, and
   later on we can perhaps add a warning to ld.so to identify these
   linkers and get them fixed.

2. But the relro is still not handled perfectly, and I hope someone else's
   eyes can compare addresses and spot what's wrong.

3. ld.so has to cut the list of mutable mappings from the immutable mappings,
   before applying them (late, to satisfy 1).This is in _dl_apply_mutable().
   After redoing this a couple of times, I am still not proud of it.

4. uvm_unmap_remove() must walk the entries in the region twice.  It cannot
   do unmapping work until it knows the region is completely muteable.  This
   might turn into a performance issue.

5. binutils ld support completely untested, I mainly went in there to fix
   objdump and readelf.

6. It would be nice to hear of a pkg that actually has a problem with this
   change.  I haven't found any yet but don't run many myself.

If anyone wants to debug issues, uncomment the // _dl_printf's in ld.so,
and expect a lot of noise.  Then do something like "ktrace -di program
>& log", and generate a kdump seperately.  It also helps if you test
with programs that don't exit, so you can procmap -a -p $pid.  In the
kdump output, it is important to look for "mprotect -1", because that
provides evidence of the worst (silent) problems...


Index: gnu/llvm/lld/ELF/ScriptParser.cpp
===
RCS file: /cvs/src/gnu/llvm/lld/ELF/ScriptParser.cpp,v
retrieving revision 1.1.1.4
diff -u -p -u -r1.1.1.4 ScriptParser.cpp
--- gnu/llvm/lld/ELF/ScriptParser.cpp   17 Dec 2021 12:25:02 -  1.1.1.4
+++ gnu/llvm/lld/ELF/ScriptParser.cpp   2 Sep 2022 15:23:20 -
@@ -1478,6 +1478,7 @@ unsigned ScriptParser::readPhdrType() {
  .Case("PT_GNU_EH_FRAME", PT_GNU_EH_FRAME)
  .Case("PT_GNU_STACK", PT_GNU_STACK)
  .Case("PT_GNU_RELRO", PT_GNU_RELRO)
+ .Case("PT_OPENBSD_MUTABLE", PT_OPENBSD_MUTABLE)
  .Case("PT_OPENBSD_RANDOMIZE", PT_OPENBSD_RANDOMIZE)
  .Case("PT_OPENBSD_WXNEEDED", PT_OPENBSD_WXNEEDED)
  .Case("PT_OPENBSD_BOOTDATA", PT_OPENBSD_BOOTDATA)
Index: gnu/llvm/lld/ELF/Writer.cpp
===
RCS file: /cvs/src/gnu/llvm/lld/ELF/Writer.cpp,v
retrieving revision 1.3
diff -u -p -u -r1.3 Writer.cpp
--- gnu/llvm/lld/ELF/Writer.cpp 17 Dec 2021 14:46:47 -  1.3
+++ gnu/llvm/lld/ELF/Writer.cpp 2 Sep 2022 21:53:22 -
@@ -146,7 +146,7 @@ StringRef elf::getOutputSectionName(cons
{".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.",
 ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", 
".tbss.",
 ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab.",
-".openbsd.randomdata."})
+".openbsd.randomdata.", ".openbsd.mutable." })
 if (isSectionPrefix(v, s->name))
   return v.drop_back();
 
@@ -2469,6 +2469,12 @@ std::vector Writer::c
   part.ehFrame->getParent() && part.ehFrameHdr->getParent())
 addHdr(PT_GNU_EH_FRAME, part.ehFrameHdr->getParent()->getPhdrFlags())
 ->add(part.ehFrameHdr->getParent());
+
+  // PT_OPENBSD_MUTABLE is an OpenBSD-specific feature. That makes
+  // the dynamic linker fill the segment with zero data, like bss, but
+  // it can be treated differently.
+  if (OutputSection *cmd = findSection(".openbsd.mutable", partNo))
+addHdr(PT_OPENBSD_MUTABLE, cmd->getPhdrFlags())->add(cmd);
 
   // PT_OPENBSD_RANDOMIZE is an OpenBSD-specific feature. That makes
   // the dynamic linker fill the segment with random data.
Index: gnu/usr.bin/binutils/bfd/elf.c
===
RCS file: /cvs/src/gnu/usr.bin/binutils/bfd/elf.c,v
retrieving revision 1.23
diff -u -p -u -r1.23 elf.c
--- gnu/usr.bin/binutils/bfd/elf.c  13 Jan 2015 20:05:01 -  1.23
+++ gnu/usr.bin/binutils/bfd/elf.c  10 Sep 2022 07:06:59 -
@@ -969,6 +969,7 @@ _bfd_elf_print_private_bfd_data (bfd *ab
case PT_GNU_EH_FRAME: pt = "EH_FRAME"; break;
case PT_GNU_STACK: pt 

Re: Unmap page in uvm_anon_release()

2022-09-10 Thread Martin Pieuchot
On 10/09/22(Sat) 15:12, Mark Kettenis wrote:
> > Date: Sat, 10 Sep 2022 14:18:02 +0200
> > From: Martin Pieuchot 
> > 
> > Diff below fixes a bug exposed when swapping on arm64.  When an anon is
> > released make sure the all the pmap references to the related page are
> > removed.
> 
> I'm a little bit puzzled by this.  So these pages are still mapped
> even though there are no references to the anon anymore?

I don't know.  I just realised that all the code paths leading to
uvm_pagefree() get rid of the pmap references by calling page_protect()
except a couple of them in the aiodone daemon and the clustering code in
the pager.

This can't hurt and make the existing code coherent.  Maybe it just
hides the bug, I don't know.



Re: EVFILT_TIMER add support for different timer precisions NOTE_{,U,N,M}SECONDS

2022-09-10 Thread Visa Hankala
On Wed, Aug 31, 2022 at 04:48:37PM -0400, aisha wrote:
> I've added a patch which adds support for NOTE_{,U,M,N}SECONDS for
> EVFILT_TIMER in the kqueue interface.

It sort of makes sense to add an option to specify timeouts in
sub-millisecond precision. It feels complete overengineering to add
multiple time units on the level of the kernel interface. However,
it looks that FreeBSD and NetBSD have already done this following
macOS' lead...

> I've also added the NOTE_ABSTIME but haven't done any actual implementation
> there as I am not sure how the `data` field should be interpreted (is it
> absolute time in seconds since epoch?).

I think FreeBSD and NetBSD take NOTE_ABSTIME as time since the epoch.

Below is a revised patch that takes into account some corner cases.
It tries to be API-compatible with FreeBSD and NetBSD. I have adjusted
the NOTE_{,M,U,N}SECONDS flags so that they are enum-like.

The manual page bits are from NetBSD.

It is quite late to introduce a feature like this within this release
cycle. Until now, the timer code has ignored the fflags field. There
might be pieces of software that are careless with struct kevent and
that would break as a result of this patch. Programs that are widely
used on different BSDs are probably fine already, though.

Index: lib/libc/sys/kqueue.2
===
RCS file: src/lib/libc/sys/kqueue.2,v
retrieving revision 1.46
diff -u -p -r1.46 kqueue.2
--- lib/libc/sys/kqueue.2   31 Mar 2022 17:27:16 -  1.46
+++ lib/libc/sys/kqueue.2   10 Sep 2022 13:01:36 -
@@ -457,17 +457,71 @@ Establishes an arbitrary timer identifie
 .Fa ident .
 When adding a timer,
 .Fa data
-specifies the timeout period in milliseconds.
-The timer will be periodic unless
+specifies the timeout period in units described below, or, if
+.Dv NOTE_ABSTIME
+is set in
+.Va fflags ,
+specifies the absolute time at which the timer should fire.
+The timer will repeat unless
 .Dv EV_ONESHOT
-is specified.
+is set in
+.Va flags
+or
+.Dv NOTE_ABSTIME
+is set in
+.Va fflags .
 On return,
 .Fa data
 contains the number of times the timeout has expired since the last call to
 .Fn kevent .
-This filter automatically sets the
+This filter automatically sets
 .Dv EV_CLEAR
-flag internally.
+in
+.Va flags
+for periodic timers.
+Timers created with
+.Dv NOTE_ABSTIME
+remain activated on the kqueue once the absolute time has passed unless
+.Dv EV_CLEAR
+or
+.Dv EV_ONESHOT
+are also specified.
+.Pp
+The filter accepts the following flags in the
+.Va fflags
+argument:
+.Bl -tag -width NOTE_MSECONDS
+.It Dv NOTE_SECONDS
+The timer value in
+.Va data
+is expressed in seconds.
+.It Dv NOTE_MSECONDS
+The timer value in
+.Va data
+is expressed in milliseconds.
+.It Dv NOTE_USECONDS
+The timer value in
+.Va data
+is expressed in microseconds.
+.It Dv NOTE_NSECONDS
+The timer value in
+.Va data
+is expressed in nanoseconds.
+.It Dv NOTE_ABSTIME
+The timer value is an absolute time with
+.Dv CLOCK_REALTIME
+as the reference clock.
+.El
+.Pp
+Note that
+.Dv NOTE_SECONDS ,
+.Dv NOTE_MSECONDS ,
+.Dv NOTE_USECONDS ,
+and
+.Dv NOTE_NSECONDS
+are mutually exclusive; behavior is undefined if more than one are specified.
+If a timer value unit is not specified, the default is
+.Dv NOTE_MSECONDS .
 .Pp
 If an existing timer is re-added, the existing timer and related pending events
 will be cancelled.
@@ -557,6 +611,7 @@ No memory was available to register the 
 The specified process to attach to does not exist.
 .El
 .Sh SEE ALSO
+.Xr clock_gettime 2 ,
 .Xr poll 2 ,
 .Xr read 2 ,
 .Xr select 2 ,
Index: regress/sys/kern/kqueue/kqueue-timer.c
===
RCS file: src/regress/sys/kern/kqueue/kqueue-timer.c,v
retrieving revision 1.4
diff -u -p -r1.4 kqueue-timer.c
--- regress/sys/kern/kqueue/kqueue-timer.c  12 Jun 2021 13:30:14 -  
1.4
+++ regress/sys/kern/kqueue/kqueue-timer.c  10 Sep 2022 13:01:37 -
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,9 +32,13 @@
 int
 do_timer(void)
 {
-   int kq, n;
+   static const int units[] = {
+   NOTE_SECONDS, NOTE_MSECONDS, NOTE_USECONDS, NOTE_NSECONDS
+   };
struct kevent ev;
-   struct timespec ts;
+   struct timespec ts, start, end, now;
+   int64_t usecs;
+   int i, kq, n;
 
ASS((kq = kqueue()) >= 0,
warn("kqueue"));
@@ -68,6 +73,125 @@ do_timer(void)
n = kevent(kq, NULL, 0, , 1, );
ASSX(n == 1);
 
+   /* Test with different time units */
+
+   for (i = 0; i < sizeof(units) / sizeof(units[0]); i++) {
+   memset(, 0, sizeof(ev));
+   ev.filter = EVFILT_TIMER;
+   ev.flags = EV_ADD | EV_ENABLE;
+   ev.fflags = units[i];
+   ev.data = 1;
+
+   n = kevent(kq, , 1, NULL, 0, NULL);
+   ASSX(n != -1);
+
+   

Re: Unmap page in uvm_anon_release()

2022-09-10 Thread Mark Kettenis
> Date: Sat, 10 Sep 2022 14:18:02 +0200
> From: Martin Pieuchot 
> 
> Diff below fixes a bug exposed when swapping on arm64.  When an anon is
> released make sure the all the pmap references to the related page are
> removed.

I'm a little bit puzzled by this.  So these pages are still mapped
even though there are no references to the anon anymore?

> We could move the pmap_page_protect(pg, PROT_NONE) inside uvm_pagefree()
> to avoid future issue but that's for a later refactoring.

I don't think that makes sense.  In cases where pages are explicitly
mapped (instead of faulted in) we call pmap_remove() before we end up
calling uvm_pagefree().  Calling pmap_page_protect() in that case
doesn't make sense.

> With this diff I can no longer reproduce the SIGBUS issue on the
> rockpro64 and swapping is stable as long as I/O from sdmmc(4) work.
> 
> This should be good enough to commit the diff that got reverted, but I'll
> wait to be sure there's no regression.
> 
> ok?
> 
> Index: uvm/uvm_anon.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
> retrieving revision 1.54
> diff -u -p -r1.54 uvm_anon.c
> --- uvm/uvm_anon.c26 Mar 2021 13:40:05 -  1.54
> +++ uvm/uvm_anon.c10 Sep 2022 12:10:34 -
> @@ -255,6 +255,7 @@ uvm_anon_release(struct vm_anon *anon)
>   KASSERT(anon->an_ref == 0);
>  
>   uvm_lock_pageq();
> + pmap_page_protect(pg, PROT_NONE);
>   uvm_pagefree(pg);
>   uvm_unlock_pageq();
>   KASSERT(anon->an_page == NULL);
> Index: uvm/uvm_fault.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> retrieving revision 1.132
> diff -u -p -r1.132 uvm_fault.c
> --- uvm/uvm_fault.c   31 Aug 2022 01:27:04 -  1.132
> +++ uvm/uvm_fault.c   10 Sep 2022 12:10:34 -
> @@ -396,7 +396,6 @@ uvmfault_anonget(struct uvm_faultinfo *u
>* anon and try again.
>*/
>   if (pg->pg_flags & PG_RELEASED) {
> - pmap_page_protect(pg, PROT_NONE);
>   KASSERT(anon->an_ref == 0);
>   /*
>* Released while we had unlocked amap.
> 
> 



Re: [RFC] Adding ESRT and EFI variables for fwupd

2022-09-10 Thread Sergii Dmytruk
Hi Mark,

Any news? I since setup gdb debugging of OVMF and figured out my EFI RT
issues and it returns data fine now (was wrong calling ABI), but I'm not
making /dev/efi as it sounds like you've done it already. Where are you
at with this? Can I help to move this forward?

Cheers,
Sergii

On Tue, Aug 23, 2022 at 07:52:42PM +0300, Sergii Dmytruk wrote:
> Hi Mark,
>
> > I have done some work on adding support for EFI runtime services on
> > OpenBSD/amd64, based on the code for OpenBSD/arm64.  My plan was to
> > implement an ioctl(2) interface that is compatible with FreeBSD's
> >  interface.  Theo objected to putting EFI-related headers
> > in /usr/include/sys, so the EFI-related headers will probably end up
> > in /usr/include/dev/efi (so you'd be include 
> > instead).
>
> Great to hear that you went with FreeBSD's API.  It's a natural choice
> for DragonBSD, and NetBSD chose compatible API on AArch64, so it sounds
> like all BSDs will have an almost identical API.  Having header in a
> different location is a minor thing.
>
> I too was trying to make EFI RT work in [1], it doesn't crash the
> kernel by now, but GetTime() doesn't return proper data either.
>
> [1]: 
> https://github.com/3mdeb/openbsd-src/compare/esrt...3mdeb:openbsd-src:efi-vars
>
> > I hope to have some time to make progress on this next week, so let me
> > come back to you then.
>
> Is the code available anywhere?
>
> By the way, do you plan to have something like `libefivar` provided as
> part of OpenBSD?
>
> Cheers,
> Sergii



Unmap page in uvm_anon_release()

2022-09-10 Thread Martin Pieuchot
Diff below fixes a bug exposed when swapping on arm64.  When an anon is
released make sure the all the pmap references to the related page are
removed.

We could move the pmap_page_protect(pg, PROT_NONE) inside uvm_pagefree()
to avoid future issue but that's for a later refactoring.

With this diff I can no longer reproduce the SIGBUS issue on the
rockpro64 and swapping is stable as long as I/O from sdmmc(4) work.

This should be good enough to commit the diff that got reverted, but I'll
wait to be sure there's no regression.

ok?

Index: uvm/uvm_anon.c
===
RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
retrieving revision 1.54
diff -u -p -r1.54 uvm_anon.c
--- uvm/uvm_anon.c  26 Mar 2021 13:40:05 -  1.54
+++ uvm/uvm_anon.c  10 Sep 2022 12:10:34 -
@@ -255,6 +255,7 @@ uvm_anon_release(struct vm_anon *anon)
KASSERT(anon->an_ref == 0);
 
uvm_lock_pageq();
+   pmap_page_protect(pg, PROT_NONE);
uvm_pagefree(pg);
uvm_unlock_pageq();
KASSERT(anon->an_page == NULL);
Index: uvm/uvm_fault.c
===
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.132
diff -u -p -r1.132 uvm_fault.c
--- uvm/uvm_fault.c 31 Aug 2022 01:27:04 -  1.132
+++ uvm/uvm_fault.c 10 Sep 2022 12:10:34 -
@@ -396,7 +396,6 @@ uvmfault_anonget(struct uvm_faultinfo *u
 * anon and try again.
 */
if (pg->pg_flags & PG_RELEASED) {
-   pmap_page_protect(pg, PROT_NONE);
KASSERT(anon->an_ref == 0);
/*
 * Released while we had unlocked amap.



Re: bgpd optimize bgpctl show rib 10/8 or-longer

2022-09-10 Thread Claudio Jeker
On Fri, Sep 09, 2022 at 07:07:14PM +0200, Theo Buehler wrote:
> On Fri, Sep 09, 2022 at 05:50:17PM +0200, Claudio Jeker wrote:
> > This diff optimized subtree walks. In other words it specifies a subtree
> > (as a prefix/prefixlen combo) and only walks the entries that are under
> > this covering route.
> > 
> > Instead of doing a full table walk this will only walk part of the tree
> > and is therefor much faster if the subtree is small.
> 
> The diff looks good. The two new dump_subtree() functions are currently
> only called with a count of CTL_MSG_HIGH_MARK, so the two
> 
>   if (count == 0)
>   prefix_dump_r(ctx)
> 
> are currently dead code. I assume you anticipate that this might change.

Yes and dump_subtree() is an extension of dump_new() and I want the two
to behave the same. These are generic apis that can be used in various
places.

It would be nice to drop the sync traversals in the long run. A sync
traversal on a big RIB just takes a long time and locks up any other
update. Maybe one day I figure out how to replace the last few ones.
 
-- 
:wq Claudio



pmap_collect and the page daemon

2022-09-10 Thread Miod Vallat
When the kernel is low on memory, the pagedaemon thread will try various
strategies to free memory.

One of those is to ask the pmap layer to free some memory. This is done
in uvm_swapout_threads(), which is roughly a wrapper around the
invocation of pmap_collect() on behalf of all processes.

However, most pmap layers do not implement pmap_collect() and only
provide a stub which does nothing. It doesn't make much sense to iterate
over the process list, only to invoke a function which does absolutely
nothing.

The following diff makes pmap_collect() an optional interface, with
pmaps implementing it defining __HAVE_PMAP_COLLECT. This feature macro
is used to completely omit uvm_swapout_threads() when pmap_collect() is
not available.

Index: arch/alpha/include/pmap.h
===
RCS file: /OpenBSD/src/sys/arch/alpha/include/pmap.h,v
retrieving revision 1.40
diff -u -p -r1.40 pmap.h
--- arch/alpha/include/pmap.h   20 Apr 2016 05:24:18 -  1.40
+++ arch/alpha/include/pmap.h   10 Sep 2022 08:00:10 -
@@ -197,6 +197,8 @@ extern  pt_entry_t *VPT;/* Virtual Page
 
 paddr_t vtophys(vaddr_t);
 
+#define__HAVE_PMAP_COLLECT
+
 /* Machine-specific functions. */
 void   pmap_bootstrap(paddr_t ptaddr, u_int maxasn, u_long ncpuids);
 intpmap_emulate_reference(struct proc *p, vaddr_t v, int user, int type);
Index: arch/amd64/amd64/pmap.c
===
RCS file: /OpenBSD/src/sys/arch/amd64/amd64/pmap.c,v
retrieving revision 1.153
diff -u -p -r1.153 pmap.c
--- arch/amd64/amd64/pmap.c 30 Jun 2022 13:51:24 -  1.153
+++ arch/amd64/amd64/pmap.c 10 Sep 2022 08:00:10 -
@@ -2206,6 +2206,7 @@ pmap_unwire(struct pmap *pmap, vaddr_t v
 #endif
 }
 
+#if 0
 /*
  * pmap_collect: free resources held by a pmap
  *
@@ -2221,10 +,10 @@ pmap_collect(struct pmap *pmap)
 * for its entire address space.
 */
 
-/* pmap_do_remove(pmap, VM_MIN_ADDRESS, VM_MAX_ADDRESS,
+   pmap_do_remove(pmap, VM_MIN_ADDRESS, VM_MAX_ADDRESS,
PMAP_REMOVE_SKIPWIRED);
-*/
 }
+#endif
 
 /*
  * pmap_copy: copy mappings from one pmap to another
Index: arch/arm/arm/pmap7.c
===
RCS file: /OpenBSD/src/sys/arch/arm/arm/pmap7.c,v
retrieving revision 1.63
diff -u -p -r1.63 pmap7.c
--- arch/arm/arm/pmap7.c21 Feb 2022 19:15:58 -  1.63
+++ arch/arm/arm/pmap7.c10 Sep 2022 08:00:10 -
@@ -1743,21 +1743,6 @@ dab_access(trapframe_t *tf, u_int fsr, u
 }
 
 /*
- * pmap_collect: free resources held by a pmap
- *
- * => optional function.
- * => called when a process is swapped out to free memory.
- */
-void
-pmap_collect(pmap_t pm)
-{
-   /*
-* Nothing to do.
-* We don't even need to free-up the process' L1.
-*/
-}
-
-/*
  * Routine:pmap_proc_iflush
  *
  * Function:
Index: arch/arm64/arm64/pmap.c
===
RCS file: /OpenBSD/src/sys/arch/arm64/arm64/pmap.c,v
retrieving revision 1.84
diff -u -p -r1.84 pmap.c
--- arch/arm64/arm64/pmap.c 10 Jan 2022 09:20:27 -  1.84
+++ arch/arm64/arm64/pmap.c 10 Sep 2022 08:00:10 -
@@ -856,24 +856,6 @@ pmap_fill_pte(pmap_t pm, vaddr_t va, pad
 }
 
 /*
- * Garbage collects the physical map system for pages which are
- * no longer used. Success need not be guaranteed -- that is, there
- * may well be pages which are not referenced, but others may be collected
- * Called by the pageout daemon when pages are scarce.
- */
-void
-pmap_collect(pmap_t pm)
-{
-   /* This could return unused v->p table layers which
-* are empty.
-* could malicious programs allocate memory and eat
-* these wired pages? These are allocated via pool.
-* Are there pool functions which could be called
-* to lower the pool usage here?
-*/
-}
-
-/*
  * Fill the given physical page with zeros.
  */
 void
Index: arch/hppa/hppa/pmap.c
===
RCS file: /OpenBSD/src/sys/arch/hppa/hppa/pmap.c,v
retrieving revision 1.177
diff -u -p -r1.177 pmap.c
--- arch/hppa/hppa/pmap.c   14 Sep 2021 16:16:51 -  1.177
+++ arch/hppa/hppa/pmap.c   10 Sep 2022 08:00:10 -
@@ -734,13 +734,6 @@ pmap_reference(struct pmap *pmap)
atomic_inc_int(>pm_obj.uo_refs);
 }
 
-void
-pmap_collect(struct pmap *pmap)
-{
-   DPRINTF(PDB_FOLLOW|PDB_PMAP, ("pmap_collect(%p)\n", pmap));
-   /* nothing yet */
-}
-
 int
 pmap_enter(struct pmap *pmap, vaddr_t va, paddr_t pa, vm_prot_t prot, int 
flags)
 {
Index: arch/hppa/include/param.h
===
RCS file: /OpenBSD/src/sys/arch/hppa/include/param.h,v
retrieving revision 1.47
diff -u -p -r1.47 param.h
--- arch/hppa/include/param.h   14 Sep 2018 13:58:20 -  1.47
+++ 

Re: strtonum.3: Use the proper macro for "long long"

2022-09-10 Thread Jason McIntyre
On Fri, Sep 09, 2022 at 08:06:32PM -0400, Josiah Frentsos wrote:
> Index: strtonum.3
> ===
> RCS file: /cvs/src/lib/libc/stdlib/strtonum.3,v
> retrieving revision 1.18
> diff -u -p -r1.18 strtonum.3
> --- strtonum.37 Feb 2016 20:50:24 -   1.18
> +++ strtonum.310 Sep 2022 00:04:29 -
> @@ -35,7 +35,7 @@ The
>  function converts the string in
>  .Fa nptr
>  to a
> -.Li long long
> +.Vt long long
>  value.
>  The
>  .Fn strtonum
> @@ -56,7 +56,7 @@ or
>  sign.
>  .Pp
>  The remainder of the string is converted to a
> -.Li long long
> +.Vt long long
>  value according to base 10.
>  .Pp
>  The value obtained is then checked against the provided
> 

hi.

i fear this is either incomplete or unwanted:

$ cd /usr/src/lib/libc/stdlib
$ grep ^.Li *.3
atof.3:.Li double
atoi.3:.Li integer
atol.3:.Li long integer
atoll.3:.Li long long integer
div.3:.Li int
getopt_long.3:.Li struct option
imaxdiv.3:.Li imaxdiv_t
imaxdiv.3:.Li intmax_t
insque.3:.Li insque
insque.3:.Li remque
ldiv.3:.Li ldiv_t
ldiv.3:.Li long integer
lldiv.3:.Li lldiv_t
lldiv.3:.Li long long integer
strtod.3:.Li double
strtod.3:.Li float
strtod.3:.Li long double
strtonum.3:.Li long long
strtonum.3:.Li long long

so your fix might be correct (i'm never 100% sure on the various code
mark ups) but it doesn;t address the bigger picture. the example above
is only for libc/stdlib.

maybe ingo has an opinion on whether this needs fixed everywhere or not?

jmc