amd64, i386: lapic_calibrate_timer: panic if timer calibration fails
Hi, In lapic_calibrate_timer() we only conditionally decide to use the lapic timer as our interrupt clock. That is, lapic timer calibration can fail and the system will boot anyway. If after measuring the lapic timer frequency we somehow come up with zero hertz, we do *not* set initclock_func to lapic_initclocks(). Here's the relevant bits from amd64/lapic.c: 554 skip_calibration: 555 printf("%s: apic clock running at %dMHz\n", 556 ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000)); 557 558 if (lapic_per_second != 0) { [...] /* (skip ahead a bit...) */ 588 /* 589 * Now that the timer's calibrated, use the apic timer routines 590 * for all our timing needs.. 591 */ 592 delay_init(lapic_delay, 3000); 593 initclock_func = lapic_initclocks; 594 } 595 } Line 558. The corresponding code is identical in i386/lapic.c. I went ahead and tried it on amd64. If you force lapic_per_second to zero the system still boots, but the secondary CPUs just sit idle. lapic_tval is zero, so when they call lapic_startclock() from cpu_hatch(), nothing happens. The i8254 still sends clock interrupts to CPU0, though, so the system runs in a oddball state where one processor is doing all the work. I don't think that this is the intended behavior. I think this is just an oversight left over from some older code. It would be a lot more sensible to just panic if lapic_per_second is zero here. Patch attached. If a bunch of you prefer to develop a more elaborate fallback scheme where we don't hatch the secondary CPUs in the event that lapic timer calibration fails, we could explore that later. But for now I would prefer to panic and try to spotlight the problem if it ever occurs in the wild. If this change is too risky -- maybe I am breaking someone's weird setup? -- I can wait until after release. Thoughts? Preferences? Index: amd64/amd64/lapic.c === RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v retrieving revision 1.63 diff -u -p -r1.63 lapic.c --- amd64/amd64/lapic.c 10 Sep 2022 01:30:14 - 1.63 +++ amd64/amd64/lapic.c 10 Sep 2022 01:59:52 - @@ -555,43 +555,44 @@ skip_calibration: printf("%s: apic clock running at %dMHz\n", ci->ci_dev->dv_xname, lapic_per_second / (1000 * 1000)); - if (lapic_per_second != 0) { - /* -* reprogram the apic timer to run in periodic mode. -* XXX need to program timer on other cpu's, too. -*/ - lapic_tval = (lapic_per_second * 2) / hz; - lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1); - - lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval); - - /* -* Compute fixed-point ratios between cycles and -* microseconds to avoid having to do any division -* in lapic_delay. -*/ - - tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second; - lapic_frac_usec_per_cycle = tmp; - - tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100; - - lapic_frac_cycle_per_usec = tmp; - - /* -* Compute delay in cycles for likely short delays in usec. -*/ - for (i = 0; i < 26; i++) - lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >> - 32; - - /* -* Now that the timer's calibrated, use the apic timer routines -* for all our timing needs.. -*/ - delay_init(lapic_delay, 3000); - initclock_func = lapic_initclocks; - } + if (lapic_per_second == 0) + panic("%s: apic timer calibration failed", __func__); + + /* +* reprogram the apic timer to run in periodic mode. +* XXX need to program timer on other cpu's, too. +*/ + lapic_tval = (lapic_per_second * 2) / hz; + lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1); + + lapic_timer_periodic(LAPIC_LVTT_M, lapic_tval); + + /* +* Compute fixed-point ratios between cycles and +* microseconds to avoid having to do any division +* in lapic_delay. +*/ + + tmp = (100 * (u_int64_t)1 << 32) / lapic_per_second; + lapic_frac_usec_per_cycle = tmp; + + tmp = (lapic_per_second * (u_int64_t)1 << 32) / 100; + + lapic_frac_cycle_per_usec = tmp; + + /* +* Compute delay in cycles for likely short delays in usec. +*/ + for (i = 0; i < 26; i++) + lapic_delaytab[i] = (lapic_frac_cycle_per_usec * i) >> + 32; + + /* +* Now that the timer's
Re: Change pru_rcvd() return type to the type of void
ok guenther@ (Thanks!) On Sat, Sep 10, 2022 at 10:20 AM Vitaliy Makkoveev wrote: > We have no interest on pru_rcvd() return value. Also, we call pru_rcvd() > only if the socket's protocol have PR_WANTRCVD flag set. Such sockets > are route domain, tcp(4) and unix(4) sockets. > > This diff keeps the PR_WANTRCVD check. In other hand we could always > call pru_rcvd() and do "pru_rcvd != NULL" check within, but in the > future with per buffer locking, we could have some re-locking around > pru_rcvd() call and I want to do it outside wrapper. > > > Index: sys/kern/uipc_usrreq.c > === > RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v > retrieving revision 1.185 > diff -u -p -r1.185 uipc_usrreq.c > --- sys/kern/uipc_usrreq.c 3 Sep 2022 22:43:38 - 1.185 > +++ sys/kern/uipc_usrreq.c 10 Sep 2022 18:51:42 - > @@ -363,7 +363,7 @@ uipc_shutdown(struct socket *so) > return (0); > } > > -int > +void > uipc_rcvd(struct socket *so) > { > struct socket *so2; > @@ -390,8 +390,6 @@ uipc_rcvd(struct socket *so) > default: > panic("uipc 2"); > } > - > - return (0); > } > > int > Index: sys/net/rtsock.c > === > RCS file: /cvs/src/sys/net/rtsock.c,v > retrieving revision 1.355 > diff -u -p -r1.355 rtsock.c > --- sys/net/rtsock.c8 Sep 2022 10:22:06 - 1.355 > +++ sys/net/rtsock.c10 Sep 2022 18:51:42 - > @@ -115,7 +115,7 @@ int route_attach(struct socket *, int); > introute_detach(struct socket *); > introute_disconnect(struct socket *); > introute_shutdown(struct socket *); > -introute_rcvd(struct socket *); > +void route_rcvd(struct socket *); > introute_send(struct socket *, struct mbuf *, struct mbuf *, > struct mbuf *); > introute_abort(struct socket *); > @@ -299,7 +299,7 @@ route_shutdown(struct socket *so) > return (0); > } > > -int > +void > route_rcvd(struct socket *so) > { > struct rtpcb *rop = sotortpcb(so); > @@ -314,8 +314,6 @@ route_rcvd(struct socket *so) > ((sbspace(rop->rop_socket, >rop_socket->so_rcv) == > rop->rop_socket->so_rcv.sb_hiwat))) > rop->rop_flags &= ~ROUTECB_FLAG_FLUSH; > - > - return (0); > } > > int > Index: sys/netinet/tcp_usrreq.c > === > RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v > retrieving revision 1.207 > diff -u -p -r1.207 tcp_usrreq.c > --- sys/netinet/tcp_usrreq.c3 Sep 2022 22:43:38 - 1.207 > +++ sys/netinet/tcp_usrreq.c10 Sep 2022 18:51:42 - > @@ -792,18 +792,17 @@ out: > /* > * After a receive, possibly send window update to peer. > */ > -int > +void > tcp_rcvd(struct socket *so) > { > struct inpcb *inp; > struct tcpcb *tp; > - int error; > short ostate; > > soassertlocked(so); > > - if ((error = tcp_sogetpcb(so, , ))) > - return (error); > + if (tcp_sogetpcb(so, , )) > + return; > > if (so->so_options & SO_DEBUG) > ostate = tp->t_state; > @@ -820,7 +819,6 @@ tcp_rcvd(struct socket *so) > > if (so->so_options & SO_DEBUG) > tcp_trace(TA_USER, ostate, tp, tp, NULL, PRU_RCVD, 0); > - return (0); > } > > /* > Index: sys/netinet/tcp_var.h > === > RCS file: /cvs/src/sys/netinet/tcp_var.h,v > retrieving revision 1.157 > diff -u -p -r1.157 tcp_var.h > --- sys/netinet/tcp_var.h 3 Sep 2022 22:43:38 - 1.157 > +++ sys/netinet/tcp_var.h 10 Sep 2022 18:51:42 - > @@ -725,7 +725,7 @@ int tcp_connect(struct socket *, struct > int tcp_accept(struct socket *, struct mbuf *); > int tcp_disconnect(struct socket *); > int tcp_shutdown(struct socket *); > -int tcp_rcvd(struct socket *); > +voidtcp_rcvd(struct socket *); > int tcp_send(struct socket *, struct mbuf *, struct mbuf *, > struct mbuf *); > int tcp_abort(struct socket *); > Index: sys/sys/protosw.h > === > RCS file: /cvs/src/sys/sys/protosw.h,v > retrieving revision 1.55 > diff -u -p -r1.55 protosw.h > --- sys/sys/protosw.h 5 Sep 2022 14:56:09 - 1.55 > +++ sys/sys/protosw.h 10 Sep 2022 18:51:42 - > @@ -72,7 +72,7 @@ struct pr_usrreqs { > int (*pru_accept)(struct socket *, struct mbuf *); > int (*pru_disconnect)(struct socket *); > int (*pru_shutdown)(struct socket *); > - int (*pru_rcvd)(struct socket *); > + void(*pru_rcvd)(struct socket *); > int (*pru_send)(struct socket *, struct mbuf *, struct mbuf *, > struct mbuf *); > int (*pru_abort)(struct socket *); > @@ -336,12
Re: strtonum.3: Use the proper macro for "long long"
Hi, yes, this is completely correct, with one tiny exception that should be fixed while committing, see in-line below. Jason, since you already started working on this, could you please commit this patch with OK schwarze@? I'm surprised there was still so much .Li in our tree where .Vt should have been. These are not even edge cases but completely unambiguous .Vt. Note that the mdoc(7) manual deprecates .Li (it is a presentational macro with an invisible effect - we usually want semantic rather than presentational markup). Rare cases exist where it may not be completely obvious what to use instead, but here it is. Thanks, Ingo Josiah Frentsos wrote on Sat, Sep 10, 2022 at 12:29:28PM -0400: > Index: lib/libc/gen/frexp.3 > Index: lib/libc/gen/getgrent.3 > Index: lib/libc/gen/getpwent.3 > Index: lib/libc/gen/getpwnam.3 > Index: lib/libc/gen/glob.3 > Index: lib/libc/gen/isalnum.3 > Index: lib/libc/gen/isalpha.3 > Index: lib/libc/gen/isblank.3 > Index: lib/libc/gen/iscntrl.3 > Index: lib/libc/gen/isdigit.3 > Index: lib/libc/gen/isgraph.3 > Index: lib/libc/gen/islower.3 > Index: lib/libc/gen/isprint.3 > Index: lib/libc/gen/ispunct.3 > Index: lib/libc/gen/isspace.3 > Index: lib/libc/gen/isupper.3 > Index: lib/libc/gen/isxdigit.3 > Index: lib/libc/gen/lockf.3 > Index: lib/libc/gen/login_cap.3 > Index: lib/libc/gen/modf.3 > Index: lib/libc/gen/opendir.3 > Index: lib/libc/gen/setjmp.3 > Index: lib/libc/gen/times.3 > Index: lib/libc/gen/tolower.3 > Index: lib/libc/gen/toupper.3 > Index: lib/libc/gen/uname.3 > Index: lib/libc/gen/utime.3 > Index: lib/libc/locale/localeconv.3 > Index: lib/libc/net/ether_aton.3 > Index: lib/libc/net/getaddrinfo.3 > Index: lib/libc/net/getnameinfo.3 > Index: lib/libc/net/getpeereid.3 > Index: lib/libc/net/getrrsetbyname.3 > Index: lib/libc/net/htonl.3 > === > RCS file: /cvs/src/lib/libc/net/htonl.3,v > retrieving revision 1.5 > diff -u -p -r1.5 htonl.3 > --- lib/libc/net/htonl.3 13 Feb 2019 07:02:09 - 1.5 > +++ lib/libc/net/htonl.3 10 Sep 2022 16:10:01 - > @@ -66,14 +66,14 @@ or > .Sq l ) > is a mnemonic > for the traditional names for such quantities, > -.Li short > +.Vt short > and > -.Li long , > +.Vt long , > respectively. This is misleading, as explained in the very next sentence. I suggest just dropping the .Li markup in these two instances without any replacement, or .Dq if you insist on some markup. > Today, the C concept of > -.Li short > +.Vt short > and > -.Li long > +.Vt long > integers need not coincide with this traditional misunderstanding. > On machines which have a byte order which is the same as the network > order, routines are defined as null macros. This part is correct. > Index: lib/libc/net/inet_addr.3 > Index: lib/libc/net/inet_net_ntop.3 > Index: lib/libc/net/inet_ntop.3 > Index: lib/libc/regex/regex.3 > Index: lib/libc/rpc/xdr.3 > Index: lib/libc/stdio/fseek.3 > Index: lib/libc/stdio/getc.3 > Index: lib/libc/stdio/putc.3 > Index: lib/libc/stdio/ungetc.3 > Index: lib/libc/stdlib/atof.3 > Index: lib/libc/stdlib/atoi.3 > Index: lib/libc/stdlib/atol.3 > Index: lib/libc/stdlib/atoll.3 > Index: lib/libc/stdlib/div.3 > Index: lib/libc/stdlib/getopt_long.3 > Index: lib/libc/stdlib/imaxdiv.3 > Index: lib/libc/stdlib/ldiv.3 > Index: lib/libc/stdlib/lldiv.3 > Index: lib/libc/stdlib/strtod.3 > Index: lib/libc/stdlib/strtonum.3 > Index: lib/libc/string/memccpy.3 > Index: lib/libc/string/memchr.3 > Index: lib/libc/string/memcmp.3 > Index: lib/libc/string/memset.3 > Index: lib/libc/sys/accept.2 > Index: lib/libc/sys/fcntl.2 > Index: lib/libc/sys/getpeername.2 > Index: lib/libc/sys/getrlimit.2 > Index: lib/libc/sys/getsockname.2 > Index: lib/libc/sys/getsockopt.2 > Index: lib/libc/sys/ioctl.2 > Index: lib/libc/sys/ptrace.2 > Index: lib/libc/sys/quotactl.2 > Index: lib/libc/termios/tcsetattr.3 > Index: lib/libc/time/ctime.3 > Index: lib/libradius/radius_new_request_packet.3 > Index: share/man/man3/bit_alloc.3 > Index: share/man/man3/dl_iterate_phdr.3 > Index: share/man/man4/bpf.4 > Index: share/man/man4/ddb.4 > Index: share/man/man4/openprom.4 > Index: share/man/man4/options.4 > Index: share/man/man4/speaker.4 > Index: share/man/man5/ranlib.5 > Index: share/man/man8/crash.8 > Index: share/man/man9/printf.9 > Index: share/man/man9/socreate.9 > Index: share/man/man9/style.9 > Index: usr.bin/ssh/sshd.8 > Index: usr.sbin/zdump/zdump.8
uvm_vnode locking & documentation
Previous fix from gnezdo@ pointed out that `u_flags' accesses should be serialized by `vmobjlock'. Diff below documents this and fix the remaining places where the lock isn't yet taken. One exception still remains, the first loop of uvm_vnp_sync(). This cannot be fixed right now due to possible deadlocks but that's not a reason for not documenting & fixing the rest of this file. This has been tested on amd64 and arm64. Comments? Oks? Index: uvm/uvm_vnode.c === RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v retrieving revision 1.128 diff -u -p -r1.128 uvm_vnode.c --- uvm/uvm_vnode.c 10 Sep 2022 16:14:36 - 1.128 +++ uvm/uvm_vnode.c 10 Sep 2022 18:23:57 - @@ -68,11 +68,8 @@ * we keep a simpleq of vnodes that are currently being sync'd. */ -LIST_HEAD(uvn_list_struct, uvm_vnode); -struct uvn_list_struct uvn_wlist; /* writeable uvns */ - -SIMPLEQ_HEAD(uvn_sq_struct, uvm_vnode); -struct uvn_sq_struct uvn_sync_q; /* sync'ing uvns */ +LIST_HEAD(, uvm_vnode) uvn_wlist; /* [K] writeable uvns */ +SIMPLEQ_HEAD(, uvm_vnode) uvn_sync_q; /* [S] sync'ing uvns */ struct rwlock uvn_sync_lock; /* locks sync operation */ extern int rebooting; @@ -144,41 +141,40 @@ uvn_attach(struct vnode *vp, vm_prot_t a struct partinfo pi; u_quad_t used_vnode_size = 0; - /* first get a lock on the uvn. */ - while (uvn->u_flags & UVM_VNODE_BLOCKED) { - uvn->u_flags |= UVM_VNODE_WANTED; - tsleep_nsec(uvn, PVM, "uvn_attach", INFSLP); - } - /* if we're mapping a BLK device, make sure it is a disk. */ if (vp->v_type == VBLK && bdevsw[major(vp->v_rdev)].d_type != D_DISK) { return NULL; } + /* first get a lock on the uvn. */ + rw_enter(uvn->u_obj.vmobjlock, RW_WRITE); + while (uvn->u_flags & UVM_VNODE_BLOCKED) { + uvn->u_flags |= UVM_VNODE_WANTED; + rwsleep_nsec(uvn, uvn->u_obj.vmobjlock, PVM, "uvn_attach", + INFSLP); + } + /* * now uvn must not be in a blocked state. * first check to see if it is already active, in which case * we can bump the reference count, check to see if we need to * add it to the writeable list, and then return. */ - rw_enter(uvn->u_obj.vmobjlock, RW_WRITE); if (uvn->u_flags & UVM_VNODE_VALID) { /* already active? */ KASSERT(uvn->u_obj.uo_refs > 0); uvn->u_obj.uo_refs++; /* bump uvn ref! */ - rw_exit(uvn->u_obj.vmobjlock); /* check for new writeable uvn */ if ((accessprot & PROT_WRITE) != 0 && (uvn->u_flags & UVM_VNODE_WRITEABLE) == 0) { - LIST_INSERT_HEAD(_wlist, uvn, u_wlist); - /* we are now on wlist! */ uvn->u_flags |= UVM_VNODE_WRITEABLE; + LIST_INSERT_HEAD(_wlist, uvn, u_wlist); } + rw_exit(uvn->u_obj.vmobjlock); return (>u_obj); } - rw_exit(uvn->u_obj.vmobjlock); /* * need to call VOP_GETATTR() to get the attributes, but that could @@ -189,6 +185,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a * it. */ uvn->u_flags = UVM_VNODE_ALOCK; + rw_exit(uvn->u_obj.vmobjlock); if (vp->v_type == VBLK) { /* @@ -213,9 +210,11 @@ uvn_attach(struct vnode *vp, vm_prot_t a } if (result != 0) { + rw_enter(uvn->u_obj.vmobjlock, RW_WRITE); if (uvn->u_flags & UVM_VNODE_WANTED) wakeup(uvn); uvn->u_flags = 0; + rw_exit(uvn->u_obj.vmobjlock); return NULL; } @@ -236,18 +235,19 @@ uvn_attach(struct vnode *vp, vm_prot_t a uvn->u_nio = 0; uvn->u_size = used_vnode_size; - /* if write access, we need to add it to the wlist */ - if (accessprot & PROT_WRITE) { - LIST_INSERT_HEAD(_wlist, uvn, u_wlist); - uvn->u_flags |= UVM_VNODE_WRITEABLE;/* we are on wlist! */ - } - /* * add a reference to the vnode. this reference will stay as long * as there is a valid mapping of the vnode. dropped when the * reference count goes to zero. */ vref(vp); + + /* if write access, we need to add it to the wlist */ + if (accessprot & PROT_WRITE) { + uvn->u_flags |= UVM_VNODE_WRITEABLE; + LIST_INSERT_HEAD(_wlist, uvn, u_wlist); + } + if (oldflags & UVM_VNODE_WANTED) wakeup(uvn); @@ -273,6 +273,7 @@ uvn_reference(struct uvm_object *uobj) struct uvm_vnode *uvn = (struct uvm_vnode *) uobj; #endif +
soreceive() with shared netlock for raw sockets
As it was done for udp and divert sockets. Index: sys/netinet/ip_var.h === RCS file: /cvs/src/sys/netinet/ip_var.h,v retrieving revision 1.104 diff -u -p -r1.104 ip_var.h --- sys/netinet/ip_var.h3 Sep 2022 22:43:38 - 1.104 +++ sys/netinet/ip_var.h10 Sep 2022 19:41:56 - @@ -258,6 +258,8 @@ int rip_output(struct mbuf *, struct so struct mbuf *); int rip_attach(struct socket *, int); int rip_detach(struct socket *); +voidrip_lock(struct socket *); +voidrip_unlock(struct socket *); int rip_bind(struct socket *so, struct mbuf *, struct proc *); int rip_connect(struct socket *, struct mbuf *); int rip_disconnect(struct socket *); Index: sys/netinet/raw_ip.c === RCS file: /cvs/src/sys/netinet/raw_ip.c,v retrieving revision 1.147 diff -u -p -r1.147 raw_ip.c --- sys/netinet/raw_ip.c3 Sep 2022 22:43:38 - 1.147 +++ sys/netinet/raw_ip.c10 Sep 2022 19:41:56 - @@ -106,6 +106,8 @@ struct inpcbtable rawcbtable; const struct pr_usrreqs rip_usrreqs = { .pru_attach = rip_attach, .pru_detach = rip_detach, + .pru_lock = rip_lock, + .pru_unlock = rip_unlock, .pru_bind = rip_bind, .pru_connect= rip_connect, .pru_disconnect = rip_disconnect, @@ -220,12 +222,19 @@ rip_input(struct mbuf **mp, int *offp, i else n = m_copym(m, 0, M_COPYALL, M_NOWAIT); if (n != NULL) { + int ret; + if (inp->inp_flags & INP_CONTROLOPTS || inp->inp_socket->so_options & SO_TIMESTAMP) ip_savecontrol(inp, , ip, n); - if (sbappendaddr(inp->inp_socket, + + mtx_enter(>inp_mtx); + ret = sbappendaddr(inp->inp_socket, >inp_socket->so_rcv, - sintosa(), n, opts) == 0) { + sintosa(), n, opts); + mtx_leave(>inp_mtx); + + if (ret == 0) { /* should notify about lost packet */ m_freem(n); m_freem(opts); @@ -498,6 +507,24 @@ rip_detach(struct socket *so) in_pcbdetach(inp); return (0); +} + +void +rip_lock(struct socket *so) +{ + struct inpcb *inp = sotoinpcb(so); + + NET_ASSERT_LOCKED(); + mtx_enter(>inp_mtx); +} + +void +rip_unlock(struct socket *so) +{ + struct inpcb *inp = sotoinpcb(so); + + NET_ASSERT_LOCKED(); + mtx_leave(>inp_mtx); } int Index: sys/netinet6/ip6_var.h === RCS file: /cvs/src/sys/netinet6/ip6_var.h,v retrieving revision 1.102 diff -u -p -r1.102 ip6_var.h --- sys/netinet6/ip6_var.h 3 Sep 2022 22:43:38 - 1.102 +++ sys/netinet6/ip6_var.h 10 Sep 2022 19:41:56 - @@ -353,6 +353,8 @@ int rip6_output(struct mbuf *, struct so struct mbuf *); intrip6_attach(struct socket *, int); intrip6_detach(struct socket *); +void rip6_lock(struct socket *); +void rip6_unlock(struct socket *); intrip6_bind(struct socket *, struct mbuf *, struct proc *); intrip6_connect(struct socket *, struct mbuf *); intrip6_disconnect(struct socket *); Index: sys/netinet6/raw_ip6.c === RCS file: /cvs/src/sys/netinet6/raw_ip6.c,v retrieving revision 1.168 diff -u -p -r1.168 raw_ip6.c --- sys/netinet6/raw_ip6.c 3 Sep 2022 22:43:38 - 1.168 +++ sys/netinet6/raw_ip6.c 10 Sep 2022 19:41:56 - @@ -108,6 +108,8 @@ struct cpumem *rip6counters; const struct pr_usrreqs rip6_usrreqs = { .pru_attach = rip6_attach, .pru_detach = rip6_detach, + .pru_lock = rip6_lock, + .pru_unlock = rip6_unlock, .pru_bind = rip6_bind, .pru_connect= rip6_connect, .pru_disconnect = rip6_disconnect, @@ -261,13 +263,20 @@ rip6_input(struct mbuf **mp, int *offp, else n = m_copym(m, 0, M_COPYALL, M_NOWAIT); if (n != NULL) { + int ret; + if (in6p->inp_flags & IN6P_CONTROLOPTS) ip6_savecontrol(in6p, n, ); /* strip intermediate headers */ m_adj(n, *offp); - if (sbappendaddr(in6p->inp_socket, + + mtx_enter(>inp_mtx); + ret = sbappendaddr(in6p->inp_socket, >inp_socket->so_rcv, - sin6tosa(), n, opts) == 0) { +
Change pru_rcvd() return type to the type of void
We have no interest on pru_rcvd() return value. Also, we call pru_rcvd() only if the socket's protocol have PR_WANTRCVD flag set. Such sockets are route domain, tcp(4) and unix(4) sockets. This diff keeps the PR_WANTRCVD check. In other hand we could always call pru_rcvd() and do "pru_rcvd != NULL" check within, but in the future with per buffer locking, we could have some re-locking around pru_rcvd() call and I want to do it outside wrapper. Index: sys/kern/uipc_usrreq.c === RCS file: /cvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.185 diff -u -p -r1.185 uipc_usrreq.c --- sys/kern/uipc_usrreq.c 3 Sep 2022 22:43:38 - 1.185 +++ sys/kern/uipc_usrreq.c 10 Sep 2022 18:51:42 - @@ -363,7 +363,7 @@ uipc_shutdown(struct socket *so) return (0); } -int +void uipc_rcvd(struct socket *so) { struct socket *so2; @@ -390,8 +390,6 @@ uipc_rcvd(struct socket *so) default: panic("uipc 2"); } - - return (0); } int Index: sys/net/rtsock.c === RCS file: /cvs/src/sys/net/rtsock.c,v retrieving revision 1.355 diff -u -p -r1.355 rtsock.c --- sys/net/rtsock.c8 Sep 2022 10:22:06 - 1.355 +++ sys/net/rtsock.c10 Sep 2022 18:51:42 - @@ -115,7 +115,7 @@ int route_attach(struct socket *, int); introute_detach(struct socket *); introute_disconnect(struct socket *); introute_shutdown(struct socket *); -introute_rcvd(struct socket *); +void route_rcvd(struct socket *); introute_send(struct socket *, struct mbuf *, struct mbuf *, struct mbuf *); introute_abort(struct socket *); @@ -299,7 +299,7 @@ route_shutdown(struct socket *so) return (0); } -int +void route_rcvd(struct socket *so) { struct rtpcb *rop = sotortpcb(so); @@ -314,8 +314,6 @@ route_rcvd(struct socket *so) ((sbspace(rop->rop_socket, >rop_socket->so_rcv) == rop->rop_socket->so_rcv.sb_hiwat))) rop->rop_flags &= ~ROUTECB_FLAG_FLUSH; - - return (0); } int Index: sys/netinet/tcp_usrreq.c === RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.207 diff -u -p -r1.207 tcp_usrreq.c --- sys/netinet/tcp_usrreq.c3 Sep 2022 22:43:38 - 1.207 +++ sys/netinet/tcp_usrreq.c10 Sep 2022 18:51:42 - @@ -792,18 +792,17 @@ out: /* * After a receive, possibly send window update to peer. */ -int +void tcp_rcvd(struct socket *so) { struct inpcb *inp; struct tcpcb *tp; - int error; short ostate; soassertlocked(so); - if ((error = tcp_sogetpcb(so, , ))) - return (error); + if (tcp_sogetpcb(so, , )) + return; if (so->so_options & SO_DEBUG) ostate = tp->t_state; @@ -820,7 +819,6 @@ tcp_rcvd(struct socket *so) if (so->so_options & SO_DEBUG) tcp_trace(TA_USER, ostate, tp, tp, NULL, PRU_RCVD, 0); - return (0); } /* Index: sys/netinet/tcp_var.h === RCS file: /cvs/src/sys/netinet/tcp_var.h,v retrieving revision 1.157 diff -u -p -r1.157 tcp_var.h --- sys/netinet/tcp_var.h 3 Sep 2022 22:43:38 - 1.157 +++ sys/netinet/tcp_var.h 10 Sep 2022 18:51:42 - @@ -725,7 +725,7 @@ int tcp_connect(struct socket *, struct int tcp_accept(struct socket *, struct mbuf *); int tcp_disconnect(struct socket *); int tcp_shutdown(struct socket *); -int tcp_rcvd(struct socket *); +voidtcp_rcvd(struct socket *); int tcp_send(struct socket *, struct mbuf *, struct mbuf *, struct mbuf *); int tcp_abort(struct socket *); Index: sys/sys/protosw.h === RCS file: /cvs/src/sys/sys/protosw.h,v retrieving revision 1.55 diff -u -p -r1.55 protosw.h --- sys/sys/protosw.h 5 Sep 2022 14:56:09 - 1.55 +++ sys/sys/protosw.h 10 Sep 2022 18:51:42 - @@ -72,7 +72,7 @@ struct pr_usrreqs { int (*pru_accept)(struct socket *, struct mbuf *); int (*pru_disconnect)(struct socket *); int (*pru_shutdown)(struct socket *); - int (*pru_rcvd)(struct socket *); + void(*pru_rcvd)(struct socket *); int (*pru_send)(struct socket *, struct mbuf *, struct mbuf *, struct mbuf *); int (*pru_abort)(struct socket *); @@ -336,12 +336,10 @@ pru_shutdown(struct socket *so) return (*so->so_proto->pr_usrreqs->pru_shutdown)(so); } -static inline int +static inline void pru_rcvd(struct socket *so) { - if (so->so_proto->pr_usrreqs->pru_rcvd) - return (*so->so_proto->pr_usrreqs->pru_rcvd)(so); - return (EOPNOTSUPP); +
Re: immutable userland mappings
Theo de Raadt wrote: > Theo de Raadt wrote: > > > Theo de Raadt wrote: > > > > > In this version of the diff, the kernel manages to mark immutable most of > > > the main binary, and in the shared-binary case, also most of ld.so. But > > > it > > > cannot mark all of the ELF mapping -- because of two remaining problems > > > (RELRO > > > in .data, and the malloc.c self-protected bookkeeping page in .bss). I am > > > looking into various solutions for both of those. > > Yet another version of the diff as I incrementally get it working better. > Call it version 22.. > > Some things of note: > > 1. Some linkers appear to be creating non-aligned relro sections, which >has security implications as they cannot be mprotected correctly. I >am happy this work has exposed the problem as severe. I have a >workaround in ld.so for now that allows these cases to work, and >later on we can perhaps add a warning to ld.so to identify these >linkers and get them fixed. > > 2. But the relro is still not handled perfectly, and I hope someone else's >eyes can compare addresses and spot what's wrong. > > 3. ld.so has to cut the list of mutable mappings from the immutable mappings, >before applying them (late, to satisfy 1).This is in > _dl_apply_mutable(). >After redoing this a couple of times, I am still not proud of it. > > 4. uvm_unmap_remove() must walk the entries in the region twice. It cannot >do unmapping work until it knows the region is completely muteable. This >might turn into a performance issue. > > 5. binutils ld support completely untested, I mainly went in there to fix >objdump and readelf. > > 6. It would be nice to hear of a pkg that actually has a problem with this >change. I haven't found any yet but don't run many myself. > > If anyone wants to debug issues, uncomment the // _dl_printf's in ld.so, > and expect a lot of noise. Then do something like "ktrace -di program > >& log", and generate a kdump seperately. It also helps if you test > with programs that don't exit, so you can procmap -a -p $pid. In the > kdump output, it is important to look for "mprotect -1", because that > provides evidence of the worst (silent) problems... Oops, 1 line error in the diff, try this instead. And to point 1 above, notice that a rare page of shared library mapping is not immutable, right on the edge of the relro... Index: gnu/llvm/lld/ELF/ScriptParser.cpp === RCS file: /cvs/src/gnu/llvm/lld/ELF/ScriptParser.cpp,v retrieving revision 1.1.1.4 diff -u -p -u -r1.1.1.4 ScriptParser.cpp --- gnu/llvm/lld/ELF/ScriptParser.cpp 17 Dec 2021 12:25:02 - 1.1.1.4 +++ gnu/llvm/lld/ELF/ScriptParser.cpp 2 Sep 2022 15:23:20 - @@ -1478,6 +1478,7 @@ unsigned ScriptParser::readPhdrType() { .Case("PT_GNU_EH_FRAME", PT_GNU_EH_FRAME) .Case("PT_GNU_STACK", PT_GNU_STACK) .Case("PT_GNU_RELRO", PT_GNU_RELRO) + .Case("PT_OPENBSD_MUTABLE", PT_OPENBSD_MUTABLE) .Case("PT_OPENBSD_RANDOMIZE", PT_OPENBSD_RANDOMIZE) .Case("PT_OPENBSD_WXNEEDED", PT_OPENBSD_WXNEEDED) .Case("PT_OPENBSD_BOOTDATA", PT_OPENBSD_BOOTDATA) Index: gnu/llvm/lld/ELF/Writer.cpp === RCS file: /cvs/src/gnu/llvm/lld/ELF/Writer.cpp,v retrieving revision 1.3 diff -u -p -u -r1.3 Writer.cpp --- gnu/llvm/lld/ELF/Writer.cpp 17 Dec 2021 14:46:47 - 1.3 +++ gnu/llvm/lld/ELF/Writer.cpp 2 Sep 2022 21:53:22 - @@ -146,7 +146,7 @@ StringRef elf::getOutputSectionName(cons {".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.", ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", ".tbss.", ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab.", -".openbsd.randomdata."}) +".openbsd.randomdata.", ".openbsd.mutable." }) if (isSectionPrefix(v, s->name)) return v.drop_back(); @@ -2469,6 +2469,12 @@ std::vector Writer::c part.ehFrame->getParent() && part.ehFrameHdr->getParent()) addHdr(PT_GNU_EH_FRAME, part.ehFrameHdr->getParent()->getPhdrFlags()) ->add(part.ehFrameHdr->getParent()); + + // PT_OPENBSD_MUTABLE is an OpenBSD-specific feature. That makes + // the dynamic linker fill the segment with zero data, like bss, but + // it can be treated differently. + if (OutputSection *cmd = findSection(".openbsd.mutable", partNo)) +addHdr(PT_OPENBSD_MUTABLE, cmd->getPhdrFlags())->add(cmd); // PT_OPENBSD_RANDOMIZE is an OpenBSD-specific feature. That makes // the dynamic linker fill the segment with random data. Index: gnu/usr.bin/binutils/bfd/elf.c === RCS file: /cvs/src/gnu/usr.bin/binutils/bfd/elf.c,v retrieving revision 1.23 diff -u -p -u -r1.23
Re: immutable userland mappings
Theo de Raadt wrote: > Theo de Raadt wrote: > > > In this version of the diff, the kernel manages to mark immutable most of > > the main binary, and in the shared-binary case, also most of ld.so. But it > > cannot mark all of the ELF mapping -- because of two remaining problems > > (RELRO > > in .data, and the malloc.c self-protected bookkeeping page in .bss). I am > > looking into various solutions for both of those. Yet another version of the diff as I incrementally get it working better. Call it version 22.. Some things of note: 1. Some linkers appear to be creating non-aligned relro sections, which has security implications as they cannot be mprotected correctly. I am happy this work has exposed the problem as severe. I have a workaround in ld.so for now that allows these cases to work, and later on we can perhaps add a warning to ld.so to identify these linkers and get them fixed. 2. But the relro is still not handled perfectly, and I hope someone else's eyes can compare addresses and spot what's wrong. 3. ld.so has to cut the list of mutable mappings from the immutable mappings, before applying them (late, to satisfy 1).This is in _dl_apply_mutable(). After redoing this a couple of times, I am still not proud of it. 4. uvm_unmap_remove() must walk the entries in the region twice. It cannot do unmapping work until it knows the region is completely muteable. This might turn into a performance issue. 5. binutils ld support completely untested, I mainly went in there to fix objdump and readelf. 6. It would be nice to hear of a pkg that actually has a problem with this change. I haven't found any yet but don't run many myself. If anyone wants to debug issues, uncomment the // _dl_printf's in ld.so, and expect a lot of noise. Then do something like "ktrace -di program >& log", and generate a kdump seperately. It also helps if you test with programs that don't exit, so you can procmap -a -p $pid. In the kdump output, it is important to look for "mprotect -1", because that provides evidence of the worst (silent) problems... Index: gnu/llvm/lld/ELF/ScriptParser.cpp === RCS file: /cvs/src/gnu/llvm/lld/ELF/ScriptParser.cpp,v retrieving revision 1.1.1.4 diff -u -p -u -r1.1.1.4 ScriptParser.cpp --- gnu/llvm/lld/ELF/ScriptParser.cpp 17 Dec 2021 12:25:02 - 1.1.1.4 +++ gnu/llvm/lld/ELF/ScriptParser.cpp 2 Sep 2022 15:23:20 - @@ -1478,6 +1478,7 @@ unsigned ScriptParser::readPhdrType() { .Case("PT_GNU_EH_FRAME", PT_GNU_EH_FRAME) .Case("PT_GNU_STACK", PT_GNU_STACK) .Case("PT_GNU_RELRO", PT_GNU_RELRO) + .Case("PT_OPENBSD_MUTABLE", PT_OPENBSD_MUTABLE) .Case("PT_OPENBSD_RANDOMIZE", PT_OPENBSD_RANDOMIZE) .Case("PT_OPENBSD_WXNEEDED", PT_OPENBSD_WXNEEDED) .Case("PT_OPENBSD_BOOTDATA", PT_OPENBSD_BOOTDATA) Index: gnu/llvm/lld/ELF/Writer.cpp === RCS file: /cvs/src/gnu/llvm/lld/ELF/Writer.cpp,v retrieving revision 1.3 diff -u -p -u -r1.3 Writer.cpp --- gnu/llvm/lld/ELF/Writer.cpp 17 Dec 2021 14:46:47 - 1.3 +++ gnu/llvm/lld/ELF/Writer.cpp 2 Sep 2022 21:53:22 - @@ -146,7 +146,7 @@ StringRef elf::getOutputSectionName(cons {".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.", ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", ".tbss.", ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab.", -".openbsd.randomdata."}) +".openbsd.randomdata.", ".openbsd.mutable." }) if (isSectionPrefix(v, s->name)) return v.drop_back(); @@ -2469,6 +2469,12 @@ std::vector Writer::c part.ehFrame->getParent() && part.ehFrameHdr->getParent()) addHdr(PT_GNU_EH_FRAME, part.ehFrameHdr->getParent()->getPhdrFlags()) ->add(part.ehFrameHdr->getParent()); + + // PT_OPENBSD_MUTABLE is an OpenBSD-specific feature. That makes + // the dynamic linker fill the segment with zero data, like bss, but + // it can be treated differently. + if (OutputSection *cmd = findSection(".openbsd.mutable", partNo)) +addHdr(PT_OPENBSD_MUTABLE, cmd->getPhdrFlags())->add(cmd); // PT_OPENBSD_RANDOMIZE is an OpenBSD-specific feature. That makes // the dynamic linker fill the segment with random data. Index: gnu/usr.bin/binutils/bfd/elf.c === RCS file: /cvs/src/gnu/usr.bin/binutils/bfd/elf.c,v retrieving revision 1.23 diff -u -p -u -r1.23 elf.c --- gnu/usr.bin/binutils/bfd/elf.c 13 Jan 2015 20:05:01 - 1.23 +++ gnu/usr.bin/binutils/bfd/elf.c 10 Sep 2022 07:06:59 - @@ -969,6 +969,7 @@ _bfd_elf_print_private_bfd_data (bfd *ab case PT_GNU_EH_FRAME: pt = "EH_FRAME"; break; case PT_GNU_STACK: pt
Re: Unmap page in uvm_anon_release()
On 10/09/22(Sat) 15:12, Mark Kettenis wrote: > > Date: Sat, 10 Sep 2022 14:18:02 +0200 > > From: Martin Pieuchot > > > > Diff below fixes a bug exposed when swapping on arm64. When an anon is > > released make sure the all the pmap references to the related page are > > removed. > > I'm a little bit puzzled by this. So these pages are still mapped > even though there are no references to the anon anymore? I don't know. I just realised that all the code paths leading to uvm_pagefree() get rid of the pmap references by calling page_protect() except a couple of them in the aiodone daemon and the clustering code in the pager. This can't hurt and make the existing code coherent. Maybe it just hides the bug, I don't know.
Re: EVFILT_TIMER add support for different timer precisions NOTE_{,U,N,M}SECONDS
On Wed, Aug 31, 2022 at 04:48:37PM -0400, aisha wrote: > I've added a patch which adds support for NOTE_{,U,M,N}SECONDS for > EVFILT_TIMER in the kqueue interface. It sort of makes sense to add an option to specify timeouts in sub-millisecond precision. It feels complete overengineering to add multiple time units on the level of the kernel interface. However, it looks that FreeBSD and NetBSD have already done this following macOS' lead... > I've also added the NOTE_ABSTIME but haven't done any actual implementation > there as I am not sure how the `data` field should be interpreted (is it > absolute time in seconds since epoch?). I think FreeBSD and NetBSD take NOTE_ABSTIME as time since the epoch. Below is a revised patch that takes into account some corner cases. It tries to be API-compatible with FreeBSD and NetBSD. I have adjusted the NOTE_{,M,U,N}SECONDS flags so that they are enum-like. The manual page bits are from NetBSD. It is quite late to introduce a feature like this within this release cycle. Until now, the timer code has ignored the fflags field. There might be pieces of software that are careless with struct kevent and that would break as a result of this patch. Programs that are widely used on different BSDs are probably fine already, though. Index: lib/libc/sys/kqueue.2 === RCS file: src/lib/libc/sys/kqueue.2,v retrieving revision 1.46 diff -u -p -r1.46 kqueue.2 --- lib/libc/sys/kqueue.2 31 Mar 2022 17:27:16 - 1.46 +++ lib/libc/sys/kqueue.2 10 Sep 2022 13:01:36 - @@ -457,17 +457,71 @@ Establishes an arbitrary timer identifie .Fa ident . When adding a timer, .Fa data -specifies the timeout period in milliseconds. -The timer will be periodic unless +specifies the timeout period in units described below, or, if +.Dv NOTE_ABSTIME +is set in +.Va fflags , +specifies the absolute time at which the timer should fire. +The timer will repeat unless .Dv EV_ONESHOT -is specified. +is set in +.Va flags +or +.Dv NOTE_ABSTIME +is set in +.Va fflags . On return, .Fa data contains the number of times the timeout has expired since the last call to .Fn kevent . -This filter automatically sets the +This filter automatically sets .Dv EV_CLEAR -flag internally. +in +.Va flags +for periodic timers. +Timers created with +.Dv NOTE_ABSTIME +remain activated on the kqueue once the absolute time has passed unless +.Dv EV_CLEAR +or +.Dv EV_ONESHOT +are also specified. +.Pp +The filter accepts the following flags in the +.Va fflags +argument: +.Bl -tag -width NOTE_MSECONDS +.It Dv NOTE_SECONDS +The timer value in +.Va data +is expressed in seconds. +.It Dv NOTE_MSECONDS +The timer value in +.Va data +is expressed in milliseconds. +.It Dv NOTE_USECONDS +The timer value in +.Va data +is expressed in microseconds. +.It Dv NOTE_NSECONDS +The timer value in +.Va data +is expressed in nanoseconds. +.It Dv NOTE_ABSTIME +The timer value is an absolute time with +.Dv CLOCK_REALTIME +as the reference clock. +.El +.Pp +Note that +.Dv NOTE_SECONDS , +.Dv NOTE_MSECONDS , +.Dv NOTE_USECONDS , +and +.Dv NOTE_NSECONDS +are mutually exclusive; behavior is undefined if more than one are specified. +If a timer value unit is not specified, the default is +.Dv NOTE_MSECONDS . .Pp If an existing timer is re-added, the existing timer and related pending events will be cancelled. @@ -557,6 +611,7 @@ No memory was available to register the The specified process to attach to does not exist. .El .Sh SEE ALSO +.Xr clock_gettime 2 , .Xr poll 2 , .Xr read 2 , .Xr select 2 , Index: regress/sys/kern/kqueue/kqueue-timer.c === RCS file: src/regress/sys/kern/kqueue/kqueue-timer.c,v retrieving revision 1.4 diff -u -p -r1.4 kqueue-timer.c --- regress/sys/kern/kqueue/kqueue-timer.c 12 Jun 2021 13:30:14 - 1.4 +++ regress/sys/kern/kqueue/kqueue-timer.c 10 Sep 2022 13:01:37 - @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -31,9 +32,13 @@ int do_timer(void) { - int kq, n; + static const int units[] = { + NOTE_SECONDS, NOTE_MSECONDS, NOTE_USECONDS, NOTE_NSECONDS + }; struct kevent ev; - struct timespec ts; + struct timespec ts, start, end, now; + int64_t usecs; + int i, kq, n; ASS((kq = kqueue()) >= 0, warn("kqueue")); @@ -68,6 +73,125 @@ do_timer(void) n = kevent(kq, NULL, 0, , 1, ); ASSX(n == 1); + /* Test with different time units */ + + for (i = 0; i < sizeof(units) / sizeof(units[0]); i++) { + memset(, 0, sizeof(ev)); + ev.filter = EVFILT_TIMER; + ev.flags = EV_ADD | EV_ENABLE; + ev.fflags = units[i]; + ev.data = 1; + + n = kevent(kq, , 1, NULL, 0, NULL); + ASSX(n != -1); + +
Re: Unmap page in uvm_anon_release()
> Date: Sat, 10 Sep 2022 14:18:02 +0200 > From: Martin Pieuchot > > Diff below fixes a bug exposed when swapping on arm64. When an anon is > released make sure the all the pmap references to the related page are > removed. I'm a little bit puzzled by this. So these pages are still mapped even though there are no references to the anon anymore? > We could move the pmap_page_protect(pg, PROT_NONE) inside uvm_pagefree() > to avoid future issue but that's for a later refactoring. I don't think that makes sense. In cases where pages are explicitly mapped (instead of faulted in) we call pmap_remove() before we end up calling uvm_pagefree(). Calling pmap_page_protect() in that case doesn't make sense. > With this diff I can no longer reproduce the SIGBUS issue on the > rockpro64 and swapping is stable as long as I/O from sdmmc(4) work. > > This should be good enough to commit the diff that got reverted, but I'll > wait to be sure there's no regression. > > ok? > > Index: uvm/uvm_anon.c > === > RCS file: /cvs/src/sys/uvm/uvm_anon.c,v > retrieving revision 1.54 > diff -u -p -r1.54 uvm_anon.c > --- uvm/uvm_anon.c26 Mar 2021 13:40:05 - 1.54 > +++ uvm/uvm_anon.c10 Sep 2022 12:10:34 - > @@ -255,6 +255,7 @@ uvm_anon_release(struct vm_anon *anon) > KASSERT(anon->an_ref == 0); > > uvm_lock_pageq(); > + pmap_page_protect(pg, PROT_NONE); > uvm_pagefree(pg); > uvm_unlock_pageq(); > KASSERT(anon->an_page == NULL); > Index: uvm/uvm_fault.c > === > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > retrieving revision 1.132 > diff -u -p -r1.132 uvm_fault.c > --- uvm/uvm_fault.c 31 Aug 2022 01:27:04 - 1.132 > +++ uvm/uvm_fault.c 10 Sep 2022 12:10:34 - > @@ -396,7 +396,6 @@ uvmfault_anonget(struct uvm_faultinfo *u >* anon and try again. >*/ > if (pg->pg_flags & PG_RELEASED) { > - pmap_page_protect(pg, PROT_NONE); > KASSERT(anon->an_ref == 0); > /* >* Released while we had unlocked amap. > >
Re: [RFC] Adding ESRT and EFI variables for fwupd
Hi Mark, Any news? I since setup gdb debugging of OVMF and figured out my EFI RT issues and it returns data fine now (was wrong calling ABI), but I'm not making /dev/efi as it sounds like you've done it already. Where are you at with this? Can I help to move this forward? Cheers, Sergii On Tue, Aug 23, 2022 at 07:52:42PM +0300, Sergii Dmytruk wrote: > Hi Mark, > > > I have done some work on adding support for EFI runtime services on > > OpenBSD/amd64, based on the code for OpenBSD/arm64. My plan was to > > implement an ioctl(2) interface that is compatible with FreeBSD's > > interface. Theo objected to putting EFI-related headers > > in /usr/include/sys, so the EFI-related headers will probably end up > > in /usr/include/dev/efi (so you'd be include > > instead). > > Great to hear that you went with FreeBSD's API. It's a natural choice > for DragonBSD, and NetBSD chose compatible API on AArch64, so it sounds > like all BSDs will have an almost identical API. Having header in a > different location is a minor thing. > > I too was trying to make EFI RT work in [1], it doesn't crash the > kernel by now, but GetTime() doesn't return proper data either. > > [1]: > https://github.com/3mdeb/openbsd-src/compare/esrt...3mdeb:openbsd-src:efi-vars > > > I hope to have some time to make progress on this next week, so let me > > come back to you then. > > Is the code available anywhere? > > By the way, do you plan to have something like `libefivar` provided as > part of OpenBSD? > > Cheers, > Sergii
Unmap page in uvm_anon_release()
Diff below fixes a bug exposed when swapping on arm64. When an anon is released make sure the all the pmap references to the related page are removed. We could move the pmap_page_protect(pg, PROT_NONE) inside uvm_pagefree() to avoid future issue but that's for a later refactoring. With this diff I can no longer reproduce the SIGBUS issue on the rockpro64 and swapping is stable as long as I/O from sdmmc(4) work. This should be good enough to commit the diff that got reverted, but I'll wait to be sure there's no regression. ok? Index: uvm/uvm_anon.c === RCS file: /cvs/src/sys/uvm/uvm_anon.c,v retrieving revision 1.54 diff -u -p -r1.54 uvm_anon.c --- uvm/uvm_anon.c 26 Mar 2021 13:40:05 - 1.54 +++ uvm/uvm_anon.c 10 Sep 2022 12:10:34 - @@ -255,6 +255,7 @@ uvm_anon_release(struct vm_anon *anon) KASSERT(anon->an_ref == 0); uvm_lock_pageq(); + pmap_page_protect(pg, PROT_NONE); uvm_pagefree(pg); uvm_unlock_pageq(); KASSERT(anon->an_page == NULL); Index: uvm/uvm_fault.c === RCS file: /cvs/src/sys/uvm/uvm_fault.c,v retrieving revision 1.132 diff -u -p -r1.132 uvm_fault.c --- uvm/uvm_fault.c 31 Aug 2022 01:27:04 - 1.132 +++ uvm/uvm_fault.c 10 Sep 2022 12:10:34 - @@ -396,7 +396,6 @@ uvmfault_anonget(struct uvm_faultinfo *u * anon and try again. */ if (pg->pg_flags & PG_RELEASED) { - pmap_page_protect(pg, PROT_NONE); KASSERT(anon->an_ref == 0); /* * Released while we had unlocked amap.
Re: bgpd optimize bgpctl show rib 10/8 or-longer
On Fri, Sep 09, 2022 at 07:07:14PM +0200, Theo Buehler wrote: > On Fri, Sep 09, 2022 at 05:50:17PM +0200, Claudio Jeker wrote: > > This diff optimized subtree walks. In other words it specifies a subtree > > (as a prefix/prefixlen combo) and only walks the entries that are under > > this covering route. > > > > Instead of doing a full table walk this will only walk part of the tree > > and is therefor much faster if the subtree is small. > > The diff looks good. The two new dump_subtree() functions are currently > only called with a count of CTL_MSG_HIGH_MARK, so the two > > if (count == 0) > prefix_dump_r(ctx) > > are currently dead code. I assume you anticipate that this might change. Yes and dump_subtree() is an extension of dump_new() and I want the two to behave the same. These are generic apis that can be used in various places. It would be nice to drop the sync traversals in the long run. A sync traversal on a big RIB just takes a long time and locks up any other update. Maybe one day I figure out how to replace the last few ones. -- :wq Claudio
pmap_collect and the page daemon
When the kernel is low on memory, the pagedaemon thread will try various strategies to free memory. One of those is to ask the pmap layer to free some memory. This is done in uvm_swapout_threads(), which is roughly a wrapper around the invocation of pmap_collect() on behalf of all processes. However, most pmap layers do not implement pmap_collect() and only provide a stub which does nothing. It doesn't make much sense to iterate over the process list, only to invoke a function which does absolutely nothing. The following diff makes pmap_collect() an optional interface, with pmaps implementing it defining __HAVE_PMAP_COLLECT. This feature macro is used to completely omit uvm_swapout_threads() when pmap_collect() is not available. Index: arch/alpha/include/pmap.h === RCS file: /OpenBSD/src/sys/arch/alpha/include/pmap.h,v retrieving revision 1.40 diff -u -p -r1.40 pmap.h --- arch/alpha/include/pmap.h 20 Apr 2016 05:24:18 - 1.40 +++ arch/alpha/include/pmap.h 10 Sep 2022 08:00:10 - @@ -197,6 +197,8 @@ extern pt_entry_t *VPT;/* Virtual Page paddr_t vtophys(vaddr_t); +#define__HAVE_PMAP_COLLECT + /* Machine-specific functions. */ void pmap_bootstrap(paddr_t ptaddr, u_int maxasn, u_long ncpuids); intpmap_emulate_reference(struct proc *p, vaddr_t v, int user, int type); Index: arch/amd64/amd64/pmap.c === RCS file: /OpenBSD/src/sys/arch/amd64/amd64/pmap.c,v retrieving revision 1.153 diff -u -p -r1.153 pmap.c --- arch/amd64/amd64/pmap.c 30 Jun 2022 13:51:24 - 1.153 +++ arch/amd64/amd64/pmap.c 10 Sep 2022 08:00:10 - @@ -2206,6 +2206,7 @@ pmap_unwire(struct pmap *pmap, vaddr_t v #endif } +#if 0 /* * pmap_collect: free resources held by a pmap * @@ -2221,10 +,10 @@ pmap_collect(struct pmap *pmap) * for its entire address space. */ -/* pmap_do_remove(pmap, VM_MIN_ADDRESS, VM_MAX_ADDRESS, + pmap_do_remove(pmap, VM_MIN_ADDRESS, VM_MAX_ADDRESS, PMAP_REMOVE_SKIPWIRED); -*/ } +#endif /* * pmap_copy: copy mappings from one pmap to another Index: arch/arm/arm/pmap7.c === RCS file: /OpenBSD/src/sys/arch/arm/arm/pmap7.c,v retrieving revision 1.63 diff -u -p -r1.63 pmap7.c --- arch/arm/arm/pmap7.c21 Feb 2022 19:15:58 - 1.63 +++ arch/arm/arm/pmap7.c10 Sep 2022 08:00:10 - @@ -1743,21 +1743,6 @@ dab_access(trapframe_t *tf, u_int fsr, u } /* - * pmap_collect: free resources held by a pmap - * - * => optional function. - * => called when a process is swapped out to free memory. - */ -void -pmap_collect(pmap_t pm) -{ - /* -* Nothing to do. -* We don't even need to free-up the process' L1. -*/ -} - -/* * Routine:pmap_proc_iflush * * Function: Index: arch/arm64/arm64/pmap.c === RCS file: /OpenBSD/src/sys/arch/arm64/arm64/pmap.c,v retrieving revision 1.84 diff -u -p -r1.84 pmap.c --- arch/arm64/arm64/pmap.c 10 Jan 2022 09:20:27 - 1.84 +++ arch/arm64/arm64/pmap.c 10 Sep 2022 08:00:10 - @@ -856,24 +856,6 @@ pmap_fill_pte(pmap_t pm, vaddr_t va, pad } /* - * Garbage collects the physical map system for pages which are - * no longer used. Success need not be guaranteed -- that is, there - * may well be pages which are not referenced, but others may be collected - * Called by the pageout daemon when pages are scarce. - */ -void -pmap_collect(pmap_t pm) -{ - /* This could return unused v->p table layers which -* are empty. -* could malicious programs allocate memory and eat -* these wired pages? These are allocated via pool. -* Are there pool functions which could be called -* to lower the pool usage here? -*/ -} - -/* * Fill the given physical page with zeros. */ void Index: arch/hppa/hppa/pmap.c === RCS file: /OpenBSD/src/sys/arch/hppa/hppa/pmap.c,v retrieving revision 1.177 diff -u -p -r1.177 pmap.c --- arch/hppa/hppa/pmap.c 14 Sep 2021 16:16:51 - 1.177 +++ arch/hppa/hppa/pmap.c 10 Sep 2022 08:00:10 - @@ -734,13 +734,6 @@ pmap_reference(struct pmap *pmap) atomic_inc_int(>pm_obj.uo_refs); } -void -pmap_collect(struct pmap *pmap) -{ - DPRINTF(PDB_FOLLOW|PDB_PMAP, ("pmap_collect(%p)\n", pmap)); - /* nothing yet */ -} - int pmap_enter(struct pmap *pmap, vaddr_t va, paddr_t pa, vm_prot_t prot, int flags) { Index: arch/hppa/include/param.h === RCS file: /OpenBSD/src/sys/arch/hppa/include/param.h,v retrieving revision 1.47 diff -u -p -r1.47 param.h --- arch/hppa/include/param.h 14 Sep 2018 13:58:20 - 1.47 +++
Re: strtonum.3: Use the proper macro for "long long"
On Fri, Sep 09, 2022 at 08:06:32PM -0400, Josiah Frentsos wrote: > Index: strtonum.3 > === > RCS file: /cvs/src/lib/libc/stdlib/strtonum.3,v > retrieving revision 1.18 > diff -u -p -r1.18 strtonum.3 > --- strtonum.37 Feb 2016 20:50:24 - 1.18 > +++ strtonum.310 Sep 2022 00:04:29 - > @@ -35,7 +35,7 @@ The > function converts the string in > .Fa nptr > to a > -.Li long long > +.Vt long long > value. > The > .Fn strtonum > @@ -56,7 +56,7 @@ or > sign. > .Pp > The remainder of the string is converted to a > -.Li long long > +.Vt long long > value according to base 10. > .Pp > The value obtained is then checked against the provided > hi. i fear this is either incomplete or unwanted: $ cd /usr/src/lib/libc/stdlib $ grep ^.Li *.3 atof.3:.Li double atoi.3:.Li integer atol.3:.Li long integer atoll.3:.Li long long integer div.3:.Li int getopt_long.3:.Li struct option imaxdiv.3:.Li imaxdiv_t imaxdiv.3:.Li intmax_t insque.3:.Li insque insque.3:.Li remque ldiv.3:.Li ldiv_t ldiv.3:.Li long integer lldiv.3:.Li lldiv_t lldiv.3:.Li long long integer strtod.3:.Li double strtod.3:.Li float strtod.3:.Li long double strtonum.3:.Li long long strtonum.3:.Li long long so your fix might be correct (i'm never 100% sure on the various code mark ups) but it doesn;t address the bigger picture. the example above is only for libc/stdlib. maybe ingo has an opinion on whether this needs fixed everywhere or not? jmc