Re: in pcb table mutex
Regress tested with witness, rebased to current, anyone? On Wed, Mar 02, 2022 at 07:13:09PM +0100, Alexander Bluhm wrote: > Hi, > > pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. So > to run pf in parallel we must make parts of the stack MP safe. > Protect the list and hashes in the PCB tables with a mutex. > > Note that the protocol notify functions may call pf via tcp_output(). > As the pf lock is a sleeping rw_lock, we must not hold a mutex. To > solve this for now, I collect these PCBs in inp_notify list and > protect it with the exclusive netlock. > > ok? > > bluhm Index: kern/kern_sysctl.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.399 diff -u -p -r1.399 kern_sysctl.c --- kern/kern_sysctl.c 25 Jan 2022 04:04:40 - 1.399 +++ kern/kern_sysctl.c 11 Mar 2022 15:07:13 - @@ -1366,16 +1366,24 @@ sysctl_file(int *name, u_int namelen, ch struct inpcb *inp; NET_LOCK(); + mtx_enter(_mtx); TAILQ_FOREACH(inp, _queue, inp_queue) FILLSO(inp->inp_socket); + mtx_leave(_mtx); + mtx_enter(_mtx); TAILQ_FOREACH(inp, _queue, inp_queue) FILLSO(inp->inp_socket); + mtx_leave(_mtx); + mtx_enter(_mtx); TAILQ_FOREACH(inp, _queue, inp_queue) FILLSO(inp->inp_socket); + mtx_leave(_mtx); #ifdef INET6 + mtx_enter(_mtx); TAILQ_FOREACH(inp, _queue, inp_queue) FILLSO(inp->inp_socket); + mtx_leave(_mtx); #endif NET_UNLOCK(); } Index: netinet/in_pcb.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/in_pcb.c,v retrieving revision 1.259 diff -u -p -r1.259 in_pcb.c --- netinet/in_pcb.c4 Mar 2022 20:35:10 - 1.259 +++ netinet/in_pcb.c11 Mar 2022 15:07:13 - @@ -120,7 +120,8 @@ struct baddynamicports baddynamicports; struct baddynamicports rootonlyports; struct pool inpcb_pool; -int in_pcbresize (struct inpcbtable *, int); +void in_pcbrehash_locked(struct inpcb *); +intin_pcbresize(struct inpcbtable *, int); #defineINPCBHASH_LOADFACTOR(_x)(((_x) * 3) / 4) @@ -173,7 +174,7 @@ in_pcblhash(struct inpcbtable *table, in void in_pcbinit(struct inpcbtable *table, int hashsize) { - + mtx_init(>inpt_mtx, IPL_SOFTNET); TAILQ_INIT(>inpt_queue); table->inpt_hashtbl = hashinit(hashsize, M_PCB, M_WAITOK, >inpt_mask); @@ -252,6 +253,7 @@ in_pcballoc(struct socket *so, struct in inp->inp_cksum6 = -1; #endif /* INET6 */ + mtx_enter(>inpt_mtx); if (table->inpt_count++ > INPCBHASH_LOADFACTOR(table->inpt_size)) (void)in_pcbresize(table, table->inpt_size * 2); TAILQ_INSERT_HEAD(>inpt_queue, inp, inp_queue); @@ -268,6 +270,8 @@ in_pcballoc(struct socket *so, struct in >inp_faddr, inp->inp_fport, >inp_laddr, inp->inp_lport); LIST_INSERT_HEAD(head, inp, inp_hash); + mtx_leave(>inpt_mtx); + so->so_pcb = inp; return (0); @@ -556,6 +560,7 @@ void in_pcbdetach(struct inpcb *inp) { struct socket *so = inp->inp_socket; + struct inpcbtable *table = inp->inp_table; NET_ASSERT_LOCKED(); @@ -585,10 +590,13 @@ in_pcbdetach(struct inpcb *inp) pf_inp_unlink(inp); } #endif + mtx_enter(>inpt_mtx); LIST_REMOVE(inp, inp_lhash); LIST_REMOVE(inp, inp_hash); - TAILQ_REMOVE(>inp_table->inpt_queue, inp, inp_queue); - inp->inp_table->inpt_count--; + TAILQ_REMOVE(>inpt_queue, inp, inp_queue); + table->inpt_count--; + mtx_leave(>inpt_mtx); + in_pcbunref(inp); } @@ -661,20 +669,25 @@ void in_pcbnotifyall(struct inpcbtable *table, struct sockaddr *dst, u_int rtable, int errno, void (*notify)(struct inpcb *, int)) { - struct inpcb *inp, *ninp; + SIMPLEQ_HEAD(, inpcb) inpcblist; + struct inpcb *inp; struct in_addr faddr; u_int rdomain; - NET_ASSERT_LOCKED(); + NET_ASSERT_WLOCKED(); if (dst->sa_family != AF_INET) return; faddr = satosin(dst)->sin_addr; if (faddr.s_addr == INADDR_ANY) return; + if (notify == NULL) + return; + SIMPLEQ_INIT(); rdomain = rtable_l2(rtable); - TAILQ_FOREACH_SAFE(inp, >inpt_queue, inp_queue, ninp) { + mtx_enter(>inpt_mtx); + TAILQ_FOREACH(inp, >inpt_queue,
Re: ssh: xstrdup(): use memcpy(3)
On Wed, 09 Mar 2022 19:20:08 -0600, Scott Cheloha wrote: > The strdup(3) implementation in libc uses memcpy(3), not strlcpy(3). > > There is no upside to using strlcpy(3) here if we know the length of > str before we copy it to the destination buffer. Sure, using memcpy() here is fine since the length includes space for the NUL terminator. OK millert@ - todd
Re: ipsec acquire mutex refcount
On 9.3.2022. 21:02, Hrvoje Popovski wrote: > On 9.3.2022. 19:14, Alexander Bluhm wrote: >> Hi, >> >> Hrvoje has hit a crash with IPsec acquire while testing the parallel >> ip forwarding diff. Analysis with sashan@ concludes that this part >> is not MP safe. Mutex and refcount should fix the memory management. >> >> Hrvoje: Could you test this diff? > > Hi, > > no problem. Give me 2 or 3 days to hit it properly... Hi, after 2 days of hitting sasyncd setup I can't trigger panic as before. Thank you ..
Re: atomic read write
I've been reading the newer C standard (n2310.pdf to be specific). That thing is scary. I believe that, in the long haul, we will have to use the Atomic types stuff that's apparently part of the C standard now. Yeah, the switch to multi-threading made the standard way more complicated. Sequence points were so much easier to explain. I'm not quite sure how Atomic types actually relate to SMP, but I'm afraid it's currently the "best" warranty we will be stuck with to try and get "better" atomic writes... I haven't even looked to see what clang does with it.
Re: atomic read write
On Fri, Mar 11, 2022 at 11:51:31AM +0100, Alexander Bluhm wrote: > On Fri, Mar 11, 2022 at 05:32:11AM +, Visa Hankala wrote: > > On Thu, Mar 10, 2022 at 07:17:43PM +0100, Alexander Bluhm wrote: > > > On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote: > > > > > Below is a patch that shows how to accomplish release semantics with > > > > > the file and refcnt APIs. (The added memory barriers ensure that the > > > > > CPU completes its loads and stores to the object before dropping the > > > > > reference. Another CPU might delete the object immediately after. > > > > > The barrier might not be strictly necessary with refcnt_finalize(), > > > > > though.) > > > > > > The enter and exit membars that protect the critical section should > > > be symmetric. Maybe this diff is better. > > > > No, symmetry is not always present. See below. > > In general if you want to move data from one CPU to another, you > have to push them out and pull them in. That is where the symetry > comes from and it should be reflected in code. At least that is > my understanding, but understanding the whole topic is hard. A skeptical take on memory barriers is that they do not push anything; they only make certain things happen in a specific order but the exact time frame remains open. To ensure immediacy, the code needs to use a read-modify-write atomic operation or something similar that the processor cannot delay. For example, on octeon, membar_producer() prevents subsequent writes from getting reordered with earlier writes in the write buffer. However, the operation does not block the pipeline, it takes effect in the background. > I came to the conclusion that refcnt does not need membars at all. > It does not protect other data, it is only looking at one counter > variable. When using refcnt, it protects the livetime of an object. > For the data you need another lock which brings its own barriers. To make a reasonable API, one should consider the intended usage. Putting the barriers inside refcnt code keeps the API sane. Otherwise callers would have to remember issue barriers, which would also be wasteful on systems where RMW atomics do enforce memory order. > Not sure about the purpose of cond. Maybe membar_enter/exit is the > way to go. Does it guatantee anything about memomy access? The caller of cond_signal() has state that the caller of cond_wait() wants to see. cond_signal() should make the (local) state visible to other CPUs before the clearing of c_wait becomes visible. membar_exit() does that. cond_wait() should prevent premature peeking into the data from happening before the clearing of c_wait has been seen. I think membar_sync(), and not membar_enter(), is the right choice. My earlier suggestion about membar_enter() is wrong. This barrier orders subsequent reads and writes relative to earlier writes. The pivot in cond_wait() is a read! > > > And to avoid memory barriers the nobody understands we should convert > > > FRELE to refcnt_rele. > > > > I am not sure about that. I think as long as file reference counting > > does unusual things that refcnt does not implement, f_count handling > > should be separate. > > Let's keep FRELE out of the discussion. Either it works as it is > or it should use suitable primitives. But please no membars in the > file system code. I believe the membars are necessary if f_count is updated using atomic operations. An alternative is to wrap the reference count updates with a mutex which invokes membars internally but with increased total cost. Below is an updated patch that acknowledges the acquire semantics aspect of FRELE() when reference count has dropped to zero. The destructor wants to see the (aggregate) state that follows the f_count 1->0 transition. Index: kern/kern_descrip.c === RCS file: src/sys/kern/kern_descrip.c,v retrieving revision 1.205 diff -u -p -r1.205 kern_descrip.c --- kern/kern_descrip.c 20 Jan 2022 11:06:57 - 1.205 +++ kern/kern_descrip.c 11 Mar 2022 14:17:22 - @@ -1268,7 +1268,16 @@ fdrop(struct file *fp, struct proc *p) { int error; - KASSERTMSG(fp->f_count == 0, "count (%u) != 0", fp->f_count); + membar_exit_before_atomic(); + if (atomic_dec_int_nv(>f_count) > 0) + return 0; + + /* +* Make this CPU see the latest state relative to f_count updates. +* Note this memory barrier is redundant because the following +* critical section provides an acquire-release barrier. +*/ + /* membar_enter_after_atomic(); */ mtx_enter(); if (fp->f_iflags & FIF_INSERTED) Index: sys/file.h === RCS file: src/sys/sys/file.h,v retrieving revision 1.65 diff -u -p -r1.65 file.h --- sys/file.h 20 Jan 2022 03:43:31 - 1.65 +++ sys/file.h 11 Mar 2022 14:17:22 - @@ -113,8 +113,7 @@ struct file {
Re: atomic read write
On Fri, Mar 11, 2022 at 09:33:29AM +0100, Mark Kettenis wrote: Hello Mark, Unfortunately this transformation almost certainly isn't safe: for example, the non-atomic load can return values that were never written by any thread (e.g. due to load/store tearing amongst other fun effects). >>> is that true even when care is taken to only use int/long sized variables >>> that are naturally aligned? are compilers that pathological now? >> Yes and yes, I'm afraid -- though rare, there are real examples of >> compilers doing this (pity the poor person who debugs it!). My working >> guess is that they'll be increasingly likely to occur as compilers become >> ever more aggressive at taking advantage of undefined behaviour. > Such a compiler is broken beyond repair. Unfortunately, the compiler is correct with respect to C's semantics. We might wish that it did not choose to take advantage of this, but it does. As much as I dislike this, I've given up hoping that the situation will change. To be specific, since we're using such a compiler in clang 13 (I have no idea if gcc 4.* is as aggressive, though more recent gcc versions definitely are), we've got little choice but to fit into its mould. For example, if an integer can be shared between threads, it must either always be read/written atomically or synchronised via some sort of guard (e.g. a mutex): failing to do so can lead to the horrors that the Alglave et al. article details. Laurie
slow xrandr / DRM_IOCTL_MODE_GETCONNECTOR
This isn't new and not sure if there is anything that can be done but it annoyed me again so I thought I'd mention it to see if anyone has ideas. Machine is X1Y4 (UHD Graphics 620; COFFEELAKE, gen 9). Displays are the built-in LCD and DP-1 which is on a USB-C to HDMI hub. If xrandr is run while the external monitor is connected then xrandr is slow (about 3 seconds) and during the time it's running, userland stalls - no screen/cursor movement in X. ssh'd into the machine and running top, that stops updating too. If the external monitor is disconnected and I plug it in, the same stall occurs when the cable is connected (including if I am switched to a text vt). $ time xrandr Screen 0: minimum 320 x 200, current 1920 x 1200, maximum 16384 x 16384 eDP-1 connected primary (normal left inverted right x axis y axis) 1920x1080 60.01 + 60.0159.9759.9659.93 1680x1050 59.9559.88 1400x1050 59.98 1600x900 59.9959.9459.9559.82 1280x1024 60.02 1400x900 59.9659.88 1280x960 60.00 1440x810 60.0059.97 1368x768 59.8859.85 1280x800 59.9959.9759.8159.91 1280x720 60.0059.9959.8659.74 1024x768 60.0460.00 960x720 60.00 928x696 60.05 896x672 60.01 1024x576 59.9559.9659.9059.82 960x600 59.9360.00 960x540 59.9659.9959.6359.82 800x600 60.0060.3256.25 840x525 60.0159.88 864x486 59.9259.57 700x525 59.98 800x450 59.9559.82 640x512 60.02 700x450 59.9659.88 640x480 60.0059.94 720x405 59.5158.99 684x384 59.8859.85 640x400 59.8859.98 640x360 59.8659.8359.8459.32 512x384 60.00 512x288 60.0059.92 480x270 59.6359.82 400x300 60.3256.34 432x243 59.9259.57 320x240 60.05 360x202 59.5159.13 320x180 59.8459.32 DP-1 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 518mm x 324mm 1920x1200 59.95*+ 1920x1080 60.0050.0059.9430.0025.0024.0029.97 23.98 1920x1080i60.0050.0059.94 1600x1200 60.00 1280x1024 75.0260.02 1152x864 75.00 1280x720 60.0050.0059.94 1024x768 75.0360.00 800x600 75.0060.32 720x576 50.00 720x480 60.0059.94 640x480 75.0060.0059.94 720x400 70.08 HDMI-1 disconnected (normal left inverted right x axis y axis) DP-2 disconnected (normal left inverted right x axis y axis) 0m03.65s real 0m00.00s user 0m00.00s system Some ktrace -TR from around the time, followed by dmesg then Xorg.log (which covers a couple of instances of running xrandr) Xorg ktrace: 49523 Xorg 0.896745 CALL ioctl(11,DRM_IOCTL_MODE_GETPROPERTY,0x7f7df6a0) 49523 Xorg 0.896748 RET ioctl 0 49523 Xorg 0.896755 CALL ioctl(11,DRM_IOCTL_MODE_GETCONNECTOR,0x7f7df6b0) 49523 Xorg 4.241310 RET ioctl 0 49523 Xorg 4.241313 STRU struct kevent { ident=7, filter=EVFILT_READ, flags=0x1005, fflags=0<>, data=2, udata=0xe943e602 } 49523 Xorg 4.241317 STRU struct pollfd [9] { fd=7, events=0x1, revents=0x1 } { fd=21, events=0x1, revents=0<> } { fd=23, events=0x1, revents=0<> } { fd=24, events=0x1, revents=0<> } { fd=25, events=0x1, revents=0<> } { fd=26, events=0x1, revents=0<> } { fd=27, events=0x1, revents=0<> } { fd=28, events=0x1, revents=0<> } { fd=29, events=0x1, revents=0<> } 49523 Xorg 4.241368 CALL ioctl(11,DRM_IOCTL_MODE_GETCONNECTOR,0x7f7df6b0) 49523 Xorg 4.241370 RET poll 1 xrandr 78338 xrandr 0.013900 CALL recvmsg(3,0x7f7d1f20,0) 78338 xrandr 0.013981 RET recvmsg -1 errno 35 Resource temporarily unavailable 78338 xrandr 0.013988 CALL kbind(0x7f7d2098,24,0xac0c8ac752cf4b56) 78338 xrandr 0.014018 RET kbind 0 78338 xrandr 0.014023 CALL poll(0x7f7d1ec8,1,INFTIM) 78338 xrandr 0.014025 STRU struct kevent [2] { ident=3, filter=EVFILT_READ, flags=0x1005, fflags=0<>, data=0, udata=0x33b648e4 } { ident=3, filter=EVFILT_WRITE, flags=0x1005, fflags=0<>, data=0, udata=0x33b648e4 } 78338 xrandr 0.014047 STRU struct kevent { ident=3, filter=EVFILT_WRITE, flags=0x1005, fflags=0<>, data=65536, udata=0x33b648e4 } 78338 xrandr 0.014049 STRU struct pollfd { fd=3, events=0x5, revents=0x4 } 78338 xrandr 0.014057 RET poll 1 78338 xrandr 0.014066 CALL writev(3,0x7f7d1f70,3) 78338 xrandr 0.014068 STRU struct iovec [3] { base=0xc084bba7000, len=8 } { base=0x0, len=0 } { base=0xc0856384ce6, len=0 } 78338 xrandr 0.014085 GIO fd 3 wrote 8 bytes "\M^L\b\^B\0\M-4\a\0\0"
Re: atomic read write
On Fri, Mar 11, 2022 at 05:32:11AM +, Visa Hankala wrote: > On Thu, Mar 10, 2022 at 07:17:43PM +0100, Alexander Bluhm wrote: > > On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote: > > > > Below is a patch that shows how to accomplish release semantics with > > > > the file and refcnt APIs. (The added memory barriers ensure that the > > > > CPU completes its loads and stores to the object before dropping the > > > > reference. Another CPU might delete the object immediately after. > > > > The barrier might not be strictly necessary with refcnt_finalize(), > > > > though.) > > > > The enter and exit membars that protect the critical section should > > be symmetric. Maybe this diff is better. > > No, symmetry is not always present. See below. In general if you want to move data from one CPU to another, you have to push them out and pull them in. That is where the symetry comes from and it should be reflected in code. At least that is my understanding, but understanding the whole topic is hard. As kettenis mentioned atomic_membar does not fit atomic_load/store in amd64. So membar_enter/exit would be better. I came to the conclusion that refcnt does not need membars at all. It does not protect other data, it is only looking at one counter variable. When using refcnt, it protects the livetime of an object. For the data you need another lock which brings its own barriers. Not sure about the purpose of cond. Maybe membar_enter/exit is the way to go. Does it guatantee anything about memomy access? > > And to avoid memory barriers the nobody understands we should convert > > FRELE to refcnt_rele. > > I am not sure about that. I think as long as file reference counting > does unusual things that refcnt does not implement, f_count handling > should be separate. Let's keep FRELE out of the discussion. Either it works as it is or it should use suitable primitives. But please no membars in the file system code. bluhm > > > Index: kern/kern_synch.c > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v > > retrieving revision 1.183 > > diff -u -p -r1.183 kern_synch.c > > --- kern/kern_synch.c 10 Mar 2022 15:21:08 - 1.183 > > +++ kern/kern_synch.c 10 Mar 2022 18:11:39 - > > @@ -805,6 +805,7 @@ void > > refcnt_init(struct refcnt *r) > > { > > atomic_store_int(>r_refs, 1); > > + membar_enter_after_atomic(); > > } > > I think membar is unnecessary here. Concurrent access can only happen > after the object has been "published", and the publishing should have > the appropriate memory barriers. > > Without proper publishing, the code is prone to race conditions. > > Also note that membar_enter_after_atomic() is not valid with > atomic_store_* or atomic_load_*. See my comment about cond_signal(). > > > > > void > > @@ -818,6 +819,7 @@ refcnt_take(struct refcnt *r) > > #else > > atomic_inc_int(>r_refs); > > #endif > > + membar_enter_after_atomic(); > > } > > This is unnecessary. The caller already has a reference to the object. > refcnt_take() only has the intent of increasing the reference count. > > > > > int > > @@ -825,6 +827,7 @@ refcnt_rele(struct refcnt *r) > > { > > u_int refcnt; > > > > + membar_exit_before_atomic(); > > refcnt = atomic_dec_int_nv(>r_refs); > > KASSERT(refcnt != ~0); > > > > @@ -844,6 +847,7 @@ refcnt_finalize(struct refcnt *r, const > > struct sleep_state sls; > > u_int refcnt; > > > > + membar_exit_before_atomic(); > > refcnt = atomic_dec_int_nv(>r_refs); > > while (refcnt) { > > sleep_setup(, r, PWAIT, wmesg, 0); > > @@ -856,11 +860,13 @@ void > > cond_init(struct cond *c) > > { > > atomic_store_int(>c_wait, 1); > > + membar_enter_after_atomic(); > > } > > Same point here as with refcnt_init(). > > > > > void > > cond_signal(struct cond *c) > > { > > + membar_exit_before_atomic(); > > atomic_store_int(>c_wait, 0); > > This should use membar_exit(). membar_exit_before_atomic() is valid > only when accompanied with a true read-modify-write atomic operation. > > The atomic_ prefix with the store and load instructions confuses this > somewhat. > > The wakeup call that follows provides a membar function, but it comes > too late as c_wait has already been cleared. > > > > > wakeup_one(c); > > @@ -872,9 +878,11 @@ cond_wait(struct cond *c, const char *wm > > struct sleep_state sls; > > unsigned int wait; > > > > + membar_exit_before_atomic(); > > wait = atomic_load_int(>c_wait); > > while (wait) { > > sleep_setup(, c, PWAIT, wmesg, 0); > > + membar_exit_before_atomic(); > > wait = atomic_load_int(>c_wait); > > sleep_finish(, wait); > > } > > > > I think this should use membar_enter() after the loop. > > cond_wait() is supposed to provide acquire semantics; once cond_wait() >
Re: atomic read write
> Date: Thu, 10 Mar 2022 23:09:59 + > From: Laurence Tratt > > On Fri, Mar 11, 2022 at 09:00:57AM +1000, David Gwynne wrote: > > Hello David, > > >> Unfortunately this transformation almost certainly isn't safe: for > >> example, the non-atomic load can return values that were never written by > >> any thread (e.g. due to load/store tearing amongst other fun effects). > > is that true even when care is taken to only use int/long sized variables > > that are naturally aligned? are compilers that pathological now? > > Yes and yes, I'm afraid -- though rare, there are real examples of compilers > doing this (pity the poor person who debugs it!). My working guess is that > they'll be increasingly likely to occur as compilers become ever more > aggressive at taking advantage of undefined behaviour. > > Jade Alglave et al. did an approachable intro to many (all?) of the horrors > one might observe in the context of the Linux kernel [1], which is worth a > read IMHO. Such a compiler is broken beyond repair. > [1] https://lwn.net/Articles/793253/ > >