Re: in pcb table mutex

2022-03-11 Thread Alexander Bluhm
Regress tested with witness, rebased to current, anyone?

On Wed, Mar 02, 2022 at 07:13:09PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer.  So
> to run pf in parallel we must make parts of the stack MP safe.
> Protect the list and hashes in the PCB tables with a mutex.
> 
> Note that the protocol notify functions may call pf via tcp_output().
> As the pf lock is a sleeping rw_lock, we must not hold a mutex.  To
> solve this for now, I collect these PCBs in inp_notify list and
> protect it with the exclusive netlock.
> 
> ok?
> 
> bluhm

Index: kern/kern_sysctl.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.399
diff -u -p -r1.399 kern_sysctl.c
--- kern/kern_sysctl.c  25 Jan 2022 04:04:40 -  1.399
+++ kern/kern_sysctl.c  11 Mar 2022 15:07:13 -
@@ -1366,16 +1366,24 @@ sysctl_file(int *name, u_int namelen, ch
struct inpcb *inp;
 
NET_LOCK();
+   mtx_enter(_mtx);
TAILQ_FOREACH(inp, _queue, inp_queue)
FILLSO(inp->inp_socket);
+   mtx_leave(_mtx);
+   mtx_enter(_mtx);
TAILQ_FOREACH(inp, _queue, inp_queue)
FILLSO(inp->inp_socket);
+   mtx_leave(_mtx);
+   mtx_enter(_mtx);
TAILQ_FOREACH(inp, _queue, inp_queue)
FILLSO(inp->inp_socket);
+   mtx_leave(_mtx);
 #ifdef INET6
+   mtx_enter(_mtx);
TAILQ_FOREACH(inp, _queue,
inp_queue)
FILLSO(inp->inp_socket);
+   mtx_leave(_mtx);
 #endif
NET_UNLOCK();
}
Index: netinet/in_pcb.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.259
diff -u -p -r1.259 in_pcb.c
--- netinet/in_pcb.c4 Mar 2022 20:35:10 -   1.259
+++ netinet/in_pcb.c11 Mar 2022 15:07:13 -
@@ -120,7 +120,8 @@ struct baddynamicports baddynamicports;
 struct baddynamicports rootonlyports;
 struct pool inpcb_pool;
 
-int in_pcbresize (struct inpcbtable *, int);
+void   in_pcbrehash_locked(struct inpcb *);
+intin_pcbresize(struct inpcbtable *, int);
 
 #defineINPCBHASH_LOADFACTOR(_x)(((_x) * 3) / 4)
 
@@ -173,7 +174,7 @@ in_pcblhash(struct inpcbtable *table, in
 void
 in_pcbinit(struct inpcbtable *table, int hashsize)
 {
-
+   mtx_init(>inpt_mtx, IPL_SOFTNET);
TAILQ_INIT(>inpt_queue);
table->inpt_hashtbl = hashinit(hashsize, M_PCB, M_WAITOK,
>inpt_mask);
@@ -252,6 +253,7 @@ in_pcballoc(struct socket *so, struct in
inp->inp_cksum6 = -1;
 #endif /* INET6 */
 
+   mtx_enter(>inpt_mtx);
if (table->inpt_count++ > INPCBHASH_LOADFACTOR(table->inpt_size))
(void)in_pcbresize(table, table->inpt_size * 2);
TAILQ_INSERT_HEAD(>inpt_queue, inp, inp_queue);
@@ -268,6 +270,8 @@ in_pcballoc(struct socket *so, struct in
>inp_faddr, inp->inp_fport,
>inp_laddr, inp->inp_lport);
LIST_INSERT_HEAD(head, inp, inp_hash);
+   mtx_leave(>inpt_mtx);
+
so->so_pcb = inp;
 
return (0);
@@ -556,6 +560,7 @@ void
 in_pcbdetach(struct inpcb *inp)
 {
struct socket *so = inp->inp_socket;
+   struct inpcbtable *table = inp->inp_table;
 
NET_ASSERT_LOCKED();
 
@@ -585,10 +590,13 @@ in_pcbdetach(struct inpcb *inp)
pf_inp_unlink(inp);
}
 #endif
+   mtx_enter(>inpt_mtx);
LIST_REMOVE(inp, inp_lhash);
LIST_REMOVE(inp, inp_hash);
-   TAILQ_REMOVE(>inp_table->inpt_queue, inp, inp_queue);
-   inp->inp_table->inpt_count--;
+   TAILQ_REMOVE(>inpt_queue, inp, inp_queue);
+   table->inpt_count--;
+   mtx_leave(>inpt_mtx);
+
in_pcbunref(inp);
 }
 
@@ -661,20 +669,25 @@ void
 in_pcbnotifyall(struct inpcbtable *table, struct sockaddr *dst, u_int rtable,
 int errno, void (*notify)(struct inpcb *, int))
 {
-   struct inpcb *inp, *ninp;
+   SIMPLEQ_HEAD(, inpcb) inpcblist;
+   struct inpcb *inp;
struct in_addr faddr;
u_int rdomain;
 
-   NET_ASSERT_LOCKED();
+   NET_ASSERT_WLOCKED();
 
if (dst->sa_family != AF_INET)
return;
faddr = satosin(dst)->sin_addr;
if (faddr.s_addr == INADDR_ANY)
return;
+   if (notify == NULL)
+   return;
 
+   SIMPLEQ_INIT();
rdomain = rtable_l2(rtable);
-   TAILQ_FOREACH_SAFE(inp, >inpt_queue, inp_queue, ninp) {
+   mtx_enter(>inpt_mtx);
+   TAILQ_FOREACH(inp, >inpt_queue, 

Re: ssh: xstrdup(): use memcpy(3)

2022-03-11 Thread Todd C . Miller
On Wed, 09 Mar 2022 19:20:08 -0600, Scott Cheloha wrote:

> The strdup(3) implementation in libc uses memcpy(3), not strlcpy(3).
>
> There is no upside to using strlcpy(3) here if we know the length of
> str before we copy it to the destination buffer.

Sure, using memcpy() here is fine since the length includes space
for the NUL terminator.  OK millert@

 - todd



Re: ipsec acquire mutex refcount

2022-03-11 Thread Hrvoje Popovski
On 9.3.2022. 21:02, Hrvoje Popovski wrote:
> On 9.3.2022. 19:14, Alexander Bluhm wrote:
>> Hi,
>>
>> Hrvoje has hit a crash with IPsec acquire while testing the parallel
>> ip forwarding diff.  Analysis with sashan@ concludes that this part
>> is not MP safe.  Mutex and refcount should fix the memory management.
>>
>> Hrvoje: Could you test this diff?
> 
> Hi,
> 
> no problem. Give me 2 or 3 days to hit it properly...

Hi,

after 2 days of hitting sasyncd setup I can't trigger panic as before.

Thank you ..




Re: atomic read write

2022-03-11 Thread Marc Espie
I've been reading the newer C standard
(n2310.pdf to be specific).

That thing is scary.

I believe that, in the long haul, we will have to use the Atomic types stuff
that's apparently part of the C standard now.

Yeah, the switch to multi-threading made the standard way more complicated.
Sequence points were so much easier to explain.

I'm not quite sure how Atomic types actually relate to SMP, but I'm afraid
it's currently the "best" warranty we will be stuck with to try and get
"better" atomic writes...

I haven't even looked to see what clang does with it.



Re: atomic read write

2022-03-11 Thread Visa Hankala
On Fri, Mar 11, 2022 at 11:51:31AM +0100, Alexander Bluhm wrote:
> On Fri, Mar 11, 2022 at 05:32:11AM +, Visa Hankala wrote:
> > On Thu, Mar 10, 2022 at 07:17:43PM +0100, Alexander Bluhm wrote:
> > > On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote:
> > > > > Below is a patch that shows how to accomplish release semantics with
> > > > > the file and refcnt APIs. (The added memory barriers ensure that the
> > > > > CPU completes its loads and stores to the object before dropping the
> > > > > reference. Another CPU might delete the object immediately after.
> > > > > The barrier might not be strictly necessary with refcnt_finalize(),
> > > > > though.)
> > > 
> > > The enter and exit membars that protect the critical section should
> > > be symmetric.  Maybe this diff is better.
> > 
> > No, symmetry is not always present. See below.
> 
> In general if you want to move data from one CPU to another, you
> have to push them out and pull them in.  That is where the symetry
> comes from and it should be reflected in code.  At least that is
> my understanding, but understanding the whole topic is hard.

A skeptical take on memory barriers is that they do not push anything;
they only make certain things happen in a specific order but the exact
time frame remains open. To ensure immediacy, the code needs to use
a read-modify-write atomic operation or something similar that the
processor cannot delay.

For example, on octeon, membar_producer() prevents subsequent writes
from getting reordered with earlier writes in the write buffer.
However, the operation does not block the pipeline, it takes effect
in the background.

> I came to the conclusion that refcnt does not need membars at all.
> It does not protect other data, it is only looking at one counter
> variable.  When using refcnt, it protects the livetime of an object.
> For the data you need another lock which brings its own barriers.

To make a reasonable API, one should consider the intended usage.
Putting the barriers inside refcnt code keeps the API sane. Otherwise
callers would have to remember issue barriers, which would also be
wasteful on systems where RMW atomics do enforce memory order.

> Not sure about the purpose of cond.  Maybe membar_enter/exit is the
> way to go.  Does it guatantee anything about memomy access?

The caller of cond_signal() has state that the caller of cond_wait()
wants to see.

cond_signal() should make the (local) state visible to other CPUs
before the clearing of c_wait becomes visible. membar_exit() does
that.

cond_wait() should prevent premature peeking into the data from
happening before the clearing of c_wait has been seen. I think
membar_sync(), and not membar_enter(), is the right choice.

My earlier suggestion about membar_enter() is wrong. This barrier
orders subsequent reads and writes relative to earlier writes. The
pivot in cond_wait() is a read!

> > > And to avoid memory barriers the nobody understands we should convert
> > > FRELE to refcnt_rele.
> > 
> > I am not sure about that. I think as long as file reference counting
> > does unusual things that refcnt does not implement, f_count handling
> > should be separate.
> 
> Let's keep FRELE out of the discussion.  Either it works as it is
> or it should use suitable primitives.  But please no membars in the
> file system code.

I believe the membars are necessary if f_count is updated using atomic
operations. An alternative is to wrap the reference count updates with
a mutex which invokes membars internally but with increased total cost.

Below is an updated patch that acknowledges the acquire semantics
aspect of FRELE() when reference count has dropped to zero. The
destructor wants to see the (aggregate) state that follows the
f_count 1->0 transition.

Index: kern/kern_descrip.c
===
RCS file: src/sys/kern/kern_descrip.c,v
retrieving revision 1.205
diff -u -p -r1.205 kern_descrip.c
--- kern/kern_descrip.c 20 Jan 2022 11:06:57 -  1.205
+++ kern/kern_descrip.c 11 Mar 2022 14:17:22 -
@@ -1268,7 +1268,16 @@ fdrop(struct file *fp, struct proc *p)
 {
int error;
 
-   KASSERTMSG(fp->f_count == 0, "count (%u) != 0", fp->f_count);
+   membar_exit_before_atomic();
+   if (atomic_dec_int_nv(>f_count) > 0)
+   return 0;
+
+   /*
+* Make this CPU see the latest state relative to f_count updates.
+* Note this memory barrier is redundant because the following
+* critical section provides an acquire-release barrier.
+*/
+   /* membar_enter_after_atomic(); */
 
mtx_enter();
if (fp->f_iflags & FIF_INSERTED)
Index: sys/file.h
===
RCS file: src/sys/sys/file.h,v
retrieving revision 1.65
diff -u -p -r1.65 file.h
--- sys/file.h  20 Jan 2022 03:43:31 -  1.65
+++ sys/file.h  11 Mar 2022 14:17:22 -
@@ -113,8 +113,7 @@ struct file {
 

Re: atomic read write

2022-03-11 Thread Laurence Tratt
On Fri, Mar 11, 2022 at 09:33:29AM +0100, Mark Kettenis wrote:

Hello Mark,

 Unfortunately this transformation almost certainly isn't safe: for
 example, the non-atomic load can return values that were never written
 by any thread (e.g. due to load/store tearing amongst other fun
 effects).
>>> is that true even when care is taken to only use int/long sized variables
>>> that are naturally aligned? are compilers that pathological now?
>> Yes and yes, I'm afraid -- though rare, there are real examples of
>> compilers doing this (pity the poor person who debugs it!). My working
>> guess is that they'll be increasingly likely to occur as compilers become
>> ever more aggressive at taking advantage of undefined behaviour.
> Such a compiler is broken beyond repair.

Unfortunately, the compiler is correct with respect to C's semantics. We
might wish that it did not choose to take advantage of this, but it does. As
much as I dislike this, I've given up hoping that the situation will change.

To be specific, since we're using such a compiler in clang 13 (I have no idea
if gcc 4.* is as aggressive, though more recent gcc versions definitely are),
we've got little choice but to fit into its mould. For example, if an integer
can be shared between threads, it must either always be read/written
atomically or synchronised via some sort of guard (e.g. a mutex): failing to
do so can lead to the horrors that the Alglave et al. article details.


Laurie



slow xrandr / DRM_IOCTL_MODE_GETCONNECTOR

2022-03-11 Thread Stuart Henderson
This isn't new and not sure if there is anything that can be done but
it annoyed me again so I thought I'd mention it to see if anyone has ideas.

Machine is X1Y4 (UHD Graphics 620; COFFEELAKE, gen 9). Displays are the
built-in LCD and DP-1 which is on a USB-C to HDMI hub.

If xrandr is run while the external monitor is connected then xrandr is
slow (about 3 seconds) and during the time it's running, userland stalls -
no screen/cursor movement in X. ssh'd into the machine and running top,
that stops updating too.

If the external monitor is disconnected and I plug it in, the same stall
occurs when the cable is connected (including if I am switched to a text vt).

$ time xrandr
Screen 0: minimum 320 x 200, current 1920 x 1200, maximum 16384 x 16384
eDP-1 connected primary (normal left inverted right x axis y axis)
   1920x1080 60.01 +  60.0159.9759.9659.93  
   1680x1050 59.9559.88  
   1400x1050 59.98  
   1600x900  59.9959.9459.9559.82  
   1280x1024 60.02  
   1400x900  59.9659.88  
   1280x960  60.00  
   1440x810  60.0059.97  
   1368x768  59.8859.85  
   1280x800  59.9959.9759.8159.91  
   1280x720  60.0059.9959.8659.74  
   1024x768  60.0460.00  
   960x720   60.00  
   928x696   60.05  
   896x672   60.01  
   1024x576  59.9559.9659.9059.82  
   960x600   59.9360.00  
   960x540   59.9659.9959.6359.82  
   800x600   60.0060.3256.25  
   840x525   60.0159.88  
   864x486   59.9259.57  
   700x525   59.98  
   800x450   59.9559.82  
   640x512   60.02  
   700x450   59.9659.88  
   640x480   60.0059.94  
   720x405   59.5158.99  
   684x384   59.8859.85  
   640x400   59.8859.98  
   640x360   59.8659.8359.8459.32  
   512x384   60.00  
   512x288   60.0059.92  
   480x270   59.6359.82  
   400x300   60.3256.34  
   432x243   59.9259.57  
   320x240   60.05  
   360x202   59.5159.13  
   320x180   59.8459.32  
DP-1 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 518mm x 
324mm
   1920x1200 59.95*+
   1920x1080 60.0050.0059.9430.0025.0024.0029.97
23.98  
   1920x1080i60.0050.0059.94  
   1600x1200 60.00  
   1280x1024 75.0260.02  
   1152x864  75.00  
   1280x720  60.0050.0059.94  
   1024x768  75.0360.00  
   800x600   75.0060.32  
   720x576   50.00  
   720x480   60.0059.94  
   640x480   75.0060.0059.94  
   720x400   70.08  
HDMI-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
0m03.65s real 0m00.00s user 0m00.00s system

Some ktrace -TR from around the time, followed by dmesg then Xorg.log
(which covers a couple of instances of running xrandr)

Xorg ktrace:

 49523 Xorg 0.896745 CALL  
ioctl(11,DRM_IOCTL_MODE_GETPROPERTY,0x7f7df6a0)
 49523 Xorg 0.896748 RET   ioctl 0
 49523 Xorg 0.896755 CALL  
ioctl(11,DRM_IOCTL_MODE_GETCONNECTOR,0x7f7df6b0)
 49523 Xorg 4.241310 RET   ioctl 0
 49523 Xorg 4.241313 STRU  struct kevent { ident=7, filter=EVFILT_READ, 
flags=0x1005, fflags=0<>, data=2, udata=0xe943e602 }
 49523 Xorg 4.241317 STRU  struct pollfd [9] { fd=7, events=0x1, 
revents=0x1 } { fd=21, events=0x1, revents=0<> } { fd=23, 
events=0x1, revents=0<> } { fd=24, events=0x1, revents=0<> } { 
fd=25, events=0x1, revents=0<> } { fd=26, events=0x1, 
revents=0<> } { fd=27, events=0x1, revents=0<> } { fd=28, 
events=0x1, revents=0<> } { fd=29, events=0x1, revents=0<> }
 49523 Xorg 4.241368 CALL  
ioctl(11,DRM_IOCTL_MODE_GETCONNECTOR,0x7f7df6b0)
 49523 Xorg 4.241370 RET   poll 1

xrandr

 78338 xrandr   0.013900 CALL  recvmsg(3,0x7f7d1f20,0)
 78338 xrandr   0.013981 RET   recvmsg -1 errno 35 Resource temporarily 
unavailable
 78338 xrandr   0.013988 CALL  kbind(0x7f7d2098,24,0xac0c8ac752cf4b56)
 78338 xrandr   0.014018 RET   kbind 0
 78338 xrandr   0.014023 CALL  poll(0x7f7d1ec8,1,INFTIM)
 78338 xrandr   0.014025 STRU  struct kevent [2] { ident=3, filter=EVFILT_READ, 
flags=0x1005, fflags=0<>, data=0, udata=0x33b648e4 } { 
ident=3, filter=EVFILT_WRITE, flags=0x1005, fflags=0<>, 
data=0, udata=0x33b648e4 }
 78338 xrandr   0.014047 STRU  struct kevent { ident=3, filter=EVFILT_WRITE, 
flags=0x1005, fflags=0<>, data=65536, udata=0x33b648e4 }
 78338 xrandr   0.014049 STRU  struct pollfd { fd=3, 
events=0x5, revents=0x4 }
 78338 xrandr   0.014057 RET   poll 1
 78338 xrandr   0.014066 CALL  writev(3,0x7f7d1f70,3)
 78338 xrandr   0.014068 STRU  struct iovec [3] { base=0xc084bba7000, len=8 } { 
base=0x0, len=0 } { base=0xc0856384ce6, len=0 }
 78338 xrandr   0.014085 GIO   fd 3 wrote 8 bytes
   "\M^L\b\^B\0\M-4\a\0\0"
 

Re: atomic read write

2022-03-11 Thread Alexander Bluhm
On Fri, Mar 11, 2022 at 05:32:11AM +, Visa Hankala wrote:
> On Thu, Mar 10, 2022 at 07:17:43PM +0100, Alexander Bluhm wrote:
> > On Thu, Mar 10, 2022 at 04:39:49PM +0100, Alexander Bluhm wrote:
> > > > Below is a patch that shows how to accomplish release semantics with
> > > > the file and refcnt APIs. (The added memory barriers ensure that the
> > > > CPU completes its loads and stores to the object before dropping the
> > > > reference. Another CPU might delete the object immediately after.
> > > > The barrier might not be strictly necessary with refcnt_finalize(),
> > > > though.)
> > 
> > The enter and exit membars that protect the critical section should
> > be symmetric.  Maybe this diff is better.
> 
> No, symmetry is not always present. See below.

In general if you want to move data from one CPU to another, you
have to push them out and pull them in.  That is where the symetry
comes from and it should be reflected in code.  At least that is
my understanding, but understanding the whole topic is hard.

As kettenis mentioned atomic_membar does not fit atomic_load/store
in amd64.  So membar_enter/exit would be better.

I came to the conclusion that refcnt does not need membars at all.
It does not protect other data, it is only looking at one counter
variable.  When using refcnt, it protects the livetime of an object.
For the data you need another lock which brings its own barriers.

Not sure about the purpose of cond.  Maybe membar_enter/exit is the
way to go.  Does it guatantee anything about memomy access?

> > And to avoid memory barriers the nobody understands we should convert
> > FRELE to refcnt_rele.
> 
> I am not sure about that. I think as long as file reference counting
> does unusual things that refcnt does not implement, f_count handling
> should be separate.

Let's keep FRELE out of the discussion.  Either it works as it is
or it should use suitable primitives.  But please no membars in the
file system code.

bluhm

> 
> > Index: kern/kern_synch.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> > retrieving revision 1.183
> > diff -u -p -r1.183 kern_synch.c
> > --- kern/kern_synch.c   10 Mar 2022 15:21:08 -  1.183
> > +++ kern/kern_synch.c   10 Mar 2022 18:11:39 -
> > @@ -805,6 +805,7 @@ void
> >  refcnt_init(struct refcnt *r)
> >  {
> > atomic_store_int(>r_refs, 1);
> > +   membar_enter_after_atomic();
> >  }
> 
> I think membar is unnecessary here. Concurrent access can only happen
> after the object has been "published", and the publishing should have
> the appropriate memory barriers.
> 
> Without proper publishing, the code is prone to race conditions.
> 
> Also note that membar_enter_after_atomic() is not valid with
> atomic_store_* or atomic_load_*. See my comment about cond_signal().
> 
> >  
> >  void
> > @@ -818,6 +819,7 @@ refcnt_take(struct refcnt *r)
> >  #else
> > atomic_inc_int(>r_refs);
> >  #endif
> > +   membar_enter_after_atomic();
> >  }
> 
> This is unnecessary. The caller already has a reference to the object.
> refcnt_take() only has the intent of increasing the reference count.
> 
> >  
> >  int
> > @@ -825,6 +827,7 @@ refcnt_rele(struct refcnt *r)
> >  {
> > u_int refcnt;
> >  
> > +   membar_exit_before_atomic();
> > refcnt = atomic_dec_int_nv(>r_refs);
> > KASSERT(refcnt != ~0);
> >  
> > @@ -844,6 +847,7 @@ refcnt_finalize(struct refcnt *r, const 
> > struct sleep_state sls;
> > u_int refcnt;
> >  
> > +   membar_exit_before_atomic();
> > refcnt = atomic_dec_int_nv(>r_refs);
> > while (refcnt) {
> > sleep_setup(, r, PWAIT, wmesg, 0);
> > @@ -856,11 +860,13 @@ void
> >  cond_init(struct cond *c)
> >  {
> > atomic_store_int(>c_wait, 1);
> > +   membar_enter_after_atomic();
> >  }
> 
> Same point here as with refcnt_init().
> 
> >  
> >  void
> >  cond_signal(struct cond *c)
> >  {
> > +   membar_exit_before_atomic();
> > atomic_store_int(>c_wait, 0);
> 
> This should use membar_exit(). membar_exit_before_atomic() is valid
> only when accompanied with a true read-modify-write atomic operation.
> 
> The atomic_ prefix with the store and load instructions confuses this
> somewhat.
> 
> The wakeup call that follows provides a membar function, but it comes
> too late as c_wait has already been cleared.
> 
> >  
> > wakeup_one(c);
> > @@ -872,9 +878,11 @@ cond_wait(struct cond *c, const char *wm
> > struct sleep_state sls;
> > unsigned int wait;
> >  
> > +   membar_exit_before_atomic();
> > wait = atomic_load_int(>c_wait);
> > while (wait) {
> > sleep_setup(, c, PWAIT, wmesg, 0);
> > +   membar_exit_before_atomic();
> > wait = atomic_load_int(>c_wait);
> > sleep_finish(, wait);
> > }
> > 
> 
> I think this should use membar_enter() after the loop.
> 
> cond_wait() is supposed to provide acquire semantics; once cond_wait()
> 

Re: atomic read write

2022-03-11 Thread Mark Kettenis
> Date: Thu, 10 Mar 2022 23:09:59 +
> From: Laurence Tratt 
> 
> On Fri, Mar 11, 2022 at 09:00:57AM +1000, David Gwynne wrote:
> 
> Hello David,
> 
> >> Unfortunately this transformation almost certainly isn't safe: for
> >> example, the non-atomic load can return values that were never written by
> >> any thread (e.g. due to load/store tearing amongst other fun effects).
> > is that true even when care is taken to only use int/long sized variables
> > that are naturally aligned? are compilers that pathological now?
> 
> Yes and yes, I'm afraid -- though rare, there are real examples of compilers
> doing this (pity the poor person who debugs it!). My working guess is that
> they'll be increasingly likely to occur as compilers become ever more
> aggressive at taking advantage of undefined behaviour.
> 
> Jade Alglave et al. did an approachable intro to many (all?) of the horrors
> one might observe in the context of the Linux kernel [1], which is worth a
> read IMHO.

Such a compiler is broken beyond repair.

> [1] https://lwn.net/Articles/793253/
> 
>