Hi,

On 2025-12-12 14:21:47 -0500, Greg Burd wrote:
> 
> On Fri, Dec 12, 2025, at 11:03 AM, Nathan Bossart wrote:
> > +/*
> > + * _InterlockedExchange() generates a full memory barrier (or release
> > + * semantics that ensures all prior memory operations are visible to
> > + * other cores before the lock is released.
> > + */
> > +#define S_UNLOCK(lock) (InterlockedExchange(lock, 0))
> 
> Nathan, thanks for looking at the patch!
> 
> > This seems to change the implementation from
> >
> >     #define S_UNLOCK(lock)  \
> >             do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > in some cases, but I am insufficiently caffeinated to figure out what
> > platforms use which implementation.  In any case, it looks like we are
> > changing it for some currently-supported platforms, and I'm curious why.
> 
> This change is within _MSC_VER, but AFAICT this intrinsic is available
> across their supported platforms.  The previous implementation of S_UNLOCK()
> boils down to a no-op because the _ReadWriteBarrier()[1] is a hint to the
> compiler and does not emit any instruction on any platform and it's also
> deprecated.  So, on MSVC S_UNLOCK is an unguarded assignment and then a loop
> that will be optimized out, not really what we wanted I'd imagine.

I don't think it can be optimized out, that should be prevented by
_ReadWriteBarrier() being a compiler barrier.


> My tests with godbolt showed this to be true, no instruction barriers
> emitted.  I think it was Andres[2] who suggested replacing it with
> _InterlockedExchange()[3].  So, given that _ReadWriteBarrier() is deprecated
> I decided not to specialize this change to only the ARM64 platform, sorry
> for not making that clear in the commit or email.

I don't think that's a good idea - the _ReadWriteBarrier() is sufficient on
x86 to implement a spinlock release (due to x86 being a total store order
architecture, once the lock is observed as being released, all the effects
protected by the lock are also guaranteed to be visible).  Making the
spinlocks use an atomic op for both acquire and release does cause measurable
slowdowns on x86 with gcc, so I'd expect the same to be true on windows.

Greetings,

Andres Freund


Reply via email to