On Fri, Mar 7, 2025 at 7:07 PM Andres Freund <and...@anarazel.de> wrote: > On 2025-03-07 18:39:42 +0200, Alexander Korotkov wrote: > > On Fri, Mar 7, 2025 at 6:02 PM Andres Freund <and...@anarazel.de> wrote: > > > > > > On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote: > > > > While investigating a bug in the patch to get rid of WALBufMappingLock, I > > > > found that the surrounding pg_atomic_compare_exchange_u64() fixes the > > > > problem for me. > > > > > > That sounds more likely to be due to slowing down things enough to make a race > > > less likely to be hit. Or a compiler bug. > > > > > > > > > > That doesn't feel right because, according to the > > > > comments, both pg_atomic_compare_exchange_u32() and > > > > pg_atomic_compare_exchange_u64() should provide full memory barrier > > > > semantics. So, I decided to investigate this further. > > > > > > > > In my case, the implementation of pg_atomic_compare_exchange_u64() is based > > > > on __atomic_compare_exchange_n(). > > > > > > > > static inline bool > > > > pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, > > > > uint64 *expected, uint64 newval) > > > > { > > > > AssertPointerAlignment(expected, 8); > > > > return __atomic_compare_exchange_n(&ptr->value, expected, newval, false, > > > > __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); > > > > } > > > > > > > > According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all > > > > other __ATOMIC_SEQ_CST operations*. It only says about other > > > > __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes. > > > > This sounds quite far from our comment, promising full barrier semantics, > > > > which, in my understanding, means the equivalent of > > > > both pg_read_barrier()/pg_write_barrier(), which should barrier *all other > > > > read/writes*. > > > > > > A memory barrier in one process/thread always needs be paired with a barrier > > > in another process/thread. It's not enough to have a memory barrier on one > > > side but not the other. > > > > Yes, there surely should be a memory barrier on another side. But > > does __ATOMIC_SEQ_CST works as a barrier for the regular read/write > > operations on the same side? > > Yes, if it's paired with another barrier on the other side. The regular > read/write operations are basically protected transitively, due to > > a) An acquire barrier preventing non-atomic reads/writes in the same thread > from appearing to have been moved to before the barrier > > b) A release barrier preventing non-atomic reads/writes in the same thread > from appearing to have been moved to after the barrier > > c) The other thread being guaranteed a) and b) for the other threads' > non-atomic reads/writes depending on the type of barrier > > d) The atomic value itself being guaranteed to be, well, atomic
Yes, looks good to me. > > > > This function first checks if LSE instructions are present. If so, > > > > instruction LSE casal is used. If not (in my case), the loop of > > > > ldaxr/stlxr is used instead. The documentation says both ldaxr/stlxr > > > > provides one-way barriers. Read/writes after ldaxr will be observed after, > > > > and read/writes before stlxr will be observed before. So, a pair of these > > > > instructions provides a full memory barrier. However, if we don't observe > > > > the expected value, only ldaxr gets executed. So, an unsuccessful > > > > pg_atomic_compare_exchange_u64() attempt will leave us with a one-way > > > > barrier, and that caused a bug in my case. > > > > > > That has to be a compiler bug. We specify __ATOMIC_SEQ_CST for both > > > success_memorder *and* failure_memorder. > > > > > > What compiler & version is this? > > > > I've tried and got the same results with two compilers. > > gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 > > Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2) > > Thinking more about it I wonder if the behaviour of not doing a release in > case the atomic fails *might* arguably actually be correct - if the > compare-exchange fails, there's nothing that the non-atomic values could be > ordered against. > > What is the access pattern and the observed problems with it that made you > look at the disassembly? Check this code. l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); /* * Try to advance XLogCtl->InitializedUpTo. * * If the CAS operation failed, then some of previous pages are not * initialized yet, and this backend gives up. * * Since initializer of next page might give up on advancing of * InitializedUpTo, this backend have to attempt advancing until it * find page "in the past" or concurrent backend succeeded at * advancing. When we finish advancing XLogCtl->InitializedUpTo, we * notify all the waiters with XLogCtl->InitializedUpToCondVar. */ l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, &NewPageBeginPtr, NewPageEndPtr)) { NewPageBeginPtr = NewPageEndPtr; NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr); l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != NewPageEndPtr) { /* * Page at nextidx wasn't initialized yet, so we cann't move * InitializedUpto further. It will be moved by backend which * will initialize nextidx. */ ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); break; } } Consider the following execution order with process 1 (p1) and process 2 (p2). 1. p1 executes l1 2. p2 executes l2 with success 3. p1 executes l2 with failure 4. p2 execute l3, but doesn't see the results of step 1, because 3 didn't provide enough of memory barrier ------ Regards, Alexander Korotkov Supabase