Hi, Andres! On Fri, Mar 7, 2025 at 7:54 PM Alexander Korotkov <aekorot...@gmail.com> wrote: > On Fri, Mar 7, 2025 at 7:46 PM Andres Freund <and...@anarazel.de> wrote: > > On 2025-03-07 19:44:20 +0200, Alexander Korotkov wrote: > > > On Fri, Mar 7, 2025 at 7:38 PM Andres Freund <and...@anarazel.de> wrote: > > > > On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote: > > > > > On Fri, Mar 7, 2025 at 7:07 PM Andres Freund <and...@anarazel.de> wrote: > > > > > > What is the access pattern and the observed problems with it that made you > > > > > > look at the disassembly? > > > > > > > > > > Check this code. > > > > > > > > > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); > > > > > > > > > /* > > > > > * Try to advance XLogCtl->InitializedUpTo. > > > > > * > > > > > * If the CAS operation failed, then some of previous pages are not > > > > > * initialized yet, and this backend gives up. > > > > > * > > > > > * Since initializer of next page might give up on advancing of > > > > > * InitializedUpTo, this backend have to attempt advancing until it > > > > > * find page "in the past" or concurrent backend succeeded at > > > > > * advancing. When we finish advancing XLogCtl->InitializedUpTo, we > > > > > * notify all the waiters with XLogCtl->InitializedUpToCondVar. > > > > > */ > > > > > l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, > > > > > &NewPageBeginPtr, NewPageEndPtr)) > > > > > { > > > > > NewPageBeginPtr = NewPageEndPtr; > > > > > NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; > > > > > nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr); > > > > > > > > > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != > > > > > NewPageEndPtr) > > > > > { > > > > > /* > > > > > * Page at nextidx wasn't initialized yet, so we cann't move > > > > > * InitializedUpto further. It will be moved by backend > > > > > which > > > > > * will initialize nextidx. > > > > > */ > > > > > > > > > > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); > > > > > break; > > > > > } > > > > > } > > > > > > > > > > Consider the following execution order with process 1 (p1) and process 2 > > > > > (p2). > > > > > > > > On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote: > > > > > Sorry, I messed this up. > > > > > The correct sequence is following. > > > > > > > > > > 1. p1 executes l1 > > > > > 2. p1 executes l2 with failure > > > > > 3. p2 executes l2 with success > > > > > 4. p2 execute l3, but doesn't see the results of step 1, because 3 > > > > > didn't provide enough of memory barrier > > > > > > > > Did you mean because 2) didn't provide enough of a memory barrier? Because 3) > > > > does, right? > > > > > > Yes, exactly. > > > > > > > You could get in exactly same the situation if the p1 is scheduled out by the > > > > OS after step 1, no? > > > > > > No. In that case, p1 will execute l2 with success. p1 executes l2 > > > with failure only because it goes before p2 executes l2. > > > > In my scenario p1 will not execute l2 because p2 gets scheduled before it can > > do so. So p1 cant yet execute l2 before p2 executes l2. > > In my scenario p1.l2 will be successful if executed after p2.l2 and > failed if executed before p2.l2. Imagine initial value is 0. p1 > tries to change 1=>2, while p2 tries to change 0 => 1.
My explanation was cumbersome form the beginning. Let me come with more clear example. Let us have a shared memory variable v, and shared memory flag f. The initial state is as following. v = 0 f = false Two processes p1 and p2 runs the following programs in pseudocode. Atomic compare-exchange operation is represented as CAS(variable, oldval, newval). In this example we don't need to update oldval on failure or even know whether operation is successful. p1: CAS(v, 0, 1) if f: CAS(v, 1, 2) p2: f = true CAS(v, 1, 2) I think if CAS() implements full barrier semantics, the invariant is that v eventually get the value 2. If CAS(v, 1, 2) by p2 goes before CAS(v, 0, 1) by p1, p1 will reliably see f == true and apply CAS(v, 1, 2). If CAS() don't implement full barrier on failure this invariant gets broken (that is CAS() by p2 fails, but p1 see f == false). I'm not an expert in formal specifications of memory models. But I'm quite surprised we're discussing whether memory barrier on compare-exchange failure might matter. For me at least the fact that __atomic_compare_exchange_n() have failure_memorder argument is a quite an evidence of that. ------ Regards, Alexander Korotkov Supabase