My last diff (11 May 2021) still has a potential problem with memory barriers. I will mail a new diff if I think of a fix.
On Tue, 11 May 2021 11:08:55 -0500 Dale Rahn <[email protected]> wrote: > This structure really should be cache-line aligned, which should prevent it > from spilling across a page boundary. There is both a struct __ppc_lock and a function __ppc_lock(); my G5 got stuck at boot when the function crossed a page boundary. It might help to align parts of the function, but if I would do so, I might need to write the entire function in asm and count instructions. The potential problem in my last diff is here inside __ppc_lock(): 15eb8c: 7c c0 19 2d stwcx. r6,0,r3 15eb90: 40 c2 ff f0 bne+ 15eb80 <__ppc_lock+0x40> 15eb94: 7c 08 38 40 cmplw r8,r7 15eb98: 41 82 00 28 beq- 15ebc0 <__ppc_lock+0x80> ... 15ebc0: 4c 00 01 2c isync After "stwcx." grabs the lock, we need a "bc; isync" memory barrier, which is the "bne+" and "isync" in the above disassembly. A page fault might act like an "isync", so if we can do the "bne+" before the page fault, we might be fine. The problem is that, if the "stwcx." and "bne+" are in different pages (1 in 1024 chance?), then I would get a page fault before the "bne+", so my code would skip the necessary memory barrier. I guess that I can fix it by adding another membar_enter() somewhere. > The powerpc pmap was originally designed to have 8 'way' locks so that > only a single way would get locked, thus (as long as one doesn't have way > more than 8 cores) any core should be able to choose some way to replace > and get the work done. This also would have allowed limited recursion. > However at some point those were taken out. Note that before locking one of > the ways of the hashtable, it would hold the lock on the pmap that it was > inserting the mapping from (which could be the kernel). Not certain if the > kernel pmap lock is recursive or not. That's a good history lesson for me; I'm too new. The pmap lock is "struct mutex pm_mtx" (powerpc/include/pmap.h), so it isn't recursive. The code in pmap.c pte_spill_r() doesn't hold the pmap lock if the page is in the direct map. The kernel text, including the function __ppc_lock(), is in the direct map. > The point was to allow multiple cores to access different regions of the > hashtable at the same time (minimal contention). However it would also > allow a core to reenter the lock_try on a different way where it should be > able to succeed (as long as the recursion doesn't go 8 deep?) If I understand the code, the max recursion is 2 levels. The kernel can grab the hash lock, get a page fault, then the page fault handler can do 2nd grab. The handler is in real mode (no address translation) and would never cause another page fault and a 3rd grab. If we would have the 8 locks, then it might deadlock if 8 cpus each hold a lock and try to grab a 2nd lock (but a macppc has at most 4 cpus: PPC_MAXPROCS in powerpc/include/cpu.h). --George > On Tue, May 11, 2021 at 12:42 AM George Koehler <[email protected]> wrote: > > > I made a mistake in my last diff by deleting the memory barriers. > > Here is a diff that keeps membar_enter() and membar_exit(). > > > > With my last diff, my G5 froze after 15 hours of uptime, and again > > after 10 hours of uptime. I'm not sure whether the missing memory > > barriers caused the freeze, but I will be running this diff and hoping > > for fewer freezes. > > > > On Sat, 8 May 2021 18:59:35 +0200 (CEST) > > Mark Kettenis <[email protected]> wrote: > > > > > Good find! On powerpc64 I avoid the issue because I guarantee that > > > the kernel mappings are never evicted from the hash. But doing so on > > > powerpc would require more serious development. I'm not sure we > > > really need a ticket lock for this, but since you already did the > > > work, let's stick with it for now. > > > > __ppc_lock (before and after this diff) doesn't give tickets like > > __mp_lock does, but __ppc_lock and __mp_lock are both recursive locks. > > I don't know an easy way to avoid the recursion, if some of the kernel > > mappings might not be in the hash. My __ppc_lock diff should be easy > > if I don't make mistakes like wrong memory barriers. > > > > --George > > > > Index: arch/powerpc/include/mplock.h > > =================================================================== > > RCS file: /cvs/src/sys/arch/powerpc/include/mplock.h,v > > retrieving revision 1.4 > > diff -u -p -r1.4 mplock.h > > --- arch/powerpc/include/mplock.h 15 Apr 2020 08:09:00 -0000 1.4 > > +++ arch/powerpc/include/mplock.h 10 May 2021 23:33:00 -0000 > > @@ -30,13 +30,13 @@ > > #define __USE_MI_MPLOCK > > > > /* > > + * __ppc_lock exists because pte_spill_r() can't use __mp_lock. > > * Really simple spinlock implementation with recursive capabilities. > > * Correctness is paramount, no fancyness allowed. > > */ > > > > struct __ppc_lock { > > - volatile struct cpu_info *mpl_cpu; > > - volatile long mpl_count; > > + volatile unsigned int mpl_bolt; > > }; > > > > #ifndef _LOCORE > > @@ -44,10 +44,6 @@ struct __ppc_lock { > > void __ppc_lock_init(struct __ppc_lock *); > > void __ppc_lock(struct __ppc_lock *); > > void __ppc_unlock(struct __ppc_lock *); > > -int __ppc_release_all(struct __ppc_lock *); > > -int __ppc_release_all_but_one(struct __ppc_lock *); > > -void __ppc_acquire_count(struct __ppc_lock *, int); > > -int __ppc_lock_held(struct __ppc_lock *, struct cpu_info *); > > > > #endif > > > > Index: arch/powerpc/powerpc/lock_machdep.c > > =================================================================== > > RCS file: /cvs/src/sys/arch/powerpc/powerpc/lock_machdep.c,v > > retrieving revision 1.9 > > diff -u -p -r1.9 lock_machdep.c > > --- arch/powerpc/powerpc/lock_machdep.c 15 Apr 2020 08:09:00 -0000 1.9 > > +++ arch/powerpc/powerpc/lock_machdep.c 10 May 2021 23:33:00 -0000 > > @@ -1,6 +1,7 @@ > > /* $OpenBSD: lock_machdep.c,v 1.9 2020/04/15 08:09:00 mpi Exp $ */ > > > > /* > > + * Copyright (c) 2021 George Koehler <[email protected]> > > * Copyright (c) 2007 Artur Grabowski <[email protected]> > > * > > * Permission to use, copy, modify, and distribute this software for any > > @@ -22,15 +23,29 @@ > > #include <sys/atomic.h> > > > > #include <machine/cpu.h> > > -#include <machine/psl.h> > > > > #include <ddb/db_output.h> > > > > +/* > > + * If __ppc_lock() crosses a page boundary in the kernel text, then it > > + * may cause a page fault (on G5 with ppc_nobat), and pte_spill_r() > > + * would recursively call __ppc_lock(). The lock must be in a valid > > + * state when the page fault happens. > > + * > > + * This lock has 2 fields, "cpuid" and "count", packed in a 32-bit > > + * "bolt", so we can use 32-bit atomic ops to ensure that our lock is > > + * always in a valid state. > > + */ > > +#define BOLT(cpuid, count) ((cpuid) << 24 | (count) & 0x00ffffff) > > +#define BOLT_CPUID(bolt) ((bolt) >> 24) > > +#define BOLT_COUNT(bolt) ((bolt) & 0x00ffffff) > > +#define BOLT_LOCKED(bolt) (BOLT_COUNT(bolt) != 0) > > +#define BOLT_UNLOCKED(bolt) (BOLT_COUNT(bolt) == 0) > > + > > void > > __ppc_lock_init(struct __ppc_lock *lock) > > { > > - lock->mpl_cpu = NULL; > > - lock->mpl_count = 0; > > + lock->mpl_bolt = 0; > > } > > > > #if defined(MP_LOCKDEBUG) > > @@ -46,12 +61,12 @@ static __inline void > > __ppc_lock_spin(struct __ppc_lock *mpl) > > { > > #ifndef MP_LOCKDEBUG > > - while (mpl->mpl_count != 0) > > + while (BOLT_LOCKED(mpl->mpl_bolt)) > > CPU_BUSY_CYCLE(); > > #else > > int nticks = __mp_lock_spinout; > > > > - while (mpl->mpl_count != 0 && --nticks > 0) > > + while (BOLT_LOCKED(mpl->mpl_bolt) && --nticks > 0) > > CPU_BUSY_CYCLE(); > > > > if (nticks == 0) { > > @@ -64,33 +79,23 @@ __ppc_lock_spin(struct __ppc_lock *mpl) > > void > > __ppc_lock(struct __ppc_lock *mpl) > > { > > - /* > > - * Please notice that mpl_count gets incremented twice for the > > - * first lock. This is on purpose. The way we release the lock > > - * in mp_unlock is to decrement the mpl_count and then check if > > - * the lock should be released. Since mpl_count is what we're > > - * spinning on, decrementing it in mpl_unlock to 0 means that > > - * we can't clear mpl_cpu, because we're no longer holding the > > - * lock. In theory mpl_cpu doesn't need to be cleared, but it's > > - * safer to clear it and besides, setting mpl_count to 2 on the > > - * first lock makes most of this code much simpler. > > - */ > > + int id = curcpu()->ci_cpuid; > > > > while (1) { > > - int s; > > - > > - s = ppc_intr_disable(); > > - if (atomic_cas_ulong(&mpl->mpl_count, 0, 1) == 0) { > > - membar_enter(); > > - mpl->mpl_cpu = curcpu(); > > - } > > + unsigned int bolt = mpl->mpl_bolt; > > > > - if (mpl->mpl_cpu == curcpu()) { > > - mpl->mpl_count++; > > - ppc_intr_enable(s); > > + if (BOLT_UNLOCKED(bolt)) { > > + /* Try to grab the lock. count = 1 */ > > + if (atomic_cas_uint(&mpl->mpl_bolt, bolt, > > + BOLT(id, 1)) == bolt) { > > + membar_enter(); > > + break; > > + } > > + } else if (BOLT_CPUID(bolt) == id) { > > + /* We hold the lock. count++ */ > > + mpl->mpl_bolt = BOLT(id, BOLT_COUNT(bolt) + 1); > > break; > > } > > - ppc_intr_enable(s); > > > > __ppc_lock_spin(mpl); > > } > > @@ -99,72 +104,18 @@ __ppc_lock(struct __ppc_lock *mpl) > > void > > __ppc_unlock(struct __ppc_lock *mpl) > > { > > - int s; > > + unsigned int bolt = mpl->mpl_bolt; > > > > #ifdef MP_LOCKDEBUG > > - if (mpl->mpl_cpu != curcpu()) { > > + if (BOLT_CPUID(bolt) != curcpu()->ci_cpuid || > > + BOLT_COUNT(bolt) == 0) { > > db_printf("__ppc_unlock(%p): not held lock\n", mpl); > > db_enter(); > > } > > #endif > > > > - s = ppc_intr_disable(); > > - if (--mpl->mpl_count == 1) { > > - mpl->mpl_cpu = NULL; > > - membar_exit(); > > - mpl->mpl_count = 0; > > - } > > - ppc_intr_enable(s); > > -} > > - > > -int > > -__ppc_release_all(struct __ppc_lock *mpl) > > -{ > > - int rv = mpl->mpl_count - 1; > > - int s; > > - > > -#ifdef MP_LOCKDEBUG > > - if (mpl->mpl_cpu != curcpu()) { > > - db_printf("__ppc_release_all(%p): not held lock\n", mpl); > > - db_enter(); > > - } > > -#endif > > - > > - s = ppc_intr_disable(); > > - mpl->mpl_cpu = NULL; > > + /* count-- */ > > + bolt = BOLT(BOLT_CPUID(bolt), BOLT_COUNT(bolt) - 1); > > membar_exit(); > > - mpl->mpl_count = 0; > > - ppc_intr_enable(s); > > - > > - return (rv); > > -} > > - > > -int > > -__ppc_release_all_but_one(struct __ppc_lock *mpl) > > -{ > > - int rv = mpl->mpl_count - 2; > > - > > -#ifdef MP_LOCKDEBUG > > - if (mpl->mpl_cpu != curcpu()) { > > - db_printf("__ppc_release_all_but_one(%p): not held > > lock\n", mpl); > > - db_enter(); > > - } > > -#endif > > - > > - mpl->mpl_count = 2; > > - > > - return (rv); > > -} > > - > > -void > > -__ppc_acquire_count(struct __ppc_lock *mpl, int count) > > -{ > > - while (count--) > > - __ppc_lock(mpl); > > -} > > - > > -int > > -__ppc_lock_held(struct __ppc_lock *mpl, struct cpu_info *ci) > > -{ > > - return mpl->mpl_cpu == ci; > > + mpl->mpl_bolt = bolt; > > } > > > > -- George Koehler <[email protected]>
