On 3/10/20, Andrew Doran <a...@netbsd.org> wrote: > Following up again.. It turns out the deadlocks I saw with this were down > to a problem with kernel_lock that has since been solved. > > I made a couple more tweaks to it: up the max number of tracked holds to 8 > because vmobjlock became a rwlock, and because aiodoned is gone now be more > selective giving up and going to sleep in the softint case (walk down the > list of pinned LWPs on curcpu and see if any of them holds the lock that's > wanted, if not held on curcpu then some other CPU has it, and it's OK to > spin).
I think the proposed patch avoidably adds overhead to the most common use case to cover a corner case. Sensibly bounded spinning on readers can be done in a speculative fashion, i.e. without actively tracking if there are threads off cpu. FreeBSD is not completely there yet, but general idea is pretty trivial: once a writer shows up it sets a bit (RW_SPINNER or so), then it can observe whether read owners are draining with some pre-defined upper bound on how long it is willing to wait. There is no spinning if there are blocked to-be writers already. Thus this might get some waste initially, but it is bounded. Side note is that waiting for threads to drain can use the reader count to speculate how long to wait before re-reading the lock instead of doing it every pause. > void > +rw_enter(krwlock_t *rw, const krw_t op) > +{ > [..] > + > + /* > + * Read the lock owner field. If the need-to-wait > + * indicator is clear, then try to acquire the lock. > + */ > + owner = rw->rw_owner; > + if (__predict_true((owner & need_wait) == 0)) { > + next = rw_cas(rw, owner, (owner + incr) & mask); > + if (__predict_true(next == owner)) { > + /* Got it! */ > + RW_LOCKED(rw, op); > + KPREEMPT_ENABLE(l); > + RW_MEMBAR_ENTER(); > + return; > + } > + } Both this and the unlock path lose on throughput if the lock is heavily used by readers. The original amd64 fast path has a tight loop which immediately retries cmpxchg, while the patched variant aborts immediately. Assume the lock is only used by readers. Patched rw_enter will always conclude it can rw_cas on it. However, this can very easily fail if racing against another rw_enter or rw_exit. In this case this will fall to the slow path which starts with a re-read and then performs a tight loop. Whether a strict tight loop is a good idea is highly dependent, I believe it is the best thing to do on amd64 by default. Other architectures may want some form of pause-equivalent. Nonetheless, the fallback to re-read is definitely pessimal and will keep happening under load. I don't know if worth the effort at this stage, but this will eventually have to morph into a xadd-based lock. As noted elsewhere cmpxchg-baesd loops degrade under load to a significantly higher degree (of course everything is already pretty bad if you have a heavily bounced lock in the first place). Note that the conversion would also take care of the pause problem. I was looking at doing it in FreeBSD but there are some constraints which pose important design questions and to my understanding your code shares most of them (if not all). Blindly adding into the lock word means there is no space to store the entire thread pointer (since it could partially overwrite the address), but the owner needs to be findable in some manner in order to lend it priority. Assuming 32-bit architectures can stick to the current code there are some options without growing the lock. One variant was "compressing" the pointer -- if all threads are guaranteed to come from a pre-defined VA range the lock can only store the relevant portion of the address which can be pretty small. Alternatively there can be a lockless hash of threads keyed by thread id. Or this can can use a variant of storing the lock which you implemented here, except by the writer. It can denote in the lock it went off cpu with the relevant turnstile lock held. Then everyone waiting can look it up based on the lock. Of course one can also "give up" and add another word for the pointer, then this mostly degrades to regular read-write lock problems. -- Mateusz Guzik <mjguzik gmail.com>