Hi, On 2023-06-27 15:33:57 +1200, Thomas Munro wrote: > On Tue, Jun 27, 2023 at 2:05 PM Andres Freund <and...@anarazel.de> wrote: > > Unfortunately it scaled way worse at first. This is not an inherent issue, > > but > > due to an implementation choice in ReadRecentBuffer(). Whereas the normal > > BufferAlloc() path uses PinBuffer(), ReadRecentBuffer() first does > > LockBufHdr(), checks if the buffer ID is the same and then uses > > PinBuffer_Locked(). > > > > The problem with that is that PinBuffer() takes care to not hold the buffer > > header spinlock, it uses compare_exchange to atomically acquire the pin, > > while > > guaranteing nobody holds the lock. When holding the buffer header spinlock, > > there obviously is the risk of being scheduled out (or even just not have > > exclusive access to the cacheline). > > Yeah. Aside from inherent nastiness of user-space spinlocks
I've been wondering about making our backoff path use futexes, after some adaptive spinning. > > The fairly obvious solution to this is to just use PinBuffer() and just > > unpin > > the buffer if its identity was changed concurrently. There could be an > > unlocked pre-check as well. However, there's the following comment in > > ReadRecentBuffer(): > > * It's now safe to pin the buffer. We can't pin > > first and ask > > * questions later, because it might confuse code > > paths like > > * InvalidateBuffer() if we pinned a random > > non-matching buffer. > > */ > > > > But I'm not sure I buy that - there's plenty other things that can briefly > > acquire a buffer pin (e.g. checkpointer, reclaiming the buffer for other > > contents, etc). > > I may well have been too cautious with that. The worst thing I can > think of right now is that InvalidateBuffer() would busy loop (as it > already does in other rare cases) when it sees a pin. Right. Particularly if we were to add a pre-check for the tag to match, that should be extremely rare. > > Another difference between using PinBuffer() and PinBuffer_locked() is that > > the latter does not adjust a buffer's usagecount. > > > > Leaving the scalability issue aside, isn't it somewhat odd that optimizing a > > codepath to use ReadRecentBuffer() instead of ReadBuffer() leads to not > > increasing usagecount anymore? > > Yeah, that is not great. The simplification you suggest would fix > that too, though I guess it would also bump the usage count of buffers > that don't have the tag we expected; that's obviously rare and erring > on a better side though. Yea, I'm not worried about that. If somebody is, we could just add code to decrement the usagecount again. > > FWIW, once that's fixed, using ReadRecentBuffer() for _bt_getroot(), caching > > the root page's buffer id in RelationData, seems a noticeable win. About 7% > > in > > a concurrent, read-only pgbench that utilizes batches of 10. And it should > > be > > easy to get much bigger wins, e.g. with a index nested loop with a > > relatively > > small index on the inner side. > > Wooo, that's better than I was hoping. Thanks for trying it out! I > think, for the complexity involved (ie very little) I don't really have a concrete thought for where to store the id of the recent buffer. I just added a new field into some padding in RelationData, but we might go for something fancier. > smgr_targblock could be another easy-to-cache candidate, ie a place where > there is a single interesting hot page that we're already keeping track of > with no requirement for new backend-local mapping machinery. I wonder if we should simple add a generic field for such a Buffer to RelationData, that the AM can use as it desires. For btree that would be the root page, for heap the target block ... > it's a nice result, and worth considering even though it's also a solid clue > that we could do much better than this with a (yet to be designed) > longer-lived pin scheme. Indeed. PinBuffer() is pretty hot after the change. As is the buffer content lock. Particularly for the root page, it'd be really interesting to come up with a scheme that keeps an offline copy of the root page while also pinning the real root page. I think we should be able to have a post-check that can figure out if the copied root page is out of date after searching it, without needing the content lock. Greetings, Andres Freund