On Mon, Feb 17, 2025 at 5:55 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > The solution we agreed on is to introduce a way for StartReadBuffers() > to communicate with future calls, and "forward" pinned buffers between > calls. The function arguments don't change, but its "buffers" > argument becomes an in/out array: one StartReadBuffers() call can > leave extra pinned buffers after the ones that were included in a > short read (*nblocks), and then when you retry (or possibly extend) > the rest of the read, you have to pass them back in. That is easy for > the read stream code, as it can just leave them in its circular queue > for the next call to take as input. It only needs to be aware of them > for pin limit accounting and stream reset (including early shutdown).
BTW here's a small historical footnote about that: I had another solution to the same problem in the original stream proposal[1], which used a three-step bufmgr API. You had to call PrepareReadBuffer() for each block you intended to read, and then StartReadBuffers() for each cluster of adjacent misses it reported, and finally WaitReadBuffers(). Hits would require only the first step. That was intended to allow the stream to manage the prepared buffers itself, short reads would leave some of them for a later call. Based on review feedback, I simplified to arrive at the two-step API, ie just start and wait, and PrepareReadBuffer() became the private helper function PinBufferForBlock(). I had to invent the "hide-the-trailing-hit" trick to avoid unpin/repin sequence in the two-step API, thinking we might eventually want to consider the three-step API again later. This new design achieves the same end result: buffers that can't be part of an I/O stay pinned and in the stream's queue ready for the next call, just like "prepared" buffers in the early prototypes, but it keeps the two-step bufmgr API and calls them "forwarded". [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com